Machine Learning and AI Seminar Series
About
This seminar series invites experts from across the country to come to Columbia and present the latest cutting-edge research in the field of Machine Learning and Artificial Intelligence. Running the gamut between theory and empirics, the seminar provides a single, unified space to bring together the ML/AI community at Columbia. Topics of interest include, but are not limited to, Language Models, Optimization for Deep Learning, Reinforcement and Imitation Learning, Â Learning Theory, Interpretability and AI Alignment, AI for science, Probabilistic ML, and Bayesian methods.
Hosts: DSI Foundations of Data Science Center; Department of Statistics, Graduate School of Arts and Sciences
Registration
Registration for all CUID holders is preferred. If you do not have an active CUID, registration is required and is due at 12:00 PM the day prior to the seminar. Unfortunately, we cannot guarantee entrance to Columbia’s Morningside campus if you register following 12:00 PM the day prior to the seminar. Thank you for understanding!
Please contact Erin Elliott, DSI Events and Marketing Coordinator at ee2548@columbia.edu with any questions.
RegisterNext Seminar
Date: Friday, October 17, 2025 (11:00 AM – 12:00 PM)
Location: Columbia School of Social Work, Room C05

Volodymyr Kuleshov, Joan Eliasoph, M.D. Assistant Professor, Department of Computer Science, Cornell Tech and Cornell University
Title: Discrete Diffusion Language Models
Abstract: While diffusion generative models excel at high-quality image generation, prior work reports a significant performance gap between diffusion and autoregressive (AR) methods on discrete data such as text or biological sequences. Our work takes steps towards closing this gap via a simple and effective framework for discrete diffusion. This framework is simple to understand—it optimizes a mixture of denoising (e.g., masking) losses—and can be seen as endowing BERT-like models with principled samplers and variational estimators of log-likelihood. Crucially, our algorithms are not constrained to generate data sequentially, and therefore have the potential to improve long-term planning, controllable generation, and sampling speed.
In the context of language modeling, our framework enables deriving masked diffusion language models (MDLMs), which achieve a new state-of-the-art among diffusion models, and approach AR quality. Combined with novel extensions of classifier-free and classifier-based guidance mechanisms, these algorithms are also significantly more controllable than AR models. Discrete diffusion extends beyond language to science, where it forms the basis of a new generation of DNA foundation models. Our largest models focus on plants and set a new state of the art in genome annotation, while also enabling effective generation. Discrete diffusion models hold the promise to advance progress in generative modeling and its applications in language understanding and scientific discovery.
Upcoming Seminar Schedule (Fall 2025)
Please save the below dates, times, and locations to attend the seminar series.
Friday, October 24, 2025 (11:00 AM – 12:00 PM)
- Location: School of Social Work, Room C05
- Speaker: Furong Huang, Associate Professor, Department of Computer Science at the University of Maryland
Friday, November 7, 2025 (11:00 AM – 12:00 PM)
- Location: School of Social Work, Room C05
- Speaker: Florentin Guth, Faculty Fellow, Center for Data Science, NYU; and Research Fellow, Center for Computational Neuroscience, Flatiron Institute
Friday, November 21, 2025 (11:00 AM – 12:00 PM)
- Location: School of Social Work, Room C05
- Speaker: Andrej Risteski, Associate Professor, Machine Learning Department, Carnegie Mellon University
Friday, December 12, 2025 (11:00 AM – 12:00 PM)
- Location: School of Social Work, Room 311/312
- Speaker: Jason Weston, Research Scientist at Facebook, NY and a Visiting Research Professor at NYU
Archive: Speaker Abstracts
-
Title: Gradient Descent Dominates Ridge: A Statistical View on Implicit Regularization
Abstract: A key puzzle in deep learning is how simple gradient methods find generalizable solutions without explicit regularization. This talk discusses the implicit regularization of gradient descent (GD) through the lens of statistical dominance. Using least squares as a clean proxy, we present two surprising findings.
First, GD dominates ridge regression: with comparable regularization, the excess risk of GD is always within a constant factor of ridge, but ridge can be polynomially worse even when tuned optimally. Second, GD is incomparable with SGD. While it is known that for certain problems GD can be polynomially better than SGD, the reverse is also true: we construct problems, inspired by benign overfitting theory, where optimally stopped GD is polynomially worse. Finally, GD dominates SGD for a significant subclass of problems — those with fast and continuously decaying covariance spectra — which includes all problems satisfying the standard capacity condition.
This is joint work with Peter Bartlett, Sham Kakade, Jason Lee, and Bin Yu.
Talk Date: Friday, October 6, 2025