Toward a Practical Theory of Deep Learning: Feature Learning in Deep Neural Networks and Backpropagation-free Algorithms that Learn Features

Mikhail Belkin
Professor of Data Science, Halıcıoğlu Data Science Institute, Computer Science and Engineering at UC San Diego

Banatao Auditorium | 310 Sutardja Dai Hall
Wednesday, November 29, 2023, 4PM

Remarkable recent advances in deep neural networks are rapidly changing science and society. Never before had a technology been deployed so widely and so quickly with so little understanding of its fundamentals. I will argue that developing a fundamental mathematical theory of deep learning is necessary for a successful AI transition and, furthermore, that such a theory may well be within reach. I will discuss what a theory might look like and some of its ingredients that we already have available.

In particular, I will discuss how deep neural neural networks of various architectures learn features and how the lessons of deep learning can be incorporated in non-backpropagation-based algorithms that we call Recursive Feature Machines. I will provide a number of experimental results on different types of data, including texts and images, as well as some connections to classical statistical methods, such as Iteratively Reweighted Least Squares.

Speaker Bio

Mikhail Belkin is a Professor at Halicioglu Data Science Institute and Computer Science and Engineering Department at UCSD and an Amazon Scholar. He received his Ph.D. from the Department of Mathematics at the University of Chicago. His research interests are broadly in theory and applications of machine learning, deep learning and data analysis.

Some of his well-known work includes widely used Laplacian Eigenmaps, Graph Regularization and Manifold Regularization algorithms, which brought ideas from classical differential geometry and spectral graph theory to data science. His more recent work has been concerned with understanding remarkable mathematical and statistical phenomena observed in deep learning. The empirical evidence necessitated revisiting some of the classical concepts in statistics and optimization, including the basic notion of over-fitting. One of his key findings has been the “double descent” risk curve that extends the textbook U-shaped bias-variance trade-off curve beyond the point of interpolation. His recent work focuses on understanding over-parameterization and feature learning in statistical models.