Tutorials


Mikhail Belkin

Theory of Deep Learning

Abstract

Generalization is the central topic of machine learning and data science. What patterns can be learned from observations and how can we be sure that they extend to future, not yet seen, data? I will try to outline the arc of recent developments in understanding of generalization in machine learning. These changes occurred largely due to empirical findings in neural networks which necessitated revisiting theoretical foundations of generalizations. Classically, many analyses relied on the assumption that the training loss was an accurate proxy of the test loss. This turned out to be unfounded as good practical predictors frequently have training loss that is much lower than the test loss. Theoretical developments, such as analyses if interpolation and double descent have recently shed light on that issue. In view of that, a common practical prescription has become to mostly ignore the training loss and to adopt early stopping – to stop the model training once the validation loss plateaus. Recent discovery of emergent phenomena like grokking show that this practice is also not generally justifiable as at least in some settings, the test loss up to a certain iteration may not be predictive of the test loss itself just a few iterations later. I will discuss why this presents a fundamental challenge to both theory and practice of machine learning, and attempt to describe the current state of affairs.

Biography

Mikhail Belkin Mikhail Belkin is a Professor at Halicioglu Data Science Institute and Computer Science and Engineering Department at UCSD and an Amazon Scholar. Prior to that he was a Professor at the Department of Computer Science and Engineering and the Department of Statistics at the Ohio State University. He received his Ph.D. from the Department of Mathematics at the University of Chicago (advised by Partha Niyogi). His research interests are broadly in theory and applications of machine learning, deep learning and data analysis. Some of his well-known work includes widely used Laplacian Eigenmaps, Graph Regularization and Manifold Regularization algorithms, which brought ideas from classical differential geometry and spectral graph theory to data science. His more recent work has been concerned with understanding remarkable mathematical and statistical phenomena observed in deep learning. The empirical evidence necessitated revisiting some of the classical concepts in statistics and optimization, including the basic notion of over-fitting. One of his key findings has been the “double descent” risk curve that extends the textbook U-shaped bias-variance trade-off curve beyond the point of interpolation. His recent work focusses on understanding feature learning and over-parameterization in deep learning. Mikhail Belkin is an ACM Fellow and a recipient of a NSF Career Award and a number of best paper and other awards. He had served on the editorial boards of IEEE Proceedings on Pattern Analysis Machine Intelligence and the Journal of the Machine Learning Research. He is the editor-in-chief of SIAM Journal on Mathematics of Data Science (SIMODS).


Siva Theja Maguluri

Reinforcement Learning and Stochastic Approximation

Abstract

We consider the problem of solving fixed-point equations of the form $F(x) = x$, where $F(\cdot)$ is an operator that is a contraction with respect to some norm (e.g., the $\ell^\infty\text{-norm}$). Stochastic approximation (SA) is an algorithm that was first proposed by Robins and Monro to achieve this when only noisy samples of the operator were accessable. SA is at the heart of several machine learning algorithms today. The focus of this tutorial is on characterizing the rate of convergence of SA and to illustrate its use in applications such as reinforcement learning (RL).

We first present finite-time bounds on the mean-square convergence error and show how it translates to the case of a few RL algorithms. The key in obtaining our mean-square bounds is the construction of a smooth Lyapunov/potential function based on infimal convolution and generalized Moreau envelope, that enables us to control the stochastic errors in the SA algorithm. We will also present an overview of several generalizations of SA that can be obtained using this type of Lyapunov argument, such as SA under seminorm contractive operators. Finally, we present high probability bounds on the error tail, which are also obtained using exponential supermartingales in conjunction with the Moreau envelop and a novel bootstrapping approach.

Biography

Siva Theja Maguluri Prof. Siva Theja Maguluri is the Fouts Family Early Career Professor and an Associate Professor in the H. Milton Stewart School of Industrial & Systems Engineering at Georgia Tech. Before joining Georgia Tech, Prof. Maguluri spent two years in the Stochastic Processes and Optimization group within the Mathematical Sciences Department at the IBM T. J. Watson Research Center. He completed a Ph.D. in Electrical and Computer Engineering (ECE) from the University of Illinois at Urbana-Champaign (UIUC) in 2014, under the supervision of Prof. R. Srikant, following an MS in ECE also advised by Prof. Srikant and Prof. Bruce Hajek. Prof. Maguluri additionally holds an MS in Applied Mathematics from UIUC and a B.Tech in Electrical Engineering from the Indian Institute of Technology Madras. Prof. Maguluri has received several notable awards, including the NSF CAREER award in 2021, the 2017 Best Publication in Applied Probability Award from INFORMS Applied Probability Society, and second prize in the 2020 INFORMS Junior Faculty Interest Group (JFIG) Best Paper Competition. Joint research with his students received the Stephen S. Lavenberg Best Student Paper Award at IFIP Performance 2021. Recognized for teaching excellence, Prof. Maguluri received the Student Recognition of Excellence in Teaching: Class of 1934 CIOS Award in 2020 for ISyE 6761 and the CTL/BP Junior Faculty Teaching Excellence Award in 2020, presented by the Center for Teaching and Learning at Georgia Tech.


Prabha Mandayam

Quantum Information Theory and Quantum Error Correction

Abstract

This set of lectures aims to give a flavour of quantum information theory and quantum error correction codes. Details below:

  • Lecture 1: Quantum states and gates, Qubit and Bloch sphere, pure and mixed states, quantum measurements: projects and positive operator valued measures (POVMs), Distinguishing quantum states.
  • Lecture 2: The No-cloning Theorem, Entanglement and Bell states, Teleportation and Superdense Coding. Von Neumann Entropy and basic properties: subadditivity and negative conditional entropy.
  • Lecture 3: Quantum Channels: completely positive trace-preserving (CPTP) maps, Choi theorem and Kraus representation. Examples: quantum bit-flip and phase-flip channels, depolarising channel, quantum erasure channel and amplitude-damping channel.
  • Lecture 4: Quantum Error Correction: Quantum repetition code, Shor’s 9-qubit code, Stabilizer codes.
  • Lecture 5: Quantum Information Processing: The Holevo bound (proof sketch), brief discussion on classical and quantum capacities of quantum channels.
Biography

Prabha Mandayam Prof. Prabha Mandayam is an Associate Professor in the Department of Physics at the Indian Institute of Technology, Madras. Previously, she was an Inspire Faculty Fellow at the Chennai Mathematical Institute and a Postdoctoral Fellow with the Optics and Quantum Information Group at the Institute of Mathematical Sciences. She obtained her Ph.D. in Physics from the Institute for Quantum Information and Matter at Caltech under the supervision of John Preskill. Prof. Mandayam’s research interests lie in the field of quantum computing and quantum information theory. Specifically, she focuses on quantum error correction, the interplay between quantum foundations and quantum cryptography, and using quantum information as a tool to explore fundamental questions in theoretical physics.