The course will primarily be on online learning in a stochastic environment. The emphasis will be on proving formal performance guarantees of algorithms, and also fundamental limits on the performance of any algorithm.
The first half of the course will focus on variants of the multi-armed bandit problem:- Regret minimization: algorithms and information theoretic lower bounds
- Pure exploration: fixed budget and fixed confidence
- Linear bandits, contextual bandits, Bayesian bandits (Thompson sampling)
- Background on MDPs
- Markovian bandits (rested and restless); Gittins and Whittle index
- Reinforcement learning
Text/References
- T. Lattimore and C. Szepesvári, “Bandit Algorithms,” Available at http://tor-lattimore.com/downloads/book/book.pdf.
- Alexandre Slivkins, “Introduction to Multi-Armed Bandits,” NOW Publishers, 2019
- D. Russo, et al, “A Tutorial on Thompson Sampling,” NOW Publishers, 2018.
- S. Shalev-Schwartz, “Online Learning and Online Convex Optimization,” NOW Publishers 2011.
- Contemporary research papers