We first present a couple of our algorithms based on gradient approximations for the problem of model-free or data-driven optimization. It turns out that these algorithms, as with many other algorithms in the literature, involve biases that do not vanish asymptotically, however, which keep the error in a bounded region. Previous analyses in the literature had focussed on the case when the errors vanish asymptotically and a standard procedure for the analysis has been to show that the algorithm converges to the set of function minima. As mentioned, however, the bias terms really do not vanish asymptotically. We will sketch a treatment where we show that such algorithms in particular track a differential inclusion (i.e., one involving set-valued maps) instead of an ODE. We also give a set of verifiable sufficient conditions which ensure that such an algorithm remains stable. These conditions are an extension of similar conditions for regular stochastic approximation algorithms by Borkar and Meyn. In the second part of the talk, we will consider the problem of off-policy control in reinforcement learning, i.e., one that involves finding an optimal policy but under the constraint that the data we require to train comes from a simulator that only outputs data (in terms of states, actions, rewards and next states) from a given policy and nothing else. Our algorithm here is a gradient-free and model-free algorithm that is derived from the cross entropy method. An important characteristic of our algorithm here is that it converges almost surely to the set of global minima. This is joint work with my former Ph.D students, Dr. Arunselvan Ramaswamy and Dr. Ajin George Joseph.
Shalabh Bhatnagar is currently a Professor in the Department of Computer Science and Automation at the Indian Institute of Science. On the theory side, his research interests include stochastic approximation algorithms, stochastic optimization as well as reinforcement learning. He also has interests in designing algorithms and techniques for specific engineering applications in domains that include autonomous systems, smart grids, vehicular traffic control as well as communication and wireless networks