Decisions are increasingly taken by both humans and machine learning models. However, machine learning models are currently trained for full automation-they are not aware that some of the decisions may still be taken by humans. In this talk, we tackle two problems towards making machine learning models aware of the presence of human decision-makers. In the first problem, we introduce the problem of ridge regression under human assistance and show that it is NP-hard. Then, we derive an alternative representation of the corresponding objective function as a difference of non-decreasing sub-modular functions. Building on this representation, we further show that the objective is non-decreasing and satisfies \alpha-sub modularity, a recently introduced notion of approximate sub-modularity. These properties allow simple and efficient greedy algorithm to enjoy approximation guarantees at solving the problem. Experiments on synthetic and real-world data from two important applications-medical diagnoses and content moderation-demonstrate that the greedy algorithm beats several competitive baselines. In the second problem, we consider the problem of switching between humans and machines in the context of reinforcement learning. Reinforcement learning algorithms have been mostly developed and evaluated under the assumption that they will operate in a fully autonomous manner---they will take all actions. However, in safety critical applications, full autonomy faces a variety of technical, societal and legal challenges, which have precluded the use of reinforcement learning policies in real-world systems. In this work, our goal is to develop algorithms that, by learning to switch control between machines and humans, allow existing reinforcement learning policies to operate under different automation levels. More specifically, we first formally define the learning to switch problem using finite horizon Markov decision processes. Then, we show that, if the human policy is known, we can find the optimal switching policy directly by solving a set of recursive equations using backwards induction. However, in practice, the human policy is often unknown. To overcome this, we develop an algorithm that uses upper confidence bounds on the human policy to find a sequence of switching policies whose total regret with respect to the optimal switching policy is sub-linear. Simulation experiments on two important tasks in autonomous driving---lane keeping and obstacle avoidance---demonstrate the effectiveness of the proposed algorithms and illustrate our theoretical findings.
Abir De is an assistant professor in CSE Department at IIT Bombay. Prior to this, he was a postdoctoral researcher in Max Planck Institute for Software Systems at Kaiserslautern, Germany since January 2018. He received his PhD from Department of Computer Science and Engineering, IIT Kharagpur in July 2018. During that time, he was a part of the Complex Network Research Group (CNeRG) at IIT Kharagpur. He was supported by Google India PhD Fellowship 2013. Prior to that, he did his BTech in Electrical Engineering and MTech in Control Systems Engineering both from IIT Kharagpur. His main research interests broadly lie in modeling, learning and control of networked dynamical processes. Very recently, he started working on human centric machine learning. His publications can be accessed from https://abir-de.github.io/pub.html .