Gating is a key feature in modern neural networks including LSTMs, GRUs and sparsely-gated deep neural networks. The backbone of such gated networks is a mixture-of-experts layer, where several experts make regression decisions and gating controls how to weigh the decisions in an input-dependent manner. Despite having such a prominent role in both modern and classical machine learning, very little is understood about parameter recovery of mixture-of-experts since gradient descent and EM algorithms are known to be stuck in local optima in such models. In this work, we perform a careful analysis of the optimization landscape and show that with appropriately designed loss functions, gradient descent can indeed learn the parameters accurately. A key idea underpinning our results is the design of two distinct loss functions, one for recovering the expert parameters and another for recovering the gating parameters. We demonstrate the first sample complexity results for parameter recovery in this model for any algorithm and demonstrate significant performance gains over standard loss functions in numerical experiments.
Ashok is a 5th year graduate student in the ECE department at UIUC, advised by Prof. Pramod Viswanath. He obtained his Masters in ECE (advised by Prof. Yihong Wu) from UIUC in 2017 and Bachelors in EE (advised by Prof. Vivek Borkar) with a minor in Mathematics from IIT Bombay in 2015. His current research interests are theoretical and algorithmic aspects of machine learning and information theory. He is a recipient of Best Paper Award from ACM MobiHoc 2019. He has won several graduate student awards and fellowships including Joan and Lalit Bahl Fellowship, Sundaram Seshu International Student Fellowship, and was also a finalist for the Qualcomm Innovation Fellowship 2018. Outside the convex hull of research activities, he likes to learn new languages, watch and read about international films, reading history, and remembering trivia. For more details, please visit: http://makkuva2.web.engr.illinois.edu/