We study the infinite horizon risk-sensitive control problem for discrete time Markov decision processes with compact metric state and action spaces. We derive a variational formula for the optimal growth rate of risk-sensitive reward. This parallels the usual variational formulation of the long term average reward in the absence of risk-sensitivity given by the ergodic control viewpoint, with an additional relative entropy penalty on occupation measures. It can also be viewed as an extension of the characterization of Donsker and Varadhan for the Perron-Frobenius eigenvalue of a positive operator. The problem of determining this optimal growth rate of risk-sensitive reward is thereby presented as a problem of maximizing a concave function over a convex set.
Venkat Anantharam is on the faculty of EECS department at U. C. Berkeley, where he has been since 1994. Prior to that he was on the faculty of the School of Electrical Engineering at Cornell University, after getting his doctorate from U. C. Berkeley in 1986.