TITLE: Information Relaxation in Stochastic Optimal Control
ABSTRACT:
Dynamic programming is a principal method for analyzing stochastic optimal control problems. However, the exact computation of dynamic programming can be intractable in large-scale problems due to the "curse of dimensionality". Various approximation methods have been proposed to address this issue and often generate suboptimal policies, though it is generally difficult to tell how far these policies are from optimal. This thesis concerns with studying the stochastic control problems from a duality perspective and generating upper bounds on maximal expected rewards, which complements the lower bounds associated with suboptimal policies. If the gap between the lower and upper bounds is small, it implies that a suboptimal policy must be close to optimal. The approach considered in this thesis is called "information relaxation", that is, it relaxes the non-anticipativity constraint that requires the decisions to depend only on the information available to the decision maker and imposes a penalty that punishes such a violation.
In the first part of the thesis, we study the interaction of Lagrangian relaxation and information relaxation in weakly coupled dynamic programs. A commonly studied approach builds on the property that this high-dimensional problem can be decoupled by dualizing the resource constraints via Lagrangian relaxation. We generalize the information relaxation approach to improve upon the Lagrangian bound in an infinite-horizon setting. We also develop a computational method to tackle large-scale problems and provide insightful interpretation and performance guarantee.
In the second part, we formulate the information relaxation-based duality in an important class of continuous-time decision-making models --- controlled Markov diffusion, which is widely used in portfolio optimization and risk management. We find that this continuous-time model admits an optimal penalty in compact form --- an Ito stochastic integral, which enables us to construct approximate penalties in simple forms and achieve tight dual bounds. We demonstrate its use in a dynamic portfolio choice problem subject to position and consumption constraints.
In the third part, we consider the problem of optimal stopping of discrete-time continuous-state partially observable Markov processes. We develop a filtering-based dual approach, which relies on the martingale duality formulation of the optimal stopping problem and the particle filtering technique. We carry out error analysis and illustrate the effectiveness of our method in an example of pricing American options under partial observation of stochastic volatility.