|
Rayadurgam Srikant: Part I: Introduction to Average-Cost MDPs Part II: Approximate Policy Iteration in Average-Cost MDPs
Abstract:
In the first part, we will present an introduction to discounted and average-cost MDPs. Discounted-cost MDPs are more commonly studied in the reinforcement learning and approximate dynamic programming literature. We will present some reasons why average-cost MDPs are harder to study in this context. In the second part, we will present some recent results on approximate dynamic programming and reinforcement learning for average-cost MDPs. Specifically, we will present some new results on approximate policy iteration and soft policy iteration.
|