ECSE 506: Stochastic Control and Decision Theory
Aditya Mahajan
Winter 2022
About  Lectures  Notes  Coursework
Whenever possible, I will post notes on some of the material covered in class, but that is not guaranteed. This is a graduate class and you are responsible for taking notes in class and reading the appropriate chapters of the textbooks.
The notes will be updated as we move along in the course. Please check the dates on the first page to keep track. If you find any typos/mistakes in the notes, please let me know. Pull requests welcome.
 Week 1

Introduction and course overview.
 Reading: Kumar and Varaiya (Ch 1, 2); Bertsekas (Ch 1).
 Introduction to stochastic optimization
 Newsvendor problem
 Introduction to MDPs and dynamic programming
 Matrix formulation of MDPs
 Assignment 1 posted.
 Week 2

Examples of MDPs
 Optimal gambling
 Inventory management
 Assignment 2 posted.
 Assignment 1 due
 Week 3

Proof of optimality of dynamic programming
 See notes on MDP theory
 Cost vs reward, cost which depends on the next state, minimax optimality, risk senstive control.
 See notes on MDP theory
 Assignment 3 posted.
 Assignment 2 due
 Week 4

Monotonicty in Markov decision processes
 Stochastic dominance, monotonicity, and submodularity
 Sufficient conditions for value function and optimal policy to be monotone.
 Example of powerdelay tradeoff in wireless communication
 Assignment 4 posted.
 Assignment 2 due
 Week 5

Introduction to infinite horizon discounted problems
 Reward shaping
 Infinite horizon inventory management
 Sevice Migration in mobile edge computing
 Week 6

Bellman operators, value iteration, and policy iteration
 Monotonicity, contraction, and their implications
 Value iteration and stopping conditions
 Policy iteration and convergence guarantees
 See notes on infinite horizon MDPs
 Week 7

Properties of value functions
 Lipschitz continuity of MDPs
 C4 properties
 Week 8

Approximate dynamic programming
 Week 9

Model approximation
 Integral probability metrics and comparing MDPs.
 State quantization (see notes on state aggregation)
 State compression (see notes on state aggregation)
 Week 10

POMDPs
 Basic model of POMDPs (see notes on POMDPs)
 Information state
 Sequential hypothesis testing
 Week 11

Approximations for POMDPs
 Approximate information states
 Week 12

Decentralized control
 Decentralized POMDPs / Dynamic team problems
 Common information approach
 Delayed sharing and control sharing models
This entry was last updated on 06 Jan 2022