Stochastic Control
Assignments
Assignment 8
Course Outline
Stochastic Optimization
1
Introduction
2
Dealing with observations
3
Certainty equivalence
4
Interchange arguments
MDPs
5
Finite horizon MDPs
6
Optimal gambling
7
Inventory Management
8
Some simplifying structures
9
Monotonicity of value function and optimal policies
10
Power-delay tradeoff
11
Continuous state spaces
12
Service migration in mobile edge computing
13
Reward Shaping
14
Optimal stopping
15
Infinite horizon MDPs
16
MDP algorithms
17
Inventory management (revisted)
18
Computational complexity of value iteration
19
Thirfty and equalizing policies
20
Linear programming formulation
21
Lipschitz MDPs
22
Periodic MDPs
23
Policy Gradient Theorem
POMDPs
24
Introduction
25
Sequential hypothesis testing
26
When to observe a Markov chain
Approx DP
27
Approximate dynamic programming
28
Upper bounds on policy loss
29
Model approximation
Risk sensitive MDPs
30
Risk Sensitive Utility
31
Risk Sensitive MDPs
32
DP for dynamic risk measures
Linear systems
33
Linear quadratic regulation
34
Large scale systems
35
Linear filtering
Stochastic Approximation
36
Stochastic approximation
37
Rate of convergence for stochastic approximation
RL
38
The learning setup
39
Q-Learning
Dec-POMDPs
40
Designer’s Approach
Analysis Appendix
41
Convergence of sequences
42
Inequalities
Probability Appendix
43
Convergence of random variables
44
Sub-Gaussian random variables
45
Change of Measure
46
Integral Probablity Metrics
47
Markov chains
48
Martingales
49
Stochastic stability
Linear Algebra Appendix
50
Some useful matrix relationships
51
Positive definite matrices
52
Matrix trace
53
Infinite product of matrices
54
Singular value decomposition
55
Vector, Banach, and Hilbert spaces
56
Reproducing Kernel Hilbert Space
Convexity Appendix
57
Convex sets and convex functions
58
Convex Duality
References
Assignments
Assignment 1
Assignment 2
Assignment 3
Assignment 4
Assignment 5
Assignment 6
Assignment 7
Assignment 8
59
Grading rubric for the report
Assignments
Assignment 8
Assignment 8
Author
Affiliation
Aditya Mahajan
McGill University
Updated
March 10, 2026
Exercise
16.2
from the notes on
MDP algorithms
.
Assignment 7
59
Grading rubric for the report