Assignment 2
Inventory Management. Consider a variation of the inventory management problem considered in class. Suppose that the inventory \(S_t\) and the actions \(A_t\) takes values in the set \(\mathbb S = \{0, 1, \dots, L \}\), and the dynamics are given by \[ S_{t+1} = \bigl[ S_t + A_t - W_t \bigr]_0^L, \] where \([ s ]_{0}^L\) is a function which clips the values between \(0\) and \(L\), i.e., \[ [ s ]_0^L = \begin{cases} 0, & \text{if $s < 0$} \\ s, & \text{if $0 \le s \le L$} \\ L, & \text{if $s > L$}. \end{cases} \] The demand \(W_t\) takes values in the set \(\ALPHABET W = \{0, 1, 2\}\) with probability mass function \(P_W = [ 0.1, 0.7, 0.2 ]\).
The per-step cost is given by \[ c_t(S_t, A_t, W_t) = (S_t + A_t - W_t)^2. \] Note that in this case, the inventoy cannot be negative but the per-step cost does take shortage into account.
Write the dynamic programming equation and numerically solve it for a horizon \(T = 10\) and inventory level \(L = 5\). Plot the value function and the optimal policy for \(t=1\) and \(t=2\).
Exercise 5.1 from notes on Finite horizon MDPs.
Exercise 5.2 from notes on Finite horizon MDPs.