7  Inventory Management

Updated

August 21, 2024

Keywords

inventory management, post-decision state, base-stock policy, structural results

Image credit: ChatGPT-4o
Summary

The inventory management example illustrates that a dynamic programming formulation is useful even when a closed form solution does not exist. This model also introduces the idea of post-decision state, which is useful in many contexts.

Imagine a retail store that stockpiles products in its warehouse to meet random demand. Suppose the store procures new stocks at the end of each day (and that there is no lead time and stocks are available next morning). Let

The random variables \(\{W_t\}_{t \ge 1}\) are i.i.d. with known probability distribution.

Excess demand is backlogged and filled when new inventory becomes available. Thus, the stock evolves according to \[S_{t+1} = S_t + A_t - W_t,\] where negative stock denotes backlogged demand.

The cost incurred during day \(t\) consists of two components:

7.1 Dynamic programming decomposition

\(\def\S{\mathbb{S}}\)

The above model is a Markov decision process. Therefore, the optimal solution is given by dynamic programming.

Instead of \(\integers\), we use \(\S\) to denote the possible values of states. The reason is that we will later consider the case when the state space is the set of reals, and we can still use the same equations.

Proposition 7.1 (Dynamic programming) Define the following value functions \(V^*_t \colon \S \to \reals\) \[V^*_{T+1}(s) = 0\] and for \(t \in \{T, \dots, 1\}\) \[ Q^*_t(s, a) = p a + \EXP[ h(s + a - W_t) + V^*_{t+1}( s + a - W_t ) ]\] and \[ \begin{align*} V^*_t(s) &= \min_{a \in \S_{\ge 0}} Q^*_t(s,a) \\ π^*_t(s) &= \arg \min_{a \in \S_{\ge 0}} Q^*_t(s,a) \end{align*} \] Then the strategy \(π^* = (π^*_1, \dots, π^*_T)\) is optimal.

It is possible to simplify the above dynamic program by exploiting a feature of the model. Notice that the dynamics can be split into two parts: \[ \begin{align*} Z_t &= S_t + A_t, \\ S_{t+1} &= Z_t - W_t. \end{align*} \] The first part, \(Z_t\), depends only on the current state and action. The second part depends only on \(Z_t\) and a primitive random variable. In this particular model, \(Z_t\) is a deterministic function of \(S_t\) and \(A_t\); but, in general, it could be stochastic as well; what is important is that the second part should only depend on \(Z_t\) and a primitive random variable. The variable \(Z_t\) is sometimes called the post-decision state.

Now write the dynamic program in terms of the post-decision state as follows. Define \[ H_t(z) = \EXP[ h(z - W) + V^*_{t+1}(z-W) ].\] Then the value function and optimal policy at time \(t\) can be written as: \[ \begin{align*} V^*_t(s) &= \min_{a \in \S_{\ge 0}} \bigl\{ pa + H_t(s + a) \bigr\}, \\ π^*_t(s) &= \arg \min_{a \in \S_{\ge 0}} \bigl\{ pa + H_t(s + a) \bigr\}. \end{align*} \]

Note that the problem at each step is similar to the newsvendor problem. So, similar to that model, we try to see if we can establish qualitative properties of the optimal solution.

To fix ideas, let’s solve this dynamic program for a specific instance.

Figure 7.1: Demand Distribution

We assume that the demand is distributed according to a Binomial(50,0.4) distribution, as shown in Figure 7.1. We assume that the model parameters are as given below:

\[ c_h = 2,\quad c_s = 5,\quad p = 1. \]

We consider a horizon \(T = 15\), and solve the dynamic program shown above. The optimal value function and policy are shown below: