# 7 Inventory Management

inventory management, post-decision state, base-stock policy, structural results

Imagine a retail store that stockpiles products in its warehouse to meet random demand. Suppose the store procures new stocks at the end of each day (and that there is no lead time and stocks are available next morning). Let

- \(S_t \in \integers\) denote the amount of stock at the beginning of day \(t\),
- \(A_t \in \integers_{\ge 0}\) denote the stock ordered (and immediately delivered) at the beginning of day \(t\), and
- \(W_t \in \integers_{\ge 0}\) denote the demand during day \(t\).

The random variables \(\{W_t\}_{t \ge 1}\) are i.i.d. with known probability distribution.

Excess demand is backlogged and filled when new inventory becomes available. Thus, the stock evolves according to \[S_{t+1} = S_t + A_t - W_t,\] where negative stock denotes backlogged demand.

The cost incurred during day \(t\) consists of two components:

A procurement cost of \(p A_t\), where \(p\) is the cost per unit.

At the end of the day, if the stock \(S_{t+1}\) is positive, then there is a holding cost of \(c_h S_{t+1}\) for storing excess inventory; if \(S_{t+1}\) is negative, then a shortage cost of \(-c_s S_{t+1}\) for unfilled demand.

We denote this cost by \(h(S_{t+1})\), where \[ h(s) = \begin{cases} c_h s, & \text{if } s \ge 0 \\ -c_s s, & \text{if } s < 0 \end{cases}\]

## 7.1 Dynamic programming decomposition

\(\def\S{\mathbb{S}}\)

The above model is a Markov decision process. Therefore, the optimal solution is given by dynamic programming.

Instead of \(\integers\), we use \(\S\) to denote the possible values of states. The reason is that we will later consider the case when the state space is the set of reals, and we can still use the same equations.

**Proposition 7.1 (Dynamic programming)** Define the following value functions \(V^*_t \colon \S \to \reals\) \[V^*_{T+1}(s) = 0\] and for \(t \in \{T, \dots, 1\}\) \[ Q^*_t(s, a) = p a + \EXP[ h(s + a - W_t) + V^*_{t+1}( s + a - W_t ) ]\] and \[ \begin{align*}
V^*_t(s) &= \min_{a \in \S_{\ge 0}} Q^*_t(s,a) \\
π^*_t(s) &= \arg \min_{a \in \S_{\ge 0}} Q^*_t(s,a)
\end{align*}
\] Then the strategy \(π^* = (π^*_1, \dots, π^*_T)\) is optimal.

It is possible to simplify the above dynamic program by exploiting a feature of the model. Notice that the dynamics can be split into two parts: \[ \begin{align*}
Z_t &= S_t + A_t, \\
S_{t+1} &= Z_t - W_t.
\end{align*}
\] The first part, \(Z_t\), depends only on the current state and action. The second part depends only on \(Z_t\) and a primitive random variable. In this particular model, \(Z_t\) is a deterministic function of \(S_t\) and \(A_t\); but, in general, it could be stochastic as well; what is important is that the second part should only depend on \(Z_t\) and a primitive random variable. The variable \(Z_t\) is sometimes called the **post-decision state**.

Now write the dynamic program in terms of the post-decision state as follows. Define \[ H_t(z) = \EXP[ h(z - W) + V^*_{t+1}(z-W) ].\] Then the value function and optimal policy at time \(t\) can be written as: \[ \begin{align*} V^*_t(s) &= \min_{a \in \S_{\ge 0}} \bigl\{ pa + H_t(s + a) \bigr\}, \\ π^*_t(s) &= \arg \min_{a \in \S_{\ge 0}} \bigl\{ pa + H_t(s + a) \bigr\}. \end{align*} \]

Note that the problem at each step is similar to the newsvendor problem. So, similar to that model, we try to see if we can establish qualitative properties of the optimal solution.

To fix ideas, let’s solve this dynamic program for a specific instance.

We assume that the demand is distributed according to a Binomial(50,0.4) distribution, as shown in Figure 7.1. We assume that the model parameters are as given below:

\[ c_h = 2,\quad c_s = 5,\quad p = 1. \]

We consider a horizon \(T = 15\), and solve the dynamic program shown above. The optimal value function and policy are shown below: