7 Inventory Management

Author

Affiliation

Aditya Mahajan

McGill University

Updated

August 21, 2024

Keywords

inventory management, post-decision state, base-stock policy, structural results

How to manage inventory — Image credit: ChatGPT-4o

Summary

The inventory management example illustrates that a dynamic programming formulation is useful even when a closed form solution does not exist. This model also introduces the idea of post-decision state, which is useful in many contexts.

Imagine a retail store that stockpiles products in its warehouse to meet random demand. Suppose the store procures new stocks at the end of each day (and that there is no lead time and stocks are available next morning). Let

$S_t \in \integers$ denote the amount of stock at the beginning of day $t$,
$A_t \in \integers_{\ge 0}$ denote the stock ordered (and immediately delivered) at the beginning of day $t$, and
$W_t \in \integers_{\ge 0}$ denote the demand during day $t$.

The random variables $\{W_t\}_{t \ge 1}$ are i.i.d. with known probability distribution.

Excess demand is backlogged and filled when new inventory becomes available. Thus, the stock evolves according to \[S_{t+1} = S_t + A_t - W_t,\] where negative stock denotes backlogged demand.

The cost incurred during day $t$ consists of two components:

A procurement cost of $p A_t$, where $p$ is the cost per unit.
At the end of the day, if the stock $S_{t+1}$ is positive, then there is a holding cost of $c_h S_{t+1}$ for storing excess inventory; if $S_{t+1}$ is negative, then a shortage cost of $-c_s S_{t+1}$ for unfilled demand.

We denote this cost by $h(S_{t+1})$, where \[ h(s) = \begin{cases} c_h s, & \text{if } s \ge 0 \\ -c_s s, & \text{if } s < 0 \end{cases}\]

7.1 Dynamic programming decomposition

$\def\S{\mathbb{S}}$

The above model is a Markov decision process. Therefore, the optimal solution is given by dynamic programming.

Instead of $\integers$, we use $\S$ to denote the possible values of states. The reason is that we will later consider the case when the state space is the set of reals, and we can still use the same equations.

Proposition 7.1 (Dynamic programming) Define the following value functions $V^*_t \colon \S \to \reals$ \[V^*_{T+1}(s) = 0\] and for $t \in \{T, \dots, 1\}$ \[ Q^*_t(s, a) = p a + \EXP[ h(s + a - W_t) + V^*_{t+1}( s + a - W_t ) ]\] and \[ \begin{align*} V^*_t(s) &= \min_{a \in \S_{\ge 0}} Q^*_t(s,a) \\ π^*_t(s) &= \arg \min_{a \in \S_{\ge 0}} Q^*_t(s,a) \end{align*} \] Then the strategy $π^* = (π^*_1, \dots, π^*_T)$ is optimal.

It is possible to simplify the above dynamic program by exploiting a feature of the model. Notice that the dynamics can be split into two parts: \[ \begin{align*} Z_t &= S_t + A_t, \\ S_{t+1} &= Z_t - W_t. \end{align*} \] The first part, $Z_t$, depends only on the current state and action. The second part depends only on $Z_t$ and a primitive random variable. In this particular model, $Z_t$ is a deterministic function of $S_t$ and $A_t$; but, in general, it could be stochastic as well; what is important is that the second part should only depend on $Z_t$ and a primitive random variable. The variable $Z_t$ is sometimes called the post-decision state.

Now write the dynamic program in terms of the post-decision state as follows. Define \[ H_t(z) = \EXP[ h(z - W) + V^*_{t+1}(z-W) ].\] Then the value function and optimal policy at time $t$ can be written as: \[ \begin{align*} V^*_t(s) &= \min_{a \in \S_{\ge 0}} \bigl\{ pa + H_t(s + a) \bigr\}, \\ π^*_t(s) &= \arg \min_{a \in \S_{\ge 0}} \bigl\{ pa + H_t(s + a) \bigr\}. \end{align*} \]

Note that the problem at each step is similar to the newsvendor problem. So, similar to that model, we try to see if we can establish qualitative properties of the optimal solution.

To fix ideas, let’s solve this dynamic program for a specific instance.

demandPlot = Plot.plot({
  grid: true,
  marks: [
    // Axes
    Plot.ruleX([0]),
    Plot.ruleY([0]),
    // Data
    Plot.line(W, {x:"demand", y:"probability",curve:"step-after"})
  ]
})

Figure 7.1: Demand Distribution

We assume that the demand is distributed according to a Binomial(50,0.4) distribution, as shown in Figure 7.1. We assume that the model parameters are as given below:

\[ c_h = 2,\quad c_s = 5,\quad p = 1. \]

We consider a horizon $T = 15$, and solve the dynamic program shown above. The optimal value function and policy are shown below:

viewof time = Object.assign(Inputs.range([1, T], {label: "t", step: 1, value: 1 }), {style: '--label-width:20px'})
viewof cs_val = Object.assign(Inputs.range([0.5, 5], {label: "cs", step: 0.5, value: cs }), {style: '--label-width:20px'})

valuePlot = Plot.plot({
  grid: true,
  y: { domain: [0, 500] },
  x: { domain: [xmin, xmax] },
  marks: [
    // Axes
    Plot.ruleX([0]),
    Plot.ruleY([0]),
    // Data
    Plot.line(DP.filter(d => d.time == time && d.shortage == cs_val), {x:"state", y:"value", curve:"step-after"})
  ]
})

actionPlot = Plot.plot({
  grid: true,
  y: { domain: [0, 35] },
  x: { domain: [-10, 30] },
  marks: [
    // Axes
    Plot.ruleX([0]),
    Plot.ruleY([0]),
    // Data
    Plot.line(DP.filter(d => d.time == time && d.shortage == cs_val), {x:"state", y:"policy", curve:"step-after"})
  ]
})

(a) Value function

(b) Optimal policy

Figure 7.2: Dynamic programming solution for the example

The plots above suggest that the optimal policy has a structure. Play around with the value of the shortage cost to see if that structure is retained.

We will now see how to prove the structure of optimal policy.

7.2 Structure of optimal solution

In this section, we assume that the state space $\S$ is equal to $\reals$ (instead of $\integers$). See Exercise 7.1 for the case when $\S$ is equal to $\integers$.

For this setting, the optimal policy is then characterized as follows.

Theorem 7.1 Define \[ σ_t = \arg \min_{z \in \reals} \bigl\{ p z + H_t(z) \bigr\} . \] Then, \[\begin{equation} \label{eq:V} V^*_t(s) = \begin{cases} H_t(σ_t) + p (σ_t - s), &\text{if } s \le σ_t \\ H_t(s) , & \text{otherwise } \end{cases} \end{equation}\] and \[\begin{equation}\label{eq:pi} π^*_t(s) = \begin{cases} σ_t - s, &\text{if } s \le σ_t \\ 0, & \text{otherwise } \end{cases}\end{equation}\]

Furthermore, for all $t$, $H_t(z)$ and $V^*_t(s)$ are convex in $z$ and $s$, respectively.

Base-stock policy

The optimal policy given by \eqref{eq:pi} is called a base-stock policy. It states that there is a base-stock level $\{σ_t\}_{t \ge 1}$ for every time step. If, at the beginning of time $t$, the value of the current stock is below the base stock level $σ_t$, then the optimal decision is to order more goods so that the level of the stock equals the base stock level.

points = {
  const n = 100
  const Smax = 10

  const f = function(s) { return (s-5)**2 }

  var points = new Array()
  var W = [0, 1, 2 ]
  var Pw = [1/3, 1/3, 1/3]

  var idx = 0
  for( var i = 0; i < n; i++) {
    var s = Smax*i/n
    var sum = 0
    var min = 10000 // CHECK: Large positive number
    for(var w = 0; w < W.length; w++) {
      sum += f(s + W[w])
      min = Math.min(min, f(s+W[w]))
      points[idx++] = { state: s, value: f(s + W[w]), noise:W[w], type: "noise" }
    }
    points[idx++] = { state: s, value: sum/W.length, type: "average" }
    points[idx++] = { state: s, value: min, type: "minimum" }
  }
  return points
  }

We first establish some preliminary results.

averagePlot = Plot.plot({
  grid: true,
  y: { domain: [0, 25] },
  marginRight: 40,
  marginTop: 40,

  marks: [
    // Axes
    Plot.ruleX([0]),
    Plot.ruleY([0]),
    // Data
    Plot.line(points, {filter: d =>  d.value <= 25 && d.type == "noise", 
                       x:"state", y:"value", z:"noise", stroke:"gray"}),
    Plot.text(points, Plot.selectLast({
      filter: d => d.value <= 25 && d.type == "noise", 
      x: "state",
      y: "value",
      z: "noise",
      text: d => "w = " + d.noise,
      textAnchor: "start",
      dx: 3
    })),
    Plot.line(points, {filter: d => d.value <= 25 && d.type == "average", 
                       x:"state", y:"value", stroke:"red", strokeWidth: 4}),
    Plot.text(points, Plot.selectLast({
      filter: d => d.value <= 25 && d.type == "average", 
      x: "state",
      y: "value",
      z: "noise",
      text: "type",
      textAnchor: "start",
      fill:"red",
      dy: -10
    }))
  ]
})

Figure 7.3: An example showing that the average of convex functions is convex

For any convex function $f \colon \reals \to \reals$, $F(s) = \EXP[ f(s - W) ]$ is convex.

Proof For any realization of $W$, $f(s - w)$ is convex in $s$. The expectation w.r.t. $W$ is simply the sum of convex functions and is, therefore, convex.
For any convex function $f \colon \reals \to \reals$, let $σ = \arg \min_{s \in \reals} f(s)$. (If the arg min is not unique, take $σ_t$ to be the smallest arg min). Then, \[\arg \min_{a \in \reals_{\ge 0}} f(s + a) = \begin{cases} 0, & \text{if } s > σ, \\ σ - s, & \text{if } s \le σ \end{cases}\] and \[F(s) = \min_{a \in \reals_{\ge 0}} f(s+a) = \begin{cases} f(s), & \text{if } s > σ \\ f(σ), & \text{if } s \le σ \end{cases}\] and $F(s)$ is convex in $s$.

minimumPlot = Plot.plot({
  grid: true,
  y: { domain: [0, 25] },
  marginRight: 60,
  marginTop: 40,

  marks: [
    // Axes
    Plot.ruleX([0]),
    Plot.ruleY([0]),
    // Data
    Plot.line(points, {filter: d =>  d.value <= 25 && d.type == "noise", 
                       x:"state", y:"value", z:"noise", stroke:"gray"}),
    Plot.text(points, Plot.selectLast({
      filter: d => d.value <= 25 && d.type == "noise", 
      x: "state",
      y: "value",
      z: "noise",
      text: d => "f(s +" + d.noise + ")",
      textAnchor: "start",
      dx: 3
    })),
    Plot.line(points, {filter: d => d.value <= 25 && d.type == "minimum", 
                       x:"state", y:"value", stroke:"red", strokeWidth: 4}),
    Plot.text(points, Plot.selectLast({
      filter: d => d.value <= 25 && d.type == "minimum", 
      x: "state",
      y: "value",
      text: "type",
      textAnchor: "start",
      fill:"red",
      dy: -10
    }))
  ]
})

Figure 7.4: An example showing the minimum of $f(s)$, $f(s+1)$, $f(s+2)$.

We first see an illustration of $F(s) = \min\{ f(s), f(s+1), f(s+2) \}$ in Figure 7.4. Note that the resulting function is not convex because $a$ takes only discrete values. But the plot shows that the minimum will look like when we allow $a$ to take continuous values.

If there were no constraint on $a$, then the minimizer will be $a = σ - s$. If $s \le σ$, then $a = σ -s \in \reals_{\ge 0}$ is the minimizer for the constrained problem as well. On the other hand, if $s \ge σ$, then the function $f(s + a)$ is increasing as a function of $a$. Hence, the minimizer for the constrained problem is $a = 0$.

Proof of the structural result

To prove the result, we define \[ f_t(z) = pz + H_t(z). \] Then, \[ V^*_t(s) = \min_{a \in \reals_{\ge 0}} \bigl\{ p(s + a) + H_t(s + a) \bigr\} - p s = \min_{a \in \reals_{\ge 0}} f_t(s+a) - p s. \] As usual, we prove the result by backward induction. For $t=T$, \[H_T(z) = \EXP[ h(z - W) ] \] which is convex because $h(z)$ is convex. $f_T(z) = p z + H_T(z)$ is the sum of a linear function and convex function and is, therefore, convex. Then, by fact 2 above, \[π^*_T(s) = \arg \min_{a \in \reals_{\ge 0}} f_T(s+a) = \max(σ_T - s, 0) \] and \[V^*_T(s) = \min_{a \in \reals_{\ge 0}} f_T(s + a) - p s = \begin{cases} f_T(s) - p s, & \text{if } s > σ_T \\ f_T(σ_T) - p s, & \text{if } s \le σ_T. \end{cases} \] Substituting $f_t(z) = p z + H_t(z)$, we get that both $V^*_T$ and $π^*_T$ have the desired form and $V^*_T$ is convex. This forms the basis of induction.

Now assume that $V^*_{t+1}(s)$ is convex and of the form \eqref{eq:V}. Now note that, by fact 1, \[ H_t(z) = \EXP[ h(z - W) + V^*_{t+1}(z - W) ]\] is convex. Hence, $f_t(z)$ is convex. Therefore, by fact 2 above, \[ π^*_t(s) = \max(σ_t - s, 0)\] and $V^*_t(s)$ is of the desired form and convex.

Thus, the result is holds by induction.

7.3 Variations of a theme: lost sales

Consider a variation of the basic model where the excess demand is lost rather than backlogged. Thus, $\S = [0, ∞)$ and the stock evolves according to \[ S_{t+1} = [S_t + A_t - W_t]^{+} \] where $[x]^{+} \coloneqq \max\{x, 0\}$. There is still a negative per-step cost associated with excess demand. Thus, the per-step cost is \[ p A_t + h(S_t + A_t - W_t). \]

We will write the DP slightly differently (in terms of the function $f_t$ rather than $H_t$). Define \[ V^*_{T+1}(s) = 0\] and for $t \in \{T, \dots, 1\}$, \[ f_t(z) = pz + \EXP[c_h [z-W]^{+} - c_s [z-W]^{-} + V^*_{t+1}([z - W]^{+})] \] where $[x]^{-} = \min\{0, x\}$ and \[ \begin{align*} V^*_t(s) &= \min_{a \ge 0} \bigl\{ f_t(s+a) \bigr\} - ps , \\ π^*_t(s) &= \arg \min_{a \ge 0} \bigl\{ f_t(s+a) \bigr\}. \end{align*} \]

Theorem 7.2 Assume that $p < c_s$. Define \[ σ_t = \arg \min_{z \in \reals} \bigl\{ f_t(z) \bigr\} . \] Then, $σ_t \ge 0$, \[\begin{equation} \label{eq:V-lost} V^*_t(s) = \begin{cases} f_t(σ_t) - ps , &\text{if } s \le σ_t \\ f_t(s) - p s , & \text{otherwise } \end{cases} \end{equation}\] and \[\begin{equation}\label{eq:pi-lost} π^*_t(s) = \begin{cases} σ_t - s, &\text{if } s \le σ_t \\ 0, & \text{otherwise } \end{cases}\end{equation}\]

Note that we cannot follow the same proof argument as before because for a convex function $\EXP[f([s-W]^{+})]$ is not necessarily convex. So we cannot show that $H_t(z)$ is convex. So, we need a slightly different proof argument.

Proof

We will show the following fact.

Fact. If $f_{t+1}$ is convex, then $V^*_{t+1}, π^*_{t+1}$ is of the specified form and \[g_t(z) = V^*_{t+1}([z]^{+}) - c_s [z]^{-}\] is convex.

Implication of fact. Convexity of $g_t$ implies that $g_t(z) + c_h [z]^{+}$ is convex (since it is sum of two convex functions). Hence, $f_t(z)$ is convex (since it is sum of a linear function and weighted average of convex functions). This implies the result holds by induction.

Proof of fact. The fact that convexity of $f_{t+1}$ implies the structure of $V^*_{t+1}$ and $π^*_{t+1}$ follows from the same argument as Theorem 7.1. Then, by definition of $g_t$, we have \[ g_t(z) = \begin{cases} f_{t+1}(σ_{t+1}) - c_s z, & \hbox{if } 0 < z \\ f_{t+1}(σ_{t+1}) - p z, & \hbox{if } 0 \le z < σ_{t+1} \\ f_{t+1}(z) - p z, & \hbox{if } σ_{t+1} \le z \end{cases} \] In the range $(-∞, 0)$ the function is linear and has slope $-c_s$; in the range $[0, σ_{t+1})$ the function is linear and has slope $-p$; in the range $[σ_{t+1}, ∞)$, the function is convex and increasing and has a slope greater than $-p$. Hence, the slope of $g_t$ is increasing from $(-∞,∞)$ and hence the function is convex.

The above argument also shows that the slope of $g_t(z)$ (and therefore $g_t(z) + c_h [z]^{+}$ is $-c_s$ in $(-∞,0)$. Hence, the slope of $f_t(z)$ is strictly negative in $(-∞, 0)$. Thus, $σ_t \not\in (-∞,0)$.

7.4 Variations of a theme: positive ordering cost

We now consider the case when the store has pay a fixed cost $K$ every time it places an order. Thus, the cost of ordering the inventory is \[ \begin{cases} pa + K, & \hbox{if } a > 0 \\ 0, & \hbox{otherwise} \end{cases} \] The rest of the model is the same as before. Therefore, the dynamic programming decomposition in this case is as follows:

Proposition 7.2 (Dynamic programming) Define the following value functions $V^*_t \colon \S \to \reals$ \[V^*_{T+1}(s) = 0\] and for $t \in \{T, \dots, 1\}$ \[ Q^*_t(s, a) = p a + \textcolor{red}{K \IND\{a > 0\}} + \EXP[ h(s + a - W_t) + V^*_{t+1}( s + a - W_t ) ]\] and \[ \begin{align*} V^*_t(s) &= \min_{a \in \S_{\ge 0}} Q^*_t(s,a) \\ π^*_t(s) &= \arg \min_{a \in \S_{\ge 0}} Q^*_t(s,a) \end{align*} \] Then the strategy $π^* = (π^*_1, \dots, π^*_T)$ is optimal.

The only difference between Proposition 7.1 and Proposition 7.2 is the red term in $Q^*_t(s,a)$.

As before, we simplify the dynamic program by working with a post-decision state $Z_t = S_t + A_t$. To write the dynamic program in terms of the post-decision state, we define \[ H_t(z) = \EXP[ h(z - W) + V^*_{t+1}(z - W) ]. \] Then, we define the value function as \[ V^*_t(s) = \min\biggl\{ H_t(s), \min_{a \in \S_{> 0}} \bigl\{ pa + K + H_t(s+a) \bigr\} \biggr\}. \] Note that we have written the case $a=0$ separately.

To understand the nature of the optimal solution, we repeat the simulation given above with $K=4$.

viewof time_K = Object.assign(Inputs.range([1, T], {label: "t", step: 1, value: 1 }), {style: '--label-width:20px'})
viewof cs_val_K = Object.assign(Inputs.range([0.5, 5], {label: "cs", step: 0.5, value: cs }), {style: '--label-width:20px'})

valuePlot_K = Plot.plot({
  grid: true,
  y: { domain: [0, 500] },
  x: { domain: [xmin, xmax] },
  marks: [
    // Axes
    Plot.ruleX([0]),
    Plot.ruleY([0]),
    // Data
    Plot.line(DP_K.filter(d => d.time == time_K && d.shortage == cs_val_K), {x:"state", y:"value", curve:"step-after"})
  ]
})

actionPlot_K = Plot.plot({
  grid: true,
  y: { domain: [0, 35] },
  x: { domain: [-10, 30] },
  marks: [
    // Axes
    Plot.ruleX([0]),
    Plot.ruleY([0]),
    // Data
    Plot.line(DP_K.filter(d => d.time == time_K && d.shortage == cs_val_K), {x:"state", y:"policy", curve:"step-after"})
  ]
})

(a) Value function

(b) Optimal policy

Figure 7.5: Dynamic programming solution for the example

Thus, the structure of optimal policy has changed. There is still a threshold $σ_t$, and we only place an order when $s < σ_t$. However, when we do place an order, we order an amount $Σ_t$ which is greater than $σ_t$. Such a policy is called an $(s,S)$-policy.

We need a new proof technique to prove this result. The proof for $K=0$ replied on convexity of $H_t$. However, when $K > 0$, $H_t$ is no longer convex. It does, however, satisfy a relaxed property known as $K$-convexity.

K-convexity

A function $f \colon \reals \to \reals$ is called $K$-convex (for $K \ge 0$) if for all $x,y \in \reals$ such that $x \le y$ and $\lambda \in [0,1]$, we have \[ f(\lambda x + (1 - \lambda) y) \le \lambda f(x) + (1 - \lambda)[ f(y) + K ]. \]

Sometimes, $K$-convexity is stated differently. Take any $u, z \ge 0$ and $b > 0$ and consider the change of variables: \[ \lambda = \frac{z}{z + b}, \quad x = u - b, \quad y = u + z. \] Then, the definition of $K$-convexity is equivalent to \[ f(u) \le \frac{z}{b+z} f(u-b) + \frac{b}{b+z} \biggl[ f(u+z) + K \biggr]. \] Rearranging terms, we get \[ f(u) + z \biggl[ \frac{ f(u) - f(u - b) }{ b } \biggr] \le f(u + z) + K. \] This definition looks a bit strange, but the right way to interpret it is to assume that $f$ is differentiable and take the limit of $b \to 0$, which gives \[ K + f(u + z) - f(u) - z f'(u) \ge 0, \] which can immediately be seen as a generalization of the standard property of convexity (when $K = 0$).

The definition of $K$-convexity immediately imply the following properties.

Lemma 7.1 (Properties of $K$-convexity)

If $f$ is $K$ convex, it is also $L$-convex for any $L \ge K$. In particular, if $f$ is convex (i.e., $0$-convex), then it is also $K$-convex for $K \ge 0$.
If $f$ is $K$-convex and $g$ is $L$-convex, then $αf + βg$ is $(αK + βL)$-convex.
If $f$ is $K$-convex and $W$ is a random variable then $\EXP[f(x-W)]$ is $K$-convex, provided $\EXP[\ABS{f(x-W)}] < ∞$ for all $x \in \reals$.

The reason that $K$-convexity is useful is because it implies the following.

Lemma 7.2 Let $f \colon \reals \to \reals$ be a continuous $K$-convex function such that $f(x) \to \infty$ as $\ABS{x} \to \infty$. Then, there exist scalars $σ$ and $Σ$ with $σ \le Σ$ such that

$f(Σ) \le f(x)$, for all $x \in \reals$.
$f(σ) = f(Σ) + K < f(x)$ for all $x < σ$.
$f(x)$ is decreasing in $(-\infty, σ)$.
$f(y) \le f(z) + K$ for all $σ \le y \le z$.

Proof

Since $f$ is a continuous function and $f(x) \to ∞$ as $\ABS{x} \to ∞$, there exists a minimizing point of $f$, which we denote by $Σ$. Let $σ$ be defined as \[ σ = \inf\{ z \le Σ : f(z) = f(Σ) + K \}. \] Since $f(-∞) \to ∞$, such a $σ$ must exist.

Now we show that $(σ,Σ)$ defined above satisfy the properties on the lemma.

Follows from the definition of $Σ$.
Follows from the definition of $σ$ and continuity of $f$.
Note that for any $x < y < σ$, we have (taking $u - b = x$, $u = y$, and $u + z = Σ$ in the second definition of $K$-convexity) \[ K + f(Σ) \ge f(y) + \frac{Σ - y}{y - x}( f(y) - f(x) ) \] Also from property 2, we know that \[ f(y) > K + f(Σ). \] Adding these two inequalities, we get \[ 0 > \frac{Σ - y}{y - x}( f(y) - f(x) ) \] which implies $f(x) > f(y)$, proving property 3.
First observe that the property is true for $y = z$, or for $y = Σ$ or $y = σ$. So, there are two remaining possibilities: $y < Σ$ or $y > Σ$.

If $σ < y < Σ$, then by $K$-convexity we have \[ f(σ) = f(Σ) + K \ge f(y) + \frac{Σ-y}{y-σ}(f(y) - f(σ)). \] Rearranging terms, we have \[ \left( 1 + \frac{Σ - y}{y - σ}\right)f(σ) \ge \left( 1 + \frac{Σ - y}{y - σ}\right)f(y). \] Thus, $f(σ) \ge f(y)$. The result then follows by observing that \[ f(z) + K \ge f(Σ) + K = f(σ) \ge f(y). \]

If $Σ < y < z$, then by $K$-convexity we have \[ K + f(z) \ge f(y) + \frac{z - y}{y - Σ}(f(y) - f(Σ)) \ge f(y), \] which proves the result.

Theorem 7.3 Define \[ Σ_t = \arg \min_{z \in \reals} \big\{ pz + H_t(z) \bigr \} \] and \[ σ_t = \inf \{ z \le Σ_t : pz + H_t(z) = pσ_t + H_t(Σ_t) + K \}. \]

Then, \[\begin{equation}\label{eq:value-setup} V^*_t(s) = \begin{cases} K + H_t(Σ_t) + p(Σ_t - s), & \hbox{if } s < σ_t \\ H_t(s), & \hbox{otherwise} \end{cases} \end{equation}\] and \[\begin{equation}\label{eq:pi-setup} π^*_t(s) = \begin{cases} Σ_t - s, & \hbox{if } s \le σ_t \\ 0, & \hbox{otherwise} \end{cases} \end{equation}\] Furthermore, for all $t$, $H_t(z)$ and $V_t^*(s)$ are $K$-convex in $z$ and $s$, respectively.

(s-S) policy

The optimal policy given by \eqref{eq:pi-setup} is called an $(s,S)$- policy, which state that whenever the stock level falls below $σ_t$, the optimal action is replenish the stock to level $Σ_t$.

Proof of the structural result

To prove the result, we define \[ f_t(z) = pz + H_t(z). \] Then, \[\begin{align*} V^*_t(s) &= \min\biggl\{ p s + H_t(s), \min_{a > 0} \bigl\{ p(s+a) + K + H_t(s+a) \bigr\} \biggr\} - ps \\ &= \min\biggl\{ f_t(s), \min_{a > 0} \bigl\{ K + f_t(s+a) \bigr\} \biggr\} - ps. \end{align*}\]

For $t = T$, $H_T(z)$ is convex and therefore $K$-convex. This forms the basis of induction.

We now assume that $H_t(z)$ is $K$-convex. Note that Lemma 7.1 implies (part 2) implies thta $f_t(z)$ is also $K$-convex. By definition, $Σ_t$ is the minimizer of $f_t(z)$ and $σ_t$ is the smallest value of $z$ for which $f_t(z) = f_t(Σ_t) + K$. Therefore, \[\begin{equation}\label{eq:V-setup} V^*_t(s) = \begin{cases} K + f_t(Σ_t) - ps, & \hbox{if } s < σ_t \\ f_t(s) - ps, & \hbox{otherwise} \end{cases} \end{equation}\] and $\pi^*(s)$ is given by $\eqref{eq:pi-setup}$. We will now show that $V^*_t$ and $H_{t-1}$ are $K$-convex.

To simplify the proof, we will assume that $V^*_t$ is differentiable. Thus, to show $K$-convexity, we must verify that for all $z > 0$, \[\begin{equation}\label{eq:verify-setup} K + V^*_t(s + z) - V^*_t(s) - z \GRAD V^*_t(s) \ge 0 \end{equation}\] (where $\GRAD V^*_t$ denote the derivative of $V^*_t$).

We consider three cases:

Case 1: $σ_t < s \le s + z$. In this case, $V^*_t(s)$ is the sum of a $K$-convex and an affine (there $0$-convex) function. Hence, by property 2 of Lemma 7.1, it is $K$-convex. Thus, $\eqref{eq:verify-setup}$ is satisfied.
Case 2: $s \le s + z \le σ_t$. In this region, $V^*_t(s)$ is affine and therefore $K$-convex.
Case 3: $s < σ_t < s + z$. In this case, left hand side of $\eqref{eq:verify-setup}$ equals \[\begin{align*} &K + f_t(s+z) - p(s+z) - [K + f_t(Σ_t) - ps] + pz \\ &\quad = K + f_t(s+z) - p(s+z) -[K + f_t(Σ_t) - p(s+z)] \end{align*}\] which is positive because action we know that at $s + z > σ_t$, not ordering is better than ordering.

This completes the proof that $V^*_t$ is $K$-convex. Moreover, properties 2 and 3 of Lemma 7.1 imply that $H_{t-1}(z) = \EXP[h(z-W) + V^*_t(z-W)]$ is $K$-convex. This completes the induction step.

Exercises

Exercise 7.1 (Discrete state space) Consider inventory management with zero ordering cost for the case when the state space $\S$ is equal to $\integers$ (i.e., all $S_t, A_t, W_t \in \integers$). In this case, we need the option of discrete convexity, which we explain below.

Discrete convexity or $L^{\#}$ convexity

A function $f \colon \integers \to \reals$ is called convex (or $L^{\#}$ convex) if for any $x \in \integers$, \[ f(x+1) + f(x-1) \ge 2 f(x), \] or, equivalently, for any $x, y \in \integers$ \[ f(x) + f(y) \ge f\Bigl(\Bigl\lfloor \frac{x+y}{2} \Bigr\rfloor\Bigr) + f\Bigl(\Bigl\lceil \frac{x+y}{2} \Bigr\rceil\Bigr).\]

It can be easily seen that $L^{\#}$ functions satisfy the following properties:

Sum of $L^{\#}$ convex functions is $L^{\#}$ convex.
Pointwise limits of $L^{\#}$ convex functions is $L^{\#}$ convex.

See Murota (1998) and Chen (2017) for more details.

Argue that the preliminary properties of convex functions established in the preliminary results also hold for discrete convex functions $f \colon Z \to \reals$.
Argue that the structure of optimal policies continue to hold, i.e., there exists a sequence $\{σ_t\}$, $s_t \in \integers$ such that the policy \[ π^*_t(s) = \begin{cases} σ_t - s, & \text{if $s \le σ_t$} \\ 0, & \text{otherwise}. \end{cases} \]

Remark: Exactly the same argument works if the state space $\{ n \Delta : n \in \integers \}$.

Exercise 7.2 (Bounded inventory level) Consider inventory management with zero ordering cost with the only difference being that there is an upper bound $\overline B$ and a lower bound $\underline B$ on the allowable value of the stock. In particular, let $W_{\max} > 0$ be the maximum value of the demand (and we assume $\underline B + W_{\max} < \overline B$), then the bounds impose a constraint \[ \underline B + W_{\max} \le S_t + A_t \le \overline B. \] Show that the optimal policy is still a base-stock policy.

Notes

Inventory management models with deterministic demand were introduced by Harris (1913). The mathematical model of inventory management considered here was originally proposed by Arrow et al. (1952). The optimality of base-stock policy was established by Bellman et al. (1955). See the notes on infinite horizon version of this model to see how to find the threshold in closed form.

A model where $\S = \reals$ but $\ALPHABET A = \integers_{\ge 0}$ is considered in Veinott (1965). It’s generalization with positive ordering cost is considered in Tsitsiklis (1984).

Sufficient conditions for the optimality of $(s,S)$-policies were presented in Dvoretzky et al. (1953). The notion of $K$-convexity and optimality of $(s,S)$-policies is due to Scarf (1960). An alternative proof under a different set of assumptions was provided by Veinott (1966). The proof of Lemma 7.2 is borrowed from Bertsekas (2011). The proof of Theorem 7.3 is from Scarf (1960). See Bertsekas (2011) for an alternative proof of Theorem 7.3 and generalization of Theorem 7.3 to the setting of lost sales (discussed in Exercise 4.6 of Bertsekas (2011)).

For a detailed overview of inventory management, see Zipkin (2000).