# 14 Inventory management (revisted)

inventory management, base-stock policy, reward shaping, structural results, stochastic optimization, infinite horizon, discounted cost

Let’s reconsider the model for inventory management and assume that it runs for an infinite horizon. We assume that the per-step cost is given by \[c(s,a,s_{+}) = p a + γ h(s_{+}), \] where \[ h(s) = \begin{cases} c_h s, & \text{if $s \ge 0$} \\ -c_s s, & \text{if $s < 0$}, \end{cases}\] where \(c_h\) is the per-unit holding cost, \(c_s\) is the per-unit shortage cost, and \(p\) is the per-unit procurement cost. Note that we have assumed that the holding or shortage cost is discounted because this cost is incurred at the end of the time period.

Recall that in the finite horizon setting, the optimal policy was a *base-stock* policy characterized by thresholds \(\{σ_t\}_{t \ge 1}\). In the infinite horizon discounted setting, we expect the optimal policy to be time-homogeneous, i.e., the thresholds \(\{σ_t\}_{t \ge 1}\) be a constant \(σ\) and not to depend on time.

As an illustration, let’s reconsider the example used for the finite horizon setting (where \(c_h = 2\), \(c_s = 5\), \(p=1\), and the demand is Binomial(50,0.4)). We consider the discount factor \(γ = 0.9\). The value function and optimal policy is this case are shown below.