ECSE 506: Stochastic Control and Decision Theory
Aditya Mahajan
Winter 2022
About  Lectures  Notes  Coursework
TL;DR The newsvendor problem is a simple model of stochastic optimization problem where a decision has to be made when there is uncertainty about the outcome. It also shows that for some stochastic optimization problems it is possible to obtain the qualitative properties of the nature of optimal solution.
Each morning, a newsvendor has to decide how many newspapers to buy before knowing the demand during the day. The newsvendor purchases a newspaper at a cost of \(\$p\) per newspaper and sells them at a cost of \(\$q\) per newspaper, where \(q > p\). Any unsold newspapers at the end of the day have no salvage value.
Let \(a\) denote the number of newspapers bought and \(W\) denotes the demand. If \(W < a\), then the newsvendor will sell \(W\) newspapers and receive a total earnings of \(q W  p a\). If \(W \ge a\), then the newsvendor will sell \(a\) newspapers and receive a total earning of \(q a  p a\). Thus, the reward is \(r(a,W)\), where
\[r(a, w) = \begin{cases} q w  p a, & \text{if } w < a, \\ q a  p a, & \text{if } w \ge a. \end{cases} \]
An illustration of the reward function is shown below.
The problem above has discrete action and discrete demand. To build intuition, we first consider the case where both the actions and demand are continuous. Let \(f(w)\) denote the probability density of the demand and \(F(w)\) denote the cumulative probability density. Then, the expected reward is \[ \begin{equation} \label{eq:J} J(a) = \int_{0}^a [ q w  p a ] f(w) dw + \int_{a}^\infty [ q a  p a ] f(w) dw. \end{equation}\]
As an illustration, the PDF of a particular distribution (see the plot later for exact choices of parameters) is shown below. Try to move the slider for \(a\) to try and find an (approximately) optimal value of the action.
We can also plot the performance as a function of action, and identify its maximma as shown in the plot below.
The plot of \(J(a)\) is concave. This suggests that we can use calculus to find the optimal value. In particular, to find the optimal action, we need to compute the \(a\) such that \(dJ(a)/da = 0\). We first recall the Leibniz integral rule: \[ \dfrac{d}{dx} \left( \int_{p(x)}^{q(x)} f(x,t) dt \right) = f(x, q(x)) \cdot \dfrac {d}{dx} q(x)  f(x, p(x)) \cdot \dfrac {d}{dx} p(x) + \int_{p(x)}^{q(x)} \dfrac{\partial}{\partial x} f(x,t) dt. \]
Thus, the derivative of the first term of \eqref{eq:J} is \[ [q a  p a ] f(a) + \int_{0}^a [ p ] f(w) dw = [q a  p a ] f(a)  p F(a). \]
Similarly, the derivative of the second term of \eqref{eq:J} is \[  [q a  p a] f(a) + \int_{a}^{\infty} (qp)f(w)dw =  [q a  p a] f(a) + (q p)[ 1  F(a)]. \]
Combining the two, we get that \[ \dfrac{dJ(a)}{da} =  p F(a) + (q  p) [ 1  F(a) ]. \]
Equating this to \(0\), we get \[ F(a) = \dfrac{ q  p }{ q} \quad\text{or}\quad a = F^{1} \left( \dfrac{ q  p }{ q } \right). \] In the literature, the quantity \((qp)/q\) is called the critical fractile.
An illustration of the optimal decision rule is shown below.
 Remark

Strictly speaking, we also need to verify that the function \(J(a)\) is concave. To do so, we compute the second derivative: \[ \frac{d^2 J(a)}{da^2} =  p f(a)  (q  p) f(a) = q f(a) \le 0. \] Hence, the \(J(a)\) is concave and the extremum value computed above is a maximum.
Now, we come back to the problem with discrete actions and discrete demand. Suppose \(W\) takes the values \(\ALPHABET W = \{ w_1, w_2, \dots, w_k \}\) (where \(w_1 < w_2 < \cdots < w_k\)) with probabilities \(\{ μ_1, μ_2, \dots, μ_k \}\). It is ease to see that in this case the action \(a\) should be in the set \(\{ w_1, w_2, \dots, w_k \}\).
As an illustration, consider the case when \(\ALPHABET W = \{0, 1, \dots, 100\}\).
In the discrete case, the brute force search is easier (because there are a finite rather than continuous number of values). We cannot directly use the ideas from calculus because functions over discrete domain are not differentiable. But we can use a very similar idea. Instead of checking if \(dJ(a)/da = 0\), we check the sign of \(J(w_{i+1})  J(w_i)\).
The expected reward for choice \(w_i\) is \[ \begin{align*} J(w_i) &= \sum_{j < i} μ_j [ q w_j  p w_i ] + \sum_{j \ge i} μ_j [q w_i  p w_i] \\ &= p w_i + q \Bigl[ \sum_{j < i} μ_j w_j + \sum_{j \ge i} μ_j w_i \Bigr]. \end{align*}\]
Thus, \[ \begin{align*} J(w_{i+1})  J(w_i) &= p w_{i+1} + q \Bigl[ \sum_{j < i+1} μ_j w_j + \sum_{j \ge i+1} μ_j w_{i+1} \Bigr] \\ &\quad + p w_i  q \Bigl[ \sum_{j < i} μ_j w_j + \sum_{j \ge i} μ_j w_i \Bigr] \\ &= p (w_{i+1}  w_i) + q \Bigl[ \sum_{j \ge i + 1} μ_j ( w_{i+1}  w_i) \Bigr] \\ &= \big(  p + q [ 1  M_i ] \big) (w_{i+1}  w_i), \end{align*}\] where \(M_i\) is the cumulative probability mass function. Note that \[ M_i \le \dfrac{qp}{q} \iff p + q [ 1  M_i ] \ge 0. \] Thus, for all \(i\) such that \(M_i \le (qp)/q\), \(J(w_{i+1}) \ge J(w_i)\). On the other hand, for all \(i\) such that \(M_i > (qp)/q)\), \(J(w_{i+1}) < J(w_i)\). Thus, the optimal amount to order is the largest \(w_i\) such that \(M_i \le (qp)/q\).
Note that the structure of the optimal solution is the same for continuous and discrete demand distributions.
Exercises
Qualitative properties of optimal solution. The numerical results shown in Figure 4 suggest that, for a fixed value of \(q\), the optimal value of \(a\) is a decreasing function of \(p\). Intuitively, this makes sense: if the purchase price of the newspaper increases, but the selling price remains the same, then the newsvendor should buy less newspapers. Formally prove this statement.
Hint: The CDF of a distribution is a weakly increasing function.
Monotonicity of optimal action. Consider two scenarios for the case with continuous demand and actions. In scenario 1, the demand is distributed according to PDF \(f_1\). In scenario 2, it is distributed according to PDF \(f_2\). Suppose \(F_1(w) \le F_2(w)\) for all \(w\). Show that the optimal action \(a_1\) for scenario 1 is greater than the optimal action \(a_2\) for scenario 2.
Hint: Plot the two CDFs and try to interpret the optimal decision rule graphically.
Selling random wind. The amount \(W\) of power generated by the wind turbine is a positive realvalued random variable with probability density function \(f\). The operator of the wind turbine has to commit to provide a certain amount of power in the dayahead market. The price of power is \(\$p\) per MW.
If the operator commits to provide \(a\) MW of power and the wind generation \(W\) is less than \(a\), then he has to buy the balance \(a  W\) from a reserves market at the cost of \(\$ q\) per unit, where \(q > p\). Thus, the reward of the operator is \(r(a,W)\) where \[ r(a, w) = \begin{cases} p a, & \text{if } w > a \\ p a  q (a  w), & \text{if } w < a. \end{cases}\]
Find the value of commitment \(a\) that maximizes the expected reward.
References
Perhaps the earliest model of the newsvendor problem appeared in Edgeworth (1888) in the context of a bank setting the level of cash reserves to cover demands from its customers. The solution to the basic model presented above and some of its variants was provided in Morse and Kimball (1951); Arrow et al. (1952); Whitin (1953). See Porteus (2008) for an accessible introduction.
The property \(F_1(w) \le F_2(w)\) used in Exercise 2 is called stochastic dominance. Later in the course, we will study how stochastic dominance is useful to establish monotonicity properties of general MDPs.
The example of selling random wind in Exercise 3 is taken from Bitar et al. (2012).
This entry was last updated on 11 Jan 2022 and posted in Stochastics and tagged stochastic optimization.