```
p = 0.5
q = 1
r = function(w,a){ if(w<=a) { return q*w - p*a } else { return q*a - p*a } }
a_opt = inverseCDF( (q-p)/q )
config = ({
// Kumaraswamy Distribution: https://en.wikipedia.org/wiki/Kumaraswamy_distribution
a: 2,
b: 5,
max: 100
})
pdf = {
const a = config.a
const b = config.b
return function(x) {
var normalized = x/config.max
return a*b*normalized**(a-1)*(1 - normalized**a)**(b-1)
}
}
inverseCDF= {
const a = config.a
const b = config.b
return function(y) {
// Closed form expression for inverse CDF of Kumaraswamy distribution
return config.max * (1 - (1-y)**(1/b))**(1/a)
}
}
points = {
const n = 1000
var points = new Array(n)
var cdf = 0;
for (var i = 0 ; i < n; i++) {
var x = config.max * i/n
cdf = cdf + pdf(x) * 1/n
points[i] = {demand: x, pdf: pdf(x), CDF: cdf, reward: r(x,action) }
}
return points
}
cost_values = {
const n = 1000
var points = new Array(n)
for (var i = 0 ; i < n; i++) {
var x = config.max * i/n
points[i] = {action: x, performance: J(x)}
}
return points
}
J = {
const a = config.a
const b = config.b
return function(action) {
const n = 1000
var cost = 0
var w = 0
for (var i = 0; i < n; i++) {
w = config.max*i/n
if (w <= action) {
cost += (q*w - p*action)*pdf(w)/n
} else {
cost += (q*action - p*action)*pdf(w)/n
}
}
return cost
}
}
```

# 2 The newsvendor problem

Keywords

stochastic optimization, newsvendor problem

Each morning, a newsvendor has to decide how many newspapers to buy before knowing the demand during the day. The newsvendor purchases a newspaper at a cost of \(\$p\) per newspaper and sells them at a cost of \(\$q\) per newspaper, where \(q > p\). Any unsold newspapers at the end of the day have no salvage value.

Let \(a\) denote the number of newspapers bought and \(W\) denotes the demand. If \(W < a\), then the newsvendor will sell \(W\) newspapers and receive a total earnings of \(q W - p a\). If \(W \ge a\), then the newsvendor will sell \(a\) newspapers and receive a total earning of \(q a - p a\). Thus, the *reward* is \(r(a,W)\), where

\[r(a, w) = \begin{cases} q w - p a, & \text{if } w < a, \\ q a - p a, & \text{if } w \ge a. \end{cases} \]

## 2.1 Interlude with continuous version

The problem above has discrete action and discrete demand. To build intuition, we first consider the case where both the actions and demand are continuous. Let \(f(w)\) denote the probability density of the demand and \(F(w)\) denote the cumulative probability density. Then, the expected reward is \[ \begin{equation} \label{eq:J} J(a) = \int_{0}^a [ q w - p a ] f(w) dw + \int_{a}^\infty [ q a - p a ] f(w) dw. \end{equation}\]

To fix ideas, we consider an example where \(p = 0.5\), \(q = 1\), and the demand is a :Kumaraswamy distribution with parameters \((2,5)\) and support \([0,100]\). The performance of a function of action is shown below.

In Figure 2.1(a), the plot of \(J(a)\) is concave. We can verify that this is true in general.

Verify that \(J(a)\) is concave

To verify that the function \(J(a)\) is concave, we compute the second derivative: \[ \frac{d^2 J(a)}{da^2} = - p f(a) - (q - p) f(a) = -q f(a) \le 0. \]

This suggests that we can use calculus to find the optimal value. In particular, to find the optimal action, we need to compute the \(a\) such that \(dJ(a)/da = 0\).

**Proposition 2.1** For the newsvendor problem with continuous demand, the optimal action is \[
a = F^{-1}\left( 1 - \frac{p}{q} \right).
\] In the literature, the quantity \(1 - (p/q)\) is called the *critical fractile*.

Proof

Leibniz integral rule

\[ \dfrac{d}{dx} \left( \int_{p(x)}^{q(x)} f(x,t) dt \right) = f(x, q(x)) \cdot \dfrac {d}{dx} q(x) - f(x, p(x)) \cdot \dfrac {d}{dx} p(x) + \int_{p(x)}^{q(x)} \dfrac{\partial}{\partial x} f(x,t) dt. \]

Using the Leibniz integral rule, the derivative of the first term of \(\eqref{eq:J}\) is \[ [q a - p a ] f(a) + \int_{0}^a [ -p ] f(w) dw = [q a - p a ] f(a) - p F(a). \]

Similarly, the derivative of the second term of \(\eqref{eq:J}\) is \[ - [q a - p a] f(a) + \int_{a}^{\infty} (q-p)f(w)dw = - [q a - p a] f(a) + (q -p)[ 1 - F(a)]. \]

Combining the two, we get that \[ \dfrac{dJ(a)}{da} = - p F(a) + (q - p) [ 1 - F(a) ]. \]

Equating this to \(0\), we get \[ F(a) = \dfrac{ q - p }{ q} \quad\text{or}\quad a = F^{-1} \left( 1 - \dfrac{ p }{ q } \right). \]

Graphical interpretation of the result

The result of Proposition 2.1 has a nice graphical interpretation. Draw the CDF of the demand. The optimal action is the point where the CDF intersects the horizontal line \(1 - p/q\).

## 2.2 Back to discrete version

Now, we come back to the problem with discrete actions and discrete demand. Suppose \(W\) takes the values \(\ALPHABET W = \{ w_1, w_2, \dots, w_k \}\) (where \(w_1 < w_2 < \cdots < w_k\)) with probabilities \(\{ μ_1, μ_2, \dots, μ_k \}\). It is easy to see that in this case the action \(a\) should be in the set \(\{ w_1, w_2, \dots, w_k \}\).

To fix ideas, we repeat the above numerical example when \(\ALPHABET W = \{0, 1, \dots, 100\}\).

In the discrete case, the brute force search is easier (because there are a finite rather than continuous number of values). We cannot directly use the ideas from calculus because functions over discrete domain are not differentiable. But we can use a very similar idea. Instead of checking if \(dJ(a)/da = 0\), we check the sign of \(J(w_{i+1}) - J(w_i)\).

**Proposition 2.2** Let \(\{M_i\}_{i \ge 1}\) denote the cumulative mass function of the demand. Then, the optimal action is the largest value of \(w_i\) such that \[
M_i \le 1 - \frac{p}{q}.
\]

Proof

The expected reward for choice \(w_i\) is \[ \begin{align*} J(w_i) &= \sum_{j < i} μ_j [ q w_j - p w_i ] + \sum_{j \ge i} μ_j [q w_i - p w_i] \\ &= -p w_i + q \Bigl[ \sum_{j < i} μ_j w_j + \sum_{j \ge i} μ_j w_i \Bigr]. \end{align*}\]

Thus, \[ \begin{align*} J(w_{i+1}) - J(w_i) &= -p w_{i+1} + q \Bigl[ \sum_{j < i+1} μ_j w_j + \sum_{j \ge i+1} μ_j w_{i+1} \Bigr] \\ &\quad + p w_i - q \Bigl[ \sum_{j < i} μ_j w_j + \sum_{j \ge i} μ_j w_i \Bigr] \\ &= -p (w_{i+1} - w_i) + q \Bigl[ \sum_{j \ge i + 1} μ_j ( w_{i+1} - w_i) \Bigr] \\ &= \big( - p + q [ 1 - M_i ] \big) (w_{i+1} - w_i). \end{align*}\] Note that \[ M_i \le \dfrac{q-p}{q} \iff -p + q [ 1 - M_i ] \ge 0. \] Thus, for all \(i\) such that \(M_i \le (q-p)/q\), we have \(J(w_{i+1}) \ge J(w_i)\). On the other hand, for all \(i\) such that \(M_i > (q-p)/q)\), we have \(J(w_{i+1}) < J(w_i)\). Thus, the optimal amount to order is the largest \(w_i\) such that \(M_i \le (q-p)/q\).

Graphical interpretation of the result

The structure of the optimal solution is the same for continuous and discrete demand distributions. In particular, the result of Proposition 2.2 has the same graphical interpretation as that of ?prp-newscendor-cts:

Draw the CDF of the demand. The optimal action is the point where the CDF intersects the horizontal line \(1 - p/q\).

## Exercises

**Exercise 2.1 (Qualitative properties of optimal solution)** Intuitively, we expect that if the purchase price of the newspaper increases but the selling price remains the same, then the newsvendor should buy less newspapers. Formally prove this statement.

*Hint*: The CDF of a distribution is a weakly increasing function.

**Exercise 2.2 (Monotonicity of optimal action)** Consider two scenarios for the case with continuous demand and actions. In scenario 1, the demand is distributed according to PDF \(f_1\). In scenario 2, it is distributed according to PDF \(f_2\). Suppose \(F_1(w) \le F_2(w)\) for all \(w\). Show that the optimal action \(a_1\) for scenario 1 is greater than the optimal action \(a_2\) for scenario 2.

*Hint*: Plot the two CDFs and try to interpret the optimal decision rule graphically.

**Exercise 2.3 (Selling random wind)** The amount \(W\) of power generated by the wind turbine is a positive real-valued random variable with probability density function \(f\). The operator of the wind turbine has to commit to provide a certain amount of power in the day-ahead market. The price of power is \(\$p\) per MW.

If the operator commits to provide \(a\) MWs of power and the wind generation \(W\) is less than \(a\), then he has to buy the balance \(a - W\) from a reserves market at the cost of \(\$ q\) per unit, where \(q > p\). Thus, the reward of the operator is \(r(a,W)\) where \[ r(a, w) = \begin{cases} p a, & \text{if } w > a \\ p a - q (a - w), & \text{if } w < a. \end{cases}\]

Find the value of commitment \(a\) that maximizes the expected reward.

## Notes

Perhaps the earliest model of the newsvendor problem appeared in Edgeworth (1888) in the context of a bank setting the level of cash reserves to cover demands from its customers. The solution to the basic model presented above and some of its variants was provided in Morse and Kimball (1951); Arrow et al. (1952); Whitin (1953). See Porteus (2008) for an accessible introduction.

The property \(F_1(w) \le F_2(w)\) used in Exercise 2.2 is called stochastic dominance. Later in the course, we will study how stochastic dominance is useful to establish monotonicity properties of general MDPs.

The example of selling random wind in Exercise 2.3 is taken from Bitar et al. (2012).