p = 0.5
q = 1
r = function(w,a){ if(w<=a) { return q*w - p*a } else { return q*a - p*a } }
a_opt = inverseCDF( (q-p)/q )
config = ({
// Kumaraswamy Distribution: https://en.wikipedia.org/wiki/Kumaraswamy_distribution
a: 2,
b: 5,
max: 100
})
pdf = {
const a = config.a
const b = config.b
return function(x) {
var normalized = x/config.max
return a*b*normalized**(a-1)*(1 - normalized**a)**(b-1)
}
}
inverseCDF= {
const a = config.a
const b = config.b
return function(y) {
// Closed form expression for inverse CDF of Kumaraswamy distribution
return config.max * (1 - (1-y)**(1/b))**(1/a)
}
}
points = {
const n = 1000
var points = new Array(n)
var cdf = 0;
for (var i = 0 ; i < n; i++) {
var x = config.max * i/n
cdf = cdf + pdf(x) * 1/n
points[i] = {demand: x, pdf: pdf(x), CDF: cdf, reward: r(x,action) }
}
return points
}
cost_values = {
const n = 1000
var points = new Array(n)
for (var i = 0 ; i < n; i++) {
var x = config.max * i/n
points[i] = {action: x, performance: J(x)}
}
return points
}
J = {
const a = config.a
const b = config.b
return function(action) {
const n = 1000
var cost = 0
var w = 0
for (var i = 0; i < n; i++) {
w = config.max*i/n
if (w <= action) {
cost += (q*w - p*action)*pdf(w)/n
} else {
cost += (q*action - p*action)*pdf(w)/n
}
}
return cost
}
}1 Introduction
Keywords
stochastic optimization, newsvendor problem
Let’s start with the simplest optimization problem. A decision maker has to choose an action \(a \in \ALPHABET A\). Upon choosing the action \(a\), the decision maker incurs a cost \(c(a)\). What action should the decision maker pick to minimize the cost?
Formally, the above optimization problem may be written as \[\begin{equation} \label{eq:basic} \min_{a \in \ALPHABET A} c(a). \end{equation}\]
When the action space \(\ALPHABET A\) is finite, say \(\ALPHABET A = \{1, \dots, m\}\), solving the optimization problem \(\eqref{eq:basic}\) is conceptually straightforward: enumerate the cost of all possible actions, i.e., enumerate the set \(C = \{ c(a) : a \in \ALPHABET A \}\) and find the smallest element. See Figure 1.1(a) for an illustration.
When the action space \(\ALPHABET A\) is continuous, say a compact subset of a Euclidean space, solving the optimization problem \(\eqref{eq:basic}\) is conceptually straightforward only when the cost function \(c\) satisfies some regularity conditions. For example, when \(c\) is convex, the optimal action can be obtained by solving \[ \dfrac {d c(a) }{ da } = 0. \]
In the absence of appropriate regularity conditions, it is not possible to solve an optimization problem over continuous action spaces. See Figure 1.1(b) for an example of a non-convex cost function.
1.1 The stochastic optimization problem
Now consider the simplest stochastic optimization problem. A decision maker has to choose an action \(a \in \ALPHABET A\). Upon choosing the action \(a\), the decision maker incurs a cost \(c(a,W)\), where \(W \in \ALPHABET W\) is a random variable with known probability distribution. Assume that the decision maker is risk neutral and, therefore, wants to minimize \(\EXP[ c(a, W) ]\), where the expectation is with respect to the random variable \(W\).
Formally, the above optimization problem may be written as \[\begin{equation} \label{eq:stochastic} \min_{a \in \ALPHABET A} \EXP[ c(a, W) ]. \end{equation}\]
Define \(J(a) = \EXP[ c(a, W) ]\). Then Problem \(\eqref{eq:stochastic}\) is conceptually the same as Problem \(\eqref{eq:basic}\) with the cost function \(J(a)\). Numerically, Problem \(\eqref{eq:stochastic}\) is more difficult because computing \(J(a)\) involves evaluating an expectation, but we ignore the computational complexity for the time being.
TipCost vs reward
In some settings, it is more natural to maximize a reward function rather than minimize a cost function. The two formulations are equivalent, so we pick the one that is more natural in the given context.
In the rest of this chapter, we present some simple examples of stochastic optimization problems.
1.2 A simple example: Reliability of Multi-Processor System on Chips (MPSoC)
Example 1.1 Consider an MPSoC that requires at least one CPU and one RAM to function properly. The chip has a total capacity for three components, so we can add redundancy to protect against component failure. There are two possible designs:
- Design 1 uses two CPUs and one RAM, whereas
- Design 2 uses one CPU and two RAM modules.
In both cases, the lifetimes of the CPU and RAM are independent and modeled as exponential random variables with parameters \(\lambda_{c}\) and \(\lambda_{r}\), respectively. When two identical components are present, they operate in cold standby: the backup component activates only after the primary fails. The overall subsystem fails only once both components have ceased to function.
Find the design that maximizes the expected lifetime (i.e.,the mean time to failure) of the system.
The above problem can be modeled as a stochastic optimization problem with \(W = (W^1_c, W^2_c, W^1_r, W^2_r)\), where \(W^i_c\) and \(W^i_r\) are the lifetimes of the \(i\)-th CPU and RAM, \(i \in \{1, 2\}\) and \(\ALPHABET A = \{1, 2\}\). Then, the lifetime of the system is \[\begin{align*} r(1, W) &= \min\{ W^1_c + W^2_c, W^2_r \}, \\ r(2, W) &= \min\{ W^1_c, W^1_r + W^2_r \}. \end{align*}\]
The optimal design is the one that maximizes the expected lifetime of the system, i.e., \[ \max_{a \in \ALPHABET A} \EXP[ r(a, W) ]. \]
Before proceeding further, we recall some basic results from probability theory.
If \(X_1\) and \(X_2\) are independent exponential random variables with parameter \(\lambda\), then \[ X_1 + X_2 \sim \text{Gamma}(2, \lambda) \implies \PR(X_1 + X_2 > t) = (1 + \lambda t) e^{-\lambda t}. \]
The expected value of a non-negative random variable \(X\) can be computed as \[ \EXP[X] = \int_0^\infty \PR(X > t) \, dt. \]
Therefore, the expected value of the minimum of two independent non-negative random variables \(X\) and \(Y\) can be computed as \[ \EXP[\min\{X, Y\}] = \int_0^\infty \PR(X > t) \PR(Y > t) \, dt. \]
We also recall some standard integration formulas:
- \(\int_0^\infty e^{-\lambda t} \, dt = \frac{1}{\lambda}\)
- \(\int_0^\infty t e^{-\lambda t} \, dt = \frac{1}{\lambda^2}\)
Thus, we have \[\begin{align*} \EXP[r(1, W)] &= \int_0^\infty \PR(W^1_c + W^2_c > t) \PR(W^2_r > t) \, dt \\ &= \int_0^\infty (1 + \lambda_c t) e^{-\lambda_c t} e^{-\lambda_r t} \, dt \\ &= \int_0^\infty (1 + \lambda_c t) e^{-(\lambda_c + \lambda_r) t} \, dt \\ &= \frac{1}{\lambda_c + \lambda_r} + \frac{\lambda_c}{(\lambda_c + \lambda_r)^2} \\ &= \frac{2\lambda_c + \lambda_r}{(\lambda_c + \lambda_r)^2}, \end{align*}\] and \[\begin{align*} \EXP[r(2, W)] &= \int_0^\infty \PR(W^1_c > t) \PR(W^1_r + W^2_r > t) \, dt \\ &= \int_0^\infty e^{-\lambda_c t} (1 + \lambda_r t) e^{-\lambda_r t} \, dt \\ &= \int_0^\infty (1 + \lambda_r t) e^{-(\lambda_c + \lambda_r) t} \, dt \\ &= \frac{1}{\lambda_c + \lambda_r} + \frac{\lambda_r}{(\lambda_c + \lambda_r)^2} \\ &= \frac{\lambda_c + 2\lambda_r}{(\lambda_c + \lambda_r)^2}. \end{align*}\]
Thus, we pick Design 1 if \(2\lambda_c + \lambda_r > \lambda_c + 2\lambda_r\), i.e., if \(\lambda_c > \lambda_r\), and Design 2 otherwise. Thus, we reinforce the weakest component.
1.3 An more elaborateexample: The newsvendor problem

Example 1.2 (The Newsvendor Problem) Each morning, a newsvendor has to decide how many newspapers to buy before knowing the demand during the day. The newsvendor purchases a newspaper at a cost of \(\$p\) per newspaper and sells them at a price of \(\$q\) per newspaper, where \(q > p\). Any unsold newspapers at the end of the day have no salvage value.
Let \(a\) denote the number of newspapers bought and \(W\) denote the demand. If \(W < a\), then the newsvendor will sell \(W\) newspapers and receive total earnings of \(q W - p a\). If \(W \ge a\), then the newsvendor will sell \(a\) newspapers and receive total earnings of \(q a - p a\). Thus, the reward is \(r(a,W)\), where
\[r(a, w) = \begin{cases} q w - p a, & \text{if } w < a, \\ q a - p a, & \text{if } w \ge a. \end{cases} \]
1.3.1 Interlude with continuous version
The problem above has discrete actions and discrete demand. To build intuition, we first consider the case where both the actions and demand are continuous. Let \(f(w)\) denote the probability density of the demand and \(F(w)\) denote the cumulative distribution function. Then, the expected reward is \[ \begin{equation} \label{eq:J} J(a) = \int_{0}^a [ q w - p a ] f(w) dw + \int_{a}^\infty [ q a - p a ] f(w) dw. \end{equation}\]
To fix ideas, we consider an example where \(p = 0.5\), \(q = 1\), and the demand is a :Kumaraswamy distribution with parameters \((2,5)\) and support \([0,100]\). The performance as a function of action is shown below.
In Figure 1.2(a), the plot of \(J(a)\) is concave. We can verify that this is true in general.
NoteVerify that \(J(a)\) is concave
To verify that the function \(J(a)\) is concave, we compute the second derivative: \[ \frac{d^2 J(a)}{da^2} = - p f(a) - (q - p) f(a) = -q f(a) \le 0. \]
This suggests that we can use calculus to find the optimal value. In particular, to find the optimal action, we need to find the \(a\) such that \(dJ(a)/da = 0\).
Proposition 1.1 For the newsvendor problem with continuous demand, the optimal action is \[ a = F^{-1}\left( 1 - \frac{p}{q} \right). \] In the literature, the quantity \(1 - (p/q)\) is called the critical fractile.
NoteProof
TipLeibniz integral rule
\[ \dfrac{d}{dx} \left( \int_{p(x)}^{q(x)} f(x,t) dt \right) = f(x, q(x)) \cdot \dfrac {d}{dx} q(x) - f(x, p(x)) \cdot \dfrac {d}{dx} p(x) + \int_{p(x)}^{q(x)} \dfrac{\partial}{\partial x} f(x,t) dt. \]
Using the Leibniz integral rule, the derivative of the first term of \(\eqref{eq:J}\) is \[ [q a - p a ] f(a) + \int_{0}^a [ -p ] f(w) dw = [q a - p a ] f(a) - p F(a). \]
Similarly, the derivative of the second term of \(\eqref{eq:J}\) is \[ - [q a - p a] f(a) + \int_{a}^{\infty} (q-p)f(w)dw = - [q a - p a] f(a) + (q -p)[ 1 - F(a)]. \]
Combining the two, we get that \[ \dfrac{dJ(a)}{da} = - p F(a) + (q - p) [ 1 - F(a) ]. \]
Equating this to \(0\), we get \[ F(a) = \dfrac{ q - p }{ q} \quad\text{or}\quad a = F^{-1} \left( 1 - \dfrac{ p }{ q } \right). \]
ImportantGraphical interpretation of the result
The result of Proposition 1.1 has a nice graphical interpretation. Draw the CDF of the demand. The optimal action is the point where the CDF intersects the horizontal line \(1 - p/q\).
Example 1.3 Suppose the demand is distributed according to an exponential distribution with mean \(\theta\). Find the optimal action.
NoteSolution
Recall that the CDF of the exponential distribution with mean \(\theta\) is given by \[ F(w) = 1 - e^{-w/\theta}. \] Since \(F\) is continuous and strictly increasing, the optimal action is the unique \(a\) such that \(F(a) = 1 - p/q\). \[ F(a) = 1 - p/q \iff e^{-a/\theta} = p/q \iff -a/\theta = \ln(p/q) \iff a = -\theta \ln(p/q) = \theta \ln \frac{q}{p}. \]
1.3.2 Back to discrete version
Now, we come back to the problem with discrete actions and discrete demand. Suppose \(W\) takes the values \(\ALPHABET W = \{ w_1, w_2, \dots, w_k \}\) (where \(w_1 < w_2 < \cdots < w_k\)) with probabilities \(\{ μ_1, μ_2, \dots, μ_k \}\). It is easy to see that in this case the action \(a\) should be in the set \(\{ w_1, w_2, \dots, w_k \}\).
To fix ideas, we repeat the above numerical example when \(\ALPHABET W = \{0, 1, \dots, 100\}\).
In the discrete case, the brute force search is easier (because there are a finite rather than continuous number of values). We cannot directly use the ideas from calculus because functions over discrete domain are not differentiable. But we can use a very similar idea. Instead of checking if \(dJ(a)/da = 0\), we check the sign of \(J(w_{i+1}) - J(w_i)\).
Proposition 1.2 Let \(\{M_i\}_{i \ge 1}\) denote the cumulative mass function of the demand. Then, the optimal action is the largest value of \(w_i\) such that \[ M_i \le 1 - \frac{p}{q}. \]
NoteProof
The expected reward for choice \(w_i\) is \[ \begin{align*} J(w_i) &= \sum_{j < i} μ_j [ q w_j - p w_i ] + \sum_{j \ge i} μ_j [q w_i - p w_i] \\ &= -p w_i + q \Bigl[ \sum_{j < i} μ_j w_j + \sum_{j \ge i} μ_j w_i \Bigr]. \end{align*}\]
Thus, \[\begin{align*} J(w_{i+1}) - J(w_i) &= -p w_{i+1} + q \Bigl[ \sum_{j < i+1} μ_j w_j + \sum_{j \ge i+1} μ_j w_{i+1} \Bigr] \\ &\quad + p w_i - q \Bigl[ \sum_{j < i} μ_j w_j + \sum_{j \ge i} μ_j w_i \Bigr] \\ &= -p (w_{i+1} - w_i) + q \Bigl[ \sum_{j \ge i + 1} μ_j ( w_{i+1} - w_i) \Bigr] \\ &= \big( - p + q [ 1 - M_i ] \big) (w_{i+1} - w_i). \end{align*}\] Note that \[ M_i \le \dfrac{q-p}{q} \iff -p + q [ 1 - M_i ] \ge 0. \] Thus, for all \(i\) such that \(M_i \le (q-p)/q\), we have \(J(w_{i+1}) \ge J(w_i)\). On the other hand, for all \(i\) such that \(M_i > (q-p)/q)\), we have \(J(w_{i+1}) < J(w_i)\). Thus, the optimal amount to order is the largest \(w_i\) such that \(M_i \le (q-p)/q\).
ImportantGraphical interpretation of the result
The structure of the optimal solution is the same for continuous and discrete demand distributions. In particular, the result of Proposition 1.2 has the same graphical interpretation as that of Proposition 1.1:
Draw the CDF of the demand. The optimal action is the point where the CDF intersects the horizontal line \(1 - p/q\).
Example 1.4 Suppose the demand is distributed according to a geometric distribution with parameter \(\theta\), i.e., \(\PR(W = w) = (1-\theta)^w \theta\). Find the optimal action.
NoteSolution
Recall that the CDF of the geometric distribution is given by \[ F(w) = 1 - (1-\theta)^{\lfloor w \rfloor + 1}. \] Since \(F\) is strictly increasing, the optimal action is the largest integer \(a\) such that \(F(a) \le 1 - p/q\). \[ F(a) \le 1 - p/q \iff (1-\theta)^{a + 1} \ge p/q \iff (a + 1) \ln(1-\theta) \ge \ln(p/q) \iff (a + 1) \le \frac{\ln(p/q)}{\ln(1-\theta} \] Thus, \[ a=\left\lfloor \frac{\ln(p/q)}{\ln(1-\theta)} \right\rfloor - 1. \]
1.3.3 Acting based on historical data
In the above discussion, we assumed that probability distribution \(F\) of the demand is known. In many instances, the decision maker just has access to a database of historical demands, which we denote by \[ \ALPHABET D_K = \{ w_1, \dots, w_K \} \] where \(K\) denotes the number of samples. How do we choose the number of newspapers to buy?
The simplest way to solve this problem is to estimate the demand distribution from the historical data \(\ALPHABET D_K\) as \[ \hat{F}_K(w) = \frac{1}{K} \sum_{k=1}^K \IND\{ w_k \le w \}. \] where \(\IND\{\cdot\}\) is the indicator function. Then, the decision maker can choose the number of newspapers to buy by solving the optimization problem \[ \max_{a \in \ALPHABET A} \EXP_{w \sim \hat{F}_K}[ r(a, W)] = \max_{a \in \ALPHABET A} \frac{1}{K} \sum_{k=1}^K r(a, w_k). \] This simple idea is known by different names in the literature. It is called certainty equivalence in adaptive control, plug-in estimator in statistics, empirical risk minimization (ERM) in machine learning, and sample average approximation (SAA) in optimization.
The structure of the optimal solution establised in Proposition 1.1 and Proposition 1.2 implies that the optimal action is the \(\rho\)-th quantile of the demand distribution, where \(\rho = 1 - p/q\), is the critical fractile. To obtain that, order the historical demand according to their order statistics as: \[ w_{1:K} \le w_{2:K} \le \cdots \le w_{K:K}. \] Then, the certainty equivalent action is given by \(w_{\lceil \rho K \rceil: K}\). This approach is called the non-parametric approach, because it does not make any assumption about the distribution of the demand.
If we know that the demand is distributed according to a specific distribution, then we can estimate the parameters of the distribution. For example, if we know that the demand is distributed according to an exponential distribution with an unknown mean \(\theta\), then we can estimate \(\theta\) from the historical data. In particular, an unbiased estimator of \(\theta\) is the sample mean: \[ \hat \theta_{K} = \frac 1K \sum_{k=1}^K w_k. \] Given the results of Example 1.3, the certainty equivalent action is given by \[ a = \hat \theta_K \ln \frac{q}{p}. \] This approach is called the parametric approach, because it makes an assumption about the distribution of the demand.
If the assumption that the demand is distributed according to an exponential distribution is correct, both approaches asymptotically converge to the same result but for small sample sizes the parameteric approach is more efficient. However, if the assumption is incorrect, the parametric approach may perform poorly. Therefore, the non-parametric approach is more robust.
Exercises
Exercise 1.1 (MPSoC with four components) Extend Example 1.1 to the case where the chip has a total capacity for four components. There are three possible designs:
- Design 1 uses three CPUs and one RAM
- Design 2 uses two CPUs and two RAMs
- Design 3 uses one CPU and three RAMs
Find the design that maximizes the expected lifetime of the system.
Hint: If \(X_1\), \(X_2\), and \(X_3\) are independent exponential random variables with parameter \(\lambda\), then \(X_1 + X_2 + X_3 \sim \text{Gamma}(3, \lambda)\) and \[\PR(X_1 + X_2 + X_3 > t) = \left(1 + \lambda t + \frac{(\lambda t)^2}{2}\right) e^{-\lambda t}.\] Moreover, in general, \[\int_0^{\infty} t^n e^{-\lambda t} \, dt = \frac{n!}{\lambda^{n+1}}, \quad n \in \naturalnumbers.\]
Exercise 1.2 (Qualitative properties of optimal solution) Intuitively, in Example 1.2, we expect that if the purchase price of the newspaper increases but the selling price remains the same, then the newsvendor should buy fewer newspapers. Formally prove this statement.
Hint: The CDF of a distribution is a weakly increasing function.
Exercise 1.3 (Monotonicity of optimal action) Consider two scenarios for the case with continuous demand and actions in Example 1.2. In scenario 1, the demand is distributed according to PDF \(f_1\). In scenario 2, it is distributed according to PDF \(f_2\). Suppose \(F_1(w) \le F_2(w)\) for all \(w\). Show that the optimal action \(a_1\) for scenario 1 is greater than the optimal action \(a_2\) for scenario 2.
Hint: Plot the two CDFs and try to interpret the optimal decision rule graphically.
Exercise 1.4 (Selling random wind) The amount \(W\) of power generated by the wind turbine is a positive real-valued random variable with probability density function \(f\). The operator of the wind turbine has to commit to provide a certain amount of power in the day-ahead market. The price of power is \(\$p\) per MW.
If the operator commits to provide \(a\) MW of power and the wind generation \(W\) is less than \(a\), then he has to buy the balance \(a - W\) from a reserves market at the cost of \(\$ q\) per unit, where \(q > p\). Thus, the reward of the operator is \(r(a,W)\) where \[ r(a, w) = \begin{cases} p a, & \text{if } w > a \\ p a - q (a - w), & \text{if } w < a. \end{cases}\]
Find the optimal commitment \(a\) that maximizes the expected reward.
Notes
Perhaps the earliest model of the newsvendor problem appeared in Edgeworth (1888) in the context of a bank setting the level of cash reserves to cover demands from its customers. The solution to the basic model presented above and some of its variants was provided in Morse and Kimball (1951); Arrow et al. (1952); Whitin (1953). See Porteus (2008) for an accessible introduction.
The idea of using order statistics is from Besbes and Mouchtaki (2023). They quantify the regret of the certainty equivalent policy as a function of the number of samples \(K\). Liyanage and Shanthikumar (2005) and Chu et al. (2008) proposed a more general idea of using operational statistics, that integrate estimation and optimization.
The property \(F_1(w) \le F_2(w)\) used in Exercise 1.3 is called stochastic dominance. Later in the course, we will study how stochastic dominance is useful to establish monotonicity properties of general MDPs.
The example of selling random wind in Exercise 1.4 is taken from Bitar et al. (2012).