ECSE 506: Stochastic Control and Decision Theory
Aditya Mahajan
Winter 2022
About  Lectures  Notes  Coursework
Risk sensitivity is relative to the idea of utility. The value of a sum of money \(z\) to a decision maker may not be proportional to \(z\) itself but may be some general increasing function \(\mathsf{U}(z)\), known as the utility function. For example, in the example on optimal gambling considered earlier, we had assumed that the utility for wealth \(z\) is \(\log z\). If a decision maker has utility function \(\mathsf{U}\), then the value of a random outcome \(Z\) will be defined by the expected utility \(\EXP[\mathsf{U}(Z)]\).
If the function \(\mathsf{U}\) is concave, then by Jensen’s inequality implies that \(\EXP[\mathsf{U}(Z)] < \mathsf{U}( \EXP[Z] )\). That is, for a given expected return, the individual always prefers a certain return. In this case the decision maker is said to be risk averse. On the other hand, if the function \(\mathsf{U}\) is convex, the reverse inequality holds and the decision maker is said to be risk seeking. In the transitional case when \(\mathsf{U}\) is linear, the decision maker is said to be risk neutral.
Risk sensitivity has immediate implications. For example, consider the problem of gambling problem described in Exercise 3 of the notes on optimal gambling. A gambler can bet on \(n\) mutually exclusive outcomes with different success probabilities \((p_1, \dots, p_n)\). A risk seeking gambler will concentrate his bet on the single most attractive investment, whereas a risk averse gabler (as was the case in the exercise with \(\mathsf{U} = \log\)) will spread his bet on multiple outcomes, thus trading peak return for assured returns.
An alternative view is to say that the riskseeking decisionmaker is optimistic, since he implicit assumes that uncertainties will turn out to his advantage. On the other hand, the riskaverse decisionmaker is pessimistic and implicit assumes that the uncertainties will turn out to his disadvantage.
In general, we can phrase decision problems either in terms of maximizing rewards or, in some cases, minimizing cost. For cost minimization problems, instead of talking in terms of the utility \(\mathsf{U}(z)\) or a return \(z\), we will talk in terms of the disutility \(\mathsf{L}(z)\) of the cost \(z\). The usual connection is that \(\mathsf{L}(z) =  \mathsf{U}(z)\), so concave \(\mathsf{L}\) corresponds to riskseeking behavior and convex \(\mathsf{L}\) corresponds to riskaverse behavior.
It is also helpful sometimes to invert the transformation \(\mathsf{L}\) after having taken the expectation, so that the return to a cost scale. Thus, \[ γ = \mathsf{L}^{1}( \EXP[ \mathsf{L}(Z) ] ) \] is the fixed cost which is equivalent to uncertain cost \(Z\). This is sometimes called the certainty equivalent cost, but that phrase is already overloaded, so I will avoid using it and instead use the term effective cost.
One disutility function that is of special interest is the exponential function \(\mathsf{L}(z) = \exp(\theta z)\), where the parameter \(θ\) measures the degree and nature of risksensitivity. The exponential function is always convex, but one wishes to maximize or minimize \(\exp(θ z)\) according to whether \(θ\) is positive or negative. Equivalently, we can state that the decision maker wants to minimize the effective cost \[ γ = \frac{1}{θ} \log \EXP[ \exp( θ Z) ] \] irrespective of the sign of \(θ\). When \(θ < 0\), the decision maker is risk seeking and when \(θ > 0\), the decision maker is risk averse.
The exponential disutility has a constant cost elasticity: if the outcomes \(Z\) all increase by an amount \(Δ\), then the effective cost also increases by \(Δ\). The only utility functions which satisfy the constant cost elasticity are linear and exponential.
For small values of \(θ\), the effective cost is approximately \[ γ \approx \EXP[Z] + \tfrac{1}{2}θ \text{var}(Z) \] which approximately decouples expectation and variability.
 Remark

In the financial mathematics literature, the exponential disutility function is call entropic risk measure. You need to be careful if you are comparing the results presented in these notes with those in financial mathematics, because they consider reward maximization problems. Therefore, the effective return is defined as \[ γ = \frac{1}{θ} \log \EXP[ \exp( θ Z) ]. \] where \(Θ > 0\) corresponds to risk aversion.
1 A simple LQG example
Suppose \(x \in \reals\) is the distance of an object from its desired position and the application of a control \(u \in \reals\) will bring it to \(x  u\). Suppose the cost of this maneuver is \[ C = \tfrac{1}{2}[ R u^2 + S (xu)^2] . \]
Here, the two terms represent the cost of control and the final displacement from the desired position. Elementary calculus shows that the optimal value of \(u\) and the minimum cost are \[ u = \frac{S x}{S + R }, \qquad V(x) = \frac{1}{2} \cdot \frac{RS x^2}{S + R}. \]
Now suppose there is noise so that \(x u\) is replaced by \(x  u + w\). We’ll assume that \(w \sim {\cal N}(0, Σ)\). The cost then becomes \[ C = R u^2 + S (xu + w)^2 . \]
In the risk neutral case, the optimal control is same as earlier and the minimum cost \(V(x)\) simply increases by \(\frac12 SΣ\). This a special case of a general phenomenon known as certainty equivalence. See the notes of linear quadratic regulator for details.
Now consider a risksensitive version of the problem, in which \(u\) is chosen to minimize \[ C_θ = \frac{1}{θ} \log \EXP[ \exp(θ C) ]. \]
In the riskaverse case (i.e., \(θ > 0\)), minimizing \(C_θ\) is equivalent to minimizing \[ \begin{equation} \label{eq:cost} \EXP[ \exp(θ C)] = \int \exp\Bigl( \frac{θ}{2} \Bigl( Ru^2 + S(xu+w)^2  \frac{w^2}{θΣ}\Bigr)\Bigr) dw. \end{equation} \] Let us write the right hand side as \(\int \exp(\frac{1}{2} θQ((x,u), w) dw\). Note that \[ \frac{∂^2 Q((x,u), w)}{∂w^2} = S  \frac{1}{θΣ}. \] Therefore, \(Q\) is negative definite in \(w\) if \(S  1/θΣ < 0\), or equivalently (recall \(θ > 0\)), \[\begin{equation} \label{eq:critical} θΣS  1 < 0 \iff 0 < θ < \frac{1}{SΣ}. \end{equation} \] For now, we assume that \(θΣS < 1\) and we will return to what happens when \(θΣS = 1\) later.
Since \(Q\) is negative definite in \(w\) (and \(θ > 0\)), \(\frac{1}{2}θQ((x,u),w))\) is positive definite in \(w\). Therefore, by using Lemma 1 of the notes of LEQG, we know that \[ \begin{equation} \label{eq:simplify} \int\exp\Bigl( \frac{θ}{2} Q((x,u),w) \Bigr) dw = \sqrt{\frac{2π (1  θΣS)}{Σ}} \exp\Bigl( \frac{θ}{2} \max_{w}Q((x,u),w) \Bigr). \end{equation} \] Now, the maximizing value of \(w\) is \(\frac{θΣS}{1  θΣS}(xu)\) and therefore we get \[ \max_{w} Q((x,u), w) = R u^2 + \frac{S}{1θΣS}(xu)^2 \]
Substituting this base in \eqref{eq:simplify} and then in \eqref{eq:cost}, we get \[ \EXP[\exp(θC)] = \sqrt{\frac{2π (1  θΣS)}{Σ}} \exp\Bigl(\frac{θ}{2}\Bigl(R u^2 + \frac{S}{1  θΣS}(xu)^2\Bigr). \]
Now, minimizing \(\EXP[\exp(θC)]\) is same as minimizing the term in coefficient of \(θ/2\) (recall \(θ\) is positive), which is minimized by \[ u = \frac{Sx}{S + R  θΣSR}. \] The corresponding minimum value of effective cost is \[ V_θ(x) = \frac{1}{2} \cdot \frac{RS x^2}{R + S  θΣSR} + \frac{1}{2θ} \log\frac{2π (1  θΣS)}{Σ}. \]
Note that both the expression for control action and the value become infinity as \(θ\) increases through the critical value: \[ θ_{\text{crit}} = \frac{1}{Σ}\left( \frac{1}{S} + \frac{1}{R} \right) \] First note that for \(Θ < Θ_{\text{crit}}\), the constraint \eqref{eq:critical} is automatically satisfied. The value \(θ = θ_{\text{crit}}\) marks a point at which the decision maker is so pessimistic that his apprehension of uncertainties completely overrides the assurances given by known statistical behavior. This is called neurotic breakdown. There is a corresponding optimistic extreme, euphoria, if the cost function contains quadratic reward terms.
 Remark

Whittle calls the term \(Q((x,u),w)\) as the stress. Note that in the above calculations, we choose \(u\) to minimize the stress and choose \(w\) to maximize the stress. It is as though there is an another agent, the “phantom other”, who exerts the control \(w\) at the same time as the optimizer exerts the control \(u\). When \(θ\) is negative, then the phantom other is opposing the optimizer and trying to maximize the stress. (Note that the minimizing value of \(w\) is \(\frac{θΣS}{1  θΣS}(xu)\), which can also be written as \(θΣRu\)). So, what started out as a oneperson control problem has turned into a twoperson game.
References
The material in this section is taken from Whittle (2002).
This entry was last updated on 15 Jun 2020