52 Convex sets and convex functions
52.1 Convexity
Definition 52.1 (Convex sets and convex functions) A subset \(C\) of \(\reals^n\) is convex if for every \(x_0, x_1 \in C\), the line segment \([x_0, x_1] \in C\), i.e., \[ (1-λ) x_0 + λ x_1 \in C \quad \text{for all } λ \in (0,1). \]
A function \(f\) on a convex set \(C\) is convex relative to \(C\) if for every \(x_0, x_1 \in C\), one has \[ f( (1-λ)x_0 + λ x_1 ) \le (1-λ) f(x_0) + λ f(x_1) \quad \text{for all } λ \in (0,1). \]
\(f\) is strictly convex if this inequality is strict for points \(x_0 \neq x_1\) with \(f(x_0)\) and \(f(x_1)\) finite.
A function is said to be concave when \(-f\) is convex.
52.2 Convex combinations
A convex combination of elements \(x_0, x_1, \dots, x_p \in \reals^n\) is a linear combination \(\sum_{i=0}^p λ_i x_i\) where the coefficients \(λ_i\) are non-negative and add to one. In case of two elements, convex combinations can be equivalently expressed as \((1-λ)x_0 + λx_1\) with \(λ \in [0,1]\), which we have already seen. The next result shows that the definition of convexity in terms of two elements given earlier generalizes to multiple elements.
Proposition 52.1 A set \(C \subset \reals^n\) is convex if and only if \(C\) contains all convex combinations of its elements.
A function \(f\) is convex relative to a convex set \(C\) if and only if for every choice of points \(x_0, \dots, x_p \in C\), we have: \[\begin{equation}\label{eq:Jensen-inequality} f\biggl( \sum_{i=0}^p λ_i x_i \biggr) \le \sum_{i=0}^p λ_i f(x_i) \quad \text{when } λ_i \ge 0, \sum_{i=0}^p λ_i = 1. \end{equation}\]
Inequality \eqref{eq:Jensen-inequality} is also call Jensen’s inequality.
In many applications, it is convenient to consider function \(f\) that are allowed to be extended-real-valued, i.e., take values in \(\bar \reals = [-∞, ∞]\) instead of just \(\reals = (-∞, ∞)\). The extended real line has all the properties of a compact interval: every subset \(A \subset \bar \reals\) has a supremum and an infimum, either of which could be infinite.
When extending arithmetic operations to the extended real line, most rules extend in a natural manner but we have to be careful with two operations: \(0 ⋅ ∞\) and \(∞ - ∞\). We will define \[ 0 ⋅ ∞ = 0 = 0 ⋅ (-∞). \]
For convexity, one follows the inf-addition convention: \[ ∞ + (-∞) = (-∞) + ∞ = ∞. \]
Extended arithematic then obeys the associative, commutative, and distributive laws of ordinary arithematic with one crucial exception: \[ λ ⋅ (∞ - ∞) \neq (λ ⋅ ∞ - λ ⋅ ∞) \quad \text{when } λ < 0. \]
The definition of convex functions can be generalized to function defined over the extended real line as long as we invoke the inf-addition convention.
Any convex function \(f\) on a convex set \(C \in \reals^n\) can be identified with a convex function on all of \(\reals^n\) by defining \(f(x) = ∞\) for all \(x \not\in C\). Such functions are called proper convex functions (in contrast to improper convex functions which take the value \(f(x) = -∞\) for all \(x \in \text{int}(\text{dom} f)\).
52.3 Properties of convex function
Proposition 52.2 Let \(I\) be an index set. Then:
- Intersection of convex functions is convex: 
 \(\bigcap_{i \in I} C_i\) is convex if each set \(C_i\) is convex.
- Supremum of convex functions is convex: 
 \(\sup_{i \in I} f_i\) is convex if each function \(f_i\) is convex.
- Supremum of finite number of strictly convex functions is strictly convex: 
 \(\sup_{i \in I} f_i\) is strictly convex if each function \(f_i\) is strictly convex and \(I\) is finite.
- Pointwise supremum of a collection of convex functions is convex: 
 \(f\) is convex if \(f(x) = \lim\sup_{i \in I} f_i(x)\) for all \(x\) and each \(f_i\) is convex.
The next result presents equivalent characterizations of convexity.
Proposition 52.3 For a differentiable function \(f\) on an open convex set \(O \subset \reals^n\), each of the following is both necessary and sufficient for \(f\) to be convex on \(O\):
- \(\langle x_1 - x_0, \GRAD f(x_1) - \GRAD f(x_0) \rangle \ge 0\) for all \(x_0, x_1 \in O\). 
- \(f(y) \ge f(x) + \langle \GRAD f(x), y - x \rangle\) for all \(x, y \in O\). 
- \(\GRAD^2 f(x)\) is pointwise-semidefinite for all \(x \in O\) (\(f\) twice differentiable). 
Notes
The material of these notes is adapted from Rockafellar and Wets (2009).