46  Convex sets and convex functions

Updated

July 26, 2023

46.1 Convexity

Definition 46.1 (Convex sets and convex functions) A subset \(C\) of \(\reals^n\) is convex if for every \(x_0, x_1 \in C\), the line segment \([x_0, x_1] \in C\), i.e., \[ (1-λ) x_0 + λ x_1 \in C \quad \text{for all } λ \in (0,1). \]

A function \(f\) on a convex set \(C\) is convex relative to \(C\) if for every \(x_0, x_1 \in C\), one has \[ f( (1-λ)x_0 + λ x_1 ) \le (1-λ) f(x_0) + λ f(x_1) \quad \text{for all } λ \in (0,1). \]

\(f\) is strictly convex if this inequality is strict for points \(x_0 \neq x_1\) with \(f(x_0)\) and \(f(x_1)\) finite.

A function is said to be concave when \(-f\) is convex.

46.2 Convex combinations

A convex combination of elements \(x_0, x_1, \dots, x_p \in \reals^n\) is a linear combination \(\sum_{i=0}^p λ_i x_i\) where the coefficients \(λ_i\) are non-negative and add to one. In case of two elements, convex combinations can be equivalently expressed as \((1-λ)x_0 + λx_1\) with \(λ \in [0,1]\), which we have already seen. The next result shows that the definition of convexity in terms of two elements given earlier generalizes to multiple elements.

Proposition 46.1 A set \(C \subset \reals^n\) is convex if and only if \(C\) contains all convex combinations of its elements.

A function \(f\) is convex relative to a convex set \(C\) if and only if for every choice of points \(x_0, \dots, x_p \in C\), we have: \[\begin{equation}\label{eq:Jensen-inequality} f\biggl( \sum_{i=0}^p λ_i x_i \biggr) \le \sum_{i=0}^p λ_i f(x_i) \quad \text{when } λ_i \ge 0, \sum_{i=0}^p λ_i = 1. \end{equation}\]

Inequality \eqref{eq:Jensen-inequality} is also call Jensen’s inequality.

In many applications, it is convenient to consider function \(f\) that are allowed to be extended-real-valued, i.e., take values in \(\bar \reals = [-∞, ∞]\) instead of just \(\reals = (-∞, ∞)\). The extended real line has all the properties of a compact interval: every subset \(A \subset \bar \reals\) has a supremum and an infimum, either of which could be infinite.

When extending arithmetic operations to the extended real line, most rules extend in a natural manner but we have to be careful with two operations: \(0 ⋅ ∞\) and \(∞ - ∞\). We will define \[ 0 ⋅ ∞ = 0 = 0 ⋅ (-∞). \]

For convexity, one follows the inf-addition convention: \[ ∞ + (-∞) = (-∞) + ∞ = ∞. \]

Extended arithematic then obeys the associative, commutative, and distributive laws of ordinary arithematic with one crucial exception: \[ λ ⋅ (∞ - ∞) \neq (λ ⋅ ∞ - λ ⋅ ∞) \quad \text{when } λ < 0. \]

The definition of convex functions can be generalized to function defined over the extended real line as long as we invoke the inf-addition convention.

Any convex function \(f\) on a convex set \(C \in \reals^n\) can be identified with a convex function on all of \(\reals^n\) by defining \(f(x) = ∞\) for all \(x \not\in C\). Such functions are called proper convex functions (in contrast to improper convex functions which take the value \(f(x) = -∞\) for all \(x \in \text{int}(\text{dom} f)\).

46.3 Properties of convex function

Proposition 46.2 Let \(I\) be an index set. Then:

  1. Intersection of convex functions is convex:
    \(\bigcap_{i \in I} C_i\) is convex if each set \(C_i\) is convex.

  2. Supremum of convex functions is convex:
    \(\sup_{i \in I} f_i\) is convex if each function \(f_i\) is convex.

  3. Supremum of finite number of strictly convex functions is strictly convex:
    \(\sup_{i \in I} f_i\) is strictly convex if each function \(f_i\) is strictly convex and \(I\) is finite.

  4. Pointwise supremum of a collection of convex functions is convex:
    \(f\) is convex if \(f(x) = \lim\sup_{i \in I} f_i(x)\) for all \(x\) and each \(f_i\) is convex.

The next result presents equivalent characterizations of convexity.

Proposition 46.3 For a differentiable function \(f\) on an open convex set \(O \subset \reals^n\), each of the following is both necessary and sufficient for \(f\) to be convex on \(O\):

  1. \(\langle x_1 - x_0, \GRAD f(x_1) - \GRAD f(x_0) \rangle \ge 0\) for all \(x_0, x_1 \in O\).

  2. \(f(y) \ge f(x) + \langle \GRAD f(x), y - x \rangle\) for all \(x, y \in O\).

  3. \(\GRAD^2 f(x)\) is pointwise-semidefinite for all \(x \in O\) (\(f\) twice differentiable).

Notes

The material of these notes is adapted from Rockafellar and Wets (2009).