39 Sub-Gaussian random variables

Author

Affiliation

Updated

June 4, 2025

39.1 Prelim: Concentration inequality of sum of Gaussian random variables

Let $ϕ (\cdot)$ denote the density of $N (0, 1)$ Gaussian random variable: $ϕ (x) = \frac{1}{\sqrt{2 π}} \exp (- \frac{x^{2}}{2}) .$

Note that if $X \sim N (μ, σ^{2})$ , then the density of $X$ is $\frac{1}{σ} ϕ (\frac{x - μ}{σ}) = \frac{1}{\sqrt{2 π} σ} \exp (- \frac{(x - μ)^{2}}{2 σ^{2}}) .$

The tails of Gaussian random variables decay fast which can be quantified using the following inequality.

Proposition 39.1 (Mills inequality) If $X \sim N (0, 1)$ , then for any $t > 0$ , $P (| X | > t) \leq \frac{2 ϕ (t)}{t}$

More generally, if $X \sim N (0, σ^{2})$ , then for any $t > 0$ , $P (| X | > t) \leq 2 \frac{σ}{t} ϕ (\frac{t}{σ}) = \sqrt{\frac{2}{π}} \frac{σ}{t} \exp (- \frac{t^{2}}{2 σ^{2}}) .$

Remark

In the communication theory literature, this bound is sometimes known as the bound on the erfc or $Q$ function.

Proof

We’ll first prove the result for unit variance random variable. Note that $X$ is symmetric around origin. Therefore, $P (| X | > t) = 2 P (X > t) .$

Now, by using an idea similar to the proof of Markov’s inequality, we have $\begin{aligned} t \cdot P (| X | > t) & = t \int_{t}^{\infty} ϕ (x) d x \\ \leq \int_{t}^{\infty} x ϕ (x) d x \\ = \int_{t}^{\infty} \frac{1}{\sqrt{2 π}} x \exp (- \frac{x^{2}}{2}) d x \\ = \frac{1}{\sqrt{2 π}} \int_{t}^{\infty} - \frac{\partial}{\partial x} \exp (- \frac{x^{2}}{2}) d x \\ = \frac{1}{\sqrt{2 π}} \exp (- \frac{t^{2}}{2}) \end{aligned}$

The proof for the general case follows by observing that $P (| X | > t) = P (| \frac{X}{σ} | > \frac{t}{σ})$ where $X / σ \sim N (0, 1)$ .

The fact that a Gaussian random variable has tails that decay to zero exponentially fast can be be seen in the moment generating function: $M (s) = E [\exp (s X)] = \exp (s μ + \frac{1}{2} s^{2} σ^{2}) .$

A useful application of Mills inequality is the following concentration inequality.

Proposition 39.2 (Concentration inequality.) Let $X_{i} \sim N (0, σ^{2})$ (not necessarily independent). Then, for any $t > 0$ , $P (max_{1 \leq i \leq n} | X_{i} | > t) \leq 2 n \frac{σ}{t} ϕ (\frac{t}{σ}) .$

Proof

This follows immediately from Mills inequality and the union bound.

Another useful result is the following:

Proposition 39.3 (Max of Gaussian random variables.) Let $X_{i} \sim N (0, σ^{2})$ (not necessarily independent). Then, $E [max_{1 \leq i \leq n} X_{i}] \leq σ \sqrt{2 \log n}$ and $E [max_{1 \leq i \leq n} | X_{i} |] \leq σ \sqrt{2 \log 2 n} .$

See these notes for a lower bound with the same rate!

Proof

We prove the first inequality. The second follows by considering $2 n$ random variables $X_{1}, \dots, X_{n}$ , $- X_{1}, \dots, - X_{n}$ .

For any $s > 0$ , $\begin{aligned} E [max_{1 \leq i \leq n} X_{i}] & = \frac{1}{s} E [\log (\exp (s max_{1 \leq i \leq n} X_{i}))] \\ \overset{(a)}{\leq} \frac{1}{s} \log (E [\exp (s max_{1 \leq i \leq n} X_{i})]) \\ \overset{(b)}{=} \frac{1}{s} \log (E [max_{1 \leq i \leq n} \exp (s X_{i})]) \\ \overset{(c)}{\leq} \frac{1}{s} \log (\sum_{i = 1}^{n} E [\exp (s X_{i})]) \\ \overset{(d)}{=} \log (\sum_{i = 1}^{n} \exp (\frac{s^{2} σ^{2}}{2})) \\ = \frac{\log n}{s} + \frac{s^{2} σ^{2}}{2} \end{aligned}$ where $(a)$ follows from Jensen’s inequality, $(b)$ follows from monotonicity of $\exp (\cdot)$ , $(c)$ follows from definition of max, $(d)$ follows from the definition of moment generating function of Gaussian random variables. We get the result by setting $s = \sqrt{2 \log n} / σ$ (which minimizes the upper bound).

Remark

We have stated and proved these inequalities for real-valued random variables. But a version of them continue to hold for vector valued Gaussian variables as well. For a complete treatment, see Picard (2007).

39.2 Sub-Gaussian random variables

It turns out that the concentration inequalities of the form above continue to hold for more general distributions than the Gaussian. In particular, consider the bound on the max of Gaussian random variables that we established above. The only step which depends on the assumption that the random variables $X_{i}$ were Gaussian in step $(d)$ . Thus, as long as $E [\exp (s X_{i})] \leq \exp (\frac{1}{2} s^{2} σ^{2})$ , the result will continue to hold! This motivates the definition of sub-Gaussian random variables.

Definition 39.1 (Sub-Gaussian random variable) A random variable $X \in R$ is said to be sub-Gaussian with variance proxy $σ^{2}$ if $E [X] = 0$ and its moment generating function satisfies $E [\exp (s X)] \leq \exp (\frac{1}{2} s^{2} σ^{2}), \forall s \in R .$

The reason the parameter $σ^{2}$ is called a variance proxy is because by a straight forward application of Taylor series expansion and comparing coefficients, it can be shown that $var (X) \leq σ^{2}$ . See Rivasplata (2012) for a proof.

This definition can be generalized to random vectors and matrices. A random vector $X \in R^{d}$ is said the be sub-Gaussian with variance proxy $σ^{2}$ if $E [X] = 0$ and for any unit vector $u \in R^{d}$ , $u^{⊺} X$ is sub-Gaussian with variance proxy $σ^{2}$ .

Similarly, a random matrix $X \in R^{d_{1} \times d_{2}}$ is said to be sub-Gaussian with variance proxy $σ^{2}$ if $E [X] = 0$ and for any unit vectors $u \in R^{d_{1}}$ and $v \in R^{d_{2}}$ , $u^{⊺} X v$ is sub-Gaussian with variance proxy $σ^{2}$ .

We will use the phrase “ $σ$ -sub-Gaussian” as a short form of “sub-Gaussian with variance proxy $σ^{2}$ ”. One typically writes $X \sim subG (σ^{2})$ to denote a random variable with sub-Gaussian distribution with variance proxy $σ^{2}$ . (Strictly speaking, this notation is a bit ambiguous since $subG (σ^{2})$ is a class of distributions rather than a single distribution.)

39.3 Examples of sub-Gaussian distributions

If $X$ be a Rademacher random variable, i.e., $X$ takes the values $\pm 1$ with probability $1 / 2$ . Then, $E [\exp (s X)] = \frac{1}{2} e^{- s} + \frac{1}{2} e^{s} = \cosh s \leq \exp (\frac{1}{2} s^{2}),$ so $X$ is
If $X$ is uniformly distributed over $[- a, a]$ . Then, for any $s \neq 0$ , $E [\exp (s X)] = \frac{1}{2 a s} [e^{a s} - e^{- a s}] = \sum_{n = 0}^{\infty} \frac{(a s)^{2 n}}{(2 n + 1)!} .$ Using the inequality $(2 n + 1)! \geq n! 2^{n}$ , we get that $X$ is $a$ -sub-Gaussian.
It can be shown that (see Rivasplata (2012) ) if $X$ is a random variable with $E [X] = 0$ and $| X | < 1$ a.s., then $E [\exp (s X)] \leq \cosh s, \forall s \in R .$ Therefore, $X$ is 1-sub-Gaussian.
An immediate corollary of the previous example is that if $X$ is a random variable with $E [X] = 0$ and $| X | \leq b$ a.s., then $X$ is $b$ -sub-Gaussian. This result is also a consequence of Hoeffding’s Lemma.
By a similar arguement, we can show that if $X$ is a zero mean random variable supported on some interval $[a, b]$ , then $X$ is $(b - a) / 2$ sub-Gaussian.
If $X$ is $σ^{2}$ sub-Gaussian, then for any $α \in R$ , $α X$ is $| α | σ$ -sub-Gaussian.
If $X_{1}$ and $X_{2}$ are $σ_{1}$ and $σ_{2}$ -sub-Gaussian, then $X_{1} + X_{2}$ is $\sqrt{σ_{1}^{2} + σ_{2}^{2}}$ -sub-Gaussian.

39.4 Characterization of sub-Gaussian random variables

Sub-Gaussian random variables satisfy a concentration result similar to Mills inequality.

Lemma 39.1 Let $X \in R$ be $σ$ -sub-Gaussian. Then, for any $t > 0$ , $\begin{matrix} (1) & P (X > t) \leq \exp (- \frac{t^{2}}{2 σ^{2}}) and P (X < - t) \leq \exp (- \frac{t^{2}}{2 σ^{2}}) \end{matrix}$

Proof

This follows from Chernoff’s bound and the definition of sub-Gaussianity. In particular, for any $s > 0$ $P (X > t) = P (\exp (s X) > \exp (s t)) \leq \frac{E [\exp (s X)]}{\exp (s t)} \leq \exp (\frac{s^{2} σ^{2}}{2} - s t) .$ Now, to find the tightest possible bound, we minimize the above bound with respect to $s$ , which is attained at $s = t / σ^{2}$ . Substituting this in the above bound, we get the first inequality. The second inequality follows from a similar argument.

Recall that the moments of $Z \sim N (0, σ^{2})$ are given by $E [| Z |^{k}] = \frac{1}{\sqrt{π}} (2 σ^{2})^{k / 2} Γ (\frac{k + 1}{2}),$ where $Γ (\cdot)$ denotes the Gamma function. The next result shows that the tail bounds $(1)$ are sufficient to show that the absolute moments of $X \sim subG (σ^{2})$ can be bounded by those of $Z \sim N (0, σ^{2})$ up to multiplicative constants.

Lemma 39.2 Let $X$ be a random variable such that $P (| X | > t) \leq 2 \exp (- \frac{t^{2}}{2 σ^{2}}),$ then for any positive integer $k \geq 1$ , $E [| X |^{k}] \leq (2 σ^{2})^{k / 2} k Γ (k / 2) .$

Note that for the special case of $k = 1$ , the above bound implies $E [| X |] \leq σ \sqrt{2 π}$ and for $k = 2$ , $E [| X |^{2}] \leq 4 σ^{2}$ .

Proof

This is a simple application of the tail bound. $\begin{aligned} E [| X |^{k}] & = \int_{0}^{\infty} P (| X |^{k} > t) d t \\ = \int_{0}^{\infty} P (| X | > t^{1 / k}) d t \\ \leq 2 \int_{0}^{\infty} \exp (- \frac{t^{2 / k}}{2 σ^{2}}) d t \\ = (2 σ^{2})^{k / 2} k \int_{0}^{\infty} e^{- u} u^{k / 2 - 1} d u, u = \frac{t^{2 / k}}{2 σ^{2}} \\ = (2 σ^{2})^{k / 2} k Γ (k / 2) . \end{aligned}$

The result for $k = 1$ follows from $Γ (1 / 2) = \sqrt{π / 2}$ .

Using moments, we can bound the moment generating function in terms of the tail bounds.

Lemma 39.3 Let $X$ be a random variable such that $P (| X | > t) \leq 2 \exp (- \frac{t^{2}}{2 σ^{2}})$ then, $E [\exp (s X)] \leq \exp (4 s^{2} σ^{2}) .$

For this reason, sometimes it is stated that $X \sim subG (σ^{2})$ when it satisfies the tail bound $(1)$ .

Proof

The proof follows from the following Taylor series bound on the exponential function. $\exp (s X) \leq 1 + \sum_{k = 2}^{\infty} \frac{s | X |^{k}}{k!}$ and apply the result of Lemma 39.2. See Rigollet (2015) for details.

39.5 Properties of sub-Gaussian random vectors

Theorem 39.1 Let $X = (X_{1}, \dots, X_{n})$ be a vector of independent $σ$ -sub-Gaussian random variables. Then, the random vector $X$ is $σ$ -sub-Gaussian.

Proof

For any unit vector $u \in R^{n}$ , and any $s \in R$ $\begin{aligned} E [\exp (s u^{⊺} X)] & = \prod_{i = 1}^{n} E [\exp (s u_{i} X_{i})] \\ \leq \prod_{i = 1}^{n} \exp (\frac{1}{2} s^{2} u_{i}^{2} σ^{2}) \\ = \exp (\frac{1}{2} s^{2} ∥ u ∥^{2} σ^{2}) \\ = \exp (\frac{1}{2} s^{2} σ^{2}) . \end{aligned}$

39.6 Concentration inequalities

Recall that if $X_{1}$ and $X_{2}$ and $σ_{1}$ and $σ_{2}$ -sub-Gaussian, then $X_{1} + X_{2}$ is sub-Gaussian with variance proxy $σ_{1}^{2} + σ_{2}^{2}$ . An immediate implication of this property is the following:

Proposition 39.4 (Hoeffding inequality) Suppose that variables $X_{i}$ , $i \in {1, \dots, n}$ , are independent and $X_{i}$ has mean $μ_{i}$ and $σ_{i}$ -sub-Gaussian. Then, for all $t > 0$ , we have

$P (\sum_{i = 1}^{n} (X_{i} - μ_{i}) \geq t) \leq \exp (- \frac{t^{2}}{2 \sum_{i = 1}^{n} σ_{i}^{2}}) .$

The Hoeffding inequality is often stated for the special case of bounded random variables. In particular, if $X_{i} \in [a, b]$ , then we know that $X_{i}$ is sub-Gaussian with parameter $σ = (b - a) / 2$ , so we obtain the bound $P (\sum_{i = 1}^{n} (X_{i} - μ_{i}) \geq t) \leq \exp (- \frac{2 t^{2}}{\sum_{i = 1}^{n} n (b - a)^{2}}) .$

The Hoeffding inequality can be generalized to Martingales. See Theorem 43.6.

39.7 Maximal inequalities

As we explained in the motivation for the definition of sub-Gaussian random variables, the definition implies that sub-Gaussian random variables will satisfy the concentration and maximal inequalities for Gaussian random variables. In particular, we have the following general result.

Theorem 39.2 Let $X_{i} \in R$ be $σ$ -sub-Gaussian random variables (not necessarily independent). Then, $E [max_{1 \leq i \leq n} X_{i}] \leq σ \sqrt{2 \log n} and E [max_{1 \leq i \leq n} | X_{i} |] \leq σ \sqrt{2 \log 2 n} .$ Moreover, for any $t > 0$ , $P (max_{1 \leq i \leq n} X_{i} > t) \leq n \exp (- \frac{t^{2}}{2 σ^{2}}) and P (max_{1 \leq i \leq n} | X_{i} | > t) \leq 2 n \exp (- \frac{t^{2}}{2 σ^{2}}) .$

The proof is exactly the same as the Gaussian case!

Now we state two generalizations without proof. See Rigollet (2015) for proof.

Maximum over a convex polytope

Theorem 39.3 Let $P$ be a polytope with $n$ vertices $v^{(1)}, \dots, v^{(n)} \in R^{d}$ and let $X \in R^{d}$ be a random variable such that $[v^{(i)}]^{⊺} X$ , $i \in {1, \dots, n}$ are $σ$ -sub-Gaussian random variables. Then, $E [max_{θ \in P} θ^{⊺} X] \leq σ \sqrt{2 \log n} and E [max_{θ \in P} | θ^{⊺} X |] \leq σ \sqrt{2 \log 2 n} .$ Moreover, for any $t > 0$ , $P (max_{θ \in P} θ^{⊺} X > t) \leq n \exp (- \frac{t^{2}}{2 σ^{2}}) and P (max_{θ \in P} | θ^{⊺} X | > t) \leq 2 n \exp (- \frac{t^{2}}{2 σ^{2}}) .$

Maximum over the $ℓ_{2}$ ball

Theorem 39.4 Let $X \in R^{d}$ be a $σ$ -sub-Gaussian random variable. Then, $E [max_{∥ θ ∥ \leq 1} θ^{⊺} X] = E [max_{∥ θ ∥ \leq 1} | θ^{⊺} X |] \leq 4 σ \sqrt{d} .$ Moreover, for any $t > 0$ $P (max_{∥ θ ∥ \leq 1} θ^{⊺} X > t) = P (max_{∥ θ ∥ \leq 1} | θ^{⊺} X | > t) \leq 6^{d} \exp (- \frac{t^{2}}{8 σ^{2}}) .$

Remark

For any $δ > 0$ , take $t = σ \sqrt{8 d \log 6} + 2 σ \sqrt{2 \log (1 / δ)}$ , we obtain that with probability less than $1 - δ$ , it holds that $max_{∥ θ ∥ \leq 1} θ^{⊺} X = max_{∥ θ ∥ \leq 1} | θ^{⊺} X | \leq 4 σ \sqrt{d} + 2 σ \sqrt{2 \log (1 / δ)} .$

39.8 Lipschitz functions of Gaussian variables.

Recall that a function $f : R^{d} \to R$ is $L$ -Lipschitz with respect to the Eucledian norm if $| f (x) - f (y) | \leq L ∥ x - y ∥_{2}, \forall x, y \in R^{d} .$

The following results shows that any Lipschitz function of a Gaussian random variable is $L$ -sub-Gaussian.

Theorem 39.5 Let $X = (X_{1}, \dots, X_{n})$ be a vector of i.i.d. standard Gaussian random variables and let $f : R^{n} \to R$ be $L$ -Lipschitz with respect to the Euclidean norm. Then, the variable $f (X) - E [f (X)]$ is $L$ -sub-Gaussian and therefore $P [| f (X) - E [f (X)] | \geq t] \leq 2 \exp (- \frac{t^{2}}{2 L^{2}}) .$

This result is remarkable because it guarantees that any $L$ -Lipschitz function of a standard Gaussian random vector, irrespective of the dimension, exhibits concetration like a scalar Gaussian variable with variance $L^{2}$ .

For a proof, see Chapter 2 of Wainwright (2019).

39.1 Prelim: Concentration inequality of sum of Gaussian random variables

39.2 Sub-Gaussian random variables

39.3 Examples of sub-Gaussian distributions

39.4 Characterization of sub-Gaussian random variables

39.5 Properties of sub-Gaussian random vectors

39.6 Concentration inequalities

39.7 Maximal inequalities

Maximum over a convex polytope

Maximum over the ℓ2 ball

39.8 Lipschitz functions of Gaussian variables.

Maximum over the $ℓ_{2}$ ball