38  Sub-Gaussian random variables

Updated

June 4, 2025

38.1 Prelim: Concentration inequality of sum of Gaussian random variables

Let ϕ() denote the density of N(0,1) Gaussian random variable: ϕ(x)=12πexp(x22).

Note that if XN(μ,σ2), then the density of X is 1σϕ(xμσ)=12πσexp((xμ)22σ2).

The tails of Gaussian random variables decay fast which can be quantified using the following inequality.

Proposition 38.1 (Mills inequality) If XN(0,1), then for any t>0, P(|X|>t)2ϕ(t)t

More generally, if XN(0,σ2), then for any t>0, P(|X|>t)2σtϕ(tσ)=2πσtexp(t22σ2).

Remark

In the communication theory literature, this bound is sometimes known as the bound on the erfc or Q function.

We’ll first prove the result for unit variance random variable. Note that X is symmetric around origin. Therefore, P(|X|>t)=2P(X>t).

Now, by using an idea similar to the proof of Markov’s inequality, we have tP(|X|>t)=ttϕ(x)dxtxϕ(x)dx=t12πxexp(x22)dx=12πtxexp(x22)dx=12πexp(t22)

The proof for the general case follows by observing that P(|X|>t)=P(|Xσ|>tσ) where X/σN(0,1).

The fact that a Gaussian random variable has tails that decay to zero exponentially fast can be be seen in the moment generating function: M(s)=E[exp(sX)]=exp(sμ+12s2σ2).

A useful application of Mills inequality is the following concentration inequality.

Proposition 38.2 (Concentration inequality.) Let XiN(0,σ2) (not necessarily independent). Then, for any t>0, P(max1in|Xi|>t)2nσtϕ(tσ).

Proof

This follows immediately from Mills inequality and the union bound.

Another useful result is the following:

Proposition 38.3 (Max of Gaussian random variables.) Let XiN(0,σ2) (not necessarily independent). Then, E[max1inXi]σ2logn and E[max1in|Xi|]σ2log2n.

See these notes for a lower bound with the same rate!

We prove the first inequality. The second follows by considering 2n random variables X1,,Xn, X1,,Xn.

For any s>0, E[max1inXi]=1sE[log(exp(smax1inXi))](a)1slog(E[exp(smax1inXi)])=(b)1slog(E[max1inexp(sXi)])(c)1slog(i=1nE[exp(sXi)])=(d)log(i=1nexp(s2σ22))=logns+s2σ22 where (a) follows from Jensen’s inequality, (b) follows from monotonicity of exp(), (c) follows from definition of max, (d) follows from the definition of moment generating function of Gaussian random variables. We get the result by setting s=2logn/σ (which minimizes the upper bound).

Remark

We have stated and proved these inequalities for real-valued random variables. But a version of them continue to hold for vector valued Gaussian variables as well. For a complete treatment, see Picard ().

38.2 Sub-Gaussian random variables

It turns out that the concentration inequalities of the form above continue to hold for more general distributions than the Gaussian. In particular, consider the bound on the max of Gaussian random variables that we established above. The only step which depends on the assumption that the random variables Xi were Gaussian in step (d). Thus, as long as E[exp(sXi)]exp(12s2σ2), the result will continue to hold! This motivates the definition of sub-Gaussian random variables.

Definition 38.1 (Sub-Gaussian random variable) A random variable XR is said to be sub-Gaussian with variance proxy σ2 if E[X]=0 and its moment generating function satisfies E[exp(sX)]exp(12s2σ2),sR.

The reason the parameter σ2 is called a variance proxy is because by a straight forward application of Taylor series expansion and comparing coefficients, it can be shown that var(X)σ2. See Rivasplata () for a proof.

This definition can be generalized to random vectors and matrices. A random vector XRd is said the be sub-Gaussian with variance proxy σ2 if E[X]=0 and for any unit vector uRd, uX is sub-Gaussian with variance proxy σ2.

Similarly, a random matrix XRd1×d2 is said to be sub-Gaussian with variance proxy σ2 if E[X]=0 and for any unit vectors uRd1 and vRd2, uXv is sub-Gaussian with variance proxy σ2.

We will use the phrase “σ-sub-Gaussian” as a short form of “sub-Gaussian with variance proxy σ2”. One typically writes XsubG(σ2) to denote a random variable with sub-Gaussian distribution with variance proxy σ2. (Strictly speaking, this notation is a bit ambiguous since subG(σ2) is a class of distributions rather than a single distribution.)

38.3 Examples of sub-Gaussian distributions

  1. If X be a Rademacher random variable, i.e., X takes the values ±1 with probability 1/2. Then, E[exp(sX)]=12es+12es=coshsexp(12s2), so X is

  2. If X is uniformly distributed over [a,a]. Then, for any s0, E[exp(sX)]=12as[easeas]=n=0(as)2n(2n+1)!. Using the inequality (2n+1)!n!2n, we get that X is a-sub-Gaussian.

  3. It can be shown that (see Rivasplata () ) if X is a random variable with E[X]=0 and |X|<1 a.s., then E[exp(sX)]coshs,sR. Therefore, X is 1-sub-Gaussian.

  4. An immediate corollary of the previous example is that if X is a random variable with E[X]=0 and |X|b a.s., then X is b-sub-Gaussian. This result is also a consequence of Hoeffding’s Lemma.

  5. By a similar arguement, we can show that if X is a zero mean random variable supported on some interval [a,b], then X is (ba)/2 sub-Gaussian.

  6. If X is σ2 sub-Gaussian, then for any αR, αX is |α|σ-sub-Gaussian.

  7. If X1 and X2 are σ1 and σ2-sub-Gaussian, then X1+X2 is σ12+σ22-sub-Gaussian.

38.4 Characterization of sub-Gaussian random variables

Sub-Gaussian random variables satisfy a concentration result similar to Mills inequality.

Lemma 38.1 Let XR be σ-sub-Gaussian. Then, for any t>0, (1)P(X>t)exp(t22σ2)andP(X<t)exp(t22σ2)

This follows from Chernoff’s bound and the definition of sub-Gaussianity. In particular, for any s>0 P(X>t)=P(exp(sX)>exp(st))E[exp(sX)]exp(st)exp(s2σ22st). Now, to find the tightest possible bound, we minimize the above bound with respect to s, which is attained at s=t/σ2. Substituting this in the above bound, we get the first inequality. The second inequality follows from a similar argument.

Recall that the moments of ZN(0,σ2) are given by E[|Z|k]=1π(2σ2)k/2Γ(k+12), where Γ() denotes the Gamma function. The next result shows that the tail bounds (1) are sufficient to show that the absolute moments of XsubG(σ2) can be bounded by those of ZN(0,σ2) up to multiplicative constants.

Lemma 38.2 Let X be a random variable such that P(|X|>t)2exp(t22σ2), then for any positive integer k1, E[|X|k](2σ2)k/2kΓ(k/2).

Note that for the special case of k=1, the above bound implies E[|X|]σ2π and for k=2, E[|X|2]4σ2.

This is a simple application of the tail bound. E[|X|k]=0P(|X|k>t)dt=0P(|X|>t1/k)dt20exp(t2/k2σ2)dt=(2σ2)k/2k0euuk/21du,u=t2/k2σ2=(2σ2)k/2kΓ(k/2).

The result for k=1 follows from Γ(1/2)=π/2.

Using moments, we can bound the moment generating function in terms of the tail bounds.

Lemma 38.3 Let X be a random variable such that P(|X|>t)2exp(t22σ2) then, E[exp(sX)]exp(4s2σ2).

For this reason, sometimes it is stated that XsubG(σ2) when it satisfies the tail bound (1).

The proof follows from the following Taylor series bound on the exponential function. exp(sX)1+k=2s|X|kk! and apply the result of . See Rigollet () for details.

38.5 Properties of sub-Gaussian random vectors

Theorem 38.1 Let X=(X1,,Xn) be a vector of independent σ-sub-Gaussian random variables. Then, the random vector X is σ-sub-Gaussian.

For any unit vector uRn, and any sR E[exp(suX)]=i=1nE[exp(suiXi)]i=1nexp(12s2ui2σ2)=exp(12s2u2σ2)=exp(12s2σ2).

38.6 Concentration inequalities

Recall that if X1 and X2 and σ1 and σ2-sub-Gaussian, then X1+X2 is sub-Gaussian with variance proxy σ12+σ22. An immediate implication of this property is the following:

Proposition 38.4 (Hoeffding inequality) Suppose that variables Xi, i{1,,n}, are independent and Xi has mean μi and σi-sub-Gaussian. Then, for all t>0, we have

P(i=1n(Xiμi)t)exp(t22i=1nσi2).

The Hoeffding inequality is often stated for the special case of bounded random variables. In particular, if Xi[a,b], then we know that Xi is sub-Gaussian with parameter σ=(ba)/2, so we obtain the bound P(i=1n(Xiμi)t)exp(2t2i=1nn(ba)2).

The Hoeffding inequality can be generalized to Martingales. See .

38.7 Maximal inequalities

As we explained in the motivation for the definition of sub-Gaussian random variables, the definition implies that sub-Gaussian random variables will satisfy the concentration and maximal inequalities for Gaussian random variables. In particular, we have the following general result.

Theorem 38.2 Let XiR be σ-sub-Gaussian random variables (not necessarily independent). Then, E[max1inXi]σ2lognandE[max1in|Xi|]σ2log2n. Moreover, for any t>0, P(max1inXi>t)nexp(t22σ2)andP(max1in|Xi|>t)2nexp(t22σ2).

The proof is exactly the same as the Gaussian case!

Now we state two generalizations without proof. See Rigollet () for proof.

Maximum over a convex polytope

Theorem 38.3 Let P be a polytope with n vertices v(1),,v(n)Rd and let XRd be a random variable such that [v(i)]X, i{1,,n} are σ-sub-Gaussian random variables. Then, E[maxθPθX]σ2lognandE[maxθP|θX|]σ2log2n. Moreover, for any t>0, P(maxθPθX>t)nexp(t22σ2)andP(maxθP|θX|>t)2nexp(t22σ2).

Maximum over the 2 ball

Theorem 38.4 Let XRd be a σ-sub-Gaussian random variable. Then, E[maxθ1θX]=E[maxθ1|θX|]4σd. Moreover, for any t>0 P(maxθ1θX>t)=P(maxθ1|θX|>t)6dexp(t28σ2).

Remark

For any δ>0, take t=σ8dlog6+2σ2log(1/δ), we obtain that with probability less than 1δ, it holds that maxθ1θX=maxθ1|θX|4σd+2σ2log(1/δ).

38.8 Lipschitz functions of Gaussian variables.

Recall that a function f:RdR is L-Lipschitz with respect to the Eucledian norm if |f(x)f(y)|Lxy2,x,yRd.

The following results shows that any Lipschitz function of a Gaussian random variable is L-sub-Gaussian.

Theorem 38.5 Let X=(X1,,Xn) be a vector of i.i.d. standard Gaussian random variables and let f:RnR be L-Lipschitz with respect to the Euclidean norm. Then, the variable f(X)E[f(X)] is L-sub-Gaussian and therefore P[|f(X)E[f(X)]|t]2exp(t22L2).

This result is remarkable because it guarantees that any L-Lipschitz function of a standard Gaussian random vector, irrespective of the dimension, exhibits concetration like a scalar Gaussian variable with variance L2.

For a proof, see Chapter 2 of Wainwright ().

close all nutshells