7 Convergence Theorems

#LLN #CLT #NormalDistribution #Convergence #JensenInequality #MGF #ChebyshevInequality #MarkovInequality

1 Convergence Theorem

1.1 LLN

Suppose we have $X_{1}, \dots, X_{n}$ , where $n$ is very large. We are curious about $\frac{S_{n}}{n} = \frac{\sum_{i = 1}^{n} X_{i}}{n}$ . What value it converges to? What is its distribution? Law of Large Numbers (LLN) and Central Limit Theorem (CLT) will answer these questions.

Theorem (Weak Law of Large Numbers, WLLN)

Let $X_{1}, \dots, X_{n}$ be a sequence of iid RVs with a finite mean $μ$ and finite variance $σ^{2}$ . Let $S_{n} = X_{1} + \dots + X_{n}$ . Then $\forall ε > 0$ , $lim_{n \to \infty} P (| \frac{S_{n}}{n} - μ | > ε) = 0.$
I.e. $\frac{S_{n}}{n} \overset{p}{\to} μ$ . (see below)

Proof

Since $Var (X_{i}) = σ^{2}$ and $X_{1}, \dots, X_{n}$ are independent, $Var (\frac{S_{n}}{n}) = E [{(\frac{S_{n}}{n})}^{2}] - E {[\frac{S_{n}}{n}]}^{2} = \frac{Var (S_{n})}{n^{2}} = \frac{σ^{2}}{n} .$
By Chebyshev's Inequality, $P (| \frac{S_{n}}{n} - μ | > ε) \leq \frac{σ^{2}}{n ε^{2}} \to 0.$

The requirement of finite variance can be neglected, if we consider a stronger version

Theorem (Strong Law of Large Numbers, SLLN)

$X_{1}, \dots, X_{n}$ is a sequence of iid RVs with finite mean $μ$ . Then $P (lim_{n \to \infty} \frac{S_{n}}{n} = μ) = 1.$
I.e. $\frac{S_{n}}{n} \overset{a . s .}{\to} μ$ .

1.2 CLT

Theorem (Central Limit Theorem, CLT)

Again assume $μ, σ^{2} < \infty$ . Let $S_{n} = X_{1} + \dots + X_{n}$ . Then $lim_{n \to \infty} P (\frac{\sqrt{n}}{σ} (\frac{S_{n}}{n} - μ) \leq x) = Φ (x), \forall x \in R,$
where $Φ (x) = \int_{- \infty}^{x} \frac{1}{\sqrt{2 π}} e^{- \frac{t^{2}}{2}} d t$ is the c.d.f of $N (0, 1)$ .
I.e. $\frac{\sqrt{n}}{σ} (\frac{S_{n}}{n} - μ) \overset{d}{\to} N (0, 1)$ .

The theorem is equivalent to, for large $n$ , $\frac{S_{n}}{n}$ is well approximated by $N (μ, \frac{σ^{2}}{n})$ .

We actually perform a standardization $\frac{\frac{S_{n}}{n} - μ}{\frac{σ}{\sqrt{n}}} = \frac{\sqrt{n}}{σ} (\frac{S_{n}}{n} - μ) .$ This RV has mean $0$ and variance $1$ .

Proof

Let $Y_{i} = \frac{X_{i} - μ}{σ}$ . Then $E [Y_{i}] = 0, Var (Y_{i}) = 1$ . Let $Z_{n} = \sqrt{n} \frac{\frac{S_{n}}{n} - μ}{σ} = \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} Y_{i}$ . We want to show $Z_{n} \overset{d}{\to} Z \sim N (0, 1)$ . So we need continuity theorem of MGF. $\begin{aligned} M_{Z_{n}} (t) = E [e^{\frac{t}{\sqrt{n}} \sum_{i = 1}^{n} Y_{i}}] = {[M_{Y_{1}} (\frac{t}{\sqrt{n}})]}^{n} \\ \Rightarrow & \log M_{Z_{n}} (t) = n \underset{L (\frac{t}{\sqrt{n}})}{\underset{⏟}{\log M_{Y_{1}} (\frac{t}{\sqrt{n}})}} . \end{aligned}$
Now we want to calculate $lim_{n \to \infty} n L (\frac{t}{\sqrt{n}})$ . Firstly $\begin{aligned} M_{Y_{1}} (0) = 1 & \Rightarrow L (0) = 0, \\ L^{'} (t) = \frac{M_{Y_{1}}^{'} (t)}{M_{Y_{1}} (t)} & \Rightarrow L^{'} (0) = E [Y_{1}] = 0, \\ L^{″} (t) = \frac{M_{Y_{1}}^{″} (t)}{M_{Y_{1}} (t)} - {[\frac{M_{Y_{1}}^{'} (t)}{M_{Y_{1}} (t)}]}^{2} & \Rightarrow L^{″} (0) = E [Y_{1}^{2}] = 1. \end{aligned}$
Apply L'Hospital's rule: $\begin{aligned} lim_{n \to \infty} \frac{L (\frac{t}{\sqrt{n}})}{\frac{1}{n}} & = lim_{n \to \infty} \frac{- \frac{1}{2} n^{- \frac{3}{2}} t L^{'} (\frac{t}{\sqrt{n}})}{- n^{- 2}} = lim_{n \to \infty} \frac{t L^{'} (\frac{t}{\sqrt{n}})}{2 n^{- \frac{1}{2}}} \\ = lim_{n \to \infty} \frac{- \frac{1}{2} n^{- \frac{3}{2}} t^{2} L^{″} (\frac{t}{\sqrt{n}})}{- n^{- \frac{3}{2}}} = lim_{n \to \infty} \frac{t^{2}}{2} L^{″} (\frac{t}{\sqrt{n}}) = \frac{t^{2}}{2} . \end{aligned}$ So $lim_{n \to \infty} M_{Z_{n}} (t) = e^{\frac{t^{2}}{2}} \Leftrightarrow lim_{n \to \infty} n L (\frac{t}{\sqrt{n}}) = \frac{t^{2}}{2} .$
By continuity theorem of MGF, $Z_{n} \overset{d}{\to} Z \sim N (0, 1)$ .

1.2.1 Extensions of CLT

If $X_{1}, X_{2}, \dots$ are independent but not identically distributed, CLT can apply with additional conditions like Lyapunov or Lindeberg condition. A special case is for bounded RVs. Suppose $X_{1}, X_{2}, \dots$ independent, with $E [X_{i}] = μ_{i} < \infty$ , $Var (X_{i}) = σ_{i}^{2} < \infty$ , $S_{n} = X_{1} + \dots + X_{n}$ , then if

$\exists$ a constant $M > 0$ : $P (| X_{i} | < M) = 1, \forall i \in N$ .
$lim_{n \to \infty} Var (S_{n}) = lim_{n \to \infty} \sum_{i = 1}^{n} σ_{i}^{2} = \infty$ ,

we have $lim_{n \to \infty} P (\frac{S_{n} - E [S_{n}]}{\sqrt{Var (S_{n})}} < x) = Φ (x), \forall x \in R .$

2 Types of Convergence for RV

$X_{1}, X_{2}, \dots$ is an infinite sequence of RVs. $X$ is another RV. Assume they are all defined on the same probability space. Of course we can define pointwise convergence $lim_{n \to \infty} X_{n} (ω) = X (ω), \forall ω \in Ω .$
This notion of convergence however, turns out to be too strong.

Almost Sure Convergence

Almost sure convergence, also called strong convergence, convergence with probability $1$ , is defined as $P (lim_{n \to \infty} X_{n} = X) = 1.$
Denote as $X_{n} \overset{a . s .}{\to} X, n \to \infty$ .

More precisely, $P ({ω \in Ω | lim_{n \to \infty} X_{n} (ω) = X (ω)}) = 1.$
I.e., $\forall ε > 0, \exists N (ω, ε) \in N, \forall n \geq N (ω, ε) : | X_{n} (ω) - X | < ε$ . This is actually a definition for pointwise convergence.

Convergence in Probability

${X_{n}}$ converges in probability if $\forall ε > 0$ , $lim_{n \to \infty} P (| X_{n} - X | > ε) = 0.$
Denote as $X_{n} \overset{p}{\to} X, n \to \infty$ .

Convergence in

r

th mean

For $r > 0$ , $lim_{n \to \infty} E [| X_{n} - X |^{r}] = 0.$
Denote as $X_{n} \overset{r}{\to} X, n \to \infty$ .

Convergence in Distribution

Convergence in distribution is also called convergence in law. Here $X$ does not need to be defined on the same probability space as ${X_{n}}$ . And $lim_{n \to \infty} F_{X_{n}} (x) = F_{X} (x), \forall x \in C (F_{X}),$ where $C (F_{X}) = {x \in R | F_{X} (x) is continuous at x}$ .
Denote as $X_{1} \overset{d}{\to} X, n \to \infty$ .

Equal in distribution does not mean equal in probability. Take a fair coin and toss it twice. Assume the two tosses are independent. $Ω = {HH,HT,TH,TT}$ . Let $X_{1}, X_{2}$ be defined as $X_{i} (ω) = {\begin{aligned} 1, i -th toss shows H, \\ 0, i -th toss shows T . \end{aligned}$
Then $X_{1} \overset{d}{\to} X_{2}$ , but $P (X_{1} = X_{2}) = P ({HH,TT}) = \frac{1}{2}$ . So $X_{1}, X_{2}$ are not equal in probability.

Example for convergence in distribution: let $X_{1}, X_{2}, \dots$ be RVs with $P (X_{n} = \frac{1}{n}) = 1$ , and $X : P (X = 0) = 1$ . It's easy to see $\begin{aligned} lim_{n \to \infty} F_{X_{n}} (x) & = F_{X} (x), \forall x \neq 0, \\ lim_{n \to \infty} F_{X_{n}} (x) & \neq F_{X} (x), x = 0. \end{aligned}$
But this does not affect convergence in distribution, since $F_{X} (x)$ is not continuous at $x = 0$ . (in definition we want $C (F_{X})$ )

The three convergence theorems correspond to different convergence.

Almost sure convergence	Convergence in probability	Convergence in distribution
SLLN	CLT	WLLN

3 Convergence Relations

Theorem (Relations between Different Convergence Concepts)

The relations can be shown below:
Pasted image 20241130003132.png|400

Proof of (1)

Lemma1

For $0 < r < s$ , any RV $Y$ satisfies $(E [| Y |^{r}])^{\frac{1}{r}} \leq (E [| Y |^{s}])^{\frac{1}{s}} .$

Proof

Let $g (x) = | x |^{\frac{s}{r}}$ . This is a convex function since $| x |^{a}$ is convex for $a \geq 1$ .
By Jensen's inequality, $\begin{aligned} g (E [| Y |^{r}]) \leq E [g (| Y |)^{r}] \\ \Rightarrow & (E [| Y |^{r}])^{\frac{s}{r}} \leq E [| Y |^{s}] \\ \Rightarrow & (E [| Y |^{r}])^{\frac{1}{r}} \leq (E [| Y |^{s}])^{\frac{1}{s}} . \end{aligned}$

By Lemma 1, $0 \leq E [| X_{n} - X |^{r}] \leq E [| X_{n} - X |^{s}]^{\frac{r}{s}},$ so if $lim_{n \to \infty} E [| X_{n} - X |^{s}] = 0$ for RHS, then $lim_{n \to \infty} E [| X_{n} - X |^{r}] = 0$ .

Proof of (2)

$\forall ε > 0, r > 0$ , by generalized Markov's inequality, $0 \leq P (| X_{n} - X | \geq ε) \leq \frac{E [| X_{n} = X |^{r}]}{ε^{r}} .$

Proof of (3)

Lemma 2

$X_{n} \overset{a . s .}{\to} X$ iff $\forall ε > 0$ , $lim_{n \to \infty} P (| X_{m} - X | < ε, \forall m \geq n) = 1.$

Proof

Note that $\begin{aligned} (lim_{n \to \infty} X_{n} = X) & = {ω \in Ω | lim_{n \to \infty} X_{n} (ω) = X (ω)} \\ = {ω \in Ω | \begin{aligned} \forall ε > 0, \exists N (ω, ε) \in N s . t . \\ | X_{m} (ω) - X (ω) | < ε, \forall m \geq N (ω, ε) \end{aligned}} \\ = ⋂_{ε > 0} ⋃_{n = 1}^{\infty} \underset{A_{n, ε}}{\underset{⏟}{{ω \in Ω | | X_{m} (ω) - X (ω) | < ε, \forall m \geq n}}} . \end{aligned}$
So $\begin{aligned} P (lim_{n \to \infty} X_{n} = X) = 1 & \Leftrightarrow P (⋂_{ε > 0} ⋃_{n = 1}^{\infty} A_{n, ε}) = 1 \\ \Leftrightarrow P (⋃_{n = 1}^{\infty} A_{n, ε}) = 1, \forall ε > 0. \end{aligned}$
Lastly, $\forall ε > 0$ , $A_{1, ε} \subset \dots \subset A_{n, ε} \subset ⋃_{n = 1}^{\infty} A_{n, ε},$ so $lim_{n \to \infty} P (A_{n, ε} = P (⋃_{n = 1}^{\infty} A_{n, ε}))$ .

Suppose $X_{n} \overset{a . s .}{\to} X$ . Since ${| X_{m} - X | < ε, \forall m \geq n} = ⋂_{m = n}^{\infty} {| X_{m} - X | < ε}$ and $P (A) \geq P (B)$ if $B \subset A$ , and by Lemma 2 $1 \geq P (| X_{n} - X | < ε) \geq P (| X_{m} - X | < ε, \forall m \geq n) \to 1,$ so $lim_{n \to \infty} P (| X_{n} - X | < ε) = 1$ .

Proof of (4)

Assume $X_{n} \overset{p}{\to} X$ .

If $X_{n} (ω) \leq x$ , then either $X (ω) \leq x + ω$ , or $| X (ω) - X_{n} (ω) | > ε$ . I.e. ${ω \in Ω | X_{n} (ω) \leq x} \subset {ω | X (ω) \leq x + ε} \cup {ω | | X (ω) - X_{n} (ω) | > ε} .$
By union bound inequality, $\begin{aligned} P (X_{n} \leq x) \leq P (X \leq x + ε) + P (| X - X_{n} | > ε) \\ \Rightarrow & F_{X_{n}} (x) \leq F_{X} (x + ε) + P (| X - X_{n} | > ε) . \end{aligned}$
If $X_{n} (ω) > x$ , then either $X (ω) > x - ε$ or $| X (ω) - X_{n} (ω) | > ε$ . Then $1 - F_{X_{n}} (x) \leq 1 - F_{X} (x - ε) + P (| X - X_{n} | > ε) .$

Since $X_{n} \overset{p}{\to} X$ , $lim_{n \to \infty} P (| X - X_{n} | > ε) = 0, \forall ε > 0$ . So case 1 and 2 together imply $F_{X} (x - ε) \leq \underset{n \to \infty}{lim inf} F_{X_{n}} (x) \leq \underset{n \to \infty}{lim sup} F_{X_{n}} (x) \leq F_{X} (x + ε) .$
If $x \in C (F_{X})$ , then $lim_{ε \to 0} F_{X} (x - ε) = F_{X} (s) = lim_{ε \to 0} F_{X} (x + ε),$ then $lim_{n \to \infty} F_{X_{n}} (x) = F_{X} (x)$ if $x \in C (F_{X})$ .

Counterexample of

(X_{n} \overset{p}{\to} X)

does not imply

(X_{n} \overset{a . s .}{\to} X)

Suppose $Ω = [0, 1]$ . $P ([a, b]) = b - a, \forall 0 \leq a < b \leq 1$ .
Let $X (ω) = 0$ and define ${X_{n}}$ like this:

$X_{1} = I_{[0, 1]}$ ;
$X_{2} = I_{[0, \frac{1}{2}]}, X_{3} = I_{[\frac{1}{2}, 1]}$ ,
$X_{4} = I_{[0, \frac{1}{3}]}, X_{5} = I_{[\frac{1}{3}, \frac{2}{3}]}, X_{6} = I_{[\frac{2}{3}, 1]}$ and so on.

But for any $ω \in Ω$ , there exist infinitely many values of $n$ for which $X_{n} (ω) = 1$ , then $lim_{n \to \infty} X_{n} (ω) \neq X (ω), \forall ω \in Ω$ , so almost sure convergence does not hold.

4 Rate of Convergence

Theorem (Berry-Eseen)

There exists a constant $C$ s.t. if $X_{1}, \dots, X_{n}$ are iid RVs with finite mean $μ$ , finite variance $σ^{2}$ and finite $ρ = E [| X_{i} - μ |^{3}]$ , then $\forall n \in N$ , $sup_{x \in R} | F_{n} (x) - Φ (x) | \leq \frac{c ρ}{σ^{3} \sqrt{n}},$ where $S_{n} = X_{1} + \dots + X_{n}$ , $Z_{n} = \sqrt{n} \frac{\frac{S_{n}}{n} - μ}{σ}$ , $F_{n} (x) = P (Z_{n} \leq x)$ .

This implies uniform convergence.

$C$ does not depend on the distribution of $X_{i}$ . As a matter of fact $0.4097 \leq C \leq 0.4748$ .