15 Linear Model

#tDistribution #BayesFormula #LinearRegression

Problem of the day: Select two points independently and uniformly at random from a meter stick, thereby obtaining

3

segments:

L_{1}, L_{2}, L_{3}

. What is the probability that the three segments can form a triangle?

Can form a triangle if ${\begin{aligned} L_{1} < L_{2} + L_{3} = 1 - L_{1} \Rightarrow L_{1} < \frac{1}{2} \\ L_{2} < L_{1} + L_{3} = 1 - L_{2} \Rightarrow L_{2} < \frac{1}{2} \\ L_{3} < L_{1} + L_{2} = 1 - L_{3} \Rightarrow L_{3} < \frac{1}{2} . \end{aligned}$
Let $U_{1}, U_{2}$ denote the points chosen in $[0, 1]$ . Draw the area where $L_{1} = U_{1} < \frac{1}{2}, L_{2} = U_{2} - U_{1} < \frac{1}{2}, L_{3} = 1 - U_{2} < \frac{1}{2}$ . Thus $P = \frac{1}{4}$ .

Consider a linear model $Y = β_{0} + β_{1} X_{1} + \dots + β_{p - 1} X_{p - 1} + ε, ε \sim N (0, σ^{2}) .$

$\vec{X} = (1, X_{1}, \dots, X_{p - 1})$ is the covariate feature, e.g. (age, BMI, DNA).
$\vec{β} = (β_{0}, β_{1}, \dots, β_{p - 1})$ is unknown coefficients.
Training data: ${({\vec{X}}^{(i)}, Y^{(i)}), i = 1, \dots, n}, n ≫ p$ .
${\vec{Y}}_{n \times 1} = [\begin{matrix} Y^{(1)} \\ ⋮ \\ Y^{(n)} \end{matrix}]$ . Design matrix $X_{n \times p} = [\begin{matrix} {\vec{X}}^{(1)} \\ ⋮ \\ {\vec{X}}^{(n)} \end{matrix}]$ .

By MLE (see this example): $\hat{\vec{β}} = (X^{T} X)^{- 1} X^{T} \vec{Y} .$
Common hypothesis testing: $H_{0} : β_{j} = 0 vs. H_{1} : β_{j} \neq 0.$

Theorem

Under $H_{0}$ , the test statistic $S = \frac{1}{\sqrt{| | \vec{e} | |^{2} / [(n - p) σ^{2}]}} \cdot \frac{{\hat{β}}_{j}}{\sqrt{Var ({\hat{β}}_{j})}} \sim t_{n - p} .$
(Check definition of t distribution). Here $\vec{e} = \vec{Y} - X \hat{\vec{β}}$ is the residual.

Facts:

Under $H_{0}$ , $\frac{{\hat{β}}_{j}}{\sqrt{Var ({\hat{β}}_{j})}} \sim N (0, 1)$ .
$\frac{| | \vec{e} | |^{2}}{σ^{2}} \sim χ_{n - p}^{2}$ . (see here)
$| | \vec{e} | |^{2} ⊥ ⊥ \hat{\vec{β}}$ .

Today we will show that these facts imply $S \sim t_{n - p}$ .

Conditional density of continuous random variables:
$X, Y$ are continuous RVs, with joint density $f_{X, Y}$ . Given measurable set $A \in F$ , how to calculate $P (X \in A | Y = y_{0})$ ?
For $δ ≪ 1$ , $\begin{aligned} P (X \in A | y_{0} < Y < y_{0} + δ) & = \frac{P [(X \in A), (y_{0} < Y < y_{0} + δ)]}{P (y_{0} < Y < y_{0} + δ)} \\ = \frac{\int_{A} \int_{y_{0}}^{y_{0} + δ} f_{X, Y} (x, y) d y d x}{\int_{y_{0}}^{y_{0} + δ} f_{Y} (y) d y} \\ \approx \frac{\int_{A} f_{X, Y} (x, y_{0}) δ d x}{f_{Y} (y_{0}) δ} \\ = \int_{A} \frac{f_{X, Y} (x, y_{0})}{f_{Y} (y_{0})} d x \end{aligned}$

Conditional Density

$f_{X | Y = y_{0}} (x) = \frac{f_{X, Y} (x, y_{0})}{f_{Y} (y_{0})} .$

$f_{X | Y = y_{0}} (x)$ is well defined if $f_{Y} (y_{0}) > 0$ .

Independence

\begin{aligned} X ⊥ ⊥ Y \Leftrightarrow & f_{X | Y = y} (x) = f_{X} (x), \forall x \in R, \forall y \in R, s . t . f_{Y} (y) > 0. \\ \Leftrightarrow & f_{X, Y} (x, y) = f_{X} (x) f_{Y} (y), \forall x, y \in R . \end{aligned}

Law of Total Probability

f_{X} (x) = \int_{- \infty}^{+ \infty} f_{X, Y} d y = \int_{- \infty}^{+ \infty} f_{X | Y = y} (x) f_{Y} (y) d y .

Application 1

$X, Y \overset{i . i . d}{\sim} N (0, 1)$ . Let $R = \frac{X}{Y}$ . We want to calculate $f_{R} (r) = \int_{- \infty}^{+ \infty} f_{R | Y = y} (r) f_{Y} (y) d y .$
By independence of $X, Y$ , $(R | Y = y) \overset{d}{=} (\frac{X}{y} | Y = y) \overset{d}{=} \frac{X}{y} .$
Since $T (X) = \frac{X}{y}$ is invertible and differentiable, $f_{\frac{X}{y}} (r) = f_{X} (r y) | \frac{d (r y)}{d r} | = | y | f_{X} (r y) .$ Therefore $\begin{aligned} f_{R} (r) = & \int_{- \infty}^{+ \infty} | y | \frac{1}{\sqrt{2 π}} e^{- \frac{(r y)^{2}}{2}} \frac{1}{\sqrt{2 π}} e^{- \frac{y^{2}}{2}} d y \\ = & \frac{1}{2 π} \int_{- \infty}^{+ \infty} y e^{- \frac{y^{2}}{2} (1 + r^{2})} d y = \frac{1}{π (1 + r^{2})}, \end{aligned}$ thus $R \sim Cauchy$ .

Application 2

$X_{1}, Y_{1}, \dots, Y_{k} \overset{i . i . d}{\sim} N (0, 1)$ . $R = \frac{X}{\sqrt{\frac{Y_{1}^{2} + \dots + Y_{k}^{2}}{k}}}$ . (Recall: $G = Y_{1}^{2} + \dots + Y_{k}^{2} \sim Gamma (\frac{k}{2}, \frac{1}{2})$ .) Note that $(R | G = y) \overset{d}{=} (\frac{X}{\sqrt{g / k}} | G = g) \overset{d}{=} \frac{X}{\sqrt{g / k}} .$ Thus $\frac{f_{X}}{\sqrt{g / k}} = \sqrt{\frac{g}{k}} f_{X} (r \sqrt{\frac{g}{k}}) .$ By law of total probability $\begin{aligned} f_{R} (r) = & \int_{0}^{+ \infty} f_{R | G = y} (r) f_{G} (y) d y \\ = & \int_{0}^{+ \infty} \sqrt{\frac{g}{k}} f_{X} (r \sqrt{\frac{g}{k}}) f_{G} (y) d y \\ = & \frac{1}{\sqrt{2 π k}} \frac{1}{2^{k / 2}} \frac{1}{Γ (k / 2)} \int_{0}^{+ \infty} g^{\frac{1}{2} (k + 1) - 1} e^{- \frac{g}{2} (\frac{r^{2}}{k} + 1)} d g \\ = & \frac{Γ ((k + 1) / 2)}{Γ (k / 2)} \frac{1}{\sqrt{π k}} {(1 + \frac{r^{2}}{k})}^{- \frac{k + 1}{2}} \sim t_{k} . \end{aligned}$
(Recall $Γ (\frac{1}{2}) = \sqrt{π}, Γ (α) = (α - 1) Γ (α - 1)$ .)

What happens as $k \to \infty$ ?
$Y_{1}^{2}, \dots, Y_{k}^{2} \overset{i . i . d}{\sim} Gamma (\frac{1}{2}, \frac{1}{2})$ , $E (Y_{i}^{2}) = 1$ .
By SLLN, $\frac{Y_{1}^{2} + \dots + Y_{k}^{2}}{k} \overset{a . s .}{\to} 1$ , so $t_{k} \to N (0, 1)$ .