Problem of the day: Select two points independently and uniformly at random from a meter stick, thereby obtaining
3 segments:
L 1 , L 2 , L 3 . What is the probability that the three segments can form a triangle?
Can form a triangle if{ L 1 < L 2 + L 3 = 1 − L 1 ⇒ L 1 < 1 2 L 2 < L 1 + L 3 = 1 − L 2 ⇒ L 2 < 1 2 L 3 < L 1 + L 2 = 1 − L 3 ⇒ L 3 < 1 2 .
Let U 1 , U 2 denote the points chosen in [ 0 , 1 ] . Draw the area where L 1 = U 1 < 1 2 , L 2 = U 2 − U 1 < 1 2 , L 3 = 1 − U 2 < 1 2 . Thus P = 1 4 .
Consider a linear model Y = β 0 + β 1 X 1 + ⋯ + β p − 1 X p − 1 + ε , ε ∼ N ( 0 , σ 2 ) .
X → = ( 1 , X 1 , ⋯ , X p − 1 ) is the covariate feature , e.g. (age, BMI, DNA).
β → = ( β 0 , β 1 , ⋯ , β p − 1 ) is unknown coefficients .
Training data : { ( X → ( i ) , Y ( i ) ) , i = 1 , ⋯ , n } , n ≫ p .
Y → n × 1 = [ Y ( 1 ) ⋮ Y ( n ) ] . Design matrix X n × p = [ X → ( 1 ) ⋮ X → ( n ) ] .
By MLE (see this example ): β → ^ = ( X T X ) − 1 X T Y → .
Common hypothesis testing:H 0 : β j = 0 vs. H 1 : β j ≠ 0.
Facts:
Under H 0 , β ^ j Var ( β ^ j ) ∼ N ( 0 , 1 ) .
| | e → | | 2 σ 2 ∼ χ n − p 2 . (see here )
| | e → | | 2 ⊥ ⊥ β → ^ .
Today we will show that these facts imply S ∼ t n − p .
Conditional density of continuous random variables :
X , Y are continuous RVs, with joint density f X , Y . Given measurable set A ∈ F , how to calculate P ( X ∈ A | Y = y 0 ) ?
For δ ≪ 1 ,P ( X ∈ A | y 0 < Y < y 0 + δ ) = P [ ( X ∈ A ) , ( y 0 < Y < y 0 + δ ) ] P ( y 0 < Y < y 0 + δ ) = ∫ A ∫ y 0 y 0 + δ f X , Y ( x , y ) d y d x ∫ y 0 y 0 + δ f Y ( y ) d y ≈ ∫ A f X , Y ( x , y 0 ) δ d x f Y ( y 0 ) δ = ∫ A f X , Y ( x , y 0 ) f Y ( y 0 ) d x
f X | Y = y 0 ( x ) = f X , Y ( x , y 0 ) f Y ( y 0 ) .
f X | Y = y 0 ( x ) is well defined if f Y ( y 0 ) > 0 .
Independence
X ⊥ ⊥ Y ⇔ f X | Y = y ( x ) = f X ( x ) , ∀ x ∈ R , ∀ y ∈ R , s . t . f Y ( y ) > 0. ⇔ f X , Y ( x , y ) = f X ( x ) f Y ( y ) , ∀ x , y ∈ R . Law of Total Probability
f X ( x ) = ∫ − ∞ + ∞ f X , Y d y = ∫ − ∞ + ∞ f X | Y = y ( x ) f Y ( y ) d y .
X , Y ∼ i . i . d N ( 0 , 1 ) . Let R = X Y . We want to calculatef R ( r ) = ∫ − ∞ + ∞ f R | Y = y ( r ) f Y ( y ) d y .
By independence of X , Y ,( R | Y = y ) = d ( X y | Y = y ) = d X y .
Since T ( X ) = X y is invertible and differentiable,f X y ( r ) = f X ( r y ) | d ( r y ) d r | = | y | f X ( r y ) . Thereforef R ( r ) = ∫ − ∞ + ∞ | y | 1 2 π e − ( r y ) 2 2 1 2 π e − y 2 2 d y = 1 2 π ∫ − ∞ + ∞ y e − y 2 2 ( 1 + r 2 ) d y = 1 π ( 1 + r 2 ) , thus R ∼ Cauchy .
X 1 , Y 1 , ⋯ , Y k ∼ i . i . d N ( 0 , 1 ) . R = X Y 1 2 + ⋯ + Y k 2 k . (Recall: G = Y 1 2 + ⋯ + Y k 2 ∼ Gamma ( k 2 , 1 2 ) .) Note that( R | G = y ) = d ( X g / k | G = g ) = d X g / k . Thusf X g / k = g k f X ( r g k ) . By law of total probabilityf R ( r ) = ∫ 0 + ∞ f R | G = y ( r ) f G ( y ) d y = ∫ 0 + ∞ g k f X ( r g k ) f G ( y ) d y = 1 2 π k 1 2 k / 2 1 Γ ( k / 2 ) ∫ 0 + ∞ g 1 2 ( k + 1 ) − 1 e − g 2 ( r 2 k + 1 ) d g = Γ ( ( k + 1 ) / 2 ) Γ ( k / 2 ) 1 π k ( 1 + r 2 k ) − k + 1 2 ∼ t k .
(Recall Γ ( 1 2 ) = π , Γ ( α ) = ( α − 1 ) Γ ( α − 1 ) .)
What happens as k → ∞ ?
Y 1 2 , ⋯ , Y k 2 ∼ i . i . d Gamma ( 1 2 , 1 2 ) , E ( Y i 2 ) = 1 .
By SLLN, Y 1 2 + ⋯ + Y k 2 k → a . s . 1 , so t k → N ( 0 , 1 ) .