1 Sinusoidal Model
Goal: to fit a simple sinusoidal model to the sunspots data.
Let's consider the expression of sinusoidal model: s ( t ) = A cos ( 2 π f t ) + B sin ( 2 π f t ) , A = R cos ϕ , B = − R sin ϕ .
For us, usually t = 1 , 2 , ⋯ , n . In this case, we can restrict the frequency f ∈ [ 0 , 1 2 ] .
Based on the remark,
For every f , ϕ , there exists f 0 ∈ [ 0 , 1 2 ] , ϕ 0 , s.t. s ( t ; f , ϕ ) = s ( t , f 0 , ϕ 0 ) , ∀ t ∈ N ∗ .
When f = 0 , s ( t ) is constant; when f = 1 2 , s ( t ) = R cos ( π t + ϕ ) = ( − 1 ) t R cos ( ϕ ) .
y t = β 0 + β 1 cos 2 π f t + β 2 sin 2 π f t + ε t , ε t ∼ i . i . d N ( 0 , σ 2 ) .
The parameters here are β 0 , β 1 , β 2 , σ , f .
If f is known, it's a linear model. Otherwise it's a nonlinear regression model.
2 Frequentist Inference
Like in linear models , frequentist calculate the MLE: ∏ i = 1 n 1 2 π σ exp [ − ( y t − β 0 − β 1 cos 2 π f t − β 2 sin 2 π f t ) 2 2 σ 2 ] ∝ σ − n exp [ − 1 2 σ 2 ∑ t = 1 n ( y t − β 0 − β 1 cos 2 π f t − β 2 sin 2 π f t ) 2 ] = σ − n exp [ − 1 2 σ 2 S ( β 0 , β 1 , β 2 , f ) ] .
So the problem is to find ( β ^ 0 , β ^ 1 , β ^ 2 , f ^ ) that minimizes S ( β 0 , β 1 , β 2 , f ) = ∑ t = 1 n ( y t − β 0 − β 1 cos 2 π f t − β 2 sin 2 π f t ) 2 (2.1) = | | y − X f β | | 2 . We first fix f , so this is just a linear regression. We have β ^ ( f ) = ( X f T X f ) − 1 X f T y , X f = [ 1 cos 2 π f ( 1 ) sin 2 π f ( 1 ) ⋮ ⋮ ⋮ 1 cos 2 π f ( n ) sin 2 π f ( n ) ] . Then f ^ = arg min f S ( β ^ ( f ) , f ) , β = ( β 0 , β 1 , β 2 ) T .
3 Bayesian Inference
The posterior is σ − n exp [ − S ( β , f ) 2 σ 2 ] 1 { 0 ≤ f ≤ 1 2 } , and assume prior β 0 , β 1 , β 2 , log σ ∼ i . i . d Uniform [ − C , C ] , f ∼ Uniform [ 0 , 1 2 ] .
Then f β , σ , f | d a t a = σ − n − 1 exp [ − S ( β , f ) 2 σ 2 ] 1 { 0 ≤ f ≤ 1 2 } 1 { − C < β 0 , β 1 , β 2 , log σ < C } , then the posterior density of f ∝ ∬ σ − n − 1 exp [ − S ( β , f ) 2 σ 2 ] d β d σ .
By Pythagorean identity , S ( β , f ) = | | y − X f β | | 2 = S ( β ^ f , f ) + ( β − β ^ f ) T X f T X f ( β − β ^ f ) .
So further on ∝ ∬ σ − n − 1 exp ( − S ( β ^ f , f ) 2 σ 2 ) [ − ( β − β ^ f ) T X f T X f ( β − β ^ f ) 2 σ 2 ] d β d σ = ∫ σ − n − 1 exp ( − S ( β ^ f , f ) 2 σ 2 ) ( 2 π ) p 2 det ( σ 2 ( X f T X f ) − 1 ) ∝ ∫ σ − n − 1 exp ( − S ( β ^ f , f ) 2 σ 2 ) σ p | X f T X f | − 1 2 d σ = | X f T X f | − 1 2 ∫ σ − n + p − 1 exp ( − S ( β ^ f , f ) 2 σ 2 ) d σ = ∫ 0 ∞ t − n + p − 1 ( S ( β ^ f , f ) ) − n − p 2 exp ( − 1 2 t 2 ) d t ∝ ( S ( β ^ f , f ) ) − n − p 2 | X f T X f | − 1 2 .
Compare with linear regression: Posterior ∝ ( S ( β ) ) − n 2 .
4 Fourier Frequency
RSS ( f ) (S above measures how well the sinusoid at frequency f fits the data). So we should take a grid of values for f , and then compute RSS ( f ) for each grid point.
Common Choice of Grid for f : (notate as G )
n is even, 0 , 1 n , 2 n , ⋯ , 1 2 ,
n is odd, 0 , 1 n , 2 n , ⋯ , n − 1 2 n .
Note that n is the size of data.
Fourier frequency is frequency f s.t. n f is an integer.
So our common choice inside G are all Fourier frequencies lying in [ 0 , 1 2 ] .
When f ∈ G , plug in β ^ f = ( X f T X f ) − 1 X t T y , RSS ( f ) = | | y − X f β ^ f | | 2 = ( y − X f β ^ f ) T ( y − X f β ^ f ) = y T y − β ^ f T X f T y − y T X f β ^ f + β ^ f T X f T X f β ^ f = y T y − y T ( X f T X f ) − 1 X f T y − y T X f ( X f T X f ) − 1 X f T y + y T X f ( X f T X f ) − 1 X f T X f ( X f T X f ) − 1 X f T y = y T y − y T X f ( X f T X f ) − 1 X f T y .
And X f T X f = [ 1 ⋯ 1 cos 2 π f ⋅ 1 ⋯ cos 2 π f ⋅ n sin 2 π f ⋅ 1 ⋯ sin 2 π f ⋅ n ] [ 1 cos 2 π f ⋅ 1 sin 2 π f ⋅ 1 ⋮ ⋮ ⋮ 1 cos 2 π f ⋅ n sin 2 π f ⋅ n ] = [ n ∑ t = 1 n cos ( 2 π f t ) ∑ t = 1 n sin ( 2 π f t ) ∗ ∑ t = 1 n cos 2 ( 2 π f t ) ∑ t = 1 n cos ( 2 π f t ) sin ( 2 π f t ) ∗ ∗ ∑ t = 1 n sin 2 2 π f t ] . Next let r = e 2 π i f t , ∑ t = 1 n cos ( 2 π f t ) = ∑ t = 1 n e 2 π i f t + e − 2 π i f t 2 = r = e 2 π i f 1 2 ∑ t = 1 n r t + 1 2 ∑ t = 1 n r − t = 1 2 e 2 π i f e 2 π i f − 1 ( e 2 π i f − 1 ) + 1 2 e − 2 π i f e − 2 π i f − 1 ( e − 2 π i f − 1 ) = 0.
Next similarly ∑ t = 1 n cos 2 ( 2 π f t ) = ∑ t = 1 n 1 + cos ( 4 π f t ) 2 = n 2 , ∑ t = 1 n cos ( 2 π f t ) sin ( 2 π f t ) = 1 2 ∑ t = 1 n sin ( 4 π f t ) = 0.
f 1 , f 2 are two distinct Fourier frequencies. Then ∑ t = 0 n − 1 cos ( 2 π f 1 t ) sin ( 2 π f 2 t ) = 0 , ∑ t = 0 n − 1 sin ( 2 π f 1 t ) sin ( 2 π f 2 t ) = 0 , ∑ t = 0 n − 1 sin ( 2 π f 1 t ) cos ( 2 π f 2 t ) = 0.
So X f T X f = [ n 0 0 0 n 2 0 0 0 n 2 ] , ( X f T X f ) − 1 = [ 1 n 0 0 0 2 n 0 0 0 2 n ] . Then RSS ( f ) = y T y − ( ∑ t = 1 n y t ∑ t = 1 n y t cos 2 π f t ∑ t = 1 n y t sin 2 π f t ) [ 1 n 0 0 0 2 n 0 0 0 2 n ] ( ∑ t = 1 n y t ∑ t = 1 n y t cos 2 π f t ∑ t = 1 n y t sin 2 π f t ) = y T y − 1 n ( ∑ t = 1 n y t ) 2 − 2 n ( ∑ t = 1 n y t cos 2 π f t ) 2 − 2 n ( ∑ t = 1 n y t sin 2 π f t ) 2 = ∑ t = 1 n ( y t − y ― ) 2 − 2 n ( ∑ t = 1 n y t cos 2 π f t ) 2 − 2 n ( ∑ t = 1 n y t sin 2 π f t ) 2 . Note again that this is only true when f ∈ [ 0 , 1 2 ] and f is Fourier frequency.
Define periodogram as I ( f ) = 1 n [ ( ∑ t = 1 n y t cos 2 π f t ) 2 + ( ∑ t = 1 n y t sin 2 π f t ) 2 ] .
So RSS ( f ) = ∑ t = 1 n ( y t − y ― ) 2 − 2 I ( f ) .
We can also rewrite I ( f ) = 1 n | ∑ t = 1 n y t ( cos 2 π f t + i sin 2 π f t ) | 2 = 1 n | ∑ t = 1 n y t e − 2 π i f t | 2 . This is exactly a Fourier transformation .
5 Some Other Nonlinear Regression Models
y t = β 0 + β 1 t + β 2 cos ( 2 π f t ) + β 3 sin ( 2 π f t ) + ε t . RSS ( f ) = ∑ t = 1 n ( y t − β 0 − β 1 t − β 2 cos ( 2 π f t ) − β 3 sin ( 2 π f t ) ) 2 .
(Broken stick / change of slope) y t = β 0 + β 1 t + β 2 ( t − s ) + + ε t , here ( t − s ) + = max { t − s , 0 } .