9 Three High Dimensional Models

1 Model One

Consider (1.1)yti.n.dN(μt,σ2). Note that RHS depends on t so yt 's distribution changes with t, so we can't say "iid".
Parameters here are μ1,,μn,σ2.
Without regularization, MLE gets μt=yt,σ2=0. So we define μ^tridge(λ) and μ^tlasso(λ), which minimize t=1n(ytμt)2+λt=2n1((μt+1μt)(μtμt1))2 and t=1n(ytμt)2+λt=2n1|(μt+1μt)(μtμt1)|. We already know from here that μ^tridge(λ)=Xβ^ridge(λ),μ^tlasso(λ)=Xβ^lasso(λ), where X=(1000110012101n1n21),β=(β0β1βn1), and β^ridge(λ),β^lasso(λ) respectively minimize ||yXβ||2+t=2n1βt2,||yXβ||2+t=2n1|βt|.

This model is an example of the "mean model" where focus is on μt. The next two models will be examples of "variance models".

2 Model Two

Consider (2.1)yti.n.dN(0,τt2). The likelihood is (2.2)t=1n1τtexp(yt22τt2). Then (y12,,yn2) form the sufficient statistic. Under (2.1), yt2i.n.dτt2χ12. Note that density of τt2χ12 is proportional to 1τt2(xτt2)12exp(x2τt2)=x121τtexp(x2τt2). The likelihood of (y12,,yn2) is thus t=1n(yt2)121τtexp(yt22τt2). Dropping (yt2)12, we have (2.2).

The log-likelihood is t=1n(logτtyt22τt2)t=1n(logτt+yt22τt2)=t=1n(αt+yt22e2αt). Here we use αt=logτt, because it removes the constraint τt>0, and improves computational stability. A simple minimization without regularization gets αt=log|yt|τt2=yt2. So we introduce regularization. Assume αt is smooth: α^tridge(λ),α^tlasso(λ), which minimizes t=1n(αt+yt22e2αt)+λt=2n1((αt+1αt)(αtαt1))2 and t=1n(αt+yt22e2αt)+λt=2n1|(αt+1αt)(αtαt1)|.

3 Model Three

Apply model two with DFT. Recall that for y0,,yn1, its DFT is b0,,bn1, where bj=t=0n1ytexp(2πijtn). Assume n is odd, m=n12. So we obtain model three from model two for the DFT terms b1,,bm, and assume Re(bj),Im(bj)i.i.dN(0,γj2),j=1,,m. Also assume bj are independent across j. The unknown parameters here are γ1,,γm. γj represents the strength of sinusoids at frequency j/n. The likelihood is j=1m1γjexp((Re(bj))22γj2)1γjexp((Im(bj))22γj2)=j=1m1γj2exp((Re(bj))2+(Im(bj))22γj2)=j=1m1γj2exp(|bj|22γj2).
Recall: periodogram I(j/n) is defined as I(j/n)=|bj|2n. We can therefore rewrite the likelihood: j=1m1γj2exp(nI(j/n)2γj2). So the periodogram forms the sufficient statistic in this model. Further I(j/n)=1n|bj|2=1n((Re(bj))2+(Im(bj))2)γj2nχ22. So we rewrite the model as I(j/n)i.n.dγj2nχ22,j=1,,m. The negative log-likelihood is j=1m(2logγj+nI(j/n)2γj2)=j=1m(2αj+nI(j/n)2e2αj), where αj=logγj. Directly minimize it, we have αj=lognI(j/n)2,γj2=e2αj=nI(j/n)2. Next regularize it: α^tridge(λ), α^tlasso(λ): j=1m(2αj+nI(j/n)2e2αj)+λj=2m1((αj+1αj)(αjαj1))2, and j=1m(2αj+nI(j/n)2e2αj)+λj=2m1|(αj+1αj)(αjαj1)|.