9 Three High Dimensional Models

#Regularization #NormalDistribution #Chi2Distribution #DFT

1 Model One

Consider $\begin{matrix} (1.1) & y_{t} \overset{i . n . d}{\sim} N (μ_{t}, σ^{2}) . \end{matrix}$ Note that RHS depends on $t$ so $y_{t}$ 's distribution changes with $t$ , so we can't say "iid".
Parameters here are $μ_{1}, \dots, μ_{n}, σ^{2}$ .
Without regularization, MLE gets $μ_{t} = y_{t}, σ^{2} = 0$ . So we define ${\hat{μ}}_{t}^{ridge} (λ)$ and ${\hat{μ}}_{t}^{lasso} (λ)$ , which minimize $\sum_{t = 1}^{n} (y_{t} - μ_{t})^{2} + λ \sum_{t = 2}^{n - 1} ((μ_{t + 1} - μ_{t}) - (μ_{t} - μ_{t - 1}))^{2}$ and $\sum_{t = 1}^{n} (y_{t} - μ_{t})^{2} + λ \sum_{t = 2}^{n - 1} | (μ_{t + 1} - μ_{t}) - (μ_{t} - μ_{t - 1}) | .$ We already know from here that ${\hat{μ}}_{t}^{ridge} (λ) = X {\hat{β}}^{ridge} (λ), {\hat{μ}}_{t}^{lasso} (λ) = X {\hat{β}}^{lasso} (λ),$ where $X = (\begin{matrix} 1 & 0 & 0 & \dots & 0 \\ 1 & 1 & 0 & \dots & 0 \\ 1 & 2 & 1 & \dots & 0 \\ ⋮ & ⋮ & ⋮ & ⋱ & ⋮ \\ 1 & n - 1 & n - 2 & \dots & 1 \end{matrix}), β = (\begin{matrix} β_{0} \\ β_{1} \\ ⋮ \\ β_{n - 1} \end{matrix}),$ and ${\hat{β}}^{ridge} (λ), {\hat{β}}^{lasso} (λ)$ respectively minimize $| | y - X β | |^{2} + \sum_{t = 2}^{n - 1} β_{t}^{2}, | | y - X β | |^{2} + \sum_{t = 2}^{n - 1} | β_{t} | .$

This model is an example of the "mean model" where focus is on $μ_{t}$ . The next two models will be examples of "variance models".

2 Model Two

Consider $\begin{matrix} (2.1) & y_{t} \overset{i . n . d}{\sim} N (0, τ_{t}^{2}) . \end{matrix}$ The likelihood is $\begin{matrix} (2.2) & \prod_{t = 1}^{n} \frac{1}{τ_{t}} \exp (- \frac{y_{t}^{2}}{2 τ_{t}^{2}}) . \end{matrix}$ Then $(y_{1}^{2}, \dots, y_{n}^{2})$ form the sufficient statistic. Under (2.1), $y_{t}^{2} \overset{i . n . d}{\sim} τ_{t}^{2} χ_{1}^{2} .$ Note that density of $τ_{t}^{2} χ_{1}^{2}$ is proportional to $\frac{1}{τ_{t}^{2}} {(\frac{x}{τ_{t}^{2}})}^{- \frac{1}{2}} \exp (- \frac{x}{2 τ_{t}^{2}}) = x^{- \frac{1}{2}} \frac{1}{τ_{t}} \exp (- \frac{x}{2 τ_{t}^{2}}) .$ The likelihood of $(y_{1}^{2}, \dots, y_{n}^{2})$ is thus $\prod_{t = 1}^{n} (y_{t}^{2})^{- \frac{1}{2}} \frac{1}{τ_{t}} \exp (- \frac{y_{t}^{2}}{2 τ_{t}^{2}}) .$ Dropping $(y_{t}^{2})^{- \frac{1}{2}}$ , we have (2.2).

The log-likelihood is $\sum_{t = 1}^{n} (- \log τ_{t} - \frac{y_{t}^{2}}{2 τ_{t}^{2}}) \propto \sum_{t = 1}^{n} (\log τ_{t} + \frac{y_{t}^{2}}{2 τ_{t}^{2}}) = \sum_{t = 1}^{n} (α_{t} + \frac{y_{t}^{2}}{2} e^{- 2 α_{t}}) .$ Here we use $α_{t} = \log τ_{t}$ , because it removes the constraint $τ_{t} > 0$ , and improves computational stability. A simple minimization without regularization gets $α_{t} = \log | y_{t} | \Rightarrow τ_{t}^{2} = y_{t}^{2}$ . So we introduce regularization. Assume $α_{t}$ is smooth: ${\hat{α}}_{t}^{ridge} (λ), {\hat{α}}_{t}^{lasso} (λ)$ , which minimizes $\sum_{t = 1}^{n} (α_{t} + \frac{y_{t}^{2}}{2} e^{- 2 α_{t}}) + λ \sum_{t = 2}^{n - 1} ((α_{t + 1} - α_{t}) - (α_{t} - α_{t - 1}))^{2}$ and $\sum_{t = 1}^{n} (α_{t} + \frac{y_{t}^{2}}{2} e^{- 2 α_{t}}) + λ \sum_{t = 2}^{n - 1} | (α_{t + 1} - α_{t}) - (α_{t} - α_{t - 1}) | .$

3 Model Three

Apply model two with DFT. Recall that for $y_{0}, \dots, y_{n - 1}$ , its DFT is $b_{0}, \dots, b_{n - 1}$ , where $b_{j} = \sum_{t = 0}^{n - 1} y_{t} \exp (- \frac{2 π i j t}{n}) .$ Assume $n$ is odd, $m = \frac{n - 1}{2}$ . So we obtain model three from model two for the DFT terms $b_{1}, \dots, b_{m}$ , and assume $Re (b_{j}), Im (b_{j}) \overset{i . i . d}{\sim} N (0, γ_{j}^{2}), j = 1, \dots, m .$ Also assume $b_{j}$ are independent across $j$ . The unknown parameters here are $γ_{1}, \dots, γ_{m}$ . $γ_{j}$ represents the strength of sinusoids at frequency $j / n$ . The likelihood is $\begin{aligned} \prod_{j = 1}^{m} \frac{1}{γ_{j}} \exp (- \frac{(Re (b_{j}))^{2}}{2 γ_{j}^{2}}) \frac{1}{γ_{j}} \exp (- \frac{(Im (b_{j}))^{2}}{2 γ_{j}^{2}}) \\ = & \prod_{j = 1}^{m} \frac{1}{γ_{j}^{2}} \exp (- \frac{(Re (b_{j}))^{2} + (Im (b_{j}))^{2}}{2 γ_{j}^{2}}) = \prod_{j = 1}^{m} \frac{1}{γ_{j}^{2}} \exp (- \frac{| b_{j} |^{2}}{2 γ_{j}^{2}}) . \end{aligned}$
Recall: periodogram $I (j / n)$ is defined as $I (j / n) = \frac{| b_{j} |^{2}}{n} .$ We can therefore rewrite the likelihood: $\prod_{j = 1}^{m} \frac{1}{γ_{j}^{2}} \exp (- \frac{n I (j / n)}{2 γ_{j}^{2}}) .$ So the periodogram forms the sufficient statistic in this model. Further $I (j / n) = \frac{1}{n} | b_{j} |^{2} = \frac{1}{n} ((Re (b_{j}))^{2} + (Im (b_{j}))^{2}) \sim \frac{γ_{j}^{2}}{n} χ_{2}^{2} .$ So we rewrite the model as $I (j / n) \overset{i . n . d}{\sim} \frac{γ_{j}^{2}}{n} χ_{2}^{2}, j = 1, \dots, m .$ The negative log-likelihood is $\sum_{j = 1}^{m} (2 \log γ_{j} + \frac{n I (j / n)}{2 γ_{j}^{2}}) = \sum_{j = 1}^{m} (2 α_{j} + \frac{n I (j / n)}{2} e^{- 2 α_{j}}),$ where $α_{j} = \log γ_{j}$ . Directly minimize it, we have $α_{j} = \log \sqrt{\frac{n I (j / n)}{2}}, γ_{j}^{2} = e^{2 α_{j}} = \frac{n I (j / n)}{2} .$ Next regularize it: ${\hat{α}}_{t}^{ridge} (λ)$ , ${\hat{α}}_{t}^{lasso} (λ)$ : $\sum_{j = 1}^{m} (2 α_{j} + \frac{n I (j / n)}{2} e^{- 2 α_{j}}) + λ \sum_{j = 2}^{m - 1} ((α_{j + 1} - α_{j}) - (α_{j} - α_{j - 1}))^{2},$ and $\sum_{j = 1}^{m} (2 α_{j} + \frac{n I (j / n)}{2} e^{- 2 α_{j}}) + λ \sum_{j = 2}^{m - 1} | (α_{j + 1} - α_{j}) - (α_{j} - α_{j - 1}) | .$