11 AR Models

1 AR Models

AR Model

The AR (Auto-Regression) model of order $p$ ( $AR (p)$ ) is given by $\begin{matrix} (1.1) & y_{t} = ϕ_{0} + ϕ_{1} y_{t - 1} + \dots + ϕ_{p} y_{t - p} + ε_{t} \end{matrix}$ for $t = p + 1, \dots, n$ . In matrix notation, $Y = X β + ε$ , where $\begin{array}{r} Y = (\begin{array}{c} y_{p + 1} \\ ⋮ \\ y_{n} \end{array}), X = (\begin{array}{c} 1 & y_{p} & \dots & y_{1} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 1 & y_{n - 1} & \dots & y_{n - p} \end{array}), β = (\begin{array}{c} ϕ_{0} \\ ⋮ \\ ϕ_{p} \end{array}), ε = (\begin{array}{c} ε_{p + 1} \\ ⋮ \\ ε_{n} \end{array}) . \end{array}$

$y_{t}$ is regressed on its own lagged values $y_{t - 1}, \dots, y_{t - p}$ .

For predicting $y_{n + 1}$ , plug $t = n + 1$ into (1.1): $y_{n + 1} = {\hat{ϕ}}_{0} + {\hat{ϕ}}_{1} y_{n} + \dots + {\hat{ϕ}}_{p} y_{n + 1 - p} .$
Note that $y_{n}, y_{n - 1}, \dots, y_{n + 1 - p}$ are all observed and they are the last $p$ observations. Then for $y_{n + 2}$ , $y_{n + 2} = {\hat{ϕ}}_{0} + {\hat{ϕ}}_{1} y_{n + 1} + \dots + {\hat{ϕ}}_{p} y_{n + 2 - p} .$
Here $y_{n + 1}$ is not observed, but we can replace with predicted value ${\hat{y}}_{n + 1}$ . And recursively, predict value for any time point.

We can see AR as simply regression of the observed time series on lagged versions of itself.

2 Relation to MLE

Start with $AR (1)$ : $\begin{matrix} (2.1) & y_{t} = ϕ_{0} + ϕ_{1} y_{t - 1} + ε_{t}, t = 2, \dots, n . \end{matrix}$

2.1 Usual Regression

(2) looks just like $\begin{matrix} (2.2) & y_{i} = β_{0} + β_{1} x_{i} + ε_{i} . \end{matrix}$ For this, given independence, $\begin{aligned} Likelihood & = f_{x_{1}, y_{1}, \dots, x_{n}, y_{n} | θ} (x_{1}, y_{1}, \dots, x_{n}, y_{n}) \\ = \prod_{i = 1}^{n} f_{x_{i}, y_{i} | θ} (x_{i}, y_{i}) = \prod_{i = 1}^{n} f_{y_{i} | x_{i}, θ} (y_{i}) f_{x_{i} | θ} (x_{i}) . \end{aligned}$
Now replace $y_{i}$ by $β_{0} + β_{1} x_{i} + ε_{i}$ : $\begin{aligned} Likelihood & = \prod_{i = 1}^{n} f_{β_{0} + β_{1} x_{i} + ε_{i} | x_{i}, θ} (y_{i}) f_{x_{i} | θ} (x_{i}) \\ = \prod_{i = 1}^{n} f_{ε_{i} | x_{i}, θ} (y_{i} - β_{0} - β_{1} x_{i}) f_{x_{i} | θ} (x_{i}) \\ = \prod_{i = 1}^{m} f_{ε_{i} | θ} (y_{i} - β_{0} - β_{1} x_{i}) f_{x_{i} | θ} (x_{i}) \\ = \prod_{i = 1}^{m} \frac{1}{\sqrt{2 π} σ} \exp (- \frac{(y_{i} - β_{0} - β_{1} x_{i})}{2 σ^{2}}) f_{x_{i} | θ} (x_{i}) \\ = {(\frac{1}{\sqrt{2 π} σ})}^{m} \exp (- \frac{1}{2 σ^{2}} \sum_{i = 1}^{m} (y_{i} - β_{0} - β_{1} x_{i})^{2}) \prod_{i = 1}^{m} f_{x_{i} | θ} (x_{i}) \\ \propto {(\frac{1}{\sqrt{2 π} σ})}^{m} \exp (- \frac{1}{2 σ^{2}} \sum_{i = 1}^{m} (y_{i} - β_{0} - β_{1} x_{i})^{2}) . \end{aligned}$
To sum up, here are the assumptions we made:

Independence of $(x_{1}, y_{1}), \dots, (x_{n}, y_{n})$ .
Model equation (linearity)
Independence of $ε_{i}, x_{i}$
$ε_{i} \sim N (0, σ^{2})$
Density of $x_{i}$ independent of $θ = (β_{0}, β_{1}, σ)$ .

2.2 Back to $AR (1)$

When we try to apply (3) to (2), these assumptions don't hold:

Independence of $(x_{i}, y_{i})$
Density of $x_{t} = y_{t - 1}$ depends on $θ$ ( $ϕ_{0}, ϕ_{1}, σ$ affects $y_{t - 1}$ )

So now given $y_{1}, \dots, y_{n}$ , $\begin{aligned} Likelihood for (2) & = f_{y_{1}, \dots, y_{n} | θ} (y_{1}, \dots, y_{n}) \\ = f_{y_{1} | θ} (y_{1}) f_{y_{2} | y_{1}, θ} (y_{2}) \dots f_{y_{n} | y_{1}, \dots, y_{n - 1}, θ} (y_{n}) \\ = f_{y_{1} | θ} (y_{1}) \prod_{t = 2}^{n} f_{y_{t} | y_{1}, \dots, y_{t - 1}, θ} (y_{t}) \\ = f_{y_{1} | θ} (y_{1}) \prod_{t = 2}^{n} f_{ε_{t} | y_{1}, \dots, y_{t - 1}, θ} (y_{t} - ϕ_{0} - ϕ_{1} y_{t - 1}) \\ = f_{y_{1} | θ} (y_{1}) \prod_{t = 2}^{n} f_{ε_{t}} (y_{t} - ϕ_{0} - ϕ_{1} y_{t - 1}) \\ = f_{y_{1} | θ} (y_{1}) \prod_{t = 2}^{n} \frac{1}{\sqrt{2 π} σ} \exp (- \frac{1}{2 σ^{2}} (y_{t} - ϕ_{0} - ϕ_{1} y_{t - 1})^{2}) \\ = f_{y_{1} | θ} (y_{1}) {(\frac{1}{\sqrt{2 π} σ})}^{n - 1} \exp (- \frac{1}{2 σ^{2}} \sum_{t = 2}^{n} (y_{t} - ϕ_{0} - ϕ_{1} y_{t - 1})^{2}) . \end{aligned}$
Here we assume

Model equation (2.1).
Independence of $ε_{t}$ and $y_{1}, \dots, y_{t - 1}$ for $t = 2, \dots, n - 1$ .
$ε_{t} \sim N (0, σ^{2})$ .

2.2.1 Computation of $f_{y_{1} | θ} (y_{1})$

We can't get $f_{y_{1} | θ} (y_{1})$ from equation. There are two approaches to get it.

First, we simply assume $f_{y_{1} | θ} (y_{1})$ does not depend on $θ$ . So $\begin{matrix} (2.3) & Likelihood \propto {(\frac{1}{\sqrt{2 π} σ})}^{n - 1} \exp (- \frac{1}{2 σ^{2}} \sum_{t = 2}^{n} (y_{t} - ϕ_{0} - ϕ_{1} y_{t - 1})^{2}) . \end{matrix}$
It's easy to verify that the estimates ${\hat{ϕ}}_{0}, {\hat{ϕ}}_{1}$ are identical to those obtained in (2.1). We call this Conditional MLE.

Second, we extend the model to $t = 1, 0, - 1, - 2, \dots$ . So $\begin{aligned} y_{1} & = ϕ_{0} + ϕ_{1} y_{0} + ε_{1} \\ = ϕ_{0} + ϕ_{1} (ϕ_{0} + ϕ_{1} y_{- 1} + ε_{0}) + ε_{1} \\ = ϕ_{0} (1 + ϕ_{1}) + ϕ_{1}^{2} y_{- 1} + ϕ_{1} ε_{0} + ε_{1} \\ = ϕ_{0} (1 + ϕ_{1}) + ϕ_{1}^{2} (ϕ_{0} + ϕ_{1} y_{- 2} + ε_{- 1}) + ϕ_{1} ε_{0} + ε_{1} \\ = ϕ_{0} (1 + ϕ_{1} + ϕ_{1}^{2}) + ϕ_{1}^{3} y_{- 2} + ϕ_{1}^{2} ε_{- 1} + ϕ_{1} ε_{0} + ε_{1} \\ = \dots = ϕ_{0} \sum_{j = 0}^{M} ϕ_{1}^{j} + ϕ_{1}^{M + 1} y_{- M} + \sum_{j = 0}^{M} ϕ_{1}^{j} ε_{1 - j} . \end{aligned}$
If $| ϕ_{1} | < 1$ , coefficient $ϕ_{1}^{M + 1}$ is very small, so $y_{1} \approx ϕ_{0} \sum_{j = 0}^{M} ϕ_{1}^{j} + \sum_{j = 0}^{M} ϕ_{1}^{j} ε_{1 - j} \approx ϕ_{0} \sum_{j = 0}^{\infty} ϕ_{1}^{j} + \sum_{j = 0}^{\infty} ϕ_{1}^{j} ε_{1 - j} = \frac{ϕ_{0}}{1 - ϕ_{1}} + \sum_{j = 0}^{\infty} ϕ_{1}^{j} ε_{1 - j} .$
Since $E (\sum_{j = 0}^{\infty} ϕ_{1}^{j} ε_{1 - j}) = 0$ , and $Var (\sum_{j = 0}^{\infty} ϕ_{1}^{j} ε_{1 - j}) = \sum_{j = 0}^{\infty} Var (ϕ_{1}^{j} ε_{1 - j}) = \sum_{j = 0}^{\infty} ϕ_{1}^{2 j} Var (ε_{1 - j}) = σ^{2} \sum_{j = 0}^{\infty} ϕ_{1}^{2 j} = \frac{σ^{2}}{1 - ϕ_{1}^{2}} .$
Thus when $| ϕ_{1} | < 1$ , $y_{1} \sim N (\frac{ϕ_{0}}{1 - ϕ_{1}}, \frac{σ^{2}}{1 - ϕ_{1}^{2}}),$ so $f_{y_{1} | θ} (y_{1}) = \frac{\sqrt{1 - ϕ_{1}^{2}}}{\sqrt{2 π} σ} \exp (- \frac{1 - ϕ_{1}^{2}}{2 σ^{2}} {(y_{1} - \frac{ϕ_{0}}{1 - ϕ_{1}})}^{2}),$ so finally $\begin{array}{r} (5) & Likelihood = \frac{\sqrt{1 - ϕ_{1}^{2}}}{(\sqrt{2 π} σ)^{n}} \exp (- \frac{1 - ϕ_{1}^{2}}{2 σ^{2}} {(y_{1} - \frac{ϕ_{0}}{1 - ϕ_{1}})}^{2} - \frac{1}{2 σ^{2}} \sum_{t = 2}^{n} (y_{t} - ϕ_{0} - ϕ_{1} y_{t - 1})^{2}) . \end{array}$
Now compare (4) and (5). (5) is referred to as full likelihood for $AR (1)$ and therefore full MLE. (4) and (5) will be quite close when $| ϕ_{1} | < 1$ and $n$ is large.

2.3 $AR (p)$

The $AR (p)$ model is given by $y_{t} = ϕ_{0} + ϕ_{1} y_{t - 1} + \dots + ϕ_{p} y_{t - p} + ε_{t} .$
The likelihood is $f_{y_{1}, \dots, y_{n} | θ} (y_{1}, \dots, y_{n}) = f_{y_{p + 1}, \dots, y_{n} | y_{1}, \dots, y_{p}, θ} (y_{p + 1}, \dots, y_{n}) f_{y_{1}, \dots, y_{p} | θ} (y_{1}, \dots, y_{p}) .$
The conditional likelihood is $\begin{aligned} f_{y_{p + 1}, \dots, y_{n} | y_{1}, \dots, y_{p}, θ} (y_{p + 1}, \dots, y_{n}) = \prod_{t = p + 1}^{n} f_{y_{t} | y_{t - 1}, \dots, y_{1}} (y_{t}) \\ = & \prod_{t = p + 1}^{n} f_{ε_{t} | y_{t - 1}, \dots, y_{1}} (y_{t} - ϕ_{0} - ϕ_{1} y_{t - 1} - \dots - ϕ_{p} y_{t - p}) \\ = & {(\frac{1}{\sqrt{2 π} σ})}^{n - p} \exp (- \frac{1}{2 σ^{2}} \sum_{t = p + 1}^{n} (y_{t} - ϕ_{0} - ϕ_{1} y_{t - 1} - \dots - ϕ_{p} y_{t - p})^{2}) . \end{aligned}$
Here we assume $ε_{t} | y_{t - 1}, \dots, y_{1} \sim N (0, σ^{2}), t = p + 1, \dots, n$ .

To obtain the parameter estimates, we can directly maximize the likelihood. And since $f_{y_{1}, \dots, y_{p} | θ} (y_{1}, \dots, y_{p})$ does not depend on $θ$ , it is equivalent to maximizing the conditional likelihood.

2.3.1 Bayesian Approach

However, if we want to derive $f_{y_{1}, \dots, y_{p} | θ} (y_{1}, \dots, y_{p})$ in a more principled way, we have to use (1.1) for smaller values of $t$ but it's complicated and not really worth it. We can also work under some "stationarity" assumptions on $ϕ_{0}, \dots, ϕ_{p}$ (much simpler than conditional likelihood): use matrix notation (see here), and $likelihood \propto_{θ} {(\frac{1}{\sqrt{2 π} σ})}^{n - 1} \exp (- \frac{| | Y - X β | |^{2}}{2 σ^{2}}) .$ Assume $ϕ_{0}, \dots, ϕ_{p}, \log σ \overset{i . i . d}{\sim} Unif (- C, C)$ , then by here, $β | data \sim t_{n - 2 p - 1, p + 1} (\hat{β}, {\hat{σ}}^{2} (X^{T} X)^{- 1}),$ where $\hat{β} = (X^{T} X)^{- 1} X^{T} Y, \hat{σ} = \sqrt{\frac{| | Y - X \hat{β} | |^{2}}{n - 2 p - 1}}$ . If inference for $σ$ is desired, we can use $\frac{| | Y - X \hat{β} | |^{2}}{σ^{2}} | data \sim χ_{n - 2 p - 1}^{2} .$

Bayesian inference for AR is identical to linear regression models because of the same likelihood. Bayesian inference only cares about the likelihood.
Frequentist inference is based on MLE, given by $\hat{β}$ and ${\hat{σ}}_{MLE} = \sqrt{\frac{| | Y - X \hat{β} | |^{2}}{n - p}}$ . The analysis is quite different from linear regression. The results are slightly different but close.

3 Predictions and Difference Equations

Given a fitted $AR (p)$ model with ${\hat{ϕ}}_{0}, \dots, {\hat{ϕ}}_{p}$ , predictions ${\hat{y}}_{n + i}$ for $i = 1, 2, \dots$ are obtained by: $\begin{matrix} (3.1) & {\hat{y}}_{n + i} = {\hat{ϕ}}_{0} + {\hat{ϕ}}_{1} {\hat{y}}_{n + i - 1} + \dots + {\hat{ϕ}}_{p} {\hat{y}}_{n + i - p}, i = 1, 2, \dots \end{matrix}$ where the recursion is initialized with ${\hat{y}}_{j} = y_{j}, j = n, n - 1, \dots, n + 1 - p .$
Or we can rewrite to $\begin{matrix} (3.2) & u_{k} = α_{0} + α_{1} u_{k - 1} + \dots + α_{p} u_{k - p}, k = p, p + 1, \dots \end{matrix}$ (Initialized by $u_{0}, \dots, u_{p - 1}$ ). (3.2) is called a difference equation of order $p$ .

3.1 First Order ( $p = 1$ )

Now (3.2) becomes $u_{k} = α_{0} + α_{1} u_{k - 1}$ along with initialized $u_{0}$ . Convert to a homogeneous equation (with no intercept term) by taking $v_{k} = u_{k} - \frac{α_{0}}{1 - α_{1}}$ : $v_{k} = α_{1} v_{k - 1} .$
So $v_{k} = α_{1}^{k} v_{0} \Rightarrow u_{k} = \frac{1 - α_{1}^{k}}{1 - α_{1}} α_{0} + α_{1}^{k} u_{0} .$

$| α_{1} | < 1$ : $u_{k}$ converges exponentially to $\frac{α_{0}}{1 - α_{1}}$ ;
$| α_{1} | > 1$ : explodes to infinity exponentially;
$α_{1} = 1$ : $u_{k} = k α_{0} + u_{0}$ ;
$α_{1} = - 1$ : $u_{k}$ oscillates between $u_{0}$ and $α_{0} - u_{0}$ .

3.2 General Case: Bayesian Approach

In the Bayesian context, prediction is done via joint probability distribution of $y_{n + 1}, \dots, y_{n + k}$ conditional on $y_{1}, \dots, y_{n}$ . Now consider conditional expectations: $\begin{matrix} (3.3) & E (y_{n + i} | y_{1}, \dots, y_{n}) = \int E (y_{n + i} | y_{1}, \dots, y_{n}, θ) f_{θ | y_{1}, \dots, y_{n}} (θ) d θ . \end{matrix}$
First calculate ${\hat{y}}_{n + i} (θ) = E (y_{n + i} | y_{1}, \dots, y_{n}, θ)$ for fixed $θ$ : $\begin{matrix} (3.4) & {\hat{y}}_{n + i} (θ) = ϕ_{0} + ϕ_{1} {\hat{y}}_{n + i - 1} (θ) + \dots + ϕ_{p} {\hat{y}}_{n + i - p} (θ) . \end{matrix}$
If we initialize this with ${\hat{y}}_{j} (θ) = y_{j}$ , $j = n, n - 1, \dots, n + 1 - p$ , then (3.4) can be evaluated in sequence for $i = 1, 2, \dots$ .

Now (3.3) becomes $E (y_{n + i} | y_{1}, \dots, y_{n}) = \int {\hat{y}}_{n + i} (θ) f_{θ | y_{1}, \dots, y_{n}} (θ) d θ .$
We can do one of two things to compute:

Generate posterior samples $θ^{(1)}, \dots, θ^{(N)}$ from $f_{θ | y_{1}, \dots, y_{n}} (θ)$ , then $E (y_{n + i} | y_{1}, \dots, y_{n}) \approx \frac{1}{N} \sum_{j = 1}^{N} {\hat{y}}_{n + i} (θ^{(j)}) .$
Use the fact that $f_{θ | y_{1}, \dots, y_{n}} (θ)$ is usually highly concentrated around $\hat{θ} = (\hat{β}, \hat{σ})$ . We then ignore the small uncertainty of $θ$ around $\hat{θ}$ : $E (y_{n + i} | y_{1}, \dots, y_{n}) \approx {\hat{y}}_{n + i} (\hat{θ}) .$