11 AR Models

1 AR Models

AR Model

The AR (Auto-Regression) model of order p (AR(p)) is given by (1.1)yt=ϕ0+ϕ1yt1++ϕpytp+εt for t=p+1,,n. In matrix notation, Y=Xβ+ε, where Y=(yp+1yn),X=(1ypy1 1yn1ynp),β=(ϕ0ϕp),ε=(εp+1εn).

yt is regressed on its own lagged values yt1,,ytp.

For predicting yn+1, plug t=n+1 into (1.1): yn+1=ϕ^0+ϕ^1yn++ϕ^pyn+1p.
Note that yn,yn1,,yn+1p are all observed and they are the last p observations. Then for yn+2, yn+2=ϕ^0+ϕ^1yn+1++ϕ^pyn+2p.
Here yn+1 is not observed, but we can replace with predicted value y^n+1. And recursively, predict value for any time point.

We can see AR as simply regression of the observed time series on lagged versions of itself.

2 Relation to MLE

Start with AR(1): (2.1)yt=ϕ0+ϕ1yt1+εt,t=2,,n.

2.1 Usual Regression

(2) looks just like (2.2)yi=β0+β1xi+εi. For this, given independence, Likelihood=fx1,y1,,xn,yn|θ(x1,y1,,xn,yn)=i=1nfxi,yi|θ(xi,yi)=i=1nfyi|xi,θ(yi)fxi|θ(xi).
Now replace yi by β0+β1xi+εi: Likelihood=i=1nfβ0+β1xi+εi|xi,θ(yi)fxi|θ(xi)=i=1nfεi|xi,θ(yiβ0β1xi)fxi|θ(xi)=i=1mfεi|θ(yiβ0β1xi)fxi|θ(xi)=i=1m12πσexp((yiβ0β1xi)2σ2)fxi|θ(xi)=(12πσ)mexp(12σ2i=1m(yiβ0β1xi)2)i=1mfxi|θ(xi)(12πσ)mexp(12σ2i=1m(yiβ0β1xi)2).
To sum up, here are the assumptions we made:

2.2 Back to AR(1)

When we try to apply (3) to (2), these assumptions don't hold:

So now given y1,,yn, Likelihood for (2)=fy1,,yn|θ(y1,,yn)=fy1|θ(y1)fy2|y1,θ(y2)fyn|y1,,yn1,θ(yn)=fy1|θ(y1)t=2nfyt|y1,,yt1,θ(yt)=fy1|θ(y1)t=2nfεt|y1,,yt1,θ(ytϕ0ϕ1yt1)=fy1|θ(y1)t=2nfεt(ytϕ0ϕ1yt1)=fy1|θ(y1)t=2n12πσexp(12σ2(ytϕ0ϕ1yt1)2)=fy1|θ(y1)(12πσ)n1exp(12σ2t=2n(ytϕ0ϕ1yt1)2).
Here we assume

2.2.1 Computation of fy1|θ(y1)

We can't get fy1|θ(y1) from equation. There are two approaches to get it.

First, we simply assume fy1|θ(y1) does not depend on θ. So (2.3)Likelihood(12πσ)n1exp(12σ2t=2n(ytϕ0ϕ1yt1)2).
It's easy to verify that the estimates ϕ^0,ϕ^1 are identical to those obtained in (2.1). We call this Conditional MLE.

Second, we extend the model to t=1,0,1,2,. So y1=ϕ0+ϕ1y0+ε1=ϕ0+ϕ1(ϕ0+ϕ1y1+ε0)+ε1=ϕ0(1+ϕ1)+ϕ12y1+ϕ1ε0+ε1=ϕ0(1+ϕ1)+ϕ12(ϕ0+ϕ1y2+ε1)+ϕ1ε0+ε1=ϕ0(1+ϕ1+ϕ12)+ϕ13y2+ϕ12ε1+ϕ1ε0+ε1==ϕ0j=0Mϕ1j+ϕ1M+1yM+j=0Mϕ1jε1j.
If |ϕ1|<1, coefficient ϕ1M+1 is very small, so y1ϕ0j=0Mϕ1j+j=0Mϕ1jε1jϕ0j=0ϕ1j+j=0ϕ1jε1j=ϕ01ϕ1+j=0ϕ1jε1j.
Since E(j=0ϕ1jε1j)=0, and Var(j=0ϕ1jε1j)=j=0Var(ϕ1jε1j)=j=0ϕ12jVar(ε1j)=σ2j=0ϕ12j=σ21ϕ12.
Thus when |ϕ1|<1, y1N(ϕ01ϕ1,σ21ϕ12), so fy1|θ(y1)=1ϕ122πσexp(1ϕ122σ2(y1ϕ01ϕ1)2), so finally (5)Likelihood=1ϕ12(2πσ)nexp(1ϕ122σ2(y1ϕ01ϕ1)212σ2t=2n(ytϕ0ϕ1yt1)2).
Now compare (4) and (5). (5) is referred to as full likelihood for AR(1) and therefore full MLE. (4) and (5) will be quite close when |ϕ1|<1 and n is large.

2.3 AR(p)

The AR(p) model is given by yt=ϕ0+ϕ1yt1++ϕpytp+εt.
The likelihood is fy1,,yn|θ(y1,,yn)=fyp+1,,yn|y1,,yp,θ(yp+1,,yn)fy1,,yp|θ(y1,,yp).
The conditional likelihood is fyp+1,,yn|y1,,yp,θ(yp+1,,yn)=t=p+1nfyt|yt1,,y1(yt)=t=p+1nfεt|yt1,,y1(ytϕ0ϕ1yt1ϕpytp)=(12πσ)npexp(12σ2t=p+1n(ytϕ0ϕ1yt1ϕpytp)2).
Here we assume εt|yt1,,y1N(0,σ2),t=p+1,,n.

To obtain the parameter estimates, we can directly maximize the likelihood. And since fy1,,yp|θ(y1,,yp) does not depend on θ, it is equivalent to maximizing the conditional likelihood.

2.3.1 Bayesian Approach

However, if we want to derive fy1,,yp|θ(y1,,yp) in a more principled way, we have to use (1.1) for smaller values of t but it's complicated and not really worth it. We can also work under some "stationarity" assumptions on ϕ0,,ϕp (much simpler than conditional likelihood): use matrix notation (see here), and likelihoodθ(12πσ)n1exp(||YXβ||22σ2). Assume ϕ0,,ϕp,logσi.i.dUnif(C,C), then by here, β|datatn2p1,p+1(β^,σ^2(XTX)1), where β^=(XTX)1XTY,σ^=||YXβ^||2n2p1. If inference for σ is desired, we can use ||YXβ^||2σ2|dataχn2p12.

Bayesian inference for AR is identical to linear regression models because of the same likelihood. Bayesian inference only cares about the likelihood.
Frequentist inference is based on MLE, given by β^ and σ^MLE=||YXβ^||2np. The analysis is quite different from linear regression. The results are slightly different but close.

3 Predictions and Difference Equations

Given a fitted AR(p) model with ϕ^0,,ϕ^p, predictions y^n+i for i=1,2, are obtained by: (3.1)y^n+i=ϕ^0+ϕ^1y^n+i1++ϕ^py^n+ip,i=1,2, where the recursion is initialized with y^j=yj,j=n,n1,,n+1p.
Or we can rewrite to (3.2)uk=α0+α1uk1++αpukp,k=p,p+1, (Initialized by u0,,up1). (3.2) is called a difference equation of order p.

3.1 First Order (p=1)

Now (3.2) becomes uk=α0+α1uk1 along with initialized u0. Convert to a homogeneous equation (with no intercept term) by taking vk=ukα01α1: vk=α1vk1.
So vk=α1kv0uk=1α1k1α1α0+α1ku0.

3.2 General Case: Bayesian Approach

In the Bayesian context, prediction is done via joint probability distribution of yn+1,,yn+k conditional on y1,,yn. Now consider conditional expectations: (3.3)E(yn+i|y1,,yn)=E(yn+i|y1,,yn,θ)fθ|y1,,yn(θ)dθ.
First calculate y^n+i(θ)=E(yn+i|y1,,yn,θ) for fixed θ: (3.4)y^n+i(θ)=ϕ0+ϕ1y^n+i1(θ)++ϕpy^n+ip(θ).
If we initialize this with y^j(θ)=yj, j=n,n1,,n+1p, then (3.4) can be evaluated in sequence for i=1,2,.

Now (3.3) becomes E(yn+i|y1,,yn)=y^n+i(θ)fθ|y1,,yn(θ)dθ.
We can do one of two things to compute:

  1. Generate posterior samples θ(1),,θ(N) from fθ|y1,,yn(θ), then E(yn+i|y1,,yn)1Nj=1Ny^n+i(θ(j)).
  2. Use the fact that fθ|y1,,yn(θ) is usually highly concentrated around θ^=(β^,σ^). We then ignore the small uncertainty of θ around θ^: E(yn+i|y1,,yn)y^n+i(θ^).