15 ARIMA and SARIMA Models

1 ARMA(p, q) Model Rewrite

Recall ARMA model and its backshift notation. We rewrite ϕ(B)(ytμ)=θ(B)εt to ϕ(B)yt=δ+θ(B)εt and try to determine δ. We first write ytμ=θ(B)ϕ(B)εt. Factorize ϕ(z) as ϕ(z)=(1a1z)(1apz). Here a11,,ap1 are roots of ϕ(z). Then ytμ=θ(B)k=1p(1akB)εt=θ(B)(1a1B)1(1apB)1εt.
Recall expansion in (2.4), if all |ak|<1, rewrite yt=μ+j=0ψjεtj. This is a causal stationary process.

Since PACF and ACF have not a clear cut off, we usually use AIC, BIC to determine p,q.

2 Box-Jenkins Time Series Modeling Strategy

The idea is

To implement the first idea, we usually have two ways:

3 ARIMA Models

ARIMA (AutoRegressive Integrated Moving Average)

yt is ARIMA(p,d,q) if ϕ(B)((dyt)μ)=θ(B)εt, where εti.i.dN(0,σ2).

3.1 Seasonal ARMA Models

We say {yt} is a seasonal ARMA(P,Q) with period s, if Φ(Bs)(ytμ)=Θ(Bs)εt, where Φ(Bs)=1Φ1BsΦPBPs,Θ(Bs)=1+Θ1Bs++ΘQBQs.
This is a special case of an ARMA(Ps,Qs) model. But it has P+Q+1 parameters (1 for σ2) while a general ARMA(Ps,Qs) has Ps+Qs+1.

The ACF and PACF of seasonal ARMA are non-zero only at seasonal lags h=0.s,2s,. At seasonal lags, PACF and ACF behave just like unseasonal ARMA model: Φ(B)Xt=Θ(B)εt.

3.2 Multiplicative Seasonal ARMA Models

Multiplicative Seasonal ARMA Model

ARMA(p,q)×(P,Q)s: Φ(Bs)ϕ(B)(ytμ)=Θ(Bs)θ(B)εt.

For a dataset that might have sample autocorrelations nonnegligile at lags 0,1,11,12,13 (like co2 dataset), we can use this to reduce parameter.

4 SARIMA Models

SARIMA model

ARIMA(p,d,q)×(P,D,Q)s: Φ(Bs)ϕ(B)sDd(ytμ)=δ+Θ(Bs)θ(B)εt. Recall sd=(1Bs)d and d=(1B)d.

5 Parameter Estimation in MA(1)

Estimating parameters of ARMA/ARIMA/SARIMA is much harder than AR models. We'll illustrate the difficulty using the example of MA(1). Recall from here, MA(1) is given by (5.1)yt=μ+εt+θεt1=μ+θ(B)εt,θ(B)=1+θB,εti.i.dN(0,σ2).
The joint density of y1,,yn is multivariate normal with mean m=(μ,,μ)T and covariance matrix Σ: Σ(i,j)={σ2(1+θ2),i=j,σ2θ,|ij|=1,0,else.
The likelihood is (12π)n(detΣ)12exp(12(ym)TΣ1(ym)), where y=(y1,,yn)T. This is a function of μ,θ,σ, which can be estimated by maximizing the logarithm of the likelihood. Σ1 makes this computationally expensive. We should use some approximation to skip inverse.
An alternative approach is to find a connection to AR models. We can convert to εt=1θ(B)(ytμ)=(1θB+θ2B2θ3B3+)(ytμ), so that ytθyt1+θ2yt2θ3yt3+=μ1+θ+εt. This requires |θ|<1. For this AR model, the likelihood is (12πσ)nexp(12σ2t=1n(ytμ1+θθyt1+θ2yt2θ3yt3+)2). This involves y0,y1,y2, for which we have no data. We can simply let them to be 0. Now it becomes (12πσ)nexp(S(μ,θ)2σ2), where S(μ,θ)=(y1μ1+θ)2+(y2μ1+θθy1)2++(ynμ1+θθyn1+θ2yn2+(1)n1θn1y1)2.
The MLE of μ,θ comes from minimizeμ^,θ^S(μ,θ).
This is a nonlinear minimization that can be done in packages in Python like scipy. It's easy to see that σ^=S(μ^,θ^)n.

For uncertainty quantification, we can take a Bayesian approach. First assume prior θUniform(1,1),μUniform(C,C),logσUniform(C,C) for a large C. Note that we restrict |θ|<1.
The posterior is then fμ,θ,σ|data(μ,θ,σ)(12πσ)nexp(S(μ,θ)2σ2)×1σ1{1<θ<1,C<μ,logσ<C}nn1exp(S(μ,θ)2σ2)1{1<θ<1,C<μ,logσ<C}. To obtain the posterior of μ,θ alone, we integrate the above w.r.t σ. Then we have fμ,θ|data(μ,θ)(1S(μ,θ))n21{1<θ<1,C<μ<C}.
This can be evaluated numerically over a grid of μ,θ and then approximated. Or, we can approximate with a suitable t distribution by doing a Taylor expansion of S(μ,θ) near μ^,θ^. I.e., let α=(μ,θ) and α^=(μ^,θ^): S(α)=S(α^)+S(α^),αα^+(αα^)T(12HS(α^))(αα^)=S(α^)+(αα^)T(12HS(α^))(αα^), where we used S(α^)=0 because α^ is a minimizer, and HS(α^) is the Hessian of S. Thereforefμ,θ|data(μ,θ)(1S(μ,θ))n21{1<θ<1,C<μ<C}(S(α^)S(α))n21{1<θ<1,C<μ<C}=(S(α^)S(α^)+(αα^)T(12HS(α^))(αα^))n21{1<θ<1,C<μ<C}=(11+(αα^)T(12S(α^)HS(α^))(αα^))n21{1<θ<1,C<μ<C}=(11+1n2(αα^)T(n22S(α^)HS(α^))(αα^))n2+221{1<θ<1,C<μ<C}
Comparing with (11+1k(xm)TΣ1(xm))k+p2 for the p-variate t-density tk,p(μ,Σ), we see that α|datatn2,2(α^,S(α^)n2(12HS(α^))1).