13 Stationarity, MA Models, ACF & PACF

1 Time Series Models, Stationarity

Examples

Let $y_{t} = β_{0} + β_{1} t + ε_{t}$ . Then $E y_{t} = β_{0} + β_{1} t, Var (y_{t}) = σ^{2}, Cov (y_{t_{1}}, y_{t_{2}}) = 0.$
Let $y_{t} = β_{0} + β_{1} \cos (2 π f t) + β_{2} \sin (2 π f t) + ε_{t}$ . Then $E y_{t} = β_{0} + β_{1} \cos (2 π f t) + β_{2} \sin (2 π f t), Var (y_{t}) = σ^{2}, Cov (y_{t_{1}}, y_{t_{2}}) = 0.$
Let $y_{t} = β_{0} + \sum_{j = 1}^{m} (β_{1 j} \cos \frac{2 π j t}{n} + β_{2 j} \sin \frac{2 π j t}{n})$ , with $β_{1 j}, β_{2 j} \overset{i . i . d}{\sim} N (0, τ_{j}^{2})$ . So $E y_{t} = β_{0}$ , and $\begin{aligned} Cov (y_{t_{1}}, y_{t_{2}}) & = \sum_{j = 1}^{m} [Cov (β_{1 j} \cos \frac{2 π j t_{1}}{n}, β_{1 j} \cos \frac{2 π j t_{2}}{n}) + Cov (β_{2 j} \sin \frac{2 π j t_{1}}{n}, β_{2 j} \sin \frac{2 π j t_{2}}{n})] \\ = \sum_{j = 1}^{m} [τ_{j}^{2} \cos \frac{2 π j t_{1}}{n} \cos \frac{2 π j t_{2}}{n} + τ_{j}^{2} \sin \frac{2 π j t_{1}}{n} \sin \frac{2 π j t_{2}}{n}] \\ = \sum_{j = 1}^{m} τ_{j}^{2} \cos \frac{2 π j | t_{1} - t_{2} |}{n} . \end{aligned}$

Stationarity

A doubly infinite sequence of random variables $y_{t}$ is said to be stationary if all of the following conditions hold:
4. $E y_{t}$ is the same for all times $t$ .
5. $Var (y_{t})$ is the same for all times $t$ .
6. $Cov (y_{t_{1}}, y_{t_{2}})$ only depends on $| t_{1} - t_{2} |$ .

For a stationary ${y_{t}}$ , we can define $γ (h) = Cov (y_{t}, y_{t + h})$ . Call $γ (h)$ the ACVF (AutoCovariance Function). Observe that $γ (0) = Cov (y_{t}, y_{t}), γ (h) = γ (- h),$ so $γ (h)$ is a symmetric function of $h$ , so we only consider nonnegative $h$ .

Define ACF (AutoCorrelation Function): $ρ (h) = \frac{Cov (y_{t}, y_{t + h})}{\sqrt{Var (y_{t}) Var (y_{t + h})}} = \frac{γ (h)}{γ (0)} .$
So $ρ (0) = 1, ρ (h) = ρ (- h)$ .

Note that

Stationarity refers to the model not the data.

Not all time series models are stationary.

ACVF and ACF are only defined for stationary models.

(Gaussian) White Noise Model

$y_{t} = ε_{t}, ε_{t} \overset{i . i . d}{\sim} N (0, σ^{2})$ . It's easy to check that $E y_{t} = 0, γ (h) = σ^{2} 1 {h = 0}, ρ (h) = 1 {h = 0} .$

2 MA Models

The Moving Average Model (MA) with order $q$ is defined by $\begin{matrix} (2.1) & y_{t} = μ + ε_{t} + θ_{1} ε_{t - 1} + \dots + θ_{q} ε_{t - q}, \end{matrix}$ where $ε_{t} \overset{i . i . d}{\sim} N (0, σ^{2})$ . Denote as $MA (q)$ . There are $q + 2$ unknown parameters: $μ, θ_{1}, \dots, θ_{q}, σ$ .

For MA model $\begin{aligned} Cov (y_{t}, y_{t + h}) \\ = & Cov (μ + \sum_{j = 0}^{q} θ_{j} ε_{t - j}, μ + \sum_{k = 0}^{q} θ_{k} ε_{t + h - k}) \\ = & \sum_{j = 0}^{q} \sum_{k = 0}^{q} θ_{j} θ_{k} Cov (ε_{t - j}, ε_{t + h - k}) . \end{aligned}$
(take $θ_{0} = 1$ ). Since ${ε_{t}}$ is Gaussian white noise, $Cov (ε_{t - j}, ε_{t + h - k}) = 0$ , unless $t - j = t + h - k \Rightarrow k = j + h$ . So we need $0 \leq j \leq q, 0 \leq k \leq q, k = j + h$ . Then $Cov (y_{t}, y_{t + h}) = {\begin{aligned} σ^{2} \sum_{j = 0}^{q - h} θ_{j} θ_{j + h}, 0 \leq h \leq q . \\ 0, h > q . \end{aligned}$
It does not depend on $t$ , so $MA (q)$ is stationary, and $Cov (y_{t}, y_{t + h}) = γ (h)$ , and $ρ (h) = {\begin{aligned} \frac{\sum_{j = 0}^{q - h} θ_{j} θ_{j + h}}{\sum_{j = 0}^{q} θ_{j}^{2}}, 0 \leq h \leq q, \\ 0, h > q . \end{aligned}$
For $MA (1)$ , $y_{t} = μ + ε_{t} + θ ε_{t - 1}$ , and $ρ (h) = {\begin{aligned} 1, h = 0, \\ \frac{θ_{1}}{1 + θ_{1}^{2}}, h = 1, \\ 0, h > 1. \end{aligned}$

3 Sample ACF

For fixed $h$ , the sample ACF at lag $h$ is defined as: $\frac{\sum_{t = 1}^{n - h} (a_{t} - \overset{―}{a}) (b_{t} - \overset{―}{b})}{\sqrt{\sum_{t = 1}^{n - h} (a_{t} - \overset{―}{a})^{2} \sum_{t = 1}^{n - h} (b_{t} - \overset{―}{b})^{2}}} = \frac{\sum_{t = 1}^{n - h} (y_{t} - \overset{―}{a}) (y_{t + h} - \overset{―}{b})}{\sqrt{\sum_{t = 1}^{n - h} (y_{t} - \overset{―}{a})^{2} \sum_{t = 1}^{n - h} (y_{n + h} - \overset{―}{b})^{2}}},$ where $\overset{―}{a} = \frac{1}{n - h} \sum_{t = 1}^{n - h} y_{t}, \overset{―}{b} = \frac{1}{n - h} \sum_{t = 1}^{n - h} y_{t + h} .$
We can simplify by $\overset{―}{a} \approx \overset{―}{y}, \overset{―}{b} \approx \overset{―}{y}$ , and $\sum_{t = 1}^{n - h} (y_{t} - \overset{―}{a})^{2} \approx \sum_{t = 1}^{n} (y_{t} - \overset{―}{y})^{2}, \sum_{t = 1}^{n - h} (y_{t + h} - \overset{―}{b})^{2} \approx \sum_{t = 1}^{n} (y_{t} - \overset{―}{y})^{2} .$ (reasonable when $h$ is small compared to $n$ ). Then define sample ACF: $r_{h} = \frac{\sum_{t = 1}^{n - h} (y_{t} - \overset{―}{y}) (y_{t + h} - \overset{―}{y})}{\sum_{t = 1}^{n} (y_{t} - \overset{―}{y})^{2}}, h = 0, 1, 2, \dots$
Note that $r_{0} = 1$ .

Although sample ACF can be computed for any time series, it is only useful for stationary ones.

Sample ACF is useful in determining $q$ : sample ACF after lag $q$ is very small/close to 0.

4 Sample PACF

Define sample PACF (Partial AutoCorrelation) of $h$ as ${\hat{ϕ}}_{h}$ (estimate of $ϕ_{h}$ ) when $AR (h)$ is fit to the data.

Sample PACF is useful in determining $p$ in $AR (p)$ : sample PACF after $p$ is very small/close to 0.

Why PACF? Suppose we have data $(x_{1}, y_{1}), \dots, (x_{n}, y_{n})$ . The correlation is then $Corr (x, y) = \frac{\sum_{i = 1}^{n} (x_{i} - \overset{―}{x}) (y_{i} - \overset{―}{y})}{\sqrt{\sum_{i = 1}^{n} (x_{i} - \overset{―}{x})^{2} \sum_{i = 1}^{n} (y_{i} - \overset{―}{y})^{2}}} .$
Under usual OLS (see here), ${\hat{β}}_{1} = \frac{\sum_{i = 1}^{n} (x_{i} - \overset{―}{x}) (y_{i} - \overset{―}{y})}{\sum_{i = 1}^{n} (x_{i} - \overset{―}{x})^{2}} = Corr (x, y) \sqrt{\frac{Var (y)}{Var (x)}} .$
Now we also have data on other variables $z_{1}, \dots, z_{k}$ , and dataset becomes $(y_{i}, x_{i}, z_{i 1}, \dots, z_{i k}), i = 1, \dots, n$ . The partial correlation between $x, y$ given $z_{1}, \dots, z_{k}$ is given by $Corr (x, y | z_{1}, \dots, z_{k})$ : define residual of $x$ given $z_{1}, \dots, z_{k}$ as the residual in linear regression $e_{i}^{x | z_{1}, \dots, z_{k}} = x_{i} - {\hat{β}}_{0}^{x} - {\hat{β}}_{1}^{x} z_{i 1} - \dots - {\hat{β}}_{k}^{x} z_{i k},$ and $Corr (x, y | z_{1}, \dots, z_{k}) = Corr (e^{x | z_{1}, \dots, z_{k}}, e^{y | z_{1}, \dots, z_{k}}) .$
And for multiple linear regression, denote coefficients (RSS minimizer) as ${\hat{β}}_{0}, {\hat{β}}_{x}, {\hat{β}}_{1}, \dots, {\hat{β}}_{k}$ , then ${\hat{β}}_{x} = Corr (x, y | z_{1}, \dots, z_{k}) \sqrt{\frac{Var (e^{y | z_{1}, \dots, z_{k}})}{Var (e^{x | z_{1}, \dots, z_{k}})}} .$
Now for time series setting with $y_{1}, \dots, y_{n}$ with AR(p), we can write ${\hat{ϕ}}_{p} = Corr (y_{t - p}, y_{t} | y_{t - 1}, \dots, y_{t - p + 1}) \sqrt{\frac{Var (e^{y_{t} | y_{t - 1}, \dots, y_{t - p + 1}})}{Var (e^{y_{t - p} | y_{t - 1}, \dots, y_{t - p + 1}})}} .$
When $AR (p)$ is stationary, ${\hat{ϕ}}_{p} \approx Corr (y_{t - p}, y_{t} | y_{t - 1}, \dots, y_{t - p + 1}) .$