R packages:
# Install necessary packages
list_packages <- c("forecast", "readxl", "stargazer", "fpp",
"fpp2", "scales", "quantmod", "urca",
"vars", "tseries", "ggplot2", "dplyr")
new_packages <- list_packages[!(list_packages %in% installed.packages()[,"Package"])]
if(length(new_packages)) install.packages(new_packages)
# Load necessary packages
lapply(list_packages, require, character.only = TRUE)
After fitting models, the next step is to provide forecasting.
Before we introduce the forecast formula, let’s go over some notations.
\(z_n\): last observation, \(z_1,...,z_n\) are observed
\(t=n\): forecast origin
\(z_{n+l}\): value of the time series at lead time \(l\), not observed
\(\hat{z}_{n}(l)\): forecast of \(z_{n+l}\) at forecast origin \(n\).
\(\mathbb{E}_n[\cdot]\): conditional expectation given all observations up to forecast origin \(n\) (including forecast origin \(n\)), i.e., \(\mathbb{E}[\cdot|z_n,z_{n-1}...]\).
Definition Best forecast of \(z_{n+l}\) is the forecast \(\hat{z}_{n}(l)\) which minimizes the expected mean square error, \(\mathbb{E}_n[(Z_{n+l} - \hat{z}_{n}(l))^2]\).
Based on such a definition, we can prove that \(\mathbb{E}_n[Z_{n+l}]\) is the best forecast of \(z_{n+l}\), i.e., \(\mathbb{E}_n[Z_{n+l}]=\mathbb{E}[Z_{n+l}|z_n,z_{n-1}...]\) is the best forecast.
So from now on, we let our forecast to be \(\hat{z}_{n}(l) = \mathbb{E}_n[Z_{n+l}]=\mathbb{E}[Z_{n+l}|z_n,z_{n-1}...]\)
\[\begin{align*} \tilde{Z}_t &= \phi_1 \tilde{Z}_{t-1} + a_t\\ Z_{n+1} &= \mu + \phi_1 (Z_{n} - \mu ) + a_{n+1} \end{align*}\] Observe \(z_1, ..., z_n\). Our forecast is \[\begin{align*} \hat{z}_{n}(1) &= \mathbb{E}_n [ Z_{n+1} ]\\ & = \mu + \phi_1 (\mathbb{E}_n [Z_n] - \mu ) + \mathbb{E}_n[a_{n+1}]\\ & = \mu + \phi_1 (z_n - \mu ) + 0 \end{align*}\]
By assumption \(\mathbb{E}_n[a_{n+1}]=0\), because \(a_{n+1}\) takes place after \(z_n, z_{n-1},...\).
\[\begin{align*} \hat{z}_{n}(1) &= \mu + \phi_1 (z_n - \mu ) \end{align*}\]
Since \(\mu\) and \(\phi_1\) are unknown, so to forecast, we replace parameters by estimates, \(\hat{\mu}\) and \(\hat{\phi}_1\) and our final forecast is
\[\begin{align*} \hat{z}_{n}(1) &= \hat{\mu} + \hat{\phi}_1 (z_n - \hat{\mu} ) \end{align*}\]
We can see that \[\begin{align*} \hat{z}_{n}(2) &= \mathbb{E}_n [ Z_{n+2} ]\\ & = \mu + \phi_1 (\mathbb{E}_n [Z_{n+1}] - \mu ) + \mathbb{E}_n[a_{n+2}]\\ & = \mu + \phi_1 (\hat{z}_{n}(1) - \mu )\\ &= \mu + \phi_1^2 (z_n - \mu ) \end{align*}\]
In general,
\[\begin{align*} \hat{z}_{n}(l) &= \mu + \phi_1^l (z_n - \mu ) \end{align*}\]
for \(l=1,2,...\)
Note that for stationary AR(1), \(\phi_1^l \to 0\) when \(l \to \infty\) because \(|\phi_1|<1\).
\[ \tilde{Z}_t = a_t - \theta_1 a_{t-1} \]
Case 1: \(l>1\)
\[ Z_{n+l} = \mu + a_{n+l} - \theta_1 a_{n+l-1}\\ \hat{z}_{n}(l) = \mathbb{E}_n(Z_{n+l}) = \mu + 0 - 0 = \mu \]
Case 2: \(l=1\)
Since \(Z_{n+1} = \mu + a_{n+1} - \theta_1 a_{n}\), we have \[ \hat{z}_{n}(1) = \mathbb{E}_n(Z_{n+1}) = \mu + 0 - \theta_1 \mathbb{E}_n(a_{n}) \]
Since \(a_n\) depends on current and past observations \(z_t\)’s, \(\mathbb{E}_n(a_n)\) is not 0 any more. Thus we estimate \(\mathbb{E}_n(a_n)\) by \(\hat{a}_n = z_n - \hat{z}_n =\) residual at time \(n\).
Therefore,
\[ \hat{z}_{n}(1) = \mu + 0 - \theta_1 \hat{a}_n \] To get a actual forecast, we plug in \(\hat{\mu}\) and \(\hat{\theta}_1\).
A general forecast formula of ARIMA
Suppose \(Z_t \sim ARIMA(p,d,q)\), which can be considered as \(Z_t \sim ARMA(p+d,q)\) (nonstationary)
\[\begin{align*} Z_t &= C + \Phi_1 Z_{t-1} + ... + \Phi_{p+d} Z_{t-p-d} + a_t - \theta_1 a_{t-1} ... - \theta_q a_{t-q} \end{align*}\]
\[\begin{align*} Z_{n+l} &= C + \Phi_1 Z_{n+l-1} + ... + \Phi_{p+d} Z_{n+l-p-d} + a_t - \theta_1 a_{n+l-1} ... - \theta_q a_{n+l-q} \end{align*}\]
The general formula \[\begin{align*} \hat{z}_{n}(l) =& \mathbb{E}[Z_{n+l}|\mathcal{F}_n] \\ =& C + \Phi_1 \mathbb{E}[Z_{n+l-1}|\mathcal{F}_n] + ... + \Phi_{p+d} \mathbb{E}[Z_{n+l-p-d}|\mathcal{F}_n] + \\ &\mathbb{E}[a_{n+l}|\mathcal{F}_n] - \theta_1 \mathbb{E}[a_{n+l-1}|\mathcal{F}_n] - ... - \theta_q \mathbb{E}[a_{n+l-q}|\mathcal{F}_n] \end{align*}\]
\[\begin{align*} \mathbb{E}[Z_{s}|\mathcal{F}_n]=\begin{cases} z_s, \quad \quad \text{ if } s \leq n\\ \hat{z}_n(s-n), \quad \text{ if } s > n \end{cases} \end{align*}\]
\[\begin{align*} \mathbb{E}[a_{s}|\mathcal{F}_n]=\begin{cases} \hat{a}_s=z_s - \hat{z}_{s-1}(1), \quad \text{ if } s \leq n\\ 0, \quad \quad \text{ if } s >n \end{cases} \end{align*}\]
Define
\[\begin{align*} \hat{Z}_{n+l}=\begin{cases} \hat{z}_n(l), \quad \quad \text{ if } l \geq 1\\ z_{n+l}, \quad \text{ if } l < 0 \end{cases} \end{align*}\]
\[\begin{align*} [a_{s}]=\begin{cases} \hat{a}_s, \quad \text{ if } s \leq n\\ 0, \quad \quad \text{ if } s > n \end{cases} \end{align*}\]
Then \[ \hat{z}_n(l) = C + \phi_1 \hat{Z}_{n+l-1} + ... + \phi_{p+d} \hat{Z}_{n+l-p-d} - \theta_1 [a_{n+l-1}] - ... - \theta_q [a_{n+l-q}] \]
Let’s use AR(1) as an example.
\[\begin{align*} \hat{z}_n(l)&=\mathbb{E}_n[Z_{n+l}]\\ &=\mathbb{E}_n[u + \phi_1 (Z_{n+l-1}-u) + a_{n+l}]\\ &=\mathbb{E}_n[u + \phi_1^2 (Z_{n+l-2}-u) + \phi_1 a_{n+l-1} + a_{n+l}]\\ & ...\\ &=\mathbb{E}_n[\mu + a_{n+l} + \phi_1 a_{n+l-1} + \phi_1^2 a_{n+l-2} ... ]\\ &=\mu + 0 ... + 0 + \phi_1^l a_n + \phi_1^{(l+1)} a_{n-1} ... \end{align*}\] Note that \(|\phi_1|<1\) for stationary time series model. Therefore, \(\phi_1^l \to 0\) as \(l \to \infty\). Therefore, the limit of the forecast is
\[\begin{align*} \lim_{l \to \infty} \hat{z}_n(l) = \mu + \lim_{l \to \infty} (\phi_1^l \hat{a}_n + \phi_1^{(l+1)} \hat{a}_{n-1} ...) = \mu \end{align*}\]
Therefore, the forecast converges to the mean for the stationary time series model. This result holds for AR(1), and it is generally true for all stationary AR, MA, ARMA models.
The forecast for nonstationary time series model do no revert to a fixed mean.
Consider a random walk \(Z_t=Z_{t-1}+a_t\), we have
\[\begin{align*} \hat{z}_n(l)&= \begin{cases} z_n, \quad l=1\\ \hat{z}_n(l-1), \quad \quad l > 1 \end{cases} \end{align*}\]
So, we combine the cases and have
\[ \hat{z}_n(l) = z_n \text{ for }l \geq 1 \]
Thus, different realization of the same random walk model will result in forecasts converge to different values.
On the other hand, different realizations of the same stationary time series model will always produce the forecasts which converge to the same constant, \(\mu\).
Next, consider the limit of the variance of the forecast error of a stationary time series model. Let’s take AR(1) as an example.
\[ \lim_{l \to \infty} \text{Var}(Z_{n+l} - \hat{z}_{n}(l))\\ =\lim_{l \to \infty} \sigma_a^2(1+\phi_1^2+(\phi_1^2)^2+(\phi_1^{l-1})^2+...)\\ =\sigma_a^2 \sum_{l=0}^{\infty} (\phi_1^2)^l \]
Thus the prediction “bands”about the forecast profile become parallel lines for large lead time (\(l \to \infty\)).
In fact, for AR(1), we can show
\[ \text{Var}(Z_{n+l} - \hat{z}_{n}(l)) = \sigma_a^2(1 + \phi_1^2 + \phi_1^4+\phi_1^{2(l-1)}) \]
On the other hand, consider the limit of the variance of the forecast error for a nonstationary time series, for example, random walk.
\[ (1-B)Z_t = a_t\\ Z_t = (1-B)^{-1} a_t\\ Z_t = (1+B+B^2+B^3...) a_t\\ Z_t = a_t + a_{t-1} + a_{t-2} + ...\\ \] Therefore, the variance becomes
\[ \text{Var}(Z_{n+l} - \hat{z}_n(l)) = \sigma^2_a \sum_{j=0}^{l-1} 1 = \sigma^2_a * l \] So the variance \(\text{Var}(Z_{n+l} - \hat{z}_n(l))\) tends to infinity when \(l \to \infty\), so the prediction band expands.
Case 1: business inventories
## Time Series:
## Start = 1
## End = 60
## Frequency = 1
## [1] 4.4 5.8 6.7 7.1 5.7 4.1 4.6 4.3 2.0 2.2 3.6 -2.2 -5.1 -4.9
## [15] 0.1 4.1 3.8 9.9 0.0 6.5 10.8 4.1 2.7 -2.9 -2.9 1.5 5.7 5.0
## [29] 7.9 6.8 7.1 4.1 5.5 5.1 8.0 5.6 4.5 6.1 6.7 6.1 10.6 8.6
## [43] 11.6 7.6 10.9 14.6 14.5 17.4 11.7 5.8 11.5 11.7 5.0 10.0 8.9 7.1
## [57] 8.3 10.2 13.3 6.2
It seems fitting a stationary model would be sufficient.
##
## Call:
## arima(x = case1, order = c(1, 0, 0))
##
## Coefficients:
## ar1 intercept
## 0.6803 6.0423
## s.e. 0.0914 1.2870
##
## sigma^2 estimated as 10.88: log likelihood = -157.04, aic = 320.08
However, the ADF test shows conflicting results.
##
## Augmented Dickey-Fuller Test
##
## data: case1
## Dickey-Fuller = -3.5868, Lag order = 1, p-value = 0.04192
## alternative hypothesis: stationary
##
## Augmented Dickey-Fuller Test
##
## data: case1
## Dickey-Fuller = -3.2623, Lag order = 2, p-value = 0.08614
## alternative hypothesis: stationary
So we decide to fit a nonstationary model again, and use auto.arima
to fit a third model. We compare their forecast results.
##
## Call:
## arima(x = case1, order = c(0, 1, 0))
##
##
## sigma^2 estimated as 13.07: log likelihood = -159.53, aic = 321.07
## Series: case1
## ARIMA(1,1,1)
##
## Coefficients:
## ar1 ma1
## 0.5017 -0.8767
## s.e. 0.1643 0.0918
##
## sigma^2 estimated as 11.34: log likelihood=-154.63
## AIC=315.26 AICc=315.7 BIC=321.5
Case 6: AT&T stock
We repeat similar analysis as above for this AT&T stock data.
## Time Series:
## Start = 1
## End = 56
## Frequency = 1
## [1] 166.8 172.8 178.3 180.3 182.6 184.2 188.9 184.4 181.7 178.5 177.6
## [12] 181.0 186.5 185.7 186.4 186.3 189.3 190.6 191.7 196.1 189.3 192.6
## [23] 192.1 189.4 189.7 191.9 182.0 175.7 192.0 192.8 193.3 200.2 208.8
## [34] 211.4 214.4 216.3 221.8 217.1 214.0 202.4 191.7 183.9 185.2 194.5
## [45] 195.8 198.0 200.9 199.0 200.6 209.5 208.4 206.7 193.3 197.3 213.7
## [56] 225.1
##
## Augmented Dickey-Fuller Test
##
## data: case6
## Dickey-Fuller = -3.051, Lag order = 3, p-value = 0.1508
## alternative hypothesis: stationary
##
## Call:
## arima(x = case6, order = c(1, 0, 0))
##
## Coefficients:
## ar1 intercept
## 0.9416 194.6908
## s.e. 0.0530 10.8988
##
## sigma^2 estimated as 35.64: log likelihood = -180.61, aic = 367.21
##
## Call:
## arima(x = case6, order = c(0, 1, 0))
##
##
## sigma^2 estimated as 36.12: log likelihood = -176.68, aic = 355.36
## Series: case6
## ARIMA(0,1,1)
##
## Coefficients:
## ma1
## 0.3589
## s.e. 0.1245
##
## sigma^2 estimated as 32.75: log likelihood=-173.55
## AIC=351.1 AICc=351.33 BIC=355.11