# Install necessary packages
list_packages <- c("AER", "dynlm", "tidyverse", "fpp", "fpp2",
"forecast", "readxl", "stargazer", "scales",
"quantmod", "urca", "vars", "tseries", "sarima")
new_packages <- list_packages[!(list_packages %in% installed.packages()[,"Package"])]
if(length(new_packages)) install.packages(new_packages)
# Load necessary packages
lapply(list_packages, require, character.only = TRUE)
So far, we have dealt with univariate time series data, what about multiple variables?
The time series models in the previous chapters allow for the inclusion of information from past observations, but not for the inclusion of other variables.
We considered regression models of the form \[ Z_t=\beta_0 + \beta_1 X_{1,t} + ... \beta_k X_{k,t} + \epsilon_t \]
In other words, \(Z_t\) is a linear combination of \(X_{1,t}\) … \(X_{k,t}\). When \(\epsilon_t\) is assumed to be an uncorrelated error term (i.e., it is white noise), this is a typical multiple linear regression.
We further allow the errors from a regression to contain autocorrelation. To emphasise this change, we replace \(\epsilon_t\) with \(\eta_t\). The error series \(\eta_t\) is assumed to follow an ARIMA model. For example, if \(\eta_t\) follows an ARIMA(1,1,1) model, we can write
\[ Z_t=\beta_0 + \beta_1 X_{1,t} + ... \beta_k X_{k,t} + \eta_t\\ (1-\phi_1 B)(1- B)\eta_t = (1-\theta_1 B) a_t \]
Notice that the model has two error terms here — the error from the regression model, which we denote by \(\eta_t\), and the error from the ARIMA model, which we denote by \(a_t\). Only the ARIMA model errors are assumed to be white noise.
Note that if \(X_{1,t}\),…,\(X_{k,t}\) are excluded from the model, the model becomes an ARIMA model.
We cannot use least square estimate for this model because:
We use maximum likelihood estimation.
An important consideration when estimating a regression with ARMA errors is that all of the variables in the model must first be stationary. Thus, we first have to check that \(Z_t\) and all of the predictors appear to be stationary. If we estimate the model when any of these are non-stationary, the estimated coefficients will not be consistent estimates (and therefore may not be meaningful). One exception to this is the case where non-stationary variables are co-integrated. If there exists a linear combination of the non-stationary \(Z_t\) and the predictors that is stationary, then the estimated coefficients will be consistent. We will explain this later.
It is often desirable to maintain the form of the relationship between \(Z_t\) and the predictors, and consequently it is common to difference all of the variables if any of them need differencing. The resulting model is then called a “model in differences”, as distinct from a “model in levels”, which is what is obtained when the original data are used without differencing.
If the above regression model with ARIMA(1,1,1) errors is differenced we obtain the model
\[ \nabla Z_t=\beta_0 + \beta_1 \nabla X_{1,t} + ... \beta_k \nabla X_{k,t} + \nabla \eta_t\\ (1-\phi_1 B) \nabla \eta_t = (1-\theta_1 B) a_t \]
R function Arima()
will fit a regression model with ARIMA errors if the argument xreg
is used. The order argument specifies the order of the ARIMA error model.
The auto.arima()
function will also handle regression terms via the xreg
argument. The user must specify the predictor variables to include, but auto.arima()
will select the best ARIMA model for the errors.
If differencing is required, then all variables are differenced during the estimation process, although the final model will be expressed in terms of the original variables. To include a constant in the differenced model, specify include.drift=TRUE
.
Here is an example
We analyze the quarterly changes in personal consumption expenditure and personal disposable income from 1970 to 2016 Q3. We would like to forecast changes in expenditure based on changes in income.
## Consumption Income Production Savings Unemployment
## 1970 Q1 0.6159862 0.9722610 -2.4527003 4.8103115 0.9
## 1970 Q2 0.4603757 1.1690847 -0.5515251 7.2879923 0.5
## 1970 Q3 0.8767914 1.5532705 -0.3587079 7.2890131 0.5
## 1970 Q4 -0.2742451 -0.2552724 -2.1854549 0.9852296 0.7
## 1971 Q1 1.8973708 1.9871536 1.9097341 3.6577706 -0.1
## 1971 Q2 0.9119929 1.4473342 0.9015358 6.0513418 -0.1
autoplot(uschange[,1:2], facets=TRUE) +
xlab("Year") + ylab("") +
ggtitle("Quarterly changes in US consumption and personal income")
The data are clearly already stationary
## Series: uschange[, "Consumption"]
## Regression with ARIMA(1,0,2) errors
##
## Coefficients:
## ar1 ma1 ma2 intercept xreg
## 0.6922 -0.5758 0.1984 0.5990 0.2028
## s.e. 0.1159 0.1301 0.0756 0.0884 0.0461
##
## sigma^2 estimated as 0.3219: log likelihood=-156.95
## AIC=325.91 AICc=326.37 BIC=345.29
## Series: uschange[, "Consumption"]
## Regression with ARIMA(1,0,2) errors
##
## Coefficients:
## ar1 ma1 ma2 intercept xreg
## 0.6922 -0.5758 0.1984 0.5990 0.2028
## s.e. 0.1159 0.1301 0.0756 0.0884 0.0461
##
## sigma^2 estimated as 0.3219: log likelihood=-156.95
## AIC=325.91 AICc=326.37 BIC=345.29
The fitted model is
\[ Z_t = 0.599 + 0.203 X_t + \eta_t,\\ \eta_t = 0.692 \eta_{t-1} + a_t + 0.576 a_{t−1} - 0.198 a_{t−2} \]
where \(a_t\) is white noise.
We can recover estimates of both the \(\eta_t\) and \(a_t\) series using the residuals()
. function.
cbind("Regression Errors" = residuals(fit, type="regression"),
"ARIMA errors" = residuals(fit, type="innovation")) %>%
autoplot(facets=TRUE)
It is the ARIMA errors that should resemble a white noise series.
##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(1,0,2) errors
## Q* = 5.8916, df = 3, p-value = 0.117
##
## Model df: 5. Total lags used: 8
To forecast using a regression model with ARIMA errors, we need to forecast the regression part of the model and the ARIMA part of the model, and combine the results.
When the predictors are themselves unknown, we must either model them separately, or use assumed future values for each predictor.
We will calculate forecasts for the next eight quarters assuming that the future percentage changes in personal disposable income will be equal to the mean percentage change from the last forty years.
fcast <- forecast(fit, xreg=rep(mean(uschange[,2]),20))
autoplot(fcast) + xlab("Year") +
ylab("Percentage change")
The prediction intervals for this model are narrower than the ARIMA model for \(Z_t\) because we are now able to explain some of the variation in the data using the income predictor.
fit2 <- Arima(uschange[,"Consumption"], order=c(1,0,2))
fcast2 <- forecast(fit2,h=20)
autoplot(fcast2) + xlab("Year") +
ylab("Percentage change")
It is important to realise that the prediction intervals from regression models (with or without ARIMA errors) do not take into account the uncertainty in the forecasts of the predictors. So they should be interpreted as being conditional on the assumed (or estimated) future values of the predictor variables.
Suppose we are interested in the level of gas prices (Data) as a function of various explanatory variables.
We observe Gas Prices \(Y_t\) over \(n\) time periods, \(t = 1, 2, \dots, n\).
we can forecast \(Y\) purely against time and itself;
Or, we can apply a VAR by using possible variables: Personal Disposable Income Consumer Price Index Unemployment Housing Starts Price of crude oil
# my_data1 <- Gas_prices[,-c(1:3,10:29)]
# Import the data ---------------------------------------------------------
GasPrices<- readxl::read_xlsx(path = "Gas_prices_1.xlsx")
head(GasPrices)
## # A tibble: 6 x 29
## Date Year Month Unleaded Crude_price SP500 PDI CPI RR_sales Unemp
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 15-J~ 1996 1 109. 18.9 636. 5660 154. 134926 5.6
## 2 15-F~ 1996 2 109. 19.1 640. 5713. 155. 136781 5.5
## 3 15-M~ 1996 3 114. 21.3 646. 5749. 156. 137527 5.5
## 4 15-A~ 1996 4 123. 23.5 654. 5733. 156. 137504 5.6
## 5 15-M~ 1996 5 128. 21.2 669. 5816 157. 138302 5.6
## 6 15-J~ 1996 6 126. 20.4 671. 5851. 157. 137881 5.3
## # ... with 19 more variables: Housing <dbl>, Demand <dbl>, Production <dbl>,
## # Imports <dbl>, Stocks <dbl>, Time <dbl>, L1_Unleaded <dbl>,
## # L1_Crude_price <dbl>, L1_SP500 <dbl>, L1_PDI <dbl>, L1_CPI <dbl>,
## # L1_RR_sales <dbl>, L1_Unemp <dbl>, L1_Housing <dbl>, L1_Demand <dbl>,
## # L1_Production <dbl>, L1_Imports <dbl>, L1_Stocks <dbl>, Recession <dbl>
if (!require("forecast")){install.packages("forecast")}
Unleaded <- ts(GasPrices$Unleaded, start=1996, frequency = 12, end=c(2015, 12))
# Transformation----------------------------------------------------------
log_Unleaded <- log(Unleaded)
# auto.arima
fit.auto <- auto.arima(log_Unleaded)
print(fit.auto)
## Series: log_Unleaded
## ARIMA(0,1,1)
##
## Coefficients:
## ma1
## 0.5588
## s.e. 0.0488
##
## sigma^2 estimated as 0.003014: log likelihood=354.82
## AIC=-705.64 AICc=-705.59 BIC=-698.69
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,1,1)
## Q* = 46.458, df = 23, p-value = 0.002622
##
## Model df: 1. Total lags used: 24
##
## Call:
## lm(formula = Unleaded ~ Crude_price + CPI + Demand, data = GasPrices)
##
## Residuals:
## Min 1Q Median 3Q Max
## -31.549 -10.814 -1.954 7.357 65.137
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -48.26392 19.52878 -2.471 0.0142 *
## Crude_price 2.30659 0.06870 33.575 < 2e-16 ***
## CPI 0.76253 0.09357 8.149 2.14e-14 ***
## Demand -0.14675 0.14335 -1.024 0.3070
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 16.58 on 236 degrees of freedom
## Multiple R-squared: 0.9681, Adjusted R-squared: 0.9677
## F-statistic: 2386 on 3 and 236 DF, p-value: < 2.2e-16
fit <- auto.arima(GasPrices$Unleaded, xreg=as.matrix(GasPrices[,c("Crude_price", "CPI", "Demand")]))
summary(fit)
## Series: GasPrices$Unleaded
## Regression with ARIMA(1,0,2) errors
##
## Coefficients:
## ar1 ma1 ma2 intercept Crude_price CPI Demand
## 0.7134 0.4971 0.1281 -207.2742 1.6327 1.6563 0.3211
## s.e. 0.0955 0.0997 0.0950 57.5287 0.1951 0.3035 0.2907
##
## sigma^2 estimated as 92.26: log likelihood=-880.77
## AIC=1777.55 AICc=1778.17 BIC=1805.39
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.06211942 9.463917 6.632496 -0.1367517 3.007046 0.6452159
## ACF1
## Training set -0.00322775
##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(1,0,2) errors
## Q* = 6.8914, df = 3, p-value = 0.07544
##
## Model df: 7. Total lags used: 10
xreg <- cbind(MaxTemp = elecdaily[, "Temperature"],
MaxTempSq = elecdaily[, "Temperature"]^2,
Workday = elecdaily[, "WorkDay"])
fit <- auto.arima(elecdaily[, "Demand"], xreg = xreg)
checkresiduals(fit)
##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(2,1,2)(2,0,0)[7] errors
## Q* = 28.229, df = 4, p-value = 1.121e-05
##
## Model df: 10. Total lags used: 14
There are two different ways of modelling a linear trend. A deterministic trend is obtained using the regression model
\[ Z_t= \beta_0 + \beta_1 t + \eta_t, \]
where \(\eta_t\) is an ARMA process. A stochastic trend is obtained using the model
\[ Z_t= \beta_0 + \beta_1 t + \eta_t, \]
where \(\eta_t\) is an ARIMA process with \(d=1\). In the latter case, we can difference both sides so that \(\nabla Z_t = \beta_1 + \nabla \eta_t\), where \(\nabla \eta_t\) is an ARMA process. In other words, \[ Z_t = Z_{t-1} + \beta_1 + \nabla \eta_t \]
This is a random walk with drift, but here the error term is an ARMA process rather than simply white noise.
Although these models appear quite similar (they only differ in the number of differences that need to be applied to \(\eta_t\) ), their forecasting characteristics are quite different.
Example: International visitors to Australia We analyze the total number of international visitors to Australia each year from 1980 to 2015. We will fit both a deterministic and a stochastic trend model to these data.
data(austa)
autoplot(austa) + xlab("Year") +
ylab("millions of people") +
ggtitle("Total annual international visitors to Australia")
The deterministic trend model is obtained as follows:
## Series: austa
## Regression with ARIMA(2,0,0) errors
##
## Coefficients:
## ar1 ar2 intercept xreg
## 1.1127 -0.3805 0.4156 0.1710
## s.e. 0.1600 0.1585 0.1897 0.0088
##
## sigma^2 estimated as 0.02979: log likelihood=13.6
## AIC=-17.2 AICc=-15.2 BIC=-9.28
This model can be written as
\[ Z_t = 0.416 + 0.171 t + \eta_t\\ \eta_t = 1.113 \eta_{t-1} -0.380 \eta_{t-2} + a_t \]
where \(a_t\) is white noise with \(\sigma^2_a = 0.030\). The estimated growth in visitor numbers is 0.17 million people per year.
The stochastic trend model can be estimated.
## Series: austa
## ARIMA(0,1,1) with drift
##
## Coefficients:
## ma1 drift
## 0.3006 0.1735
## s.e. 0.1647 0.0390
##
## sigma^2 estimated as 0.03376: log likelihood=10.62
## AIC=-15.24 AICc=-14.46 BIC=-10.57
This model can be written as
\[ Z_t = z_{0} + 0.173 t + \eta_t\\ \eta_t = \eta_{t-1} + a_t - 0.301 a_{t-1} \] where \(a_t\) is white noise with \(\sigma^2_a = 0.034\). This model can be equivalently written as \(\nabla Z_t = 0.173 + \nabla \eta_t\).
In this case, the estimated growth in visitor numbers is also 0.17 million people per year. Although the growth estimates are similar, the prediction intervals are not, as Figure 9.10 shows. In particular, stochastic trends have much wider prediction intervals because the errors are non-stationary.
fc1 <- forecast(fit1,
xreg = length(austa) + 1:10)
fc2 <- forecast(fit2, h=10)
autoplot(austa) +
autolayer(fc2, series="Stochastic trend") +
autolayer(fc1, series="Deterministic trend") +
ggtitle("Forecasts from trend models") +
xlab("Year") + ylab("Visitors to Australia (millions)") +
guides(colour=guide_legend(title="Forecast"))
There is an implicit assumption with deterministic trends that the slope of the trend is not going to change over time. On the other hand, stochastic trends can change, and the estimated growth is only assumed to be the average growth over the historical period, not necessarily the rate of growth that will be observed into the future. Consequently, it is safer to forecast with stochastic trends, especially for longer forecast horizons, as the prediction intervals allow for greater uncertainty in future growth.
One limitation of the models that we have considered so far is that they impose a unidirectional relationship — the forecast variable is influenced by the predictor variables, but not vice versa. However, there are many cases where the reverse should also be allowed for — where all variables affect each other. the changes in personal consumption expenditure were forecast based on the changes in personal disposable income. However, in this case a bi-directional relationship may be more suitable: an increase in consumption will lead to an increase in income and vice versa.
Such feedback relationships are allowed for in the vector autoregressive (VAR) framework. In this framework, all variables are treated symmetrically. They are all modelled as if they all influence each other equally. In more formal terminology, all variables are now treated as “endogenous”.
A VAR model is a generalisation of the univariate autoregressive model for forecasting a vector of time series.21 It comprises one equation per variable in the system. To keep it simple, we will consider a two variable VAR with one lag. We write a 2-dimensional VAR(1) as
\[ Z_{t} = c_1 + \phi_{11,1} Z_{t-1} + \phi_{12,1} Y_{t-1} + a_{1,t}\\ Y_{t} = c_2 + \phi_{21,1} Z_{t-1} + \phi_{22,1} Y_{t-1} + a_{2,t} \] where \(a_{1,t}\) and \(a_{2,t}\) are white noise processes that may be contemporaneously correlated.
If the series are stationary, we forecast them by fitting a VAR to the data directly (known as a “VAR in levels”). If the series are non-stationary, we take differences of the data in order to make them stationary, then fit a VAR model (known as a “VAR in differences”).
The other possibility is that the series may be non-stationary but cointegrated, which means that there exists a linear combination of them that is stationary. In this case, a VAR specification that includes an error correction mechanism, usually referred to as a vector error correction model (VEC), should be included.
Forecasts are generated from a VAR in a recursive manner. The one-step-ahead forecasts are generated by
\[ \hat{Z}_{1,t}(1) = \hat{c}_1 + \hat{\phi}_{11,1} Z_{1,t} + \hat{\phi}_{12,1} Z_{2,t}\\ \hat{Z}_{2,t}(1) = \hat{c}_2 + \hat{\phi}_{21,1} Z_{1,t} + \hat{\phi}_{22,1} Z_{2,t} \]
There are two decisions one has to make when using a VAR to forecast, namely how many variables (denoted by K) and how many lags (denoted by p) should be included in the system. The number of coefficients to be estimated in a VAR is equal to \(K+pK^2\) (or \(1+pK\) per equation). It is usual to keep K small and include only variables that are correlated with each other, and therefore useful in forecasting each other. Information criteria are commonly used to select the number of lags to be included.
VAR models are implemented in the vars
package in R. It contains a function VARselect()
for selecting the number of lags p.
VARs are useful in several contexts:
Example
A VAR model for forecasting US consumption
## AIC(n) HQ(n) SC(n) FPE(n)
## 5 1 1 5
The R output shows the lag length selected by each of the information criteria available in the vars package. We have met the AIC before, and SC is simply another name for the BIC (SC stands for Schwarz Criterion, after Gideon Schwarz who proposed it). HQ is the Hannan-Quinn criterion, and FPE is the “Final Prediction Error” criterion. Care should be taken when using the AIC as it tends to choose large numbers of lags. Instead, for VAR models, we prefer to use the BIC.
There is a large discrepancy between the VAR(5) selected by the AIC and the VAR(1) selected by the BIC. This is not unusual. As a result we first fit a VAR(1), as selected by the BIC.
##
## Portmanteau Test (asymptotic)
##
## data: Residuals of VAR object var1
## Chi-squared = 49.102, df = 36, p-value = 0.07144
##
## Portmanteau Test (asymptotic)
##
## data: Residuals of VAR object var2
## Chi-squared = 47.741, df = 32, p-value = 0.03633
In similar fashion to the univariate ARIMA methodology, we test that the residuals are uncorrelated using a Portmanteau test. Both a VAR(1) and a VAR(2) have some residual serial correlation, and therefore we fit a VAR(3).
##
## Portmanteau Test (asymptotic)
##
## data: Residuals of VAR object var3
## Chi-squared = 33.617, df = 28, p-value = 0.2138
##
## VAR Estimation Results:
## =======================
##
## Estimated coefficients for equation Consumption:
## ================================================
## Call:
## Consumption = Consumption.l1 + Income.l1 + Consumption.l2 + Income.l2 + Consumption.l3 + Income.l3 + const
##
## Consumption.l1 Income.l1 Consumption.l2 Income.l2 Consumption.l3
## 0.19100120 0.07836635 0.15953548 -0.02706495 0.22645563
## Income.l3 const
## -0.01453688 0.29081124
##
##
## Estimated coefficients for equation Income:
## ===========================================
## Call:
## Income = Consumption.l1 + Income.l1 + Consumption.l2 + Income.l2 + Consumption.l3 + Income.l3 + const
##
## Consumption.l1 Income.l1 Consumption.l2 Income.l2 Consumption.l3
## 0.45349152 -0.27302538 0.02166532 -0.09004735 0.35376691
## Income.l3 const
## -0.05375916 0.38749574
We can conduct Granger causality tests as follows
##
## VAR Estimation Results:
## =========================
## Endogenous variables: Consumption, Income
## Deterministic variables: const
## Sample size: 184
## Log Likelihood: -380.155
## Roots of the characteristic polynomial:
## 0.7821 0.4981 0.4981 0.4441 0.2857 0.2857
## Call:
## VAR(y = uschange[, 1:2], p = 3, type = "const")
##
##
## Estimation results for equation Consumption:
## ============================================
## Consumption = Consumption.l1 + Income.l1 + Consumption.l2 + Income.l2 + Consumption.l3 + Income.l3 + const
##
## Estimate Std. Error t value Pr(>|t|)
## Consumption.l1 0.19100 0.07965 2.398 0.017518 *
## Income.l1 0.07837 0.05453 1.437 0.152445
## Consumption.l2 0.15954 0.08252 1.933 0.054792 .
## Income.l2 -0.02706 0.05723 -0.473 0.636831
## Consumption.l3 0.22646 0.07970 2.841 0.005018 **
## Income.l3 -0.01454 0.05425 -0.268 0.789055
## const 0.29081 0.08300 3.504 0.000581 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## Residual standard error: 0.5955 on 177 degrees of freedom
## Multiple R-Squared: 0.2132, Adjusted R-squared: 0.1865
## F-statistic: 7.994 on 6 and 177 DF, p-value: 1.216e-07
##
##
## Estimation results for equation Income:
## =======================================
## Income = Consumption.l1 + Income.l1 + Consumption.l2 + Income.l2 + Consumption.l3 + Income.l3 + const
##
## Estimate Std. Error t value Pr(>|t|)
## Consumption.l1 0.45349 0.11560 3.923 0.000125 ***
## Income.l1 -0.27303 0.07915 -3.450 0.000702 ***
## Consumption.l2 0.02167 0.11977 0.181 0.856663
## Income.l2 -0.09005 0.08306 -1.084 0.279788
## Consumption.l3 0.35377 0.11568 3.058 0.002572 **
## Income.l3 -0.05376 0.07875 -0.683 0.495699
## const 0.38750 0.12047 3.217 0.001543 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## Residual standard error: 0.8643 on 177 degrees of freedom
## Multiple R-Squared: 0.1755, Adjusted R-squared: 0.1475
## F-statistic: 6.279 on 6 and 177 DF, p-value: 5.309e-06
##
##
##
## Covariance matrix of residuals:
## Consumption Income
## Consumption 0.3546 0.1845
## Income 0.1845 0.7470
##
## Correlation matrix of residuals:
## Consumption Income
## Consumption 1.0000 0.3585
## Income 0.3585 1.0000
The residuals for this model pass the test for serial correlation. The forecasts generated by the VAR(3) are plotted below.