Basic Tools For Forecasting

Xiaorui(Jeremy) Zhu

08/30/2019

Summarizing the Data

Suppose the sales of a popular book over a seven-week period are as follows:

Sales_D <- data.frame(Week=1:7, Sales=c(15, 10, 12, 16, 9, 8, 14))

Measures of Location

Suppose the sales of a popular book over a seven-week period are as follows:

Sales_D <- data.frame(Week=1:7, Sales=c(15, 10, 12, 16, 9, 8, 14))
mean(Sales_D$Sales)
sort(Sales_D$Sales)
median(Sales_D$Sales)
summary(Sales_D$Sales)

Exercise:

How to obtain order statistics in R?

Measures of Variation

sum(abs(Sales_D$Sales - median(Sales_D$Sales)))/length(Sales_D$Sales)
var(Sales_D$Sales)
sd(Sales_D$Sales)
scale(Sales_D$Sales, center = T, scale = T)

Measure of Linear Relationships

The correlation coefficient measures linear relationships:

Ranges over [–1, +1]

- A value of +1 indicates a perfect positive (upward sloping) linear relationship between the two variables
- A value of –1 indicates a perfect negative (downward sloping) linear relationship between the two variables
- A value of zero indicates no linear relationship between the two variables

The Correlation Coefficient, \(\rho\), may be calculated from sample data using \(\rho=\frac{S_{XY}}{S_XS_Y}\)

Example: German Forecasts

German Forecasts R Data German Forecasts Excel

German <- readxl::read_xlsx(path = "data/German_forecasts.xlsx")
German
## # A tibble: 25 x 9
##    Institutio~   GDP Privcons  GFCF Exports Imports Govsurp Consprix Unemp
##    <chr>       <dbl>    <dbl> <dbl>   <dbl>   <dbl>   <dbl>    <dbl> <dbl>
##  1 Bundesbank    0.4      1    -0.1     1.9     3     -0.75      1.5   7.2
##  2 Commerzbank   0.5      1.3   0.1     2.8     4.1   -0.5       1.9   7.1
##  3 Deka          0.7      1.1  -0.3     3.3     3.3   -0.3       1.9   6.9
##  4 Deutsche B~   0.3      0.6   1.1     3.2     4.2   -0.5       1.7   7  
##  5 DIW           0.9      1.1   0.9     4.2     4.6    0         2.8   7  
##  6 DZ Bank       0.4      0.9   0.1     3       3.8   -0.7       2.1   7.1
##  7 Feri          1.2      1.2   1.9     4.1     4.1   -0.3       2     6.6
##  8 Gemeinscha~   1        1.1   1.9     3.8     4.6   -0.2       2.1   6.8
##  9 Helaba        1.1      1.2   2.6     5.5     5      0         2     7  
## 10 HSBC          0.6      1     0.5     2.9     4.1   -0.4       2     7  
## # ... with 15 more rows

Scatter Plot Matrices:

pairs(x = German[,c("GDP", "GFCF", "Govsurp", "Unemp")], pch = 19)

Example: German Forecasts

Assessing Variability: The standardized score (Z-score)

head(cbind(German[,1], scale(German[,-1])), 10)
##             Institutions         GDP   Privcons       GFCF     Exports
## 1             Bundesbank -1.13391175  0.4365376 -0.8399654 -1.51994386
## 2            Commerzbank -0.73181539  1.4286685 -0.6177524 -0.57261043
## 3                   Deka  0.07237735  0.7672479 -1.0621785 -0.04631408
## 4          Deutsche Bank -1.53600812 -0.8863036  0.4933130 -0.15157335
## 5                    DIW  0.87657008  0.7672479  0.2711000  0.90101935
## 6                DZ Bank -1.13391175  0.1058273 -0.6177524 -0.36209189
## 7                   Feri  2.08285918  1.0979582  1.3821653  0.79576008
## 8  Gemeinschaftsdiagnose  1.27866644  0.7672479  1.3821653  0.47998227
## 9                 Helaba  1.68076281  1.0979582  2.1599111  2.26938987
## 10                  HSBC -0.32971902  0.4365376 -0.1733262 -0.46735116
##       Imports     Govsurp   Consprix      Unemp
## 1  -0.7134526 -1.84780525 -1.4626070  1.5531605
## 2   0.4206484 -0.83475413  0.0222732  0.9984604
## 3  -0.4041523 -0.02431323  0.0222732 -0.1109400
## 4   0.5237485 -0.83475413 -0.7201669  0.4437602
## 5   0.9361488  1.19134812  3.3632536  0.4437602
## 6   0.1113481 -1.64519502  0.7647133  0.9984604
## 7   0.4206484 -0.02431323  0.3934932 -1.7750406
## 8   0.9361488  0.38090722  0.7647133 -0.6656402
## 9   1.3485492  1.19134812  0.3934932  0.4437602
## 10  0.4206484 -0.42953368  0.3934932  0.4437602

Example: German Correlation

round(cor(German[,-1]), digits = 3)
##             GDP Privcons   GFCF Exports Imports Govsurp Consprix  Unemp
## GDP       1.000    0.430  0.601   0.459   0.209   0.620    0.317 -0.363
## Privcons  0.430    1.000  0.295   0.262   0.469   0.033    0.192  0.073
## GFCF      0.601    0.295  1.000   0.192   0.371   0.223    0.336 -0.295
## Exports   0.459    0.262  0.192   1.000   0.641   0.518    0.210 -0.273
## Imports   0.209    0.469  0.371   0.641   1.000   0.085    0.382 -0.087
## Govsurp   0.620    0.033  0.223   0.518   0.085   1.000    0.179 -0.317
## Consprix  0.317    0.192  0.336   0.210   0.382   0.179    1.000  0.063
## Unemp    -0.363    0.073 -0.295  -0.273  -0.087  -0.317    0.063  1.000

Exercise:

cov(German$GDP, German$GFCF); sd(German$GDP); sd(German$GFCF)
cov(German$GDP, German$GFCF) / (sd(German$GDP) * sd(German$GFCF))

Example: German Forecasts

library("PerformanceAnalytics")
my_data <- German[,-1]
chart.Correlation(R = my_data, histogram=TRUE, pch=19)

Transformations

Summary of Dulles Excel

Dulles <- readxl::read_xlsx(path = "data/Dulles_2.xlsx")
summary(Dulles)
##      Year           Passengers (000s)
##  Length:53          Min.   :  640.5  
##  Class :character   1st Qu.: 2083.1  
##  Mode  :character   Median : 8946.6  
##                     Mean   : 8455.9  
##                     3rd Qu.:14393.0  
##                     Max.   :22129.0

Transformations

Time series plot of Dulles

library(ggplot2); library(ggfortify); library(zoo)
Dulles2 <- ts(data = Dulles$`Passengers (000s)`, start = as.yearmon("1963-12"), frequency = 1)
autoplot(Dulles2)

Transformations: Differences

A simple way to view a single (or “first order”) difference is to see it as \(Y_t - Y_{t-k}\)

Time Series Plot for the First Differences of the Dulles Passenger Series

# How to use "diff" function
diff(1:10, 2, 1)
## [1] 2 2 2 2 2 2 2 2
x <- c(1,4,10,20,35)
diff(x, lag = 2)
## [1]  9 16 25
diff(x, differences = 2)
## [1] 3 4 5
Diff_1 <- diff(Dulles2, differences = 1)
autoplot(Diff_1, xlab='Time', colour="blue")

Transformations: Growth Rates

Time Series Plot for the Growth Rate for the Dulles Passenger Series

Growth_1 <- 100* Diff_1 / Dulles2
autoplot(Growth_1, xlab='Time', colour="blue")

Transformations: Logarithm

The Log Transform

Log_Dulles <- log(Dulles2)
autoplot(Log_Dulles, xlab='Time', colour="blue")

Transformations

The (first) difference in logarithms

Diff_Log <- diff(Log_Dulles, differences = 1)
autoplot(Diff_Log, xlab='Time', colour="blue")

How to Evaluate Forecasting Accuracy

The E in PIVASE

“The thrill of winning is much less than the agony of defeat.” — Michigan Football Coach [paraphrased]

How to Evaluate Forecasting Accuracy

This process is known as rolling origin forecasting. The one-step-ahead forecast error at time \(t + i\) may be denoted by \[e_{t+i}=Y_{t+i} - F_{t+i}\]

Weather Excel

DC_w <- readxl::read_xlsx(path = "data/DC_weather_3.xlsx")
summary(DC_w)
##       Date                        Forecast         Actual     
##  Min.   :2016-06-01 00:00:00   Min.   :74.00   Min.   :73.00  
##  1st Qu.:2016-06-09 12:00:00   1st Qu.:82.00   1st Qu.:82.00  
##  Median :2016-06-18 00:00:00   Median :84.50   Median :85.00  
##  Mean   :2016-06-18 00:00:00   Mean   :84.26   Mean   :84.65  
##  3rd Qu.:2016-06-26 12:00:00   3rd Qu.:86.75   3rd Qu.:87.75  
##  Max.   :2016-07-05 00:00:00   Max.   :94.00   Max.   :94.00  
##                                NA's   :1       NA's   :1
# Draw time series
plot(x = DC_w$Date, y = DC_w$Forecast, xlab='Time', ylab="Forecast", type="l", col="blue")
lines(x = DC_w$Date, y = DC_w$Actual, xlab='Time', ylab="Actual", col="red")

How to Evaluate Forecasting Accuracy

# Calculate the forecast error
Actual1 <- DC_w$Actual[c(-1)]
Forecast1 <- DC_w$Forecast[-dim(DC_w)[1]]
error <- Actual1 - Forecast1
# Mean error
mean(error)
## [1] 0.3823529
# Mean percentage error:
mean(error/Actual1)
## [1] 0.003828892
# Mean absolute error
mean(abs(error))
## [1] 1.911765
library(DescTools)
MAE(Forecast1, Actual1)
## [1] 1.911765

How to Evaluate Forecasting Accuracy

# Mean absolute percentage error
mean(abs(error)/DC_w$Actual[c(-1)])
## [1] 0.02244587
MAPE(Forecast1, Actual1)
## [1] 0.02244587
# Mean square error 
mean(error^2)
## [1] 6.970588
MSE(Forecast1, Actual1)
## [1] 6.970588
RMSE(Forecast1, Actual1)
## [1] 2.640187
sqrt(mean(error^2))
## [1] 2.640187