Measures of Location

Suppose the sales of a popular book over a seven-week period are as follows:

Sales_D <- data.frame(Week=1:7, Sales=c(15, 10, 12, 16, 9, 8, 14))

Arithmetic/Geometric Mean

mean(Sales_D$Sales)

Order Statistics

sort(Sales_D$Sales)

Median

median(Sales_D$Sales)

Five-Number Summary (Quartiles)

summary(Sales_D$Sales)

Exercise:

How to obtain order statistics in R?

Measures of Variation

Range
Measures of Dispersion
- mean absolute deviation
- variance
- standard deviation

sum(abs(Sales_D$Sales - median(Sales_D$Sales)))/length(Sales_D$Sales)
var(Sales_D$Sales)
sd(Sales_D$Sales)

Assessing Variability
- Z-score: standardized score, which meansures the relative location of the observation in a data set.

scale(Sales_D$Sales, center = T, scale = T)

Measure of Linear Relationships

The correlation coefficient measures linear relationships:

Ranges over [–1, +1]

- A value of +1 indicates a perfect positive (upward sloping) linear relationship between the two variables
- A value of –1 indicates a perfect negative (downward sloping) linear relationship between the two variables
- A value of zero indicates no linear relationship between the two variables

The Correlation Coefficient, \(\rho\), may be calculated from sample data using \(\rho=\frac{S_{XY}}{S_XS_Y}\)

Example: German Forecasts

German Forecasts R Data German Forecasts Excel

German <- readxl::read_xlsx(path = "data/German_forecasts.xlsx")
German

## # A tibble: 25 x 9
##    Institutio~   GDP Privcons  GFCF Exports Imports Govsurp Consprix Unemp
##    <chr>       <dbl>    <dbl> <dbl>   <dbl>   <dbl>   <dbl>    <dbl> <dbl>
##  1 Bundesbank    0.4      1    -0.1     1.9     3     -0.75      1.5   7.2
##  2 Commerzbank   0.5      1.3   0.1     2.8     4.1   -0.5       1.9   7.1
##  3 Deka          0.7      1.1  -0.3     3.3     3.3   -0.3       1.9   6.9
##  4 Deutsche B~   0.3      0.6   1.1     3.2     4.2   -0.5       1.7   7  
##  5 DIW           0.9      1.1   0.9     4.2     4.6    0         2.8   7  
##  6 DZ Bank       0.4      0.9   0.1     3       3.8   -0.7       2.1   7.1
##  7 Feri          1.2      1.2   1.9     4.1     4.1   -0.3       2     6.6
##  8 Gemeinscha~   1        1.1   1.9     3.8     4.6   -0.2       2.1   6.8
##  9 Helaba        1.1      1.2   2.6     5.5     5      0         2     7  
## 10 HSBC          0.6      1     0.5     2.9     4.1   -0.4       2     7  
## # ... with 15 more rows

Scatter Plot Matrices:

pairs(x = German[,c("GDP", "GFCF", "Govsurp", "Unemp")], pch = 19)

Example: German Forecasts

Assessing Variability: The standardized score (Z-score)

head(cbind(German[,1], scale(German[,-1])), 10)

##             Institutions         GDP   Privcons       GFCF     Exports
## 1             Bundesbank -1.13391175  0.4365376 -0.8399654 -1.51994386
## 2            Commerzbank -0.73181539  1.4286685 -0.6177524 -0.57261043
## 3                   Deka  0.07237735  0.7672479 -1.0621785 -0.04631408
## 4          Deutsche Bank -1.53600812 -0.8863036  0.4933130 -0.15157335
## 5                    DIW  0.87657008  0.7672479  0.2711000  0.90101935
## 6                DZ Bank -1.13391175  0.1058273 -0.6177524 -0.36209189
## 7                   Feri  2.08285918  1.0979582  1.3821653  0.79576008
## 8  Gemeinschaftsdiagnose  1.27866644  0.7672479  1.3821653  0.47998227
## 9                 Helaba  1.68076281  1.0979582  2.1599111  2.26938987
## 10                  HSBC -0.32971902  0.4365376 -0.1733262 -0.46735116
##       Imports     Govsurp   Consprix      Unemp
## 1  -0.7134526 -1.84780525 -1.4626070  1.5531605
## 2   0.4206484 -0.83475413  0.0222732  0.9984604
## 3  -0.4041523 -0.02431323  0.0222732 -0.1109400
## 4   0.5237485 -0.83475413 -0.7201669  0.4437602
## 5   0.9361488  1.19134812  3.3632536  0.4437602
## 6   0.1113481 -1.64519502  0.7647133  0.9984604
## 7   0.4206484 -0.02431323  0.3934932 -1.7750406
## 8   0.9361488  0.38090722  0.7647133 -0.6656402
## 9   1.3485492  1.19134812  0.3934932  0.4437602
## 10  0.4206484 -0.42953368  0.3934932  0.4437602

Example: German Correlation

round(cor(German[,-1]), digits = 3)

##             GDP Privcons   GFCF Exports Imports Govsurp Consprix  Unemp
## GDP       1.000    0.430  0.601   0.459   0.209   0.620    0.317 -0.363
## Privcons  0.430    1.000  0.295   0.262   0.469   0.033    0.192  0.073
## GFCF      0.601    0.295  1.000   0.192   0.371   0.223    0.336 -0.295
## Exports   0.459    0.262  0.192   1.000   0.641   0.518    0.210 -0.273
## Imports   0.209    0.469  0.371   0.641   1.000   0.085    0.382 -0.087
## Govsurp   0.620    0.033  0.223   0.518   0.085   1.000    0.179 -0.317
## Consprix  0.317    0.192  0.336   0.210   0.382   0.179    1.000  0.063
## Unemp    -0.363    0.073 -0.295  -0.273  -0.087  -0.317    0.063  1.000

Exercise:

Check correlation between GDP and GFCF

cov(German$GDP, German$GFCF); sd(German$GDP); sd(German$GFCF)
cov(German$GDP, German$GFCF) / (sd(German$GDP) * sd(German$GFCF))

Example: German Forecasts

library("PerformanceAnalytics")
my_data <- German[,-1]
chart.Correlation(R = my_data, histogram=TRUE, pch=19)

Transformations

Summary of Dulles Excel

Dulles <- readxl::read_xlsx(path = "data/Dulles_2.xlsx")
summary(Dulles)

##      Year           Passengers (000s)
##  Length:53          Min.   :  640.5  
##  Class :character   1st Qu.: 2083.1  
##  Mode  :character   Median : 8946.6  
##                     Mean   : 8455.9  
##                     3rd Qu.:14393.0  
##                     Max.   :22129.0

Transformations

Time series plot of Dulles

library(ggplot2); library(ggfortify); library(zoo)
Dulles2 <- ts(data = Dulles$`Passengers (000s)`, start = as.yearmon("1963-12"), frequency = 1)
autoplot(Dulles2)

Transformations: Differences

A simple way to view a single (or “first order”) difference is to see it as \(Y_t - Y_{t-k}\)

Time Series Plot for the First Differences of the Dulles Passenger Series

# How to use "diff" function
diff(1:10, 2, 1)

## [1] 2 2 2 2 2 2 2 2

x <- c(1,4,10,20,35)
diff(x, lag = 2)

## [1]  9 16 25

diff(x, differences = 2)

## [1] 3 4 5

Diff_1 <- diff(Dulles2, differences = 1)
autoplot(Diff_1, xlab='Time', colour="blue")

Transformations: Growth Rates

Time Series Plot for the Growth Rate for the Dulles Passenger Series

Growth_1 <- 100* Diff_1 / Dulles2
autoplot(Growth_1, xlab='Time', colour="blue")

Transformations: Logarithm

The Log Transform

Log_Dulles <- log(Dulles2)
autoplot(Log_Dulles, xlab='Time', colour="blue")

Transformations

The (first) difference in logarithms

Diff_Log <- diff(Log_Dulles, differences = 1)
autoplot(Diff_Log, xlab='Time', colour="blue")

How to Evaluate Forecasting Accuracy

The E in PIVASE

What do you need from the forecasting activity?
Quantitative accuracy
Timing accuracy
How do we measure these?
Asymmetric measures?

“The thrill of winning is much less than the agony of defeat.” — Michigan Football Coach [paraphrased]

How to Evaluate Forecasting Accuracy

This process is known as rolling origin forecasting. The one-step-ahead forecast error at time \(t + i\) may be denoted by \[e_{t+i}=Y_{t+i} - F_{t+i}\]

Weather Excel

DC_w <- readxl::read_xlsx(path = "data/DC_weather_3.xlsx")
summary(DC_w)

##       Date                        Forecast         Actual     
##  Min.   :2016-06-01 00:00:00   Min.   :74.00   Min.   :73.00  
##  1st Qu.:2016-06-09 12:00:00   1st Qu.:82.00   1st Qu.:82.00  
##  Median :2016-06-18 00:00:00   Median :84.50   Median :85.00  
##  Mean   :2016-06-18 00:00:00   Mean   :84.26   Mean   :84.65  
##  3rd Qu.:2016-06-26 12:00:00   3rd Qu.:86.75   3rd Qu.:87.75  
##  Max.   :2016-07-05 00:00:00   Max.   :94.00   Max.   :94.00  
##                                NA's   :1       NA's   :1

# Draw time series
plot(x = DC_w$Date, y = DC_w$Forecast, xlab='Time', ylab="Forecast", type="l", col="blue")
lines(x = DC_w$Date, y = DC_w$Actual, xlab='Time', ylab="Actual", col="red")

How to Evaluate Forecasting Accuracy

# Calculate the forecast error
Actual1 <- DC_w$Actual[c(-1)]
Forecast1 <- DC_w$Forecast[-dim(DC_w)[1]]
error <- Actual1 - Forecast1
# Mean error
mean(error)

## [1] 0.3823529

# Mean percentage error:
mean(error/Actual1)

## [1] 0.003828892

# Mean absolute error
mean(abs(error))

## [1] 1.911765

library(DescTools)
MAE(Forecast1, Actual1)

## [1] 1.911765

How to Evaluate Forecasting Accuracy

# Mean absolute percentage error
mean(abs(error)/DC_w$Actual[c(-1)])

## [1] 0.02244587

MAPE(Forecast1, Actual1)

## [1] 0.02244587

# Mean square error 
mean(error^2)

## [1] 6.970588

MSE(Forecast1, Actual1)

## [1] 6.970588

RMSE(Forecast1, Actual1)

## [1] 2.640187

sqrt(mean(error^2))

## [1] 2.640187

Basic Tools For Forecasting

Summarizing the Data

Measures of Location

Exercise:

Measures of Variation

Measure of Linear Relationships

Example: German Forecasts

Example: German Forecasts

Example: German Correlation

Example: German Forecasts

Transformations

Transformations

Transformations: Differences

Transformations: Growth Rates

Transformations: Logarithm

Transformations

How to Evaluate Forecasting Accuracy

How to Evaluate Forecasting Accuracy

How to Evaluate Forecasting Accuracy

How to Evaluate Forecasting Accuracy