Xiaorui(Jeremy) Zhu
08/30/2019
Suppose the sales of a popular book over a seven-week period are as follows:
Sales_D <- data.frame(Week=1:7, Sales=c(15, 10, 12, 16, 9, 8, 14))
Suppose the sales of a popular book over a seven-week period are as follows:
Sales_D <- data.frame(Week=1:7, Sales=c(15, 10, 12, 16, 9, 8, 14))
mean(Sales_D$Sales)
sort(Sales_D$Sales)
median(Sales_D$Sales)
summary(Sales_D$Sales)
How to obtain order statistics in R?
sum(abs(Sales_D$Sales - median(Sales_D$Sales)))/length(Sales_D$Sales)
var(Sales_D$Sales)
sd(Sales_D$Sales)
scale(Sales_D$Sales, center = T, scale = T)
The correlation coefficient measures linear relationships:
Ranges over [–1, +1]
- A value of +1 indicates a perfect positive (upward sloping) linear relationship between the two variables
- A value of –1 indicates a perfect negative (downward sloping) linear relationship between the two variables
- A value of zero indicates no linear relationship between the two variables
The Correlation Coefficient, \(\rho\), may be calculated from sample data using \(\rho=\frac{S_{XY}}{S_XS_Y}\)
German Forecasts R Data German Forecasts Excel
German <- readxl::read_xlsx(path = "data/German_forecasts.xlsx")
German
## # A tibble: 25 x 9
## Institutio~ GDP Privcons GFCF Exports Imports Govsurp Consprix Unemp
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Bundesbank 0.4 1 -0.1 1.9 3 -0.75 1.5 7.2
## 2 Commerzbank 0.5 1.3 0.1 2.8 4.1 -0.5 1.9 7.1
## 3 Deka 0.7 1.1 -0.3 3.3 3.3 -0.3 1.9 6.9
## 4 Deutsche B~ 0.3 0.6 1.1 3.2 4.2 -0.5 1.7 7
## 5 DIW 0.9 1.1 0.9 4.2 4.6 0 2.8 7
## 6 DZ Bank 0.4 0.9 0.1 3 3.8 -0.7 2.1 7.1
## 7 Feri 1.2 1.2 1.9 4.1 4.1 -0.3 2 6.6
## 8 Gemeinscha~ 1 1.1 1.9 3.8 4.6 -0.2 2.1 6.8
## 9 Helaba 1.1 1.2 2.6 5.5 5 0 2 7
## 10 HSBC 0.6 1 0.5 2.9 4.1 -0.4 2 7
## # ... with 15 more rows
Scatter Plot Matrices:
pairs(x = German[,c("GDP", "GFCF", "Govsurp", "Unemp")], pch = 19)
Assessing Variability: The standardized score (Z-score)
head(cbind(German[,1], scale(German[,-1])), 10)
## Institutions GDP Privcons GFCF Exports
## 1 Bundesbank -1.13391175 0.4365376 -0.8399654 -1.51994386
## 2 Commerzbank -0.73181539 1.4286685 -0.6177524 -0.57261043
## 3 Deka 0.07237735 0.7672479 -1.0621785 -0.04631408
## 4 Deutsche Bank -1.53600812 -0.8863036 0.4933130 -0.15157335
## 5 DIW 0.87657008 0.7672479 0.2711000 0.90101935
## 6 DZ Bank -1.13391175 0.1058273 -0.6177524 -0.36209189
## 7 Feri 2.08285918 1.0979582 1.3821653 0.79576008
## 8 Gemeinschaftsdiagnose 1.27866644 0.7672479 1.3821653 0.47998227
## 9 Helaba 1.68076281 1.0979582 2.1599111 2.26938987
## 10 HSBC -0.32971902 0.4365376 -0.1733262 -0.46735116
## Imports Govsurp Consprix Unemp
## 1 -0.7134526 -1.84780525 -1.4626070 1.5531605
## 2 0.4206484 -0.83475413 0.0222732 0.9984604
## 3 -0.4041523 -0.02431323 0.0222732 -0.1109400
## 4 0.5237485 -0.83475413 -0.7201669 0.4437602
## 5 0.9361488 1.19134812 3.3632536 0.4437602
## 6 0.1113481 -1.64519502 0.7647133 0.9984604
## 7 0.4206484 -0.02431323 0.3934932 -1.7750406
## 8 0.9361488 0.38090722 0.7647133 -0.6656402
## 9 1.3485492 1.19134812 0.3934932 0.4437602
## 10 0.4206484 -0.42953368 0.3934932 0.4437602
round(cor(German[,-1]), digits = 3)
## GDP Privcons GFCF Exports Imports Govsurp Consprix Unemp
## GDP 1.000 0.430 0.601 0.459 0.209 0.620 0.317 -0.363
## Privcons 0.430 1.000 0.295 0.262 0.469 0.033 0.192 0.073
## GFCF 0.601 0.295 1.000 0.192 0.371 0.223 0.336 -0.295
## Exports 0.459 0.262 0.192 1.000 0.641 0.518 0.210 -0.273
## Imports 0.209 0.469 0.371 0.641 1.000 0.085 0.382 -0.087
## Govsurp 0.620 0.033 0.223 0.518 0.085 1.000 0.179 -0.317
## Consprix 0.317 0.192 0.336 0.210 0.382 0.179 1.000 0.063
## Unemp -0.363 0.073 -0.295 -0.273 -0.087 -0.317 0.063 1.000
Exercise:
- Check correlation between GDP and GFCF
cov(German$GDP, German$GFCF); sd(German$GDP); sd(German$GFCF)
cov(German$GDP, German$GFCF) / (sd(German$GDP) * sd(German$GFCF))
library("PerformanceAnalytics")
my_data <- German[,-1]
chart.Correlation(R = my_data, histogram=TRUE, pch=19)
Summary of Dulles Excel
Dulles <- readxl::read_xlsx(path = "data/Dulles_2.xlsx")
summary(Dulles)
## Year Passengers (000s)
## Length:53 Min. : 640.5
## Class :character 1st Qu.: 2083.1
## Mode :character Median : 8946.6
## Mean : 8455.9
## 3rd Qu.:14393.0
## Max. :22129.0
Time series plot of Dulles
library(ggplot2); library(ggfortify); library(zoo)
Dulles2 <- ts(data = Dulles$`Passengers (000s)`, start = as.yearmon("1963-12"), frequency = 1)
autoplot(Dulles2)
A simple way to view a single (or “first order”) difference is to see it as \(Y_t - Y_{t-k}\)
Time Series Plot for the First Differences of the Dulles Passenger Series
# How to use "diff" function
diff(1:10, 2, 1)
## [1] 2 2 2 2 2 2 2 2
x <- c(1,4,10,20,35)
diff(x, lag = 2)
## [1] 9 16 25
diff(x, differences = 2)
## [1] 3 4 5
Diff_1 <- diff(Dulles2, differences = 1)
autoplot(Diff_1, xlab='Time', colour="blue")
Time Series Plot for the Growth Rate for the Dulles Passenger Series
Growth_1 <- 100* Diff_1 / Dulles2
autoplot(Growth_1, xlab='Time', colour="blue")
The Log Transform
Log_Dulles <- log(Dulles2)
autoplot(Log_Dulles, xlab='Time', colour="blue")
The (first) difference in logarithms
Diff_Log <- diff(Log_Dulles, differences = 1)
autoplot(Diff_Log, xlab='Time', colour="blue")
The E in PIVASE
“The thrill of winning is much less than the agony of defeat.” — Michigan Football Coach [paraphrased]
This process is known as rolling origin forecasting. The one-step-ahead forecast error at time \(t + i\) may be denoted by \[e_{t+i}=Y_{t+i} - F_{t+i}\]
DC_w <- readxl::read_xlsx(path = "data/DC_weather_3.xlsx")
summary(DC_w)
## Date Forecast Actual
## Min. :2016-06-01 00:00:00 Min. :74.00 Min. :73.00
## 1st Qu.:2016-06-09 12:00:00 1st Qu.:82.00 1st Qu.:82.00
## Median :2016-06-18 00:00:00 Median :84.50 Median :85.00
## Mean :2016-06-18 00:00:00 Mean :84.26 Mean :84.65
## 3rd Qu.:2016-06-26 12:00:00 3rd Qu.:86.75 3rd Qu.:87.75
## Max. :2016-07-05 00:00:00 Max. :94.00 Max. :94.00
## NA's :1 NA's :1
# Draw time series
plot(x = DC_w$Date, y = DC_w$Forecast, xlab='Time', ylab="Forecast", type="l", col="blue")
lines(x = DC_w$Date, y = DC_w$Actual, xlab='Time', ylab="Actual", col="red")
# Calculate the forecast error
Actual1 <- DC_w$Actual[c(-1)]
Forecast1 <- DC_w$Forecast[-dim(DC_w)[1]]
error <- Actual1 - Forecast1
# Mean error
mean(error)
## [1] 0.3823529
# Mean percentage error:
mean(error/Actual1)
## [1] 0.003828892
# Mean absolute error
mean(abs(error))
## [1] 1.911765
library(DescTools)
MAE(Forecast1, Actual1)
## [1] 1.911765
# Mean absolute percentage error
mean(abs(error)/DC_w$Actual[c(-1)])
## [1] 0.02244587
MAPE(Forecast1, Actual1)
## [1] 0.02244587
# Mean square error
mean(error^2)
## [1] 6.970588
MSE(Forecast1, Actual1)
## [1] 6.970588
RMSE(Forecast1, Actual1)
## [1] 2.640187
sqrt(mean(error^2))
## [1] 2.640187