Give me your reasons.
Why?
Gas_prices <- readxl::read_xlsx(path = "data/Gas_prices_1.xlsx")
head(Gas_prices)
tail(Gas_prices)
if (!require("forecast")){install.packages("forecast")}
L1_Crude_price <- Gas_prices$Crude_price[-1]
Unleaded <- Gas_prices$Unleaded[-dim(Gas_prices)[1]]
ts_Unleaded <- ts(Unleaded, start=1996, frequency = 12)
plot(ts_Unleaded) # Nonstationary time-series
seasonplot(ts_Unleaded) # Strong evidence of seasonality
plot(Unleaded~L1_Crude_price)
model1 <- lm(Unleaded ~ L1_Crude_price) # obtain least square estimate
plot(L1_Crude_price,Unleaded,pch=20)
abline(model1, col="red")
summary(model1)
##
## Call:
## lm(formula = Unleaded ~ L1_Crude_price)
##
## Residuals:
## Min 1Q Median 3Q Max
## -78.109 -14.510 -5.058 11.189 90.272
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 67.17064 3.43101 19.58 <2e-16 ***
## L1_Crude_price 2.84481 0.05435 52.34 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 26.13 on 237 degrees of freedom
## Multiple R-squared: 0.9204, Adjusted R-squared: 0.92
## F-statistic: 2740 on 1 and 237 DF, p-value: < 2.2e-16
Merge1 <- cbind(Unleaded, Fitted=model1$fitted.values)
rownames(Merge1) <- Gas_prices$Date[-dim(Gas_prices)[1]]
head(Merge1)
## Unleaded Fitted
## 15-JAN-1996 109.0 121.4780
## 15-FEB-1996 108.9 127.8504
## 15-MAR-1996 113.7 134.0236
## 15-APR-1996 123.1 127.3952
## 15-MAY-1996 127.9 125.2616
## 15-JUN-1996 125.6 127.7650
plot(L1_Crude_price, Unleaded, pch=20, ylim = c(90,450))
points(L1_Crude_price, model1$fitted.values, pch=4)
abline(model1, col="red")
Merge2 <- cbind(Merge1, Residuals=model1$residuals)
head(Merge2)
## Unleaded Fitted Residuals
## 15-JAN-1996 109.0 121.4780 -12.478008
## 15-FEB-1996 108.9 127.8504 -18.950376
## 15-MAR-1996 113.7 134.0236 -20.323607
## 15-APR-1996 123.1 127.3952 -4.295207
## 15-MAY-1996 127.9 125.2616 2.638398
## 15-JUN-1996 125.6 127.7650 -2.165032
plot(L1_Crude_price, Unleaded, pch=20, ylim = c(90,450))
points(L1_Crude_price, model1$fitted.values, pch=4)
for (i in 1:dim(Gas_prices)[1])
{
lines(c(L1_Crude_price[i], L1_Crude_price[i]),
c(model1$fitted.values[i],Unleaded[i]), col="red")
}
For the gas price data set, we have built a simple linear regression between the gasoline price y and lagged crude oil price. However, how strong is the relationship between x and y? To answer this question, we need coefficient of determination, \(R^2\).
\[y_i-\bar{y} = (y_i - \hat{y}_i) + (\hat{y}_i -\bar{y})\]
\[\underbrace{\sum_{i=1}^{n} (y_i-\bar{y})^2}_{SST} = \underbrace{\sum_{i=1}^{n} (y_i - \hat{y}_i)^2}_{SSE} + \underbrace{\sum_{i=1}^{n} (\hat{y}_i -\bar{y})^2}_{SSR}\]
Here is a figure illurstrating this decomposition.