Time series data: observations taken on a variable over a sequence of time points, e.g., daily stock price, daily stock return, GDP by year, hourly blood pressure, hourly temperature. Time series data can be considered as one realization (one trajectory) generated from a time series model.
Time series model (i.e., stochastic process): Let Zt be a sequence of random variables where t=1,2,3,...,n represents its index often considered as time point, then {Zt,t=1,2,...,n} is called a stochastic process.
The sequence of actual observations from this sequence of random variables are written as z1,z2,...,zn. This realization is the time series data.
Z1Z2Z3...Zn↓↓↓↓↓z1z2z3...zn For example, we simulate one trajectory from a time series model (Zt=0.8Zt−1+at, explained later), we observe the following.
Hypothetically, if multiple trajectories are generated from the same time series model, we could observe
In real world situations, unfortunately, we only observe one trajectory generated from the time series model. The goal is to use this one trajectory to infer about the probabilistic behavior of the time series model.
Discussion:
Since we only observe one trajectory, can we truly have probabilistic beharior of the underlying model?
Yes? Reason?
No? Reason?
The probabilistic behavior of a time series model essentially means the probabilistic behavior of the vector of random variables (Z1,Z2,Z3,...Zn)={Zt}nt=1, which is just its joint probability distribution. To specify the probability distribution of (Z1,Z2,Z3,...Zn), it is often sufficient to specify the probability distribution of some subsets of (Z1,Z2,Z3,...Zn), for example, (Z1,Z2).
For example, the distribution of Z1 is called the 1st order marginal distribution of Z1, fZ1(z); the joint distribution of Z2 and Z3 is called 2nd order marginal distribution of Z2 and Z3, fZ2,Z3(z2,z3); so on and so forth. Note that the n-th order marginal distribution, fZ1,...,Zn(z1,...,zn), essentially captures the entire probabilistic behavior of (Z1,Z2,Z3,...Zn).
To visualize these distributions, we have simulate 500 trajectories below. Suppose we want to plot the 1st order marginal distribution of Z3 and the 2nd order marginal distribution, fZ5,Z6(z5,z6), and the 2nd order marginal distribution, fZ10,Z17(z10,z17).
par(mfrow=c(1,3))
hist(ts_mat[3,],xlab="Z_3",ylab="distribution",col="blue",breaks=20,main="dist of Z_3")
plot(ts_mat[5,],ts_mat[6,],xlab="Z_5",ylab="Z_6",col="red",pch=20,main=paste("corr=",round(cor(ts_mat[5,],ts_mat[6,]),2)))
plot(ts_mat[10,],ts_mat[17,],xlab="Z_10",ylab="Z_17",col="green",pch=20,main=paste("corr=",round(cor(ts_mat[10,],ts_mat[17,]),2)))
Note that, usually, we assume all the joint distributions are multivariate normal distribution. If we do not assume normal distributions, it is almost hopeless to identify all the distributions.
Answer of the above discussion:
The probabilistic beharior of the underlying model does not change!
Exercise:
Quick replicate the above example code to plot the correlation of fZ2,Z3(z2,z3) and fZ6,Z7(z6,z7);
Quick replicate the above example code to plot the correlation of fZ9,Z16(z9,z16) and fZ11,Z18(z11,z18);
For example, fZ1(z)=fZ2(z)=...=fZn(z), fZ1,Z2(x,y)=fZ2,Z3(x,y)=...=fZn−1.Zn(x,y)
This definition means that stochastic behavior of the time series model is invariant over the shift in time, k.
EZt is the same for t=1,2,...,n because EZt=∫+∞−∞zfZt(z)dz=∫+∞−∞zfZt+k(z)dz=EZt+k In other words, you have constant mean.
Var[Zt] is the same for t=1,2,...,n because Var[Zt]=∫+∞−∞(z−EZt)2fZt(z)dz=∫+∞−∞(z−EZt+k)2fZt+k(z)dz=Var[Zt+k]
Covariance matrix of Z1,...,Zn is the same as the covariance of Z1+k,...,Zn+k for all subsets of {1,...,n} and all k=1,2,....
Example: The following case show Z1,Z2,Z3, and k=6.
Cov(Z1,Z2,Z3)=[Var[Z1]Cov(Z1,Z2)Cov(Z1,Z3)Cov(Z2,Z1)Var[Z2]Cov(Z2,Z3)Cov(Z3,Z1)Cov(Z3,Z2)Var[Z3]]=[Var[Z7]Cov(Z7,Z8)Cov(Z7,Z9)Cov(Z8,Z7)Var[Z8]Cov(Z8,Z9)Cov(Z9,Z7)Cov(Z9,Z8)Var[Z9]]=Cov(Z7,Z8,Z9) where Cov(Z1,Z2)=Cov(Z7,Z8) is because Cov(Z1,Z2)=∫+∞−∞∫+∞−∞(x−EZ1)(y−EZ2)fZ1,Z2(x,y)dxdy=∫+∞−∞∫+∞−∞(x−EZ7)(y−EZ8)fZ7,Z8(x,y)dxdy=Cov(Z7,Z8).
Definition A time series model (stochastic process) is said to be weakly stationary if it satisfies 1, 2, and 3 mentioned above (i.e., implications).
EZt is the same for t=1,2,...,n (constant mean).
Var[Zt] is the same for t=1,2,...,n (constant variance).
Covariance matrix of Z1,...,Zn is the same as the covariance of Z1+k,...,Zn+k for all subsets of {1,...,n} and all k=1,2,... (constant covariance matrix over time).
strictly stationary ⇒ weakly stationary
strictly stationary ⇍ weakly stationary
If all the joint distribution of Z_1,...,Z_n is multivariate normal, then strictly stationary \Leftrightarrow weakly stationary. It is because multivariate normal assumption with (1), (2), (3) completely specifies all the joint distributions and their properties.
Mean of a stationary time series model: \begin{align*} \mu = \mathbb{E}Z_t = \int_{-\infty}^{+\infty} z f_{Z_t}(z)dz \end{align*} for any t=1,...,n. It is also called the “level” of the time series model.
Variance of a stationary time series model: \begin{align*} \sigma^2(Z_t) =\sigma^2_{Z_t}= \text{Var}[Z_t] = \int_{-\infty}^{+\infty} (z - \mathbb{E}Z_t)^2 f_{Z_t}(z)dz \end{align*} for any t=1,...,n. It is a measure of fluctuation of time series model around its constant level \mu.
For a stationary time series model, we can usually estimate \mu and \sigma^2 by sample mean and sample variance.
Definition Let Z_t and Z_{t+k} be two random variables belonging to a stationary time series model, separated by a constant time interval k, i.e., lag. The autocovariance of Z_t and Z_{t+k} (or the k-th order autocovariance) is \begin{align*} \gamma_k = \mathbb{E}[(Z_t-\mu)(Z_{t+k}-\mu)]=\int_{-\infty}^{+\infty} \int_{-\infty}^{+\infty} (x - \mathbb{E}Z_t)(y - \mathbb{E}Z_{t+k}) f_{Z_t,Z_{t+k}}(x,y)dxdy \end{align*} for any t=1,...,n.
Note that \gamma_k does not depend on t for a stationary time series model since f_{Z_t,Z_{t+k}}(x,y) does not depend on t (i.e., stationarity).
Note that \gamma_0 = \text{Var}[Z_t].
Definition For a stationary time series model, autocorrelation of Z_t and Z_{t+k} (or k-th order autocorrelation) is \rho_k=\frac{\gamma_k}{\gamma_0} for all t=1,...,n. As a function of k, \rho_k is called autocorrelation function (ACF) and its graph has been called the correlogram. Note that \rho_0=1, \rho_{k}=\rho_{-k}.
Importance of \rho_k: we can identify a suitable model for a time series data by knowing its autocorrelation function or an estimate of this function (explained later)).
Estimation of Autocovariance and ACF: We can use the sample autocovariance \hat{\gamma}_k = \frac{1}{n-k}\sum_{i=k+1}^{n} (z_i-\bar{z})(z_{i-k}-\bar{z}) to estimate the autocovariance \gamma_k, then obtain the sample autocorrelation by \hat{\rho}_k=\frac{\hat{\gamma}_k}{\hat{\gamma}_0} as an estimate of the autocorrelation \rho_k.
par(mfrow=c(1,4))
plot(ts1[1:(n-1)],ts1[2:n],pch=20,
xlab="Z_{t-1}",ylab="Z_t",
xlim=c(min(ts1)-sd(ts1),max(ts1)+sd(ts1)),
ylim=c(min(ts1)-sd(ts1),max(ts1)+sd(ts1)),col="red",
main=paste("sample corr=",round(cor(ts1[1:(n-1)],ts1[2:n]),3)))
plot(ts1[1:(n-3)],ts1[4:n],pch=20,
xlab="Z_{t-3}",ylab="Z_t",
xlim=c(min(ts1)-sd(ts1),max(ts1)+sd(ts1)),
ylim=c(min(ts1)-sd(ts1),max(ts1)+sd(ts1)),col="red",
main=paste("sample corr=",round(cor(ts1[1:(n-3)],ts1[4:n]),3)))
plot(ts1[1:(n-10)],ts1[11:n],pch=20,
xlab="Z_{t-10}",ylab="Z_t",
xlim=c(min(ts1)-sd(ts1),max(ts1)+sd(ts1)),
ylim=c(min(ts1)-sd(ts1),max(ts1)+sd(ts1)),col="red",
main=paste("sample corr=",round(cor(ts1[1:(n-10)],ts1[11:n]),3)))
acf(ts1,lag.max=13)
Definition Partial correlation is the autocorrelation between Z_t and Z_{t+k} after their dependency on the intervening variables Z_{t+1},...,Z_{t+k-1} has been removed. In other words, it is the conditional correlation between Z_t and Z_{t+k} conditional on Z_{t+1},...,Z_{t+k-1}, that is,
\begin{align*} \phi_{kk} = \text{cor}(Z_t,Z_{t+k}|Z_{t+1},...,Z_{t+k-1}) \end{align*} for any t=1,...,n.
Note that we usually don’t talk about \phi_{00} because it doesn’t make sense (what is the correlation between Z_t and Z_t conditional on the random variables between Z_t and Z_t? There is nothing to conditional on.) Also note that \phi_{11}=\rho_1.
Alternative view on PACF (optional) The PACF can be derived and thought in the following way.
Consider a stationary time series model \{Z_t\}_{t=1}^n. First we make the transformation \tilde{Z}_t = Z_t - \mu, ... , \tilde{Z}_{t+k} = Z_{t+k} - \mu. That means \tilde{Z}_t,...,\tilde{Z}_{t+k} all have zero mean. Now regression \tilde{Z}_{t+k} on \tilde{Z}_t,...,\tilde{Z}_{t+k-1}, that is,
\begin{align*} \tilde{Z}_{t+k}=\beta_1 \tilde{Z}_{t+k-1} + ... + \beta_k \tilde{Z}_t + e_{t+k} \end{align*}
Then \beta_k is just the partial autocorrelation of Z_t and Z_{t+k}, i.e., \beta_k=\phi_{kk}. Rewrite the regression as
\begin{align*} \tilde{Z}_{t+k}&=\phi_{k1} \tilde{Z}_{t+k-1} + ... + \phi_{kk} \tilde{Z}_t + e_{t+k}\\ \mathbb{E}\tilde{Z}_{t+k}\tilde{Z}_{t+k-j}&=\phi_{k1} \mathbb{E}\tilde{Z}_{t+k-1}\tilde{Z}_{t+k-j} + ... + \phi_{kk} \mathbb{E}\tilde{Z}_t\tilde{Z}_{t+k-j} + \mathbb{E}e_{t+k}\tilde{Z}_{t+k-j}\\ \gamma_j&= \phi_{k1} \gamma_{j-1} + ... + \phi_{kk} \gamma_{j-k}\\ \rho_j&= \phi_{k1} \rho_{j-1} + ... + \phi_{kk} \rho_{j-k}. \end{align*}
Let j=1,...,k, we have
\begin{align*} \rho_1&= \phi_{k1} \rho_{0} + ... + \phi_{kk} \rho_{k-1}\\ \rho_2&= \phi_{k1} \rho_{1} + ... + \phi_{kk} \rho_{k-2}\\ &...\\ \rho_k&= \phi_{k1} \rho_{k-1} + ... + \phi_{kk} \rho_{0}. \end{align*}
Equivalently, we have \begin{align*} \left[\begin{array} {r} \rho_1 \\ \rho_2 \\ \vdots \\ \rho_k \end{array}\right] = \left[\begin{array} {rrrr} 1 & \rho_1 & ... & \rho_{k-1} \\ \rho_1 & 1 & ... & \rho_{k-2} \\ \vdots&\vdots&\ddots&\vdots\\ \rho_{k-1}&\rho_{k-2}&...&1 \end{array}\right] \left[\begin{array} {r} \phi_{k1} \\ \phi_{k2} \\ \vdots \\ \phi_{kk} \end{array}\right] \end{align*}
We can solve the above equation by
\begin{align*} \left[\begin{array} {r} \phi_{k1} \\ \phi_{k2} \\ \vdots \\ \phi_{kk} \end{array}\right] = \left[\begin{array} {rrrr} 1 & \rho_1 & ... & \rho_{k-1} \\ \rho_1 & 1 & ... & \rho_{k-2} \\ \vdots&\vdots&\ddots&\vdots\\ \rho_{k-1}&\rho_{k-2}&...&1 \end{array}\right]^{-1} \left[\begin{array} {r} \rho_1 \\ \rho_2 \\ \vdots \\ \rho_k \end{array}\right] \end{align*}
This is the connection between PACF and ACF.
Estimation of PACF: We can use the relationship between ACF and PACF (derived in the formula above) to estimate PACF by plugging in \hat{\rho}_k to obtain \hat{\phi}_{kk}.
Definition A time series model is called white noise if it is a sequence of uncorrelated random variables from a fixed distribution with constant zero mean \mathbb{E}[a_t]=0, constant variance \text{Var}[a_t]=\sigma^2_{a_t}, autocovariance \gamma_k = \text{cov}(a_t,a_{t+k})=0 for any k=1,2,.... A white noise is usually denoted as \{a_t\}_{t=1}^n.
The white noise is an important component in time series models because it is a building block for more complex models.
For a white noise \{a_t\}_{t=1}^n.
ACF: \begin{align*} \rho_k= \begin{cases} 1, k=0\\ 0, k\neq 0 \end{cases} \end{align*}
PACF: \begin{align*} \phi_{kk}= \begin{cases} 1, k=0\\ 0, k\neq 0 \end{cases} \end{align*}
In this course, we assume \{a_t\} follows normal distributions. Below is an example of white noise and its sample ACF and PACF.
Backshift operator, B
\begin{align*} BZ_t&=Z_{t-1}\\ B^2Z_t&=B(BZ_t)= BZ_{t-1}=Z_{t-1}=Z_{t-2}\\ B^kZ_t&=Z_{t-k} \end{align*}
Forwardshift operator, F=B^{-1} \begin{align*} FZ_t&=Z_{t+1}\\ FBZ_t&=F(BZ_t)= FZ_{t-1}=Z_{t}\\ F^kZ_t&=Z_{t+k} \end{align*}
In addition, we have Ba_t=a_{t-1} and Fa_t=a_{t+1}.
Difference operator, \nabla=1-B \begin{align*} \nabla Z_t&=(1-B)Z_t=Z_t - BZ_t=Z_t - Z_{t-1}\\ \nabla^2 Z_t&=\nabla (\nabla Z_t)= (1-B)(Z_t - Z_{t-1}) = (Z_t - Z_{t-1}) - (Z_{t-1}-Z_{t-2}) =Z_t - 2 Z_{t-1} + Z_{t-2} \\ \end{align*}
Note that \nabla^2=(1-B)^2=1-2B+B^2
Inverse of difference operator, S=\nabla^{-1} \begin{align*} S Z_t&=\sum_{j=0}^{\infty}Z_{t-j}=Z_t + Z_{t-1} + ...\\ S \nabla Z_t&=S (Z_t - Z_{t-1}) = \sum_{j=0}^{\infty}Z_{t-j} - \sum_{j=0}^{\infty}Z_{t-1-j} =Z_t\\ \end{align*}
Note that \nabla^{-1}=(1-B)^{-1}=1+B+B^2+...=\frac{1-B^{-\infty}}{1-B}=\frac{1}{1-B} where 1+B+B^2+... is a geometric series.