Numerical Methods¶

Lecture 12: Statistical models¶

Outline¶

Extrapolation
Generating correlated data
Ordinary Least Square
Autoregression and friends

In [1]:

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline  
plt.rcParams['figure.dpi'] = 100
plt.rcParams['figure.figsize'] = [10,5]
plt.rcParams['axes.grid'] = True
import pandas as pd
import pandas_datareader.data as web
import datetime

import pprint as pp
import scipy as sc
import numpy as np
import scipy.linalg as lg  # SciPy Linear Algebra Library
import scipy.sparse as sp 
import scipy.sparse.linalg as sl
from IPython.display import HTML
from matplotlib import animation, rc

"Oh, honestly, Ron. If you ask me, Divination's a woolly discipline."

Extrapolation method¶

Extrapolation is usually a very risky tool
It depends ion the method which is used and on properties of the function.
Lagrange polynomials quickly diverge especially for non-polynomial functions.
General rule of thumb: if the interpolation works, extrapolation can be used with coution for one more node.
Richardson extrapolation usually works but its applications are limitted $x=h, h/2, h/4, h/8\to 0$
A bit more on extrapolation

In [2]:

def l(n, x, Xn):
    res = 1
    for (k, xn) in enumerate(Xn):
        if k!=n:
            res *= (x-Xn[k])/(Xn[n]-Xn[k])
    return res

def show_interpolation_example(F, N=11, my_ylims=()):
    x = np.linspace(-1,1.5, 1000)
    X = np.linspace(-1,1, N)
    Y = F(X)
    y = F(x)
    res = 0*x
    for n in range(N):
        res += l(n, x, X)*Y[n]
    plt.grid(True)
    plt.xlabel("$x$")
    plt.ylabel("$y$")
    plt.ylim(*my_ylims)
    plt.plot(x, y)
    plt.plot(x[x<=1], res[x<=1])    
    plt.plot(X, Y, "o")    
    plt.plot(x[x>1], res[x>1], linestyle='dashed')
    plt.legend(['$f(x)$','$f_{aprx}(x)$','col. p.', 'extrapolation'])
    plt.show()
    return { 'max_error': np.max(np.abs(y-res))}

In [3]:

def f(x): return 1/(25*x**2+1)
plt.title("Lagrange interpolation")
show_interpolation_example(f, N=11, my_ylims=(-6, 2))

Out[3]:

{'max_error': 3813.8276479129663}

In [4]:

def f(x): return 1/(25*(x+1.2)**2+1)
plt.title("Lagrange interpolation")
show_interpolation_example(f, N=11)

Out[4]:

{'max_error': 0.41060566040601904}

In [5]:

def f(x): return 1/(25*(x+1.2)**2+1) + 1e-2*np.random.random(len(x))
plt.title("Lagrange interpolation with noisy data")
show_interpolation_example(f, N=11, my_ylims=(-0.5, 0.5))

Out[5]:

{'max_error': 112.34458879852396}

Exact interpolation/extrapolation is very stiff.
Especially with noisy data
If the model is known it is better to use smaller number of degrees of freedom and use reggression instead
The simplest case - stationary series (constant mean and variance, for example log-returns for short times)
We can also make use of statistical properties of the model (or the noise) and generate random data with the same properties.

One dimensional case (we have a single time-dependent variable $X_t$ - stock prices).

If the returns

$$R_t=\frac{X_t-X_{t-1}}{X_{t-1}}$$

have the stationary propeperty with

mean $\bar R=E(R) = \textrm{const}$ and
variance $\textrm{var}(R)=\sigma^2= E\left(R^2-E(R)^2\right)=\textrm{const}$

Than we can generate the series for the future $t=0$ - the end of recorded data, the begining of the generated data, where we can expect that

$$E(X_{t>0}) = X_0 + \bar R t$$

with uncertainty

$$\sqrt{\textrm{var}(X_{t>0})} = \sigma\sqrt{t}$$

Or in the simplest scheme for small time step $dt$

$$X_{t+dt} = X_t+\bar R dt + \sigma\sqrt{dt}\epsilon_t$$

where $\epsilon_t$ is a random number from normal(?) distribution

Usually the log-returns $\log (1+R_t)$ are noramally distributed
but $\log(1+R_t)\approx R_t$
However, people have tendecies not to trade for small changes, hence usually higher peak around $R_t=0$
Large jumps happen also more often than the gaussian distrubution would allow (black swans, freak waves or outliers)

What about multidimensional data, when $X$ and $R$ are vectors like we have for a portfolio?

Instead of a variance we have a covariance matrix

$$\Sigma_{ij}=\Sigma_{ji} = E\left[(R_i-E(R_i))\left(R_j-E(R_j\right)\right]$$

which is semipositively defined $x^T\Sigma x\geq 0$ for all real vectors $x$.
We need to generate numbers which are correlated.

Suppose that a vector $Y$ have elements which are independent and drown from formal distribution $E[Y]=0$

$$ E\left[Y_iY_j\right] = \delta_{ij}$$

For transformed variable $R' = L Y$ we have

$$E\left[R'_iR'_j\right]=E\left[L_{ik}Y_kL_{jl}Y_l\right]=L_{ik}L_{jl}E\left[Y_kY_l\right] = L_{ik}L_{jl}\delta_{kl}=L_{ik}L_{jk}$$

Or we can also write

$$\Sigma=LL^T$$

In order to generate correlated data

decompose covariance matrix $\Sigma=LL^T$ using Cholesky decomposition
choose a random vector $Y$
multiply the vector with a tridiagonal matrix and shift to obtain the correct expected value

$$R = LY+E(R)$$

A similar but more computantionally expensive method is to construct an eigenportfolio. Diagonalyzing $\Sigma$ we obtain

$$\Sigma = O^T\Sigma'O\qquad \text{where}\qquad O^TO=OO^T=\mathbb{1}$$

and $\Sigma'$ is a diagonal matrix with eigenvalues (eigen-variances) on the diagonal.

Verctor $R$ can be decomposed as

$$R=\sum_{i}\alpha_iv_i$$

Since $v_i^Tv_j=\delta_{ij}$ there is no correlation between $v_j$ and $v_j$ when $i\neq j$ so $\alpha_i$ can be drown independently.

Ordinary Least Square Method / Linear reggression¶

In general we have some function

$$y_{i}=\beta _{1}x_{i1}+\beta _{2}x_{i2}+\cdots +\beta _{p}x_{ip}+\epsilon _{i}$$

or in a matrix form

$$y=X\beta +\epsilon$$

where $\beta_p$ are unknown parameters and $x_{ip}$ some observed quantities (not necessarily simple values, but can be functions of some independent variables or weighted values etc.)

Mathematical properties are well understood only for linear fnctions in $\beta$.

But still there are a lot of questions that need to be answered:

what $x_{i}$'s should we choose:
- prices
- returns
- log-returns
- prices squered?
how many parameters should we choose?
what if $\epsilon$'s are not normally distributted?
what if the model is not linear?

How to find the parameters:
Two uses of regression models:

to learn something about the time series as it is (more academic)
- test hypothesys
- exact estimator but could have large variance
to forcasting
- small variance but could be slightly inaccurate

In [6]:

x = np.linspace(-3,3, 1000)
plt.plot(x, np.exp(-(x)**2/0.7), label = 'more accurate')
plt.plot(x, np.exp(-(x-0.2)**2/0.01)*1.2, label='biased but better preditions')
plt.axvline(x=0, color='black', linestyle = 'dashed')
plt.legend()

Out[6]:

<matplotlib.legend.Legend at 0x7fe3452a39b0>

Ordinary least square (OLS)¶

OLS is BLUE (Best Linear Unbiased Estimator)

Usually we look for minimum of the $\chi^2$ construct function

$$\chi^2(\beta) = ||y-X\beta||^2=\sum_i\epsilon_i^2$$

Usually $X$ is a matrix $N\times M$ where $N>M$.
why not $||\cdot||=\min$ (MAD estimator) or $||\cdot||^7=\min$ or $\log||\cdot||=\min$?
$||\cdot||^2$ is BLUE

Gauss-Markov theorem:¶

OLS is BLUE if

$E[\epsilon_i] = 0$
errors are homoscedastic ie they all have the same variance $\textrm{var}(\epsilon_i)=\sigma^2<\infty$
errors are not correlated $\textrm{cov}(\epsilon_i, \epsilon_j)=0$ for $i\neq j$
$x$ variable is exogenous $\textrm{cov}(\epsilon_i, x_{jp})=0$
$x$'s are not multicollinear
$x$'s variate with $i$
...

OLS may work bad if one of the assumptions is not met

In [7]:

def homoskdascicity():
    plt.figure(figsize=(10, 3))
    plt.subplot(1, 2,  1)
    x = np.linspace(0, 10, 1000)
    y = 10+0.4*x
    sig =1+0.1*x**2
    y1 = y + sig
    y2 = y-sig
    plt.title('scale effect')
    plt.plot(x, y, 'black', linestyle='dashed')
    plt.plot(x, y1, 'blue')
    plt.plot(x, y2, 'blue')
    plt.fill_between(x, y1, y2, alpha=0.25)
    plt.xlabel('$x$')
    plt.ylabel('$y$')    
    
    plt.subplot(1, 2,  2)    
    sig =5*np.exp(-0.2*x)
    y1 = y + sig
    y2 = y-sig
    plt.title('learning effect')
    plt.plot(x, y, 'black', linestyle='dashed')
    plt.plot(x, y1, 'green')
    plt.plot(x, y2, 'green')
    plt.fill_between(x, y1, y2, alpha=0.15, color='green')
    plt.xlabel('$x$')
    plt.ylabel('$y$')    
    plt.plot()

In [8]:

homoskdascicity()

In that case use the weights $1/\sigma_i$

In [9]:

def error_correlation():
    plt.figure(figsize=(10, 3))
    plt.subplot(1, 2,  1)
    x = np.linspace(0, 10, 1000)
    y = 10+0.4*x
    plt.plot(x, y, 'black', linestyle='dashed')
    x1 = np.linspace(0, 10, 50)
    errors = np.random.normal(size=len(x1), scale = 0.2)
    y1 = 10+0.4*x1 + errors
    plt.scatter(x1, y1,  s=10)
    plt.title('uncorrelated errors')
    plt.xlabel('$x$')
    plt.ylabel('$y$')    

        
    
    plt.subplot(1, 2,  2)
    plt.plot(x, y, 'black', linestyle='dashed')
    x1 = np.linspace(0, 10, 50)
    #errors = np.random.normal(size=len(x1), scale = 0.2)
    errors = np.zeros_like(x1)
    errors[0] = np.random.normal(scale = 1)
    for n in range(1, 50):
        errors[n] = 0.8*errors[n-1] + np.random.normal(scale = 0.1)
    y1 = 10+0.4*x1 + errors
    
    plt.scatter(x1, y1, color = 'red',  s=10)
    plt.title('correlated errors')
    plt.xlabel('$x$')
    plt.ylabel('$y$')    
    plt.plot()

In [10]:

error_correlation()

Biased vs unbiased¶

Suppose we have a model $$y_i = Ax_i + \epsilon_i$$ when the actual relation is $y=Ax+B$

OLS would give us the best unbiased $E[\epsilon]=0$ fit but with large variance.
But perhaps some other, less restrictive method could give $E[\epsilon]=B$ with much smaller variance.
Even worse if we choose $y_i = B + \epsilon_i$, than even the variance $\textrm{var}(\epsilon)$ would depend on $i$ - false heteroscediscity.

In [11]:

import statsmodels.api as sm

""" Preparing some example data """
nsample = 10000
x = np.linspace(0, 2, nsample)
y_exact = 0.31416 + 0.27182*x
y = y_exact + np.random.normal(size=len(x), scale=0.15)

In [28]:

""" Preparing toregression """

X = np.ones_like(x)
X = np.column_stack((X, x))  # Two vectors X ~ [1, x]

model = sm.OLS(y, X)
results = model.fit()

In [13]:

results.summary()

Out[13]:

OLS Regression Results
Dep. Variable:	y	R-squared:	0.530
Model:	OLS	Adj. R-squared:	0.530
Method:	Least Squares	F-statistic:	1.127e+04
Date:	Thu, 23 Jan 2020	Prob (F-statistic):	0.00
Time:	19:40:11	Log-Likelihood:	4801.4
No. Observations:	10000	AIC:	-9599.
Df Residuals:	9998	BIC:	-9584.
Df Model:	1
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[0.025	0.975]
const	0.3099	0.003	103.488	0.000	0.304	0.316
x1	0.2753	0.003	106.161	0.000	0.270	0.280

Omnibus:	1.377	Durbin-Watson:	1.996
Prob(Omnibus):	0.502	Jarque-Bera (JB):	1.342
Skew:	0.018	Prob(JB):	0.511
Kurtosis:	3.044	Cond. No.	3.78

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

In [14]:

def show_reg1(x, y, y_exact, y_fit):
    plt.figure(figsize=(10, 6))
    plt.subplot(2, 2,  3)
    
    plt.errorbar(x, y_exact, [0*x, y-y_exact], alpha=0.1, color='orange')
    plt.plot(x, y_exact, 'black')
    plt.plot(x, y_fit, 'red', ls='dashed')
    
    plt.subplot(2, 2,  (1,2))
    plt.errorbar(x, 0*x, [0*x, y-y_fit], alpha=0.1, color='green')
    
    plt.subplot(2, 2,  4)
    plt.hist(y-y_fit, np.linspace(-1, 1, 50), density=1)

In [15]:

𝛽 = results.params
# y_fit = 𝛽[0]*X[:, 0] + 𝛽[1]*X[:, 1]  # or better
y_fit = np.dot(X, 𝛽)
show_reg1(x, y, y_exact, y_fit)

In [16]:

X2 = x  #  only one vector X2 = [x]
model2 = sm.OLS(y, X2)
results2 = model2.fit()
results2.summary()

Out[16]:

OLS Regression Results
Dep. Variable:	y	R-squared (uncentered):	0.881
Model:	OLS	Adj. R-squared (uncentered):	0.881
Method:	Least Squares	F-statistic:	7.402e+04
Date:	Thu, 23 Jan 2020	Prob (F-statistic):	0.00
Time:	19:40:12	Log-Likelihood:	1160.7
No. Observations:	10000	AIC:	-2319.
Df Residuals:	9999	BIC:	-2312.
Df Model:	1
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[0.025	0.975]
x1	0.5077	0.002	272.070	0.000	0.504	0.511

Omnibus:	35.515	Durbin-Watson:	0.964
Prob(Omnibus):	0.000	Jarque-Bera (JB):	27.874
Skew:	-0.036	Prob(JB):	8.85e-07
Kurtosis:	2.752	Cond. No.	1.00

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

In [17]:

b2 = results2.params
y_fit2 = b2[0]*X2
show_reg1(x, y, y_exact, y_fit2)

In [27]:

X3 = np.ones_like(x)  #  only one vector X3 = [1]
model3 = sm.OLS(y, X3)
results3 = model3.fit()
b3 = results3.params
y_fit3 = b3[0]*X3

results3.summary()

Out[27]:

OLS Regression Results
Dep. Variable:	y	R-squared:	0.000
Model:	OLS	Adj. R-squared:	0.000
Method:	Least Squares	F-statistic:	inf
Date:	Thu, 23 Jan 2020	Prob (F-statistic):	nan
Time:	19:40:42	Log-Likelihood:	1027.3
No. Observations:	10000	AIC:	-2053.
Df Residuals:	9999	BIC:	-2045.
Df Model:	0
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[0.025	0.975]
const	0.5851	0.002	267.972	0.000	0.581	0.589

Omnibus:	66.201	Durbin-Watson:	0.938
Prob(Omnibus):	0.000	Jarque-Bera (JB):	45.852
Skew:	0.003	Prob(JB):	1.11e-10
Kurtosis:	2.668	Cond. No.	1.00

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

In [19]:

show_reg1(x, y, y_exact, y_fit3)

In [20]:

X4 =  np.column_stack((X, x**2))  # Overfitting with three vectors X ~ [1, x, x**2]  
model4 = sm.OLS(y, X4)
results4 = model4.fit()
#results4.summary()

Finding $\beta$'s $$\chi^2(\beta) = \beta^TX^TX\beta - 2\beta^TX^Ty +y^Ty$$

$$\frac{\partial \chi^2}{\partial \beta}=2X^TX\beta - 2X^Ty = 0$$

best fitting $\beta$:

$$\beta = (X^TX)^{-1}X^Ty$$

$X^+(X^TX)^{-1}X^T$ is a Moore–Penrose pseudoinverse for max-rank non-square matrices

If $X^TX$ is singular or almost singular SVD decomposition should be used instead
SVD decomposition:

$$X = U\Sigma V^*$$

where

$X$ - $M\times N$ arbitrary complex matrix,
$U$ - $M\times M$ unitary or orthogonal matrix
$\Sigma$ - $M\times N$ "diaginal" matrix
$V^*$ - $N\times N$ unitary or orthogonal matrix

Zero diagonal elements of $\Sigma$ correspond to the null space of $X$.

Pseudo-inverse

$$X^+ = V\Sigma ^+U^*$$$$\left.XX^+ = U\Sigma V^* V\Sigma ^+U^* = U\Sigma\Sigma ^+U^* =\mathbb{1}\right|_{range}$$

if $$\Sigma = \begin{pmatrix} \lambda_1 \\ & \lambda_2 \\ & & 0 \\ &\cdots\\ \end{pmatrix}\quad\textrm{then}\quad \Sigma^+ = \begin{pmatrix} \lambda_1^{-1} \\ & \lambda_2^{-2} &&\vdots \\ & & 0 \\ \end{pmatrix}$$

In principle we could also minimize

$$F(\beta) = ||y-f(x; \beta)||^2$$

where $f$ is an arbitrary (even nonlinear) function in $\beta$. But then the solution may not be unique nor easy to find.
The errors of $\beta$ may also have bad statistical properties.

Example: $$f(x;y_0, A, x_0, \sigma) = y_0+Ae^{-\frac{(x-x_0)^2}{2\sigma^2}}$$

The above function cannot be written in terms of linear regression.

However sume of exponents with unknown (complex even) frequencies

$$y(t_i) = \sum A_ne^{i \omega_n t_i}$$

can be solved using linear algebra after finding zeros of some polynomal polynomial. It can be applied for uniform sampling $t_i = idt$ (Prony's method)
For noisy data the original Prony's method perform badly, but usage of SVD can deal with the noise.
Applications:

resonances in quantum scattering
quasinormal modes (for example black holes collisions and LIGO)
seasonal data

Akaike criterion:¶

Information criterion

$$AIC = 2k-2\log(L)$$

where $L$ likehood function, $k$ - number of parameters

Relative likelihood:

$$r_{12} = \exp\left(\frac{AIC_1-AIC_2}{2}\right)\qquad AIC_1<AIC_2$$

In [21]:

idx = ['$\large \\beta_0 + \\beta_1x$', '$\large \\beta_0 x$', ' $\large \\beta_0 $', ' $\large \\beta_0 + \\beta_1x + \\beta_2 x^2$']
data = np.array([results.aic, results2.aic, results3.aic, results4.aic])
data = np.column_stack((data, np.exp(0.5*(data[0]-data))))
pd.DataFrame(data=data, index=idx, columns=['AIC', 'rel likehood'])

Out[21]:

	AIC	rel likehood
$\large \beta_0 + \beta_1x$	-9598.749646	1.000000
$\large \beta_0 x$	-2319.481421	0.000000
$\large \beta_0 $	-2052.505505	0.000000
$\large \beta_0 + \beta_1x + \beta_2 x^2$	-9602.665141	7.083357

Model $y=\beta_0+\beta_1x$ is $10^{1600}$ times more likely than $y=\beta_0$ or $y=\beta_0x$
but model $y=\beta_0+\beta_1x+\beta_2x^2$ is 37% likely

Autoregressive models¶

$AR(p)$ - autoregressive model of a order $p$ is a model where the value at time $t$ is a liner function of preivious $p$ values:

$$y_t = c+\beta_1y_{t-1}+\beta_2y_{t-2} + \beta_3y_{t-3} + \cdots \beta_py_{t-p} + \epsilon_t$$

Often the backshift or lag operator $B_sy_t = y_{t-s}$is used then $AR(p)$ model can be written in a more concise form

$$B_0y_t=c+\sum_{i=1}^p\beta_iB_iy_t+\epsilon_t$$

or even

$$\beta(B)y_t=c+\epsilon_t$$

The last expression can be used to test if the residuals have apprpriate statistical properties i.e. $\epsilon_t$ is a white noise

In [22]:

def generate_AR_series(𝛽, c, scale=0.1,  nsamples=200):
    y = np.empty(nsamples)    
    y[0] = 0
    for n in range(1, nsamples):
        y[n] = c + 𝛽*y[n-1] + np.random.normal(scale = scale)
    return (np.arange(nsamples), y)

def show_AR_examples():
    for n, parameters in enumerate([ (0, 0, 0.1), (0, 1, 0.1), (0.03, 1, 0.1), (0.1, 0.9, 0.) ]):
        plt.subplot(2,2,n+1)
        c, 𝛽, scale  = (parameters)
        plt.plot( *generate_AR_series(𝛽, c, scale), f'C{n}', label=f'$c$ = {c}, $\\beta_1=${𝛽}, $\\sigma={scale}$')
        plt.legend()

In [23]:

show_AR_examples()

In the case of $AR(1)$ model

$\beta_1=0$ then $y_t$ is a purely white noise
$\beta_1=1$ and $c=0$ then $y_t$ is a random walk
$\beta_1=1$ and $c\neq0$ then $y_t$ is a random walk with a drift
$\epsilon_t=0$ than $$y_*=\frac{c}{1-\beta_1}$$ is a fixed point.
For small values of $\phi$ this is also true in the mean sense even if $\epsilon_t\neq 0$. Since $$y_t-y_{t-1} = \beta_1(y_{t-1}-y_{t-2})$$ $y_*$ is a stable point when $|\beta_1|<1$ and the series is stationary.

Shock proagation¶

In AR models any shock $\epsilon_t$ implicitly affects all the future results

$$y_t = c+\beta_1y_{t-1}+\epsilon_t = c+\beta_1(c+\beta_1y_{t-2}+\color{red}{\epsilon_{t-1}})+\epsilon_t = \ldots = F(\epsilon_t, \epsilon_{t-1}, \ldots, \epsilon_0)$$

Moving average MA(q) model¶

$$y_t = c+\epsilon_t+\sum_{i=1}^q\theta_i\epsilon_{t-i}$$

limit the shocks only to last $q$ steps

In [24]:

def show_ma_example():
    nsamples = 20000+3
    eps = np.random.normal(size=nsamples)
    y = 1 + eps[3:] - 0.9*eps[2:-1] + 0.*eps[1:-2] - 0.8*eps[0:-3]
    n = np.arange(nsamples-3)
    
    #plt.figure(figsize=(10, 3))
    plt.subplot(2, 2,  1)    
    plt.plot(n, y)
    for n in range(1, 4):
        plt.subplot(2, 2,  n+1)        
        plt.xlabel('n')
        plt.ylabel(f'n-{n}')
        plt.scatter(y[0:-n], y[n:], s=10, alpha = 0.01, color=f'C{n}')

In [25]:

show_ma_example()

In [26]:

#np.cov()
A = np.array([y[0:-3], y[1:-2], y[2:-1], y[3:]])
np.corrcoef(A)

Out[26]:

array([[1.        , 0.53033413, 0.53200663, 0.52811241],
       [0.53033413, 1.        , 0.53035447, 0.5322098 ],
       [0.53200663, 0.53035447, 1.        , 0.53044532],
       [0.52811241, 0.5322098 , 0.53044532, 1.        ]])

ARMA(p, q) model = AR(p) + MA(q)¶

$$y_t = c + \epsilon_t + \sum_{i=1}^p\beta_i y_{t-i} +\sum_{i=1}^q\theta_i\epsilon_{t-i}$$

Koyck model:¶

High $AR(p)$ models are very impractical. Difficult to obtain the parameters (OLS reduces many degrees of freedom) and still are finite.

Kock Distribited Lag model - infinitely lagged model with geometric parameters:

$$y_t = c+\beta_0\sum_k\lambda^kx_{t-k}+\epsilon_k$$

We need to obtain only three parameters.

The model works for $0<\lambda<1$

$$y_t = c+\beta_0\sum_k\lambda^kx_{t-k}+\epsilon_k$$$$y_{t-1} = c+\beta_0\sum_k\lambda^kx_{t-k-1}+\epsilon_{k-1}$$

Multiplying the second equation by $\lambda$ and puttng into the first one we obtain:

$$y_t=(1-\lambda)c + \beta_0x_t+\lambda y_{t-1}+\xi_t$$

where $\xi_t=\epsilon_t-\lambda\epsilon_{t-1}$ is a composite error term.

errors are now correlated!
$y_{t-1}$ is also correlated with $v_t$
regressor $y_{t-1}$ is stochastic (!)

Autoregressive Distributed Lag Model¶

$$y_t = \alpha_0+\alpha_1y_{t-1}+\alpha_2y_{t-2} + \alpha_3y_{t-3} + \cdots \alpha_py_{t-p} + \epsilon_t+\\ +\beta_1x_{t-1}+\beta_2x_{t-2} + \beta_3x_{t-3} + \cdots \beta_qx_{t-q} + \epsilon_t$$

All these models have their generalizations to multivariables $y_i, x_i, \epsilon_i$ are vectors. Which can be correlated etc.

Nonlinear models also have different kind of combinations such as $$\sum_i\sum_j y_{t-i}y_{t-j}$$ $$\sum_i\sum_j y_{t-i}x_{t-j}$$ $$\sum_i\sum_j y_{t-i}\epsilon_{t-j}$$ $$\sum_i\sum_j\sum_k y_{t-i}y_{t-j}y_{t-k}$$

etc

In [ ]: