Numerical Methods¶

Lecture 7: Monte Carlo simulations cont.¶

Outline¶

Convergence
Metropolis algorithm
Filling gaps in data
Option prices
Pandas

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline  
plt.rcParams['figure.dpi'] = 120
plt.rcParams['figure.figsize'] = [8,4]
plt.rcParams['axes.grid'] = True
#plt.rcParams.keys()

N = 1000
results = np.empty(N)
for s in range(N):
    Nsamples, k = 1000, 0
    for n in range(Nsamples):
        x = np.random.random(2) # two random numbers
        if np.sum(x**2)<1: k+=1
    results[s] = 4*k/Nsamples

plt.hist(results, bins=np.linspace(3, 3.3, 40), density='True', label='generated dist.')
plt.show()
print("π = {:.6f} ± {:.6f}".format(np.mean(results), np.sqrt(np.var(results/(N-1)))))

π = 3.143228 ± 0.000053

Nsamples  = 10000
n = 1+np.arange(Nsamples)
x = np.random.random((Nsamples,2))
r2 = np.int32((x[:, 0]**2+x[:, 1]**2<1))
result = 4*np.cumsum(r2)/n

plt.loglog(n[10:], np.abs(result[10:]-np.pi), 'ro')
plt.loglog(n[10:], 4/np.sqrt(n[10:]), '--')

[<matplotlib.lines.Line2D at 0x7f30ac436160>]

Normal distribution:

$${\displaystyle {\frac {1}{\sqrt {2\pi \sigma ^{2}}}}e^{-{\frac {(x-\mu )^{2}}{2\sigma ^{2}}}}}$$

Cumulative distribution function

$${\displaystyle {\frac {1}{2}}\left[1+\operatorname {erf} \left({\frac {x-\mu }{\sigma {\sqrt {2}}}}\right)\right]}$$

Quantile (confidence level)

$${\displaystyle \mu +\sigma {\sqrt {2}}\operatorname {erf} ^{-1}(2p-1)}$$

from scipy.special import erfinv, erf
for confidence_level in [0.25, 0.5, 0.75, 0.95, 0.99]:
    b = np.sqrt(2)*erfinv(confidence_level)
    print(f"{100*confidence_level:.0f}%  [μ-{b:.2f}σ, μ+{b:.2f}σ]")

25%  [μ-0.32σ, μ+0.32σ]
50%  [μ-0.67σ, μ+0.67σ]
75%  [μ-1.15σ, μ+1.15σ]
95%  [μ-1.96σ, μ+1.96σ]
99%  [μ-2.58σ, μ+2.58σ]

If single measurement gives $\mu=E(X)$ and $\sigma=\sqrt{var(X)}$
than $M$ measurements would give

$$\sigma_M = \frac{\sigma}{\sqrt{M}}$$

Metropolis Hastings algorithm¶

Allows generation of the unnormalized distribution.

Let f(x) be a desired distibution.

choose an initial number $x$
choose a test number $x'=x+W$, where $W$ - random number from a symmetic distribution (usually gaussian)
if $f(x')>f(x)$ accept $x'$ as a new position
if $f(x')<f(x)$ accept $x'$ as the new position only if $$\frac{f(x')}{f(x)}>p$$ where $p\in [0,1)$ - a random number from a uniform distribution

def f(x):
    return 0.5*np.exp(-x**2/0.25)+np.exp(-(x-2)**2/0.04)

x = np.linspace(-5,5, 1000)
plt.plot(x, f(x), 'black')

[<matplotlib.lines.Line2D at 0x7f309fe6ce48>]

def metropolis_step(x, y, target_dist, sigma):
    x_test = x + np.random.normal(0, sigma)
    y_test = target_dist(x_test)
    if y_test >  y :
        return (x_test, y_test)
    else:
        p = np.random.random()
        if p < y_test/y:
            return (x_test, y_test)
        else: return (x, y)

NSamples = 1000        
x = np.zeros(Nsamples)
x[0] = 0.5
y = f(x[0])
for n in range(x.size-1):
    x[n+1], y = metropolis_step(x[n], y, f, 0.8)

plt.hist(x, bins=np.linspace(-4, 4, 100), density='True', label='generated dist.')
plt.show()

It is useful to calculate averages such as

$$\frac{\int x^2f(x)\,dx}{\int f(x)dx}$$

print("{:.6f} ± {:.6f}".format(np.mean(x), np.sqrt(np.var(x))))

0.842068 ± 1.039634

def F(x, y):
    return np.exp(-((x-1)**2+(y+0.5)**2)/0.25)  + np.exp(-((x+2)-y**2)**2/0.25)*(1-np.tanh(x))*0.5 + \
        np.exp(-((x)**2+(y-2)**2)/0.125)

def show_Metropolis_random_walk(F, walk):
    plt.figure(figsize=(12,8), dpi=80)
    plt.grid(False)
    xlist = np.linspace(-4.0, 4.0, 300)
    ylist = np.linspace(-3.0, 3.0, 300)
    X, Y = np.meshgrid(xlist, ylist)
    Z = F(X,Y)
    levels = np.linspace(0, 1.1, 6)
    plt.contour(X, Y, Z, levels, colors='k')    
    contour_filled = plt.contourf(X, Y, Z, levels, alpha=0.6)
    plt.plot(walk[:, 0], walk[:, 1], 'r', linewidth=1)
    plt.xlabel('x')
    plt.ylabel('y')

def metropolis_step_2d(pos, f, target_dist, sigma):
    pos_test = pos + np.random.normal(0, sigma, 2)    
    f_test = target_dist(*pos_test)
    if f_test >  f :
        return (pos_test, f_test)
    else:
        p = np.random.random()
        if p < f_test/f:
            return (pos_test, f_test)
        else: return (pos, f)

def make_walk(Nsamples):
    walk = np.zeros([Nsamples, 2])
    walk[0, :] = [0, 0]
    f = F(*walk[0])
    for n in range(Nsamples-1):
        walk[n+1,:], f  = metropolis_step_2d(walk[n], f, F, 0.8)

    show_Metropolis_random_walk(F, walk)

make_walk(1)

make_walk(3000)

Filling gaps¶

from scipy.stats import norm
N = 1000
def brownian_motion(σ=2, T=10, N=1000):
    t = np.linspace(0, T, N)
    dt = t[1]-t[0]
    n1, n2 = 300, 500
    r = norm.rvs(size=t.shape, scale=σ**2*dt)
    x = np.cumsum(r)  # used cumulative sum instead of loop
    x[n1:n2] = None
    plt.plot(t, x)
    plt.xlabel('$t$', fontsize=16)
    plt.ylabel('$x$',  fontsize=16)
    plt.show()
    return x, n1, n2

x, n1, n2 = brownian_motion(σ=2, T=10, N=1000)
y = np.concatenate((np.diff(x[0:n1]), np.diff(x[n2:])))
mu, var = np.mean(y), np.var(y)

Missing data problem $X(t_1<t<t_2)$:

from existing data we can obtain distributions (assuming gaussian for simplicity)
we need to contruct a random walk from poit $X(t_1)$ to $X(t_2)$ having similar properties as the rest of the data

$$X\left(\frac{t_1+t_2}{2}\right)=\frac{X(t_1)+X(t_2)}{2} + W\left(\sigma_1\sqrt{\frac{t_2-t_1}{2}}\right)$$

where $W(\sigma)$ is a gausian distribution with width $\sigma$. Width grows as $\sqrt{\Delta t}$

We repeat the same procedure for data between $\left[t_1,\frac{t_1+t_2}{2} \right]$ and $\left[\frac{t_1+t_2}{2}, t_2 \right]$ scalling the gaussians accordingly

#z = np.zeros(21)
z = np.copy(x[n1-1:n2+1])
def rec(z, a, b):
    c = (a+b)//2
    dn = np.sqrt((b-a)/2)
    z[c] = 0.5*(z[a]+ z[b]) + norm.rvs(scale=np.sqrt(var)*dn) 
    if b-a>2:
        rec(z, a, c)
        rec(z, c, b)
        
rec(z, 0, z.size-1)
#print(z)
t = np.linspace(0, 10, N)
plt.plot(t, x)
plt.plot(t[n1-1:n2+1], z)

[<matplotlib.lines.Line2D at 0x7f309d585cc0>]

Or we can generate a stright line between points $X(t_1)$ and $X(t_2)$
and add random displacements for each intermediate points matching so that

$$X(t) = \frac{X(t-dt)+X(t+dt)}{2} + W(\sigma_1 \sqrt{dt})$$

This procedure is less cemvergent and may require as many as $N^2$ samples, where $N$ is the number of missing data.

u = np.linspace(z[0], z[-1], z.size)
NN = z.size-2
for n in range(100000):
    k = 1 + int(np.random.random()*NN)
    d = norm.rvs(scale=np.sqrt(var)) 
    u[k] = (u[k-1]+u[k+1] )/2 + d

plt.plot(t, x)
plt.plot(t[n1-1:n2+1], u)

[<matplotlib.lines.Line2D at 0x7f3082f882b0>]

# not in anaconda, I used: python3.7 -m pip install sdeint
import sdeint

Algorithms:

itoEuler(f, G, y0, tspan): the Euler-Maruyama algorithm for Ito equations.
stratHeun(f, G, y0, tspan): the Stratonovich Heun algorithm for Stratonovich equations.
itoSRI2(f, G, y0, tspan): the Rößler2010 order 1.0 strong Stochastic Runge-Kutta algorithm SRI2 for Ito equations.
itoSRI2(f, [g1,...,gm], y0, tspan): as above, with G matrix given as a separate function for each column (gives speedup for large m or complicated G).
stratSRS2(f, G, y0, tspan): the Rößler2010 order 1.0 strong Stochastic Runge-Kutta algorithm SRS2 for Stratonovich equations.
stratSRS2(f, [g1,...,gm], y0, tspan): as above, with G matrix given as a separate function for each column (gives speedup for large m or complicated G).
stratKP2iS(f, G, y0, tspan): the Kloeden and Platen two-step implicit order 1.0 strong algorithm for Stratonovich equations.

Example:

$$dx = -(a+b^2x)(1-x^2)dt + b(1-x^2)dW$$

a = 1.0
b = 0.8
tspan = np.linspace(0.0, 5.0, 5001)
x0 = 0.1

def f(x, t):
    return -(a + x*b**2)*(1 - x**2)

def g(x, t):
    return b*(1 - x**2)

result = sdeint.itoint(f, g, x0, tspan)

plt.plot(tspan, result)

[<matplotlib.lines.Line2D at 0x7f309fe03828>]

Option prices¶

Black-Scholes-Merton model for European options: $$C(S_t, K, r, \sigma, T, t) = S_tN(d_1)-Ke^{-r(T-t)}N(d_2)$$

$$d_{1,2} = \frac{1}{\sigma\sqrt{T-t}}\left[\log\left(\frac{S_t}{K}\right) + \left(r-\frac{\sigma^2}{2}\right)(T-t) \right]$$

$S_t$ - proce at time $t$
$K$ - strike price
$T$ time to maturity (in years)
$\sigma$ - volatility
$N$ cumulative distribution function (CDF) of the standard normal distribution

import pandas as pd
import pandas_datareader as pdr

start_date = '2015-01-01'
end_date = '2019-11-28'
end_date = '2016-01-04'
fb_data = pdr.DataReader("FB", "yahoo", start_date, end_date)
fb_data.tail()

from scipy import stats

def Black_Scholes_call(S_t, K, r, sigma, T):
    den = 1 / (sigma * np.sqrt(T))
    d1 = den * (np.log(S_t / K) + (r + 0.5 * sigma ** 2) * T)
    d2 = den * (np.log(S_t / K) + (r - 0.5 * sigma ** 2) * T)
    C = S_t * stats.norm.cdf(d1) \
        - K * np.exp(-r * T) * stats.norm.cdf(d2)
    return C

# Calculate volatility
# Returns
ret = fb_data['Close'][1:].values / fb_data['Close'][:-1].values - 1
# Volatility
sigma = np.std(ret) * np.sqrt(252)

sigma

0.25740530517111704

# Get the risk-free rate

rfr = pdr.DataReader("DGS10", "fred", end_date)
rfr.head()

# Get the opening price on the day of interest
S_t = fb_data['Open'][-1]
# Range of strike prices
K = S_t *(1 + np.linspace(0.05, 1, 20))
# Risk free rate on day of interest
r = rfr.loc[fb_data.index[-1]][0]
# Time to maturity in year fractions
T = 0.5

# Calculate option prices
C = [Black_Scholes_call(S_t, k, r / 100, sigma, T) for k in K]

plt.plot(K, C)
plt.xlabel("Strike Price")
plt.ylabel("Option Price")
plt.title("Option Price vs. Strike Price for 6-Month European Call Options")
plt.show()

# Keep values from previous BSM for comparison
K_bsm =  S_t *(1 + 0.05)
C_bsm = Black_Scholes_call(S_t, K_bsm, r / 100, sigma, T)

# Parameters - same values as used in the example above 
# repeated here for a reminder, change as you like
# Initial asset price
S_0 = S_t
# Strike price for call option
K = K_bsm
# Time to maturity in years
T = 0.5
# Risk-free rate of interest
r = rfr.loc[fb_data.index[-1]][0] / 100
# Historical Volatility
sigma = np.std(ret) * np.sqrt(252)
# Number of time steps for simulation
n_steps = int(T * 252)
# Time interval
dt = T / n_steps
# Number of simulations
N = 100000
# Zero array to store values (often faster than appending)
S = np.zeros((n_steps, N))
S[0] = S_0

The equation $$dS_t = rS_tdt+\sigma S_tdW$$ can be rewritten as $$\frac{dS_t}{S_t} = rdt+\sigma dW$$ rhs has constant coefficients, so (from Ito's lemma)

$$S_t \approx S_{t-dt}\exp\left[\left(r - \frac{\sigma^2}{2}\right)dt +\sigma \sqrt{dt}W\right]$$$$C=e^{-rT}E\biggl(\max(S_t-K, 0)\biggr)$$

for t in range(1, n_steps):
    # Draw random values to simulate Brownian motion
    W = np.random.standard_normal(N)
    S[t] = S[t - 1] * np.exp((r - 0.5 * sigma ** 2)*dt + (sigma * np.sqrt(dt) * W))

# Sum and discount values
C = np.exp(-r * T) * np.mean(np.maximum(S[-1] - K, 0))

def show_BS_example():
    plt.plot(S[:,0:30])
    plt.axhline(K, c="k", xmin=0,
                xmax=n_steps,
               label="Strike Price")
    plt.xlim([0, n_steps])
    plt.ylabel("Non-Discounted Value")
    plt.xlabel("Time step")
    plt.title("First 30 Option Paths")
    plt.legend(loc="best")
    plt.show()

show_BS_example()

Pandas data types¶

import quandl as qdl

data = qdl.get("WSE/WIG30")
data

type(data)

pandas.core.frame.DataFrame

data.dtypes

Open                float64
High                float64
Low                 float64
Close               float64
% Change            float64
Turnover (1000s)    float64
dtype: object

vars(data)

{'_is_copy': None, '_data': BlockManager
 Items: Index(['Open', 'High', 'Low', 'Close', '% Change', 'Turnover (1000s)'], dtype='object')
 Axis 1: DatetimeIndex(['2016-10-03', '2016-10-04', '2016-10-05', '2016-10-06',
                '2016-10-07', '2016-10-10', '2016-10-11', '2016-10-12',
                '2016-10-13', '2016-10-14',
                ...
                '2019-10-01', '2019-10-02', '2019-10-03', '2019-10-04',
                '2019-10-07', '2019-10-08', '2019-10-09', '2019-10-10',
                '2019-10-14', '2019-10-15'],
               dtype='datetime64[ns]', name='Date', length=754, freq=None)
 FloatBlock: slice(0, 6, 1), 6 x 754, dtype: float64, '_item_cache': {}}

data.columns

Index(['Open', 'High', 'Low', 'Close', '% Change', 'Turnover (1000s)'], dtype='object')

data.index

DatetimeIndex(['2016-10-03', '2016-10-04', '2016-10-05', '2016-10-06',
               '2016-10-07', '2016-10-10', '2016-10-11', '2016-10-12',
               '2016-10-13', '2016-10-14',
               ...
               '2019-10-01', '2019-10-02', '2019-10-03', '2019-10-04',
               '2019-10-07', '2019-10-08', '2019-10-09', '2019-10-10',
               '2019-10-14', '2019-10-15'],
              dtype='datetime64[ns]', name='Date', length=754, freq=None)

data['Open']

Date
2016-10-03    1982.01
2016-10-04    1999.99
2016-10-05    2027.49
2016-10-06    2033.05
2016-10-07    2034.51
               ...   
2019-10-08    2406.63
2019-10-09    2410.00
2019-10-10    2415.63
2019-10-14    2445.28
2019-10-15    2445.07
Name: Open, Length: 754, dtype: float64

plt.plot(data['Close']-data['Open'])
type(data['Open'])

pandas.core.series.Series

print(data['Open'].array)

<PandasArray>
[1982.01, 1999.99, 2027.49, 2033.05, 2034.51, 2034.33, 2044.37, 2028.54,
 2028.46, 2013.79,
 ...
  2456.7, 2424.05, 2372.05, 2382.94,  2405.9, 2406.63,  2410.0, 2415.63,
 2445.28, 2445.07]
Length: 754, dtype: float64

vars(data['Open'])

{'_is_copy': None, '_data': SingleBlockManager
 Items: DatetimeIndex(['2016-10-03', '2016-10-04', '2016-10-05', '2016-10-06',
                '2016-10-07', '2016-10-10', '2016-10-11', '2016-10-12',
                '2016-10-13', '2016-10-14',
                ...
                '2019-10-01', '2019-10-02', '2019-10-03', '2019-10-04',
                '2019-10-07', '2019-10-08', '2019-10-09', '2019-10-10',
                '2019-10-14', '2019-10-15'],
               dtype='datetime64[ns]', name='Date', length=754, freq=None)
 FloatBlock: 754 dtype: float64, '_item_cache': {}, '_name': 'Open', '_subtyp': 'time_series', '_index': DatetimeIndex(['2016-10-03', '2016-10-04', '2016-10-05', '2016-10-06',
                '2016-10-07', '2016-10-10', '2016-10-11', '2016-10-12',
                '2016-10-13', '2016-10-14',
                ...
                '2019-10-01', '2019-10-02', '2019-10-03', '2019-10-04',
                '2019-10-07', '2019-10-08', '2019-10-09', '2019-10-10',
                '2019-10-14', '2019-10-15'],
               dtype='datetime64[ns]', name='Date', length=754, freq=None), '_cacher': ('Open',
  <weakref at 0x7f3079b1fbd8; to 'DataFrame' at 0x7f307fc78978>)}

Tasks:¶

Using Metropolis-Hastings algorithm calculate a volume of $N$-ball.
Can MH alg. be used to fill the gap of the data?
From some real data (example from lecture) erase 10% of the data and reonstruct the data
Implement BS model for other options
Find and download options data for some other examples.

	High	Low	Open	Close	Volume	Adj Close
Date
2015-12-28	105.980003	104.529999	105.019997	105.930000	13069700	105.930000
2015-12-29	107.739998	106.250000	106.419998	107.260002	17179900	107.260002
2015-12-30	107.250000	106.059998	107.000000	106.220001	13115000	106.220001
2015-12-31	106.169998	104.620003	106.000000	104.660004	18391100	104.660004
2016-01-04	102.239998	99.750000	101.949997	102.220001	37912400	102.220001

	DGS10
DATE
2016-01-04	2.24
2016-01-05	2.25
2016-01-06	2.18
2016-01-07	2.16
2016-01-08	2.13

	Open	High	Low	Close	% Change	Turnover (1000s)
Date
2016-10-03	1982.01	1997.93	1978.65	1994.76	0.92	447235.01
2016-10-04	1999.99	2029.75	1999.87	2026.20	1.58	788854.18
2016-10-05	2027.49	2042.42	2016.23	2035.46	0.46	779566.16
2016-10-06	2033.05	2036.45	2023.94	2034.34	-0.06	574269.09
2016-10-07	2034.51	2035.05	2022.55	2025.38	-0.44	507426.68
...	...	...	...	...	...	...
2019-10-08	2406.63	2412.69	2390.39	2410.47	0.49	642848.41
2019-10-09	2410.00	2420.15	2403.99	2412.44	0.08	514846.90
2019-10-10	2415.63	2415.87	2393.06	2399.63	-0.53	612144.83
2019-10-14	2445.28	2447.26	2425.79	2431.70	-0.47	559119.87
2019-10-15	2445.07	2445.16	2431.94	2439.89	0.34	569496.68