Numerical Methods

Lecture 7: Monte Carlo simulations cont.

by: Tomasz Romańczukiewicz


Outline

  • Convergence
  • Metropolis algorithm
  • Filling gaps in data
  • Option prices
  • Pandas
In [2]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline  
plt.rcParams['figure.dpi'] = 120
plt.rcParams['figure.figsize'] = [8,4]
plt.rcParams['axes.grid'] = True
#plt.rcParams.keys()
In [6]:
N = 1000
results = np.empty(N)
for s in range(N):
    Nsamples, k = 1000, 0
    for n in range(Nsamples):
        x = np.random.random(2) # two random numbers
        if np.sum(x**2)<1: k+=1
    results[s] = 4*k/Nsamples
In [7]:
plt.hist(results, bins=np.linspace(3, 3.3, 40), density='True', label='generated dist.')
plt.show()
print("π = {:.6f} ± {:.6f}".format(np.mean(results), np.sqrt(np.var(results/(N-1)))))
π = 3.143228 ± 0.000053
In [8]:
Nsamples  = 10000
n = 1+np.arange(Nsamples)
x = np.random.random((Nsamples,2))
r2 = np.int32((x[:, 0]**2+x[:, 1]**2<1))
result = 4*np.cumsum(r2)/n

plt.loglog(n[10:], np.abs(result[10:]-np.pi), 'ro')
plt.loglog(n[10:], 4/np.sqrt(n[10:]), '--')
Out[8]:
[<matplotlib.lines.Line2D at 0x7f30ac436160>]
  • Normal distribution:
$${\displaystyle {\frac {1}{\sqrt {2\pi \sigma ^{2}}}}e^{-{\frac {(x-\mu )^{2}}{2\sigma ^{2}}}}}$$
  • Cumulative distribution function
$${\displaystyle {\frac {1}{2}}\left[1+\operatorname {erf} \left({\frac {x-\mu }{\sigma {\sqrt {2}}}}\right)\right]}$$
  • Quantile (confidence level)
$${\displaystyle \mu +\sigma {\sqrt {2}}\operatorname {erf} ^{-1}(2p-1)}$$
In [9]:
from scipy.special import erfinv, erf
for confidence_level in [0.25, 0.5, 0.75, 0.95, 0.99]:
    b = np.sqrt(2)*erfinv(confidence_level)
    print(f"{100*confidence_level:.0f}%  [μ-{b:.2f}σ, μ+{b:.2f}σ]")
25%  [μ-0.32σ, μ+0.32σ]
50%  [μ-0.67σ, μ+0.67σ]
75%  [μ-1.15σ, μ+1.15σ]
95%  [μ-1.96σ, μ+1.96σ]
99%  [μ-2.58σ, μ+2.58σ]

If single measurement gives $\mu=E(X)$ and $\sigma=\sqrt{var(X)}$
than $M$ measurements would give

$$\sigma_M = \frac{\sigma}{\sqrt{M}}$$

Metropolis Hastings algorithm


Allows generation of the unnormalized distribution.

Let f(x) be a desired distibution.

  1. choose an initial number $x$
  2. choose a test number $x'=x+W$, where $W$ - random number from a symmetic distribution (usually gaussian)
  3. if $f(x')>f(x)$ accept $x'$ as a new position
  4. if $f(x')<f(x)$ accept $x'$ as the new position only if $$\frac{f(x')}{f(x)}>p$$ where $p\in [0,1)$ - a random number from a uniform distribution
In [10]:
def f(x):
    return 0.5*np.exp(-x**2/0.25)+np.exp(-(x-2)**2/0.04)

x = np.linspace(-5,5, 1000)
plt.plot(x, f(x), 'black')
Out[10]:
[<matplotlib.lines.Line2D at 0x7f309fe6ce48>]
In [11]:
def metropolis_step(x, y, target_dist, sigma):
    x_test = x + np.random.normal(0, sigma)
    y_test = target_dist(x_test)
    if y_test >  y :
        return (x_test, y_test)
    else:
        p = np.random.random()
        if p < y_test/y:
            return (x_test, y_test)
        else: return (x, y)

NSamples = 1000        
x = np.zeros(Nsamples)
x[0] = 0.5
y = f(x[0])
for n in range(x.size-1):
    x[n+1], y = metropolis_step(x[n], y, f, 0.8)
In [12]:
plt.hist(x, bins=np.linspace(-4, 4, 100), density='True', label='generated dist.')
plt.show()

It is useful to calculate averages such as

$$\frac{\int x^2f(x)\,dx}{\int f(x)dx}$$
In [13]:
print("{:.6f} ± {:.6f}".format(np.mean(x), np.sqrt(np.var(x))))
0.842068 ± 1.039634
In [14]:
def F(x, y):
    return np.exp(-((x-1)**2+(y+0.5)**2)/0.25)  + np.exp(-((x+2)-y**2)**2/0.25)*(1-np.tanh(x))*0.5 + \
        np.exp(-((x)**2+(y-2)**2)/0.125)

def show_Metropolis_random_walk(F, walk):
    plt.figure(figsize=(12,8), dpi=80)
    plt.grid(False)
    xlist = np.linspace(-4.0, 4.0, 300)
    ylist = np.linspace(-3.0, 3.0, 300)
    X, Y = np.meshgrid(xlist, ylist)
    Z = F(X,Y)
    levels = np.linspace(0, 1.1, 6)
    plt.contour(X, Y, Z, levels, colors='k')    
    contour_filled = plt.contourf(X, Y, Z, levels, alpha=0.6)
    plt.plot(walk[:, 0], walk[:, 1], 'r', linewidth=1)
    plt.xlabel('x')
    plt.ylabel('y')
   
In [15]:
def metropolis_step_2d(pos, f, target_dist, sigma):
    pos_test = pos + np.random.normal(0, sigma, 2)    
    f_test = target_dist(*pos_test)
    if f_test >  f :
        return (pos_test, f_test)
    else:
        p = np.random.random()
        if p < f_test/f:
            return (pos_test, f_test)
        else: return (pos, f)

def make_walk(Nsamples):
    walk = np.zeros([Nsamples, 2])
    walk[0, :] = [0, 0]
    f = F(*walk[0])
    for n in range(Nsamples-1):
        walk[n+1,:], f  = metropolis_step_2d(walk[n], f, F, 0.8)

    show_Metropolis_random_walk(F, walk)   
In [16]:
make_walk(1)
In [17]:
make_walk(3000)

Filling gaps

In [46]:
from scipy.stats import norm
N = 1000
def brownian_motion(σ=2, T=10, N=1000):
    t = np.linspace(0, T, N)
    dt = t[1]-t[0]
    n1, n2 = 300, 500
    r = norm.rvs(size=t.shape, scale=σ**2*dt)
    x = np.cumsum(r)  # used cumulative sum instead of loop
    x[n1:n2] = None
    plt.plot(t, x)
    plt.xlabel('$t$', fontsize=16)
    plt.ylabel('$x$',  fontsize=16)
    plt.show()
    return x, n1, n2
In [19]:
x, n1, n2 = brownian_motion(σ=2, T=10, N=1000)
y = np.concatenate((np.diff(x[0:n1]), np.diff(x[n2:])))
mu, var = np.mean(y), np.var(y)

Missing data problem $X(t_1<t<t_2)$:

  • from existing data we can obtain distributions (assuming gaussian for simplicity)
  • we need to contruct a random walk from poit $X(t_1)$ to $X(t_2)$ having similar properties as the rest of the data
$$X\left(\frac{t_1+t_2}{2}\right)=\frac{X(t_1)+X(t_2)}{2} + W\left(\sigma_1\sqrt{\frac{t_2-t_1}{2}}\right)$$

where $W(\sigma)$ is a gausian distribution with width $\sigma$. Width grows as $\sqrt{\Delta t}$

We repeat the same procedure for data between $\left[t_1,\frac{t_1+t_2}{2} \right]$ and $\left[\frac{t_1+t_2}{2}, t_2 \right]$ scalling the gaussians accordingly

In [20]:
#z = np.zeros(21)
z = np.copy(x[n1-1:n2+1])
def rec(z, a, b):
    c = (a+b)//2
    dn = np.sqrt((b-a)/2)
    z[c] = 0.5*(z[a]+ z[b]) + norm.rvs(scale=np.sqrt(var)*dn) 
    if b-a>2:
        rec(z, a, c)
        rec(z, c, b)
        
rec(z, 0, z.size-1)
#print(z)
t = np.linspace(0, 10, N)
plt.plot(t, x)
plt.plot(t[n1-1:n2+1], z)
Out[20]:
[<matplotlib.lines.Line2D at 0x7f309d585cc0>]

Or we can generate a stright line between points $X(t_1)$ and $X(t_2)$
and add random displacements for each intermediate points matching so that

$$X(t) = \frac{X(t-dt)+X(t+dt)}{2} + W(\sigma_1 \sqrt{dt})$$

This procedure is less cemvergent and may require as many as $N^2$ samples, where $N$ is the number of missing data.

In [21]:
u = np.linspace(z[0], z[-1], z.size)
NN = z.size-2
for n in range(100000):
    k = 1 + int(np.random.random()*NN)
    d = norm.rvs(scale=np.sqrt(var)) 
    u[k] = (u[k-1]+u[k+1] )/2 + d

plt.plot(t, x)
plt.plot(t[n1-1:n2+1], u)
Out[21]:
[<matplotlib.lines.Line2D at 0x7f3082f882b0>]
In [22]:
# not in anaconda, I used: python3.7 -m pip install sdeint
import sdeint

Algorithms:

  • itoEuler(f, G, y0, tspan): the Euler-Maruyama algorithm for Ito equations.
  • stratHeun(f, G, y0, tspan): the Stratonovich Heun algorithm for Stratonovich equations.
  • itoSRI2(f, G, y0, tspan): the Rößler2010 order 1.0 strong Stochastic Runge-Kutta algorithm SRI2 for Ito equations.
  • itoSRI2(f, [g1,...,gm], y0, tspan): as above, with G matrix given as a separate function for each column (gives speedup for large m or complicated G).
  • stratSRS2(f, G, y0, tspan): the Rößler2010 order 1.0 strong Stochastic Runge-Kutta algorithm SRS2 for Stratonovich equations.
  • stratSRS2(f, [g1,...,gm], y0, tspan): as above, with G matrix given as a separate function for each column (gives speedup for large m or complicated G).
  • stratKP2iS(f, G, y0, tspan): the Kloeden and Platen two-step implicit order 1.0 strong algorithm for Stratonovich equations.

Example:

$$dx = -(a+b^2x)(1-x^2)dt + b(1-x^2)dW$$
In [23]:
a = 1.0
b = 0.8
tspan = np.linspace(0.0, 5.0, 5001)
x0 = 0.1

def f(x, t):
    return -(a + x*b**2)*(1 - x**2)

def g(x, t):
    return b*(1 - x**2)

result = sdeint.itoint(f, g, x0, tspan)
In [24]:
plt.plot(tspan, result)
Out[24]:
[<matplotlib.lines.Line2D at 0x7f309fe03828>]

Option prices


Black-Scholes-Merton model for European options: $$C(S_t, K, r, \sigma, T, t) = S_tN(d_1)-Ke^{-r(T-t)}N(d_2)$$

$$d_{1,2} = \frac{1}{\sigma\sqrt{T-t}}\left[\log\left(\frac{S_t}{K}\right) + \left(r-\frac{\sigma^2}{2}\right)(T-t) \right]$$
  • $S_t$ - proce at time $t$
  • $K$ - strike price
  • $T$ time to maturity (in years)
  • $\sigma$ - volatility
  • $N$ cumulative distribution function (CDF) of the standard normal distribution
In [25]:
import pandas as pd
import pandas_datareader as pdr

start_date = '2015-01-01'
end_date = '2019-11-28'
end_date = '2016-01-04'
fb_data = pdr.DataReader("FB", "yahoo", start_date, end_date)
fb_data.tail()
Out[25]:
High Low Open Close Volume Adj Close
Date
2015-12-28 105.980003 104.529999 105.019997 105.930000 13069700 105.930000
2015-12-29 107.739998 106.250000 106.419998 107.260002 17179900 107.260002
2015-12-30 107.250000 106.059998 107.000000 106.220001 13115000 106.220001
2015-12-31 106.169998 104.620003 106.000000 104.660004 18391100 104.660004
2016-01-04 102.239998 99.750000 101.949997 102.220001 37912400 102.220001
In [26]:
from scipy import stats

def Black_Scholes_call(S_t, K, r, sigma, T):
    den = 1 / (sigma * np.sqrt(T))
    d1 = den * (np.log(S_t / K) + (r + 0.5 * sigma ** 2) * T)
    d2 = den * (np.log(S_t / K) + (r - 0.5 * sigma ** 2) * T)
    C = S_t * stats.norm.cdf(d1) \
        - K * np.exp(-r * T) * stats.norm.cdf(d2)
    return C
In [27]:
# Calculate volatility
# Returns
ret = fb_data['Close'][1:].values / fb_data['Close'][:-1].values - 1
# Volatility
sigma = np.std(ret) * np.sqrt(252)

sigma
Out[27]:
0.25740530517111704
In [28]:
# Get the risk-free rate

rfr = pdr.DataReader("DGS10", "fred", end_date)
rfr.head()
Out[28]:
DGS10
DATE
2016-01-04 2.24
2016-01-05 2.25
2016-01-06 2.18
2016-01-07 2.16
2016-01-08 2.13
In [29]:
# Get the opening price on the day of interest
S_t = fb_data['Open'][-1]
# Range of strike prices
K = S_t *(1 + np.linspace(0.05, 1, 20))
# Risk free rate on day of interest
r = rfr.loc[fb_data.index[-1]][0]
# Time to maturity in year fractions
T = 0.5
In [30]:
# Calculate option prices
C = [Black_Scholes_call(S_t, k, r / 100, sigma, T) for k in K]

plt.plot(K, C)
plt.xlabel("Strike Price")
plt.ylabel("Option Price")
plt.title("Option Price vs. Strike Price for 6-Month European Call Options")
plt.show()
In [31]:
# Keep values from previous BSM for comparison
K_bsm =  S_t *(1 + 0.05)
C_bsm = Black_Scholes_call(S_t, K_bsm, r / 100, sigma, T)

# Parameters - same values as used in the example above 
# repeated here for a reminder, change as you like
# Initial asset price
S_0 = S_t
# Strike price for call option
K = K_bsm
# Time to maturity in years
T = 0.5
# Risk-free rate of interest
r = rfr.loc[fb_data.index[-1]][0] / 100
# Historical Volatility
sigma = np.std(ret) * np.sqrt(252)
# Number of time steps for simulation
n_steps = int(T * 252)
# Time interval
dt = T / n_steps
# Number of simulations
N = 100000
# Zero array to store values (often faster than appending)
S = np.zeros((n_steps, N))
S[0] = S_0

The equation $$dS_t = rS_tdt+\sigma S_tdW$$ can be rewritten as $$\frac{dS_t}{S_t} = rdt+\sigma dW$$ rhs has constant coefficients, so (from Ito's lemma)

$$S_t \approx S_{t-dt}\exp\left[\left(r - \frac{\sigma^2}{2}\right)dt +\sigma \sqrt{dt}W\right]$$$$C=e^{-rT}E\biggl(\max(S_t-K, 0)\biggr)$$
In [32]:
for t in range(1, n_steps):
    # Draw random values to simulate Brownian motion
    W = np.random.standard_normal(N)
    S[t] = S[t - 1] * np.exp((r - 0.5 * sigma ** 2)*dt + (sigma * np.sqrt(dt) * W))

# Sum and discount values
C = np.exp(-r * T) * np.mean(np.maximum(S[-1] - K, 0))
In [33]:
def show_BS_example():
    plt.plot(S[:,0:30])
    plt.axhline(K, c="k", xmin=0,
                xmax=n_steps,
               label="Strike Price")
    plt.xlim([0, n_steps])
    plt.ylabel("Non-Discounted Value")
    plt.xlabel("Time step")
    plt.title("First 30 Option Paths")
    plt.legend(loc="best")
    plt.show()
In [34]:
show_BS_example()

Pandas data types

In [35]:
import quandl as qdl

data = qdl.get("WSE/WIG30")
data
Out[35]:
Open High Low Close % Change Turnover (1000s)
Date
2016-10-03 1982.01 1997.93 1978.65 1994.76 0.92 447235.01
2016-10-04 1999.99 2029.75 1999.87 2026.20 1.58 788854.18
2016-10-05 2027.49 2042.42 2016.23 2035.46 0.46 779566.16
2016-10-06 2033.05 2036.45 2023.94 2034.34 -0.06 574269.09
2016-10-07 2034.51 2035.05 2022.55 2025.38 -0.44 507426.68
... ... ... ... ... ... ...
2019-10-08 2406.63 2412.69 2390.39 2410.47 0.49 642848.41
2019-10-09 2410.00 2420.15 2403.99 2412.44 0.08 514846.90
2019-10-10 2415.63 2415.87 2393.06 2399.63 -0.53 612144.83
2019-10-14 2445.28 2447.26 2425.79 2431.70 -0.47 559119.87
2019-10-15 2445.07 2445.16 2431.94 2439.89 0.34 569496.68

754 rows × 6 columns

In [36]:
type(data)
Out[36]:
pandas.core.frame.DataFrame
In [37]:
data.dtypes
Out[37]:
Open                float64
High                float64
Low                 float64
Close               float64
% Change            float64
Turnover (1000s)    float64
dtype: object
In [38]:
vars(data)
Out[38]:
{'_is_copy': None, '_data': BlockManager
 Items: Index(['Open', 'High', 'Low', 'Close', '% Change', 'Turnover (1000s)'], dtype='object')
 Axis 1: DatetimeIndex(['2016-10-03', '2016-10-04', '2016-10-05', '2016-10-06',
                '2016-10-07', '2016-10-10', '2016-10-11', '2016-10-12',
                '2016-10-13', '2016-10-14',
                ...
                '2019-10-01', '2019-10-02', '2019-10-03', '2019-10-04',
                '2019-10-07', '2019-10-08', '2019-10-09', '2019-10-10',
                '2019-10-14', '2019-10-15'],
               dtype='datetime64[ns]', name='Date', length=754, freq=None)
 FloatBlock: slice(0, 6, 1), 6 x 754, dtype: float64, '_item_cache': {}}
In [39]:
data.columns
Out[39]:
Index(['Open', 'High', 'Low', 'Close', '% Change', 'Turnover (1000s)'], dtype='object')
In [40]:
data.index
Out[40]:
DatetimeIndex(['2016-10-03', '2016-10-04', '2016-10-05', '2016-10-06',
               '2016-10-07', '2016-10-10', '2016-10-11', '2016-10-12',
               '2016-10-13', '2016-10-14',
               ...
               '2019-10-01', '2019-10-02', '2019-10-03', '2019-10-04',
               '2019-10-07', '2019-10-08', '2019-10-09', '2019-10-10',
               '2019-10-14', '2019-10-15'],
              dtype='datetime64[ns]', name='Date', length=754, freq=None)
In [41]:
data['Open']
Out[41]:
Date
2016-10-03    1982.01
2016-10-04    1999.99
2016-10-05    2027.49
2016-10-06    2033.05
2016-10-07    2034.51
               ...   
2019-10-08    2406.63
2019-10-09    2410.00
2019-10-10    2415.63
2019-10-14    2445.28
2019-10-15    2445.07
Name: Open, Length: 754, dtype: float64
In [43]:
plt.plot(data['Close']-data['Open'])
type(data['Open'])
Out[43]:
pandas.core.series.Series
In [44]:
print(data['Open'].array)
<PandasArray>
[1982.01, 1999.99, 2027.49, 2033.05, 2034.51, 2034.33, 2044.37, 2028.54,
 2028.46, 2013.79,
 ...
  2456.7, 2424.05, 2372.05, 2382.94,  2405.9, 2406.63,  2410.0, 2415.63,
 2445.28, 2445.07]
Length: 754, dtype: float64
In [45]:
vars(data['Open'])
Out[45]:
{'_is_copy': None, '_data': SingleBlockManager
 Items: DatetimeIndex(['2016-10-03', '2016-10-04', '2016-10-05', '2016-10-06',
                '2016-10-07', '2016-10-10', '2016-10-11', '2016-10-12',
                '2016-10-13', '2016-10-14',
                ...
                '2019-10-01', '2019-10-02', '2019-10-03', '2019-10-04',
                '2019-10-07', '2019-10-08', '2019-10-09', '2019-10-10',
                '2019-10-14', '2019-10-15'],
               dtype='datetime64[ns]', name='Date', length=754, freq=None)
 FloatBlock: 754 dtype: float64, '_item_cache': {}, '_name': 'Open', '_subtyp': 'time_series', '_index': DatetimeIndex(['2016-10-03', '2016-10-04', '2016-10-05', '2016-10-06',
                '2016-10-07', '2016-10-10', '2016-10-11', '2016-10-12',
                '2016-10-13', '2016-10-14',
                ...
                '2019-10-01', '2019-10-02', '2019-10-03', '2019-10-04',
                '2019-10-07', '2019-10-08', '2019-10-09', '2019-10-10',
                '2019-10-14', '2019-10-15'],
               dtype='datetime64[ns]', name='Date', length=754, freq=None), '_cacher': ('Open',
  <weakref at 0x7f3079b1fbd8; to 'DataFrame' at 0x7f307fc78978>)}

Tasks:

  1. Using Metropolis-Hastings algorithm calculate a volume of $N$-ball.
  2. Can MH alg. be used to fill the gap of the data?
  3. From some real data (example from lecture) erase 10% of the data and reonstruct the data
  4. Implement BS model for other options
  5. Find and download options data for some other examples.
In [ ]: