Numerical Python

0.1

"Wine Tasting"

Over the years, Python has become the top language choice for scientists and financial engineers.

It is not possible to go through all packages and tools that do numerical computation in Python.

In this course, however, let me take you through a number of tasting sessions, hopefully the sampling of which will whet your appetite for future exploration into the numerical world.

To begin, run the block below to load the dependencies.

# %%HIDE%% %%AUTOEXEC%%
import numpy as np, sympy as sp
sp.init_printing(use_unicode=False, wrap_line=False, no_global=True)
import pandas
from bokeh.io import output_notebook, show
from bokeh.plotting import figure
output_notebook()

0.2

Analytical Python via Sympy

Old Bottle, New Wine

What does Newton's Second Law of Motion say?

$$ F = ma $$ With the Sympy package (shortened to sp here), we can easily do maths you have forgotten

m, a = sp.symbols('m,a')
F = m * a
F

If $m=10kg$, $a=10m/s^2$, we can evaluate the total force $F$ as

F.evalf(subs={m:10.0, a:10.0})

1.1

Solving an equation

What if we know the force $F=80N$, the mass $m=16kg$, but not the $a$?

School maths tells us to solve it using $a=F/m$.

All can do this trivial question, and so is Python.

sp.solve(sp.Eq(F.subs(m, 16.0), 80), a)

Of course the above looks like a complete overkill.
But maybe not this one $a^3 + 7a = 2$, which sympy deals with with ease.

sp.solve(a**3 + 7*a - 2, a)

1.2

Integration

Going one step further

We know that the speed $v=at$,

v, a, t = sp.symbols('v,a,t')
v = a*t
v

To get the distance travelled over a period of time,
we need to integrate the speed $v$ w.r.t. time $t$,

sp.integrate(v, t)

which sympy also is good at. This integral can be evaluated either analytically using evalf(), or numerically using trapezoidal/other rules.

Explore the sympy doc for more fun with symbolic maths using Python.

1.3

Numerical Python via Numpy

Generally speaking, analytical solutions are not computers were known for, or invented for. The fact that computers are embarrassingly fast, parallel calculators, gives hope for numerical solutions to many problems that are not solvable analytically.

Let's start the journey. This time, we ditch sympy for numpy, Python's best numerical package.

t = np.linspace(0, 10, 50)
a = 2
s = a*t*t/2
p = figure(plot_height=250, sizing_mode='scale_width')
p.line(t, s)
show(p)

You may have guessed that variable $t$, $s$ are not normal variables.
In fact they are vectors, representing a collection of mappings between $t$ and $s$.

2.1

Numerical Integrals

The previous plot of the parabola looks neat and smooth. What if we don't know this magic integral? What if we only have an expression of velocity?

We can rely on numerical integration.

dt = 10/50.0
ti, vi, si = 0, 0, 0
t, v, s = [], [], []
while ti <= 10:
    vi += a * dt
    si += vi * dt
    t.append(ti); v.append(vi); s.append(si)
    ti += dt
p = figure(plot_height=250, sizing_mode='scale_width')
p.line(t, s)
show(p)

Eagle-eyed learners may see a small issue from the plot. What is it?

2.2

Spice it up with Randomness

The parabola example does not do justice to demonstrate the power of numerical computing. Everyone knows $s = 1/2at^2$.

Now we make our example slightly more compex by adding a Brownian displacement.

import numpy.random as rnd
import time
rnd.seed(int(time.time()))
dt = 10/300.0
ti, vi, si = 0, 0, 0
t, v, s = [], [], []
while ti <= 10:
    vi += a * dt
    si += vi * dt + rnd.uniform(-5, 5)
    t.append(ti); v.append(vi); s.append(si)
    ti += dt
p = figure(plot_height=250, sizing_mode='scale_width')
p.line(t, s)
show(p)

As we seed the random number generator differently at each run-time, a different path is generated.

2.3

How to Measure Random Data

As soon as we introduced random variables into the system, we can no longer bank on results from individual runs, but their collective traits.

In maths this is designated as the law of large numbers. Let's try in our system of random-speed walks.

rnd.seed(int(time.time()))
dt = 10/300.0
N = 100
ss = []
for n in range(N):
    ti, vi, si = 0, 0, 0
    t, v, s = [], [], []
    ss.append(s)
    while ti <= 10:
        vi += a * dt
        si += vi * dt + rnd.uniform(-5, 5)
        t.append(ti); v.append(vi); s.append(si)
        ti += dt
p = figure(plot_height=250, sizing_mode='scale_width')
p.multi_line([t]*N, ss)
show(p)

Experiment with different values for variables in the system and see what happens.

2.4

Monte Carlo

You may or may not know the titled term; but you just completed a full, albeit minimalistic, Monte Carlo simulation. How do we interpret the results?

Just do the average. In maths, this is the expected value

$$ E(S_t) = \frac{1}{N}\sum_i^N s(a, t) $$

And in code, this is

es = []
ti = 0
i = 0
while ti <= 10:
    sum_s_t = 0.0
    for n in range(N):
        sum_s_t += ss[n][i]
    es.append(1.0/N * sum_s_t)
    ti += dt
    i += 1
p = figure(plot_height=250, sizing_mode='scale_width')
p.line(t, es)
show(p)

What have you found?

2.5

Handling Data

With modern computing power, there is not much technical difference between processing data of 10 rows, or those of 1 million rows. The important bit, however, is that the right tool should be used! In Python this tool is called

pandas

We take the data generated from the Monte Carlo simulations as an example for analysis. First, assemble the dataframe

import pandas as pd
data = {'Time': t, 'Average': es}
for n, mcpath in enumerate(ss):
    data['Run_%s' % n] = mcpath
df = pd.DataFrame.from_dict(data)
df.describe()

Wait - there's a bit of problem. The statistics here is not very meaningful is it? Why?

3.1

Manipulations: Transpose

Obviously, taking the average, min, or max of any single path does not carry any value of statistical importance. We want the statistics of like for like items, which means stats on all values of the same timestep.

df_new = df.transpose()
df_new.describe()

Better, not ideal though. Why?

3.2

Manipulations: Functors

When we transposed the whole dataframe, the new statistics inadvertantly included the data from the "Time" and "Average" row, which is obviously incorrect.

To apply any functions to the dataframe on a different axis, use the following approach:

# here pd.Data.describe is function, which gets passed in as an object (functor)
df_stats = df[2:].apply(pd.DataFrame.describe, axis=1)
# output the last 5 entries
df_stats.tail(5)

3.3

Exercise: Pollen Movements

  1. We want to study the movements of pollens. We would like to develop a 2D mapping of the displacement of pollens, centred around the flower. We would also like to plot the results, and ideally get some stats from it.

  2. Time permitting, we would also like to recompute the paths of the pollens, with a 1D wall positioned not far away from the centre.

  3. Time still permitting, we would like to use pandas to calculate the properties of the scattered pollens.

4.1

from time import sleep

class Pollen(object):
    def __init__(self):
        pass
        
    def blown(self, direction, strength):
        pass
        
class Wind(object):
    def __init__(self):
        pass
    
    def blow(self, pollens):
        pass
            
class Simulator(object):
    def __init__(self, n_pollens, n_winds):
        pass
        
    def draw_pollens(self):
        pass
        
    def run(self):
        pass
        
mysim = Simulator(1000, 10)
mysim.run()

4.2