0.1
Over the years, Python has become the top language choice for scientists and financial engineers.
It is not possible to go through all packages and tools that do numerical computation in Python.
In this course, however, let me take you through a number of tasting sessions, hopefully the sampling of which will whet your appetite for future exploration into the numerical world.
To begin, run the block below to load the dependencies.
# %%HIDE%% %%AUTOEXEC%% import numpy as np, sympy as sp sp.init_printing(use_unicode=False, wrap_line=False, no_global=True) import pandas from bokeh.io import output_notebook, show from bokeh.plotting import figure output_notebook()
0.2
What does Newton's Second Law of Motion say?
$$ F = ma $$ With the Sympy package (shortened to sp here), we can easily do maths you have forgotten
m, a = sp.symbols('m,a') F = m * a F
If $m=10kg$, $a=10m/s^2$, we can evaluate the total force $F$ as
F.evalf(subs={m:10.0, a:10.0})
1.1
What if we know the force $F=80N$, the mass $m=16kg$, but not the $a$?
School maths tells us to solve it using $a=F/m$.
All can do this trivial question, and so is Python.
sp.solve(sp.Eq(F.subs(m, 16.0), 80), a)
Of course the above looks like a complete overkill.
But maybe not this one $a^3 + 7a = 2$, which sympy deals with with ease.
sp.solve(a**3 + 7*a - 2, a)
1.2
We know that the speed $v=at$,
v, a, t = sp.symbols('v,a,t') v = a*t v
To get the distance travelled over a period of time,
we need to integrate the speed $v$ w.r.t. time $t$,
sp.integrate(v, t)
which sympy also is good at. This integral can be evaluated either analytically using evalf(), or numerically using trapezoidal/other rules.
Explore the sympy doc for more fun with symbolic maths using Python.
1.3
Generally speaking, analytical solutions are not computers were known for, or invented for. The fact that computers are embarrassingly fast, parallel calculators, gives hope for numerical solutions to many problems that are not solvable analytically.
Let's start the journey. This time, we ditch sympy for numpy, Python's best numerical package.
t = np.linspace(0, 10, 50) a = 2 s = a*t*t/2 p = figure(plot_height=250, sizing_mode='scale_width') p.line(t, s) show(p)
You may have guessed that variable $t$, $s$ are not normal variables.
In fact they are vectors, representing a collection of mappings between $t$ and $s$.
2.1
The previous plot of the parabola looks neat and smooth. What if we don't know this magic integral? What if we only have an expression of velocity?
We can rely on numerical integration.
dt = 10/50.0 ti, vi, si = 0, 0, 0 t, v, s = [], [], [] while ti <= 10: vi += a * dt si += vi * dt t.append(ti); v.append(vi); s.append(si) ti += dt p = figure(plot_height=250, sizing_mode='scale_width') p.line(t, s) show(p)
Eagle-eyed learners may see a small issue from the plot. What is it?
2.2
The parabola example does not do justice to demonstrate the power of numerical computing. Everyone knows $s = 1/2at^2$.
Now we make our example slightly more compex by adding a Brownian displacement.
import numpy.random as rnd import time rnd.seed(int(time.time())) dt = 10/300.0 ti, vi, si = 0, 0, 0 t, v, s = [], [], [] while ti <= 10: vi += a * dt si += vi * dt + rnd.uniform(-5, 5) t.append(ti); v.append(vi); s.append(si) ti += dt p = figure(plot_height=250, sizing_mode='scale_width') p.line(t, s) show(p)
As we seed the random number generator differently at each run-time, a different path is generated.
2.3
As soon as we introduced random variables into the system, we can no longer bank on results from individual runs, but their collective traits.
In maths this is designated as the law of large numbers. Let's try in our system of random-speed walks.
rnd.seed(int(time.time())) dt = 10/300.0 N = 100 ss = [] for n in range(N): ti, vi, si = 0, 0, 0 t, v, s = [], [], [] ss.append(s) while ti <= 10: vi += a * dt si += vi * dt + rnd.uniform(-5, 5) t.append(ti); v.append(vi); s.append(si) ti += dt p = figure(plot_height=250, sizing_mode='scale_width') p.multi_line([t]*N, ss) show(p)
Experiment with different values for variables in the system and see what happens.
2.4
You may or may not know the titled term; but you just completed a full, albeit minimalistic, Monte Carlo simulation. How do we interpret the results?
Just do the average. In maths, this is the expected value
$$ E(S_t) = \frac{1}{N}\sum_i^N s(a, t) $$
And in code, this is
es = [] ti = 0 i = 0 while ti <= 10: sum_s_t = 0.0 for n in range(N): sum_s_t += ss[n][i] es.append(1.0/N * sum_s_t) ti += dt i += 1 p = figure(plot_height=250, sizing_mode='scale_width') p.line(t, es) show(p)
What have you found?
2.5
With modern computing power, there is not much technical difference between processing data of 10 rows, or those of 1 million rows. The important bit, however, is that the right tool should be used! In Python this tool is called
We take the data generated from the Monte Carlo simulations as an example for analysis. First, assemble the dataframe
import pandas as pd data = {'Time': t, 'Average': es} for n, mcpath in enumerate(ss): data['Run_%s' % n] = mcpath df = pd.DataFrame.from_dict(data) df.describe()
Wait - there's a bit of problem. The statistics here is not very meaningful is it? Why?
3.1
Obviously, taking the average, min, or max of any single path does not carry any value of statistical importance. We want the statistics of like for like items, which means stats on all values of the same timestep.
df_new = df.transpose() df_new.describe()
Better, not ideal though. Why?
3.2
When we transposed the whole dataframe, the new statistics inadvertantly included the data from the "Time" and "Average" row, which is obviously incorrect.
To apply any functions to the dataframe on a different axis, use the following approach:
# here pd.Data.describe is function, which gets passed in as an object (functor) df_stats = df[2:].apply(pd.DataFrame.describe, axis=1) # output the last 5 entries df_stats.tail(5)
3.3
We want to study the movements of pollens. We would like to develop a 2D mapping of the displacement of pollens, centred around the flower. We would also like to plot the results, and ideally get some stats from it.
Time permitting, we would also like to recompute the paths of the pollens, with a 1D wall positioned not far away from the centre.
Time still permitting, we would like to use pandas to calculate the properties of the scattered pollens.
4.1
from time import sleep class Pollen(object): def __init__(self): pass def blown(self, direction, strength): pass class Wind(object): def __init__(self): pass def blow(self, pollens): pass class Simulator(object): def __init__(self, n_pollens, n_winds): pass def draw_pollens(self): pass def run(self): pass mysim = Simulator(1000, 10) mysim.run()
4.2
1
2
3
4
5
6
7
8
9
10
11
12
13
14