Serving the Quantitative Finance Community

 
User avatar
Cuchulainn
Posts: 64093
Joined: July 16th, 2004, 7:38 am
Location: Drosophila melanogaster
Contact:

Re: Python tricks

September 1st, 2020, 9:41 am

Great, thank you very much. I will relate it to the hinterland.
Will also send you our pybind11 - C++ checklist.
Any thoughts on how to generate HTML documentation automatically for pybind11-generated Python modules? For C++ I do with Doxygen, but I don't know how to handle binary Python wrappers.
First reaction would be some kind of Reflection API. I've done similar stuff in C# but maybe Python can do it as well.

https://docs.python.org/3/c-api/reflection.html

?

No idea if binary complicates things. Maybe inspect object attributes at run-time?
 
User avatar
ISayMoo
Posts: 2368
Joined: September 30th, 2015, 8:30 pm

Re: Python tricks

September 1st, 2020, 7:54 pm

I can do stuff like
> from cppyml import linear_regression
> help(linear_regression)
but I was hoping avoid running the intepreter... I want to do something what Sphinx does for Python source code, but with a .pyd file.
 
ZeroSumGame
Posts: 3
Joined: January 23rd, 2020, 11:26 am

Re: Python tricks

September 6th, 2020, 10:14 pm

Hi, 

Sorry if I'm on the wrong board here - new to the forum. 

I'm trying to get to grips with Cython - just doing a basic explicit finite difference function and trying to test the performance gains of various implementations. I know my code is working, and it's an order of magnitude quicker than pure python/numpy, but the numba jit compilation is another 10x faster than my Cython code - is anyone familiar with C/Cython and able to spot the bottleneck in the following please? It's definitely something to do with my V[:,:] array but I don't know how to optimise this further. 

Can obviously just use the numba version for speed but feel like I should be able to at least get to the same with Cython... so wondering what I've missed.

Thanks!!

Numpy/Numba versions (~1.5ms and 5 microseconds, respectively):
import numpy as np
import numba as nb
def FDEur_py(option_type, vol, r, K, T, n_ds):
    ds = 2 * K / n_ds
    dt = 0.9 / vol ** 2 / n_ds ** 2
    s = np.arange(0,2*K+ds,ds)
    n_dt = round(T / dt)
    dt = T / n_dt
    V = np.empty((n_ds+1, n_dt+1))
    
    q = 1 if option_type == 'C' else -1
    
    V[:,0] = np.maximum(q * (s - K),0)
    
    for k in range(1,n_dt+1):
        for i in range(1,n_ds):
            delta = (V[i+1,k-1] - V[i-1,k-1]) / 2/ds
            gamma = (V[i+1,k-1] - 2*V[i,k-1] + V[i-1,k-1]) / ds/ds
            theta = -0.5 * vol ** 2 * s[i] ** 2 * gamma - r * s[i] * delta + r * V[i,k-1]
            V[i,k] = V[i,k-1] - dt * theta
        
        V[0,k] = V[0,k-1] * (1 - r * dt)
        V[n_ds,k] = 2 * V[n_ds-1,k] - V[n_ds-2,k]
    
    return V

FDEur_nb = nb.jit(FDEur_py)

Cython attempt (~50 microseconds):
%%cython
import numpy as np
cimport numpy as np

def FDEur(str option_type, float vol, float r, float K, float T, int n_ds):
    cdef double ds = 2 * K / n_ds
    cdef double dt = 0.9 / vol ** 2 / n_ds ** 2
    cdef int n_dt = round(T / dt)
    cdef double[:] s = np.zeros(n_ds+1)
    cdef double[:,:] V = np.zeros((n_ds+1,n_dt+1))
    cdef int q, k, i
    
    dt = T / n_dt
    q = 1 if option_type == 'C' else -1
    
    for i in range(0,n_ds+1):
        s[i] = i * ds
        V[i,0] = max(q * (s[i] - K),0)
    
    for k in range(1,n_dt+1):
        for i in range(1,n_ds):
            delta = (V[i+1,k-1] - V[i-1,k-1]) / 2/ds
            gamma = (V[i+1,k-1] - 2*V[i,k-1] + V[i-1,k-1]) / ds/ds
            theta = -0.5 * vol ** 2 * s[i] ** 2 * gamma - r * s[i] * delta + r * V[i,k-1]
            V[i,k] = V[i,k-1] - dt * theta
        
        V[0,k] = V[0,k-1] * (1 - r * dt)
        V[n_ds,k] = 2 * V[n_ds-1,k] - V[n_ds-2,k]
    
    return np.array(V)
 
User avatar
katastrofa
Posts: 9910
Joined: August 16th, 2007, 5:36 am
Location: Alpha Centauri

Re: Python tricks

September 6th, 2020, 11:32 pm

Python stores multidimensional arrays in row-major order, so your V indexing produces a lot of costly jumps, you can't fetch and keep the whole chunk in L1 cache, etc. - just trying to imagine what happens, I may be wrong.
Also: https://cython.readthedocs.io/en/latest ... t-indexing
 
User avatar
Cuchulainn
Posts: 64093
Joined: July 16th, 2004, 7:38 am
Location: Drosophila melanogaster
Contact:

Re: Python tricks

September 7th, 2020, 8:45 am

The default order in Python is row-order but you can switch to column-order (saves you having to modify your working code). The other option is to exchange the i and k loops and then it is standardised C-way.

It is certainly worth investigating. Stale caches lines an even false sharing are a possibility.

One more thing ... is bounds checking on/off?


 https://en.wikipedia.org/wiki/Row-_and_ ... ajor_order

I'm basically reiterating what katastrofa has ventilated.
 
User avatar
Cuchulainn
Posts: 64093
Joined: July 16th, 2004, 7:38 am
Location: Drosophila melanogaster
Contact:

Re: Python tricks

September 7th, 2020, 9:41 am

Regarding the numerics, explicit Euler is too slow, but that fact is probably irrelevant in this test case.
ADE (Barakat and Clark) is stable. explicit and 2nd order.
 
ZeroSumGame
Posts: 3
Joined: January 23rd, 2020, 11:26 am

Re: Python tricks

September 7th, 2020, 11:35 am

Thanks very much both - I'm aware the numerics are sub-optimal. Just stole some VB code from a text book. Starting with explicit Euler and then adapting to get familiar with the various FD methods (I'm still a CQF student). 

The row/column order issue might be causing slight issue, but I realised the real issue with fresh eyes this morning. I hadn't added C static type definitions for my delta, gamma and theta intermediate variables in the loop. Resulting code is ~ 5 microseconds, same as the numba version. 

Thanks again for the help! 
 
User avatar
katastrofa
Posts: 9910
Joined: August 16th, 2007, 5:36 am
Location: Alpha Centauri

Re: Python tricks

September 7th, 2020, 1:19 pm

The textbook solutions are often suboptimal - for the sake of clarity. Not always a bad thing.
Another optimisation: If the Cython function is called multiple times for the same V size (which is the case in this example, I suspect), I'd allocate V once, outside of the function.
 
User avatar
Cuchulainn
Posts: 64093
Joined: July 16th, 2004, 7:38 am
Location: Drosophila melanogaster
Contact:

Re: Python tricks

September 7th, 2020, 3:52 pm

Well done, you're welcome. "Get it working, then get it right, then and only then get itvoptimised" :-)
 
User avatar
Cuchulainn
Posts: 64093
Joined: July 16th, 2004, 7:38 am
Location: Drosophila melanogaster
Contact:

Re: Python tricks

September 7th, 2020, 3:55 pm

The textbook solutions are often suboptimal - for the sake of clarity. Not always a bad thing.
Another optimisation: If the Cython function is called multiple times for the same V size (which is the case in this example, I suspect), I'd allocate V once, outside of the function.
It might need to have optimising compiling features turned on (just a guess).
So, cdef is the magic wand? Sounds logical I suppose.

https://cython.readthedocs.io/en/latest ... onize.html
 
User avatar
Cuchulainn
Posts: 64093
Joined: July 16th, 2004, 7:38 am
Location: Drosophila melanogaster
Contact:

Re: Python tricks

September 7th, 2020, 4:19 pm

The textbook solutions are often suboptimal - for the sake of clarity. Not always a bad thing.
Depends how to  define 'optimal'. I suspect you mean run-time, as many technical folk think this way.  It's only one of many. 
In current case code in language A is being ported to code in language B. Stuff gets lost in translation.

The structural solutions is to know how compilers work.
 
User avatar
Cuchulainn
Posts: 64093
Joined: July 16th, 2004, 7:38 am
Location: Drosophila melanogaster
Contact:

Re: Python tricks

September 7th, 2020, 4:31 pm

Thanks very much both - I'm aware the numerics are sub-optimal. Just stole some VB code from a text book. Starting with explicit Euler and then adapting to get familiar with the various FD methods (I'm still a CQF student). 
VBA stores arrays in column-major order. This clarifies your nested array style. 

I saw a similar effort when someone ported Fortran code (column-major order) to C++. I wonder if the scientists from Imperial College are aware of this.

https://www.datasim.nl/blogs/27/analysi ... are-models

Ideally, you learn the most by writing the Cython code from scratch.
 
User avatar
Cuchulainn
Posts: 64093
Joined: July 16th, 2004, 7:38 am
Location: Drosophila melanogaster
Contact:

Re: Python tricks

September 9th, 2020, 9:32 am

Moving beyond the original question and beyond scope, using divided differences to compute greeks is OK for pedagogy but their use leads to several challenges.

This thesis discusses some alternatives

https://www.datasim.nl/application/file ... hesis_.pdf
 
User avatar
ISayMoo
Posts: 2368
Joined: September 30th, 2015, 8:30 pm

Re: Python tricks

September 11th, 2020, 12:05 am

I can do stuff like
> from cppyml import linear_regression
> help(linear_regression)
but I was hoping avoid running the intepreter... I want to do something what Sphinx does for Python source code, but with a .pyd file.
OK, done. Turns out Sphinx can handle binary modules: https://romanwerpachowski.github.io/ML/ ... index.html
 
User avatar
ISayMoo
Posts: 2368
Joined: September 30th, 2015, 8:30 pm

Re: Python tricks

September 11th, 2020, 12:16 pm

Moving beyond the original question and beyond scope, using divided differences to compute greeks is OK for pedagogy 
Also OK for trading, in practice. Traders don't care about derivatives, the market doesn't move in epsilon increments (see Taleb).
ABOUT WILMOTT

Wilmott

Wilmott.com has been "Serving the Quantitative Finance Community" since 2001. Continued...


Twitter LinkedIn Instagram

JOBS BOARD

JOBS BOARD

Looking for a quant job, risk, algo trading,...? Browse jobs here...


GZIP: On