Python tricks

Cuchulainn · September 16th, 2020, 7:36 pm

Hi,

Sorry if I'm on the wrong board here - new to the forum.

I'm trying to get to grips with Cython - just doing a basic explicit finite difference function and trying to test the performance gains of various implementations. I know my code is working, and it's an order of magnitude quicker than pure python/numpy, but the numba jit compilation is another 10x faster than my Cython code - is anyone familiar with C/Cython and able to spot the bottleneck in the following please? It's definitely something to do with my V[:,:] array but I don't know how to optimise this further.

Can obviously just use the numba version for speed but feel like I should be able to at least get to the same with Cython... so wondering what I've missed.

Thanks!!

Numpy/Numba versions (~1.5ms and 5 microseconds, respectively):
Code: Select all
import numpy as np
import numba as nb
def FDEur_py(option_type, vol, r, K, T, n_ds):
    ds = 2 * K / n_ds
    dt = 0.9 / vol ** 2 / n_ds ** 2
    s = np.arange(0,2*K+ds,ds)
    n_dt = round(T / dt)
    dt = T / n_dt
    V = np.empty((n_ds+1, n_dt+1))
    
    q = 1 if option_type == 'C' else -1
    
    V[:,0] = np.maximum(q * (s - K),0)
    
    for k in range(1,n_dt+1):
        for i in range(1,n_ds):
            delta = (V[i+1,k-1] - V[i-1,k-1]) / 2/ds
            gamma = (V[i+1,k-1] - 2*V[i,k-1] + V[i-1,k-1]) / ds/ds
            theta = -0.5 * vol ** 2 * s[i] ** 2 * gamma - r * s[i] * delta + r * V[i,k-1]
            V[i,k] = V[i,k-1] - dt * theta
        
        V[0,k] = V[0,k-1] * (1 - r * dt)
        V[n_ds,k] = 2 * V[n_ds-1,k] - V[n_ds-2,k]
    
    return V

FDEur_nb = nb.jit(FDEur_py)
Cython attempt (~50 microseconds):
Code: Select all
%%cython
import numpy as np
cimport numpy as np

def FDEur(str option_type, float vol, float r, float K, float T, int n_ds):
    cdef double ds = 2 * K / n_ds
    cdef double dt = 0.9 / vol ** 2 / n_ds ** 2
    cdef int n_dt = round(T / dt)
    cdef double[:] s = np.zeros(n_ds+1)
    cdef double[:,:] V = np.zeros((n_ds+1,n_dt+1))
    cdef int q, k, i
    
    dt = T / n_dt
    q = 1 if option_type == 'C' else -1
    
    for i in range(0,n_ds+1):
        s[i] = i * ds
        V[i,0] = max(q * (s[i] - K),0)
    
    for k in range(1,n_dt+1):
        for i in range(1,n_ds):
            delta = (V[i+1,k-1] - V[i-1,k-1]) / 2/ds
            gamma = (V[i+1,k-1] - 2*V[i,k-1] + V[i-1,k-1]) / ds/ds
            theta = -0.5 * vol ** 2 * s[i] ** 2 * gamma - r * s[i] * delta + r * V[i,k-1]
            V[i,k] = V[i,k-1] - dt * theta
        
        V[0,k] = V[0,k-1] * (1 - r * dt)
        V[n_ds,k] = 2 * V[n_ds-1,k] - V[n_ds-2,k]
    
    return np.array(V)

Hi ZSG,
I sent you a PM (Private Mail), top right corner of screen.

ZeroSumGame · September 16th, 2020, 8:05 pm

Hi ZSG,
I sent you a PM (Private Mail), top right corner of screen.

Hey - apparently I'm still too new to be able to send PMs! But unfortunately don't know if I can help, sorry - I'm just in Jupyter NB and learned what i know from chapter 10 in Yves Hilpisch's Python for Finance - then just started trying different problems. Haven't attempted proper setup of .pyx files or anything yet.

Cuchulainn · October 1st, 2020, 1:46 pm

What do masked arrays offer when compared to normal arrays?

bearish · October 1st, 2020, 2:38 pm

Covid protection?

Cuchulainn · October 1st, 2020, 4:45 pm

Covid protection?

Inside every masked array hides a normal array trying to get out. It's a cover.

katastrofa · October 1st, 2020, 5:29 pm

They are a convenient way of creating masks, e.g. for missing values or special values indicating missing values (in some surveys negative numbers are used to indicate that the answer wasn't obtained for various reasons).

import numpy as np
import numpy.ma as ma
zorro = np.array([2, 0, 12, 12, 0])
masked_zorro = ma.masked_less_equal(a, 0)
print('Mean Zorro: {} vs. Mean masked Zorro: {}'.format(zorro.mean(), masked_zorro.mean()))

RTFM answer: https://numpy.org/doc/stable/reference/ ... #rationale

Cuchulainn · October 1st, 2020, 5:40 pm

They are a convenient way of creating masks, e.g. for missing values or special values indicating missing values (in some surveys negative numbers are used to indicate that the answer wasn't obtained for various reasons).
Code: Select all
import numpy as np
import numpy.ma as ma
zorro = np.array([2, 0, 12, 12, 0])
masked_zorro = ma.masked_less_equal(a, 0)
print('Mean Zorro: {} vs. Mean masked Zorro: {}'.format(zorro.mean(), masked_zorro.mean()))
RTFM answer: https://numpy.org/doc/stable/reference/ ... #rationale

Thanks; I was looking for a second opinion, maybe an application that no one else thought of.

katastrofa · January 4th, 2021, 10:33 am

{True: 0, 1: False}

Cuchulainn · January 4th, 2021, 9:14 pm

{True: 0, 1: False}

Inconceivable.

katastrofa · January 5th, 2021, 12:49 am

You keep using Pythin's bool... I don't think it means what you think it means.

I = [0,1,2]
l[True]
l[False]

Cuchulainn · January 5th, 2021, 1:00 pm

You keep using Pythin's bool... I don't think it means what you think it means.

I = [0,1,2]
l[True]
l[False]

Doublethink requires using logic against logic or suspending disbelief in the contradiction.

// Python 4.9

tags · April 23rd, 2021, 12:44 pm

Is there clean way to compare two separate dataframes in long format (dimension1, dimension2, .... dimensionN, value) ?
I have a pile of csv files, one new published each month, containing accumulated values and I need to show what has been going during each month. Any comment on this much welcome!

EDIT: maybe I pass all the dimensions as as many index levels (multiindex feature of pd.DataFrame) while values are left as pandas columns. it is then easy to make the diff between rows in different columns.

katastrofa · April 23rd, 2021, 7:42 pm

You didn't give too many details, but is the diff enough to compare such multidimensional datasets?
Can the components be correlated? Python has lots of nice clustering algorithms, multidimensional scaling, etc. (mostly sklearn) for that. Functional data analysis, baby!

You can simply do diff on a multidimensional df, I think.

tags · April 23rd, 2021, 8:03 pm

thanks for the suggestion kat. apologies i wasn't clear enough earlier. y dataset was simple fundamental market data. i went with the multi-indexing solution that popped in my mind (the one I quickly commented in my previous post). i hadn't visited the sklearn website for agesm i have to say. the library seems to have grown substantially.

tags · May 1st, 2022, 8:00 pm

PyScript is a framework that allows users to create rich Python applications in the browser using a mix of Python with standard HTML

Python tricks

Re: Python tricks

Re: Python tricks

Re: Python tricks

Re: Python tricks

Re: Python tricks

Re: Python tricks

Re: Python tricks

Re: Python tricks

Re: Python tricks

Re: Python tricks

Re: Python tricks

Re: Python tricks

Re: Python tricks

Re: Python tricks

Re: Python tricks