Hi,

Sorry if I'm on the wrong board here - new to the forum.

I'm trying to get to grips with Cython - just doing a basic explicit finite difference function and trying to test the performance gains of various implementations. I know my code is working, and it's an order of magnitude quicker than pure python/numpy, but the numba jit compilation is another 10x faster than my Cython code - is anyone familiar with C/Cython and able to spot the bottleneck in the following please? It's definitely something to do with my V[:,:] array but I don't know how to optimise this further.

Can obviously just use the numba version for speed but feel like I should be able to at least get to the same with Cython... so wondering what I've missed.

Thanks!!

Numpy/Numba versions (~1.5ms and 5 microseconds, respectively):

```
import numpy as np
import numba as nb
def FDEur_py(option_type, vol, r, K, T, n_ds):
ds = 2 * K / n_ds
dt = 0.9 / vol ** 2 / n_ds ** 2
s = np.arange(0,2*K+ds,ds)
n_dt = round(T / dt)
dt = T / n_dt
V = np.empty((n_ds+1, n_dt+1))
q = 1 if option_type == 'C' else -1
V[:,0] = np.maximum(q * (s - K),0)
for k in range(1,n_dt+1):
for i in range(1,n_ds):
delta = (V[i+1,k-1] - V[i-1,k-1]) / 2/ds
gamma = (V[i+1,k-1] - 2*V[i,k-1] + V[i-1,k-1]) / ds/ds
theta = -0.5 * vol ** 2 * s[i] ** 2 * gamma - r * s[i] * delta + r * V[i,k-1]
V[i,k] = V[i,k-1] - dt * theta
V[0,k] = V[0,k-1] * (1 - r * dt)
V[n_ds,k] = 2 * V[n_ds-1,k] - V[n_ds-2,k]
return V
FDEur_nb = nb.jit(FDEur_py)
```

Cython attempt (~50 microseconds):

```
%%cython
import numpy as np
cimport numpy as np
def FDEur(str option_type, float vol, float r, float K, float T, int n_ds):
cdef double ds = 2 * K / n_ds
cdef double dt = 0.9 / vol ** 2 / n_ds ** 2
cdef int n_dt = round(T / dt)
cdef double[:] s = np.zeros(n_ds+1)
cdef double[:,:] V = np.zeros((n_ds+1,n_dt+1))
cdef int q, k, i
dt = T / n_dt
q = 1 if option_type == 'C' else -1
for i in range(0,n_ds+1):
s[i] = i * ds
V[i,0] = max(q * (s[i] - K),0)
for k in range(1,n_dt+1):
for i in range(1,n_ds):
delta = (V[i+1,k-1] - V[i-1,k-1]) / 2/ds
gamma = (V[i+1,k-1] - 2*V[i,k-1] + V[i-1,k-1]) / ds/ds
theta = -0.5 * vol ** 2 * s[i] ** 2 * gamma - r * s[i] * delta + r * V[i,k-1]
V[i,k] = V[i,k-1] - dt * theta
V[0,k] = V[0,k-1] * (1 - r * dt)
V[n_ds,k] = 2 * V[n_ds-1,k] - V[n_ds-2,k]
return np.array(V)
```