Here I try to explain the method in more detail. Suppose we want to calculate the density of an SDE given as

[$]dX(t)=\mu (t) X(t)^{\beta} dt + \sigma (t) X(t)^{\gamma} dz(t) [$] Eq(1)

The second order Ito-Taylor expansion of the above SDE is given as

[$]X(t)=X(t_0) + \mu X(t_0)^{\beta} \int_{t_0}^t ds + \sigma X(t_0)^{\gamma} \int_{t_0}^t dz(s)[$]

[$]+\mu \sigma \beta X(t_0)^{\beta + \gamma -1} \int_{t_0}^t \int_{t_0}^s dz(v) ds [$]

[$]+\mu \sigma \gamma X(t_0)^{\beta + \gamma -1} \int_{t_0}^t \int_{t_0}^s dv dz(s)[$]

[$]+{\sigma}^2 \gamma X(t_0)^{2 {\gamma} -1 } \int_{t_0}^t \int_{t_0}^s dz(v) dz(s)[$]

[$]+ .5 {\sigma}^3 \gamma (\gamma-1) X(t_0)^{3 \gamma -2} \int_0^t \int_0^s dv dz(s)[$]

[$]+ {\mu}^2 \beta X(t_0)^{2 \beta -1} \int_{t_0}^t \int_{t_0}^s dv ds + .5 \beta (\beta -1) \mu {\sigma}^2 X(t_0)^{\beta + 2 \gamma -2} \int_{t_0}^t \int_{t_0}^s dv ds[$] Eq(2)

if [$]\Delta t=t-t_0[$], we can write the above expressions in terms of time variances and standard normal as

[$]X(t)=X(t_0) + \mu X(t_0)^{\beta} \Delta t + \sigma X(t_0)^{\gamma} \sqrt{\Delta t} N[$]

[$] +\mu \sigma \beta X(t_0)^{\beta + \gamma -1} \frac{1}{\sqrt{3}} {\Delta t}^{1.5} N[$]

[$]+\mu \sigma \gamma X(t_0)^{\beta + \gamma -1} (1-\frac{1}{\sqrt{3}}) {\Delta t}^{1.5} N[$]

[$]+{\sigma}^2 \gamma X(t_0)^{2 {\gamma} -1 } \frac{\Delta t}{2} (N^2-1) [$]

[$]+.5 \gamma (\gamma - 1) {\sigma}^3 X(t_0)^{3 \gamma -2} \frac{1}{\sqrt{3}} {\Delta t}^{1.5} N [$]

[$] + {\mu}^2 \beta X(t_0)^{2 {\beta} -1} \frac{{\Delta t}^2}{2}[$]

[$]+ .5 \beta (\beta -1) \mu {\sigma}^2 X(t_0)^{\beta + 2 \gamma -2} \frac{{\Delta t}^2}{2}[$] Eq(3)

In my program, I simply wrote the expression so that powers of N are lumped together, so just re-arranging above equation, we get

[$]X(t)=X(t_0) + \mu X(t_0)^{\beta} {\Delta t} [$]

[$]+ {\mu}^2 \beta X(t_0)^{2 {\beta} -1} \frac{{\Delta t}^2}{2}[$]

[$]+ .5 \beta (\beta -1) \mu {\sigma}^2 X(t_0)^{\beta + 2 \gamma -2} \frac{{\Delta t}^2}{2} -{\sigma}^2 \gamma X(t_0)^{2 {\gamma} -1 } \frac{\Delta t}{2}[$]

[$] + \sigma X(t_0)^{\gamma} \sqrt{\Delta t} N + \mu \sigma \beta X(t_0)^{\beta + \gamma -1} \frac{1}{\sqrt{3}} {\Delta t}^{1.5} N[$]

[$] +.5 \gamma (\gamma -1) {\sigma}^3 X(t_0)^{3 \gamma - 2} \frac{1}{\sqrt{3}} {\Delta t}^{1.5} N[$]

[$] + \mu \sigma \gamma X(t_0)^{\beta + \gamma -1} (1- \frac{1}{\sqrt{3}}) {\Delta t}^{1.5} N [$]

[$]+{\sigma}^2 \gamma X(t_0)^{2 {\gamma} -1 } \frac{\Delta t}{2} N^2[$] Eq(4)

So basically, we can see that one step evolution computational effort is exactly comparable to one step effort in monte carlo when the monte carlo simulation is based on two hermite polynomials as above.

In my algorithm, I simply used 81 paths and assigned a normal value to each path. The simulation algorithm for nth path would be

[$]X_n(t)=X_n(t_0) + \mu X_n(t_0)^{\beta} {\Delta t} [$]

[$]+ {\mu}^2 \beta X_n(t_0)^{2 {\beta} -1 } \frac{{\Delta t}^2}{2}[$]

[$]+ .5 \beta (\beta -1) \mu {\sigma}^2 X_n(t_0)^{\beta + 2 \gamma -2} \frac{{\Delta t}^2}{2} -{\sigma}^2 \gamma X_n(t_0)^{2 {\gamma} -1 } \frac{\Delta t}{2}[$]

[$] + \sigma X_n(t_0)^{\gamma} \sqrt{\Delta t} N_n + \mu \sigma \beta X_n(t_0)^{\beta + \gamma -1} \frac{1}{\sqrt{3}} {\Delta t}^{1.5} N_n[$]

[$] +.5 \gamma (\gamma -1) {\sigma}^3 X_n(t_0)^{3 \gamma - 2} \frac{1}{\sqrt{3}} {\Delta t}^{1.5} N_n[$]

[$] + \mu \sigma \gamma X_n(t_0)^{\beta + \gamma -1} (1- \frac{1}{\sqrt{3}}) {\Delta t}^{1.5} N_n [$]

[$]+{\sigma}^2 \gamma X_n(t_0)^{2 {\gamma} -1 } \frac{\Delta t}{2} N_n^2[$] Eq(5)

In my implementation [$]N_n[$], the normal variable grid, ranges from -4 to +4 with an increment of .1, so there are only 81 paths/branches each for a specific value of associated normal variable value.

You could safely use the above algorithm but the variance added over normal increments would go to [$]T[$] if there are T time intervals of equal width. And you would have to map it using a normal density of variance [$]\frac{1}{T}[$]

In my algorithm, I simply used a slight modification and weighted every normal random number value with weight [$]\sqrt{\frac{1}{T}}[$] so that variance of weighted normal would add to one and we could simply map the evolution of SDE at terminal time (after T time intervals) to a standard normal. Please note that I have used [$]T[$] as number of time intervals and it is not the absolute value of time anywhere. Here is the algorithm I used.

[$]X_n(t)=X_n(t_0) + \mu X_n(t_0)^{\beta} {\Delta t} [$]

[$]+ {\mu}^2 \beta X_n(t_0)^{2 {\beta} -1 } \frac{{\Delta t}^2}{2}[$]

[$]+ .5 \beta (\beta -1) \mu {\sigma}^2 X_n(t_0)^{\beta + 2 \gamma -2} \frac{{\Delta t}^2}{2} -{\sigma}^2 \gamma X_n(t_0)^{2 {\gamma} -1 } \frac{\Delta t}{2}[$]

[$] + \sigma X_n(t_0)^{\gamma} \sqrt{\Delta t} (\sqrt{\frac{1}{T}} N_n) + \mu \sigma \beta X_n(t_0)^{\beta + \gamma -1} \frac{1}{\sqrt{3}} {\Delta t}^{1.5} (\sqrt{\frac{1}{T}} N_n)[$]

[$] +.5 \gamma (\gamma -1) {\sigma}^3 X_n(t_0)^{3 \gamma - 2} \frac{1}{\sqrt{3}} {\Delta t}^{1.5} (\sqrt{\frac{1}{T}} N_n)[$]

[$] + \mu \sigma \gamma X_n(t_0)^{\beta + \gamma -1} (1- \frac{1}{\sqrt{3}}) {\Delta t}^{1.5} (\sqrt{\frac{1}{T}} N_n) [$]

[$]+{\sigma}^2 \gamma X_n(t_0)^{2 {\gamma} -1 } \frac{\Delta t}{2} (\sqrt{\frac{1}{T}} N_n)^2[$] Eq(6)

In my implementation [$]N_n[$], the normal variable grid, ranges from -4 to +4 with an increment of .1, so there are only 81 paths/branches each for a specific value of associated normal variable value.

And finally, I calculated the density of SDE by doing a change of variable derivative with respect to standard normal variable as ( I have omitted time index)

[$]dX_n=\frac{X_{n+1}- X_{n-1}}{N_{n+1} - N_{n-1} }[$] Eq(7)

Here [$]X_n[$] is the value of X calculated along nth branch and [$]N_n[$] is the value of normal variable associated with the nth branch/ or equivalently at nth point on normal variable grid.

and then used the formual for density of SDE as

[$]f(X_n)=\frac{f(N_n)}{dX_N}[$] Eq(8)

In my implementation [$]f(N_n)[$] is the value of standard normal density at [$]N=N_n[$] which was the specific value of normal assigned to nth branch.

I used a standard normal density for calculation of SDE density since I used a weighting [$]\sqrt{\frac{1}{T}}[$] but for different possible weights, you would use a different variance of the normal density in formula of Eq(8). For example, if I used no weight (or unity weight) as in Eq(5), the squared weights summed over T intervals would give a weight of T and then I would have to use a variance of [$]\frac{1}{T}[$] for normal density in equation(8). Again, Please note that I have used [$]T[$] as number of time intervals and it is not the absolute value of time anywhere. Sorry for this slightly ambiguous notation.

And there could be a large number of ways of assigning weights to normal variable in the algorithm and mapping to normal density with different variance corresponding to weights used in the algorithm.