Serving the Quantitative Finance Community

 
User avatar
Cuchulainn
Topic Author
Posts: 20253
Joined: July 16th, 2004, 7:38 am
Location: 20, 000

The "best" method to compute bivariate cumulative normal distribution

January 2nd, 2016, 6:20 am

QuoteOriginally posted by: EBalQuoteOriginally posted by: CuchulainnWhat is the best method for"Best" is in eye of beholder but the following are a good start:1. Efficiency (aka fast, the easy one)2. Adaptability to 1) other distributions 2) multi-variate versions3. Robustness for all x,y and rho4. Accuracy for all x,y and rho5. Effort needed to implement the numerical approximation (I'm assuming no one has a closed solution ;).. but you never can tell)6. Maintainability of the code7. Double and multiprecision versionsEast-west, what's best?Taking what list1 suggested one step further, you can transform this double integral to an ordinary integral, and use your favorite method to compute it[$]\frac{1}{\sqrt{2\pi}} \int_{-\infty}^b dy\, e^{-y^2/2} N\left(\frac{a-\rho y}{\sqrt{1-\rho^2}}\right)[$]where [$]N[$] is cumulative normal function. Most likely, there is no closed form solution to the ordinary integral, except for special values of [$]a[$], [$]\rho[$].This is interesting. It can be applied to write a trivariate normal integral in terms of 1d integral of exp(-x^2) X bivariate integral (Genz).
 
User avatar
Cuchulainn
Topic Author
Posts: 20253
Joined: July 16th, 2004, 7:38 am
Location: 20, 000

The "best" method to compute bivariate cumulative normal distribution

January 2nd, 2016, 7:08 am

QuoteOne can differentiate with respect to a and b above. Then we get a Goursat PDE which evolves into a characteristic boundary value problem by giving zero conditions at minus infinity in x and y.We then use a finite difference method to the Goursat PDE and the results are consistent with Genz/West. The algorithm produces a matrix of values.Goursat can be applied to trivariate and quadvariate numerically. I have not seen the mathematical foundations for e.g. trivariate PDEu_xyz = f(x,y,z)u_xyzp = f(x,y,z,p)// I have not yet been able to emulate [$]M(a,b;\rho) = M(b,a;\rho)[$] in the numeric scheme because doing so means you only have to compute half the matrix.
Last edited by Cuchulainn on January 1st, 2016, 11:00 pm, edited 1 time in total.
 
User avatar
Traden4Alpha
Posts: 3300
Joined: September 20th, 2002, 8:30 pm

The "best" method to compute bivariate cumulative normal distribution

January 2nd, 2016, 4:43 pm

QuoteOriginally posted by: CuchulainnWhat is the best method for"Best" is in eye of beholder but the following are a good start:A few comments on the requirements.1. Efficiency (aka fast, the easy one)This is easy to specify but hard to do! Yet there is one underspecified aspect to this requirement which is the context of usage. One type of usage would measure latency in independent computations of N(a,b,rho). Another might measure throughput in which one is computing N(a,b,rho) for a pattern in {[a,b,rho]}. If one has already computed N(a,b,rho), might it be easy to compute N(a+∆,b,rho) incrementally from already computed values.2. Adaptability to 1) other distributions 2) multi-variate versionsThis would seem to favor a generic, brute-force multi-dimensional integration solution (perhaps with some adaptive magic for dx and dy). Are requirements #1 and #2 antithetical to each other?3. Robustness for all x,y and rhoThe only obvious robustness issue happens if rho -> 1.4. Accuracy for all x,y and rhoThis requirement is ill-defined. Is accuracy measured in relative or absolute terms? Is accuracy measured in worst-case, mean-square, or average-absolute error terms?5. Effort needed to implement the numerical approximation (I'm assuming no one has a closed solution ;).. but you never can tell)This seems to have similar issues as requirement #3. The more we use logic that is specific to this distribution, the we've added effort to the numerical approximation.6. Maintainability of the codeThis also favors the generic brute force method.7. Double and multi precision versionsThis seems linked to requirements #1, 3, 4, 5, and 6.8. ParallelisationDo you require SIMD or MIMD parallelism? Are you trying to minimize latency or maximize throughput?
 
User avatar
list1
Posts: 827
Joined: July 22nd, 2015, 2:12 pm

The "best" method to compute bivariate cumulative normal distribution

January 11th, 2016, 5:22 pm

QuoteOriginally posted by: CuchulainnQuoteOne can differentiate with respect to a and b above. Then we get a Goursat PDE which evolves into a characteristic boundary value problem by giving zero conditions at minus infinity in x and y.We then use a finite difference method to the Goursat PDE and the results are consistent with Genz/West. The algorithm produces a matrix of values.Goursat can be applied to trivariate and quadvariate numerically. I have not seen the mathematical foundations for e.g. trivariate PDEu_xyz = f(x,y,z)u_xyzp = f(x,y,z,p)// I have not yet been able to emulate [$]M(a,b;\rho) = M(b,a;\rho)[$] in the numeric scheme because doing so means you only have to compute half the matrix.As far as I remember there exists Goursat problem. It is boundary problem of the type [$] u\,_{x\,y} \,= \, f ( x , y ) \; ,\:( x , y ) \in ( 0 , X ] \times ( 0 , Y ] [$]with boundary conditions [$] u ( x , 0 ) \,= \, h ( x ) \;, \; u ( 0 , y ) \,= \, g ( y ) \;, \; h ( 0 ) \,= \,g ( 0 )[$]In your case the double integral can be double differentiated and you arrive at the equation. But boundary conditions will be on u ( x , - [$]\infty [$] ) = 0 , u ( - [$]\infty [$] , y) = 0 and you should integrate over infinite region. I afraid that by differentiated double integral we arrive at the problem that does not look easier than initial problem
Last edited by list1 on January 10th, 2016, 11:00 pm, edited 1 time in total.
 
User avatar
Cuchulainn
Topic Author
Posts: 20253
Joined: July 16th, 2004, 7:38 am
Location: 20, 000

The "best" method to compute bivariate cumulative normal distribution

January 11th, 2016, 5:57 pm

QuoteOriginally posted by: list1QuoteOriginally posted by: CuchulainnQuoteOne can differentiate with respect to a and b above. Then we get a Goursat PDE which evolves into a characteristic boundary value problem by giving zero conditions at minus infinity in x and y.We then use a finite difference method to the Goursat PDE and the results are consistent with Genz/West. The algorithm produces a matrix of values.Goursat can be applied to trivariate and quadvariate numerically. I have not seen the mathematical foundations for e.g. trivariate PDEu_xyz = f(x,y,z)u_xyzp = f(x,y,z,p)// I have not yet been able to emulate [$]M(a,b;\rho) = M(b,a;\rho)[$] in the numeric scheme because doing so means you only have to compute half the matrix.As far as I remember there exists Goursat problem. It is boundary problem of the type [$] u\,_{x\,y} \,= \, f ( x , y ) \; ,\:( x , y ) \in ( 0 , X ] \times ( 0 , Y ] [$]with boundary conditions [$] u ( x , 0 ) \,= \, h ( x ) \;, \; u ( 0 , y ) \,= \, g ( y ) \;, \; h ( 0 ) \,= \,g ( 0 )[$]In your case the double integral can be double differentiated and you arrive at the equation. But boundary conditions will be on u ( x , - [$]\infty [$] ) = 0 , u ( - [$]\infty [$] , y) = 0 and you should integrate over infinite region. I afraid that by differentiated double integral we arrive at the problem that does not look easier than initial problemThis is a correct formulation of the characteristic boundary value problem for the Goursat PDE.The presence of infinities is not an issue because:1. Domain truncation to [A,x] X [B,y] (e.g. A = B = -6)2. Domain transformation [$](-\infty [$] , y) to (0,y) to get a new transformed PDE(of course, prove 1/2 are well-posed)In either case we can approximate the new PDE by FDM and my computed numerical results are very good/robust and fast.
Last edited by Cuchulainn on January 10th, 2016, 11:00 pm, edited 1 time in total.
 
User avatar
Cuchulainn
Topic Author
Posts: 20253
Joined: July 16th, 2004, 7:38 am
Location: 20, 000

The "best" method to compute bivariate cumulative normal distribution

January 11th, 2016, 6:23 pm

QuoteOriginally posted by: outrunI once proposed to add this to boost based on the G. West paper. An obstacle was defining the interface, especially the 3d version would require something like a cor matrix (2d only a cor value). Another issue is that that don't easily accept methods that don't have full machine precision.But my first choice would be G. West.The correlation matrix the easy part. It just requires a single Cholesky?A 3d version could be done using Genz paper. But the maths is tricky and I don't think it is so easy to make it robust. But someone can do it C++, which has not been don AFAIK. I've done 3d and 4d using Goursat PDE. Unfortunately, Graeme is not with us..
Last edited by Cuchulainn on January 10th, 2016, 11:00 pm, edited 1 time in total.
 
User avatar
Cuchulainn
Topic Author
Posts: 20253
Joined: July 16th, 2004, 7:38 am
Location: 20, 000

The "best" method to compute bivariate cumulative normal distribution

January 11th, 2016, 6:31 pm

Would Euler have solved this problem by asymptotics, pde or integration? Think about it.
Last edited by Cuchulainn on January 10th, 2016, 11:00 pm, edited 1 time in total.
 
User avatar
list1
Posts: 827
Joined: July 22nd, 2015, 2:12 pm

The "best" method to compute bivariate cumulative normal distribution

January 11th, 2016, 6:35 pm

This is a correct formulation of the characteristic boundary value problem for the Goursat PDE.The presence of infinities is not an issue because:1. Domain truncation to [A,x] X [B,y] (e.g. A = B = -6) ////////////////// It is clear2. Domain transformation (−∞ , y) to (0,y) to get a new transformed PDE \\\\\\\\\\\\\\\\\\\\\\\\ It makes sense.(of course, prove 1/2 are well-posed) ///////////// It is not clear what does it mean "prove 1/2 are well-posed"It also probably that constant [$]\rho[$] will be changed.It is also interesting to get first values u on characteristics after infinite gegion reduction.
Last edited by list1 on January 10th, 2016, 11:00 pm, edited 1 time in total.
 
User avatar
Cuchulainn
Topic Author
Posts: 20253
Joined: July 16th, 2004, 7:38 am
Location: 20, 000

The "best" method to compute bivariate cumulative normal distribution

January 11th, 2016, 6:44 pm

QuoteOriginally posted by: list1This is a correct formulation of the characteristic boundary value problem for the Goursat PDE.The presence of infinities is not an issue because:1. Domain truncation to [A,x] X [B,y] (e.g. A = B = -6) ////////////////// It is clear2. Domain transformation (−∞ , y) to (0,y) to get a new transformed PDE \\\\\\\\\\\\\\\\\\\\\\\\ It makes sense.(of course, prove 1/2 are well-posed) ///////////// It is not clear what does it mean "prove 1/2 are well-posed"It also probably that constant [$]\rho[$] will be changed.It is also interesting to get first values u on characteristics after infinite gegion reduction.1/2 means PDEs 1 and 2 lead to well-posed problems (Hadamard)I have applied FDM to PDE 1. QuoteIt is also interesting to get first values u on characteristics after infinite region reduction.Values == 0.0
Last edited by Cuchulainn on January 10th, 2016, 11:00 pm, edited 1 time in total.
 
User avatar
Cuchulainn
Topic Author
Posts: 20253
Joined: July 16th, 2004, 7:38 am
Location: 20, 000

The "best" method to compute bivariate cumulative normal distribution

January 11th, 2016, 6:45 pm

If you are up to the challenge, try Genz 3d modelGood luck. When finished, we can compare with Goursat.
Last edited by Cuchulainn on January 10th, 2016, 11:00 pm, edited 1 time in total.
 
User avatar
Cuchulainn
Topic Author
Posts: 20253
Joined: July 16th, 2004, 7:38 am
Location: 20, 000

The "best" method to compute bivariate cumulative normal distribution

January 13th, 2016, 7:53 pm

I have built a small automagic framework (random generation of parameters in C++ <random>, store values in Boost uBLAS matrices) to test Genz algorithm versus Goursat PDE. Conclusion:1. The C code for Genz gives the same results as Goursat (da's good).2. In release mode Goursat is between 14 and 30 times faster than Genz. The code for Goursat is ~ 10 lines.3. We can speed up Genz by loop parallelization.The generated report is like this:** Trial #0 Rectangle: [-6,4.4499] X [-6,0.0920059]Correlation: -0.923946Subdivisions NX, NY: 5608,5661Elapsed time Goursat PDE: 1.00002sElapsed time Genz West: 30.7706sLinfinity and L2 norms: 4.82373e-05, 4.18413e-05Goursat : 0.5366490122Genz West: 0.5366490098Current max error: 4.823728345e-05** Trial #1Rectangle: [-6,0.689325182] X [-6,-3.559694106]Correlation: -0.626499745Subdivisions NX, NY: 5262,5117Elapsed time Goursat PDE: 0.8300166sElapsed time Genz West: 15.150303sLinfinity and L2 norms: 1.144792956e-09, 4.463956166e-10Goursat : 2.971815024e-06Genz West: 2.97181765e-06Current max error: 4.823728345e-05** Trial #2Rectangle: [-6,-5.589745361] X [-6,3.929944493]Correlation: -0.3239965339Subdivisions NX, NY: 5409,5795Elapsed time Goursat PDE: 0.9400188sElapsed time Genz West: 17.1903438sLinfinity and L2 norms: 1.118498265e-06, 2.085454818e-06Goursat : 1.023675424e-08Genz West: 1.12030373e-08Current max error: 4.823728345e-05 /// ...** Trial #101Rectangle: [-6,5.667568808] X [-6,5.982663765]Correlation: -0.4486942594Subdivisions NX, NY: 5831,5023Elapsed time Goursat PDE: 1.2300246sElapsed time Genz West: 17.8803576sLinfinity and L2 norms: 0.0001616173889, 0.0001230238909Goursat : 0.9999999908Genz West: 0.9999999917Current max error: 0.0001616173889Final max error: 0.0001616173889[0:4.823728345e-05] , [1:1.144792956e-09] , [2:1.118498265e-06] , [3:2.395796051e-05] , [4:5.866559865e-06] , [5:0.0001072721908] , [6:1.063495919e-06] , [7:1.698521372e-05] , [8:2.804686176e-07] , [9:9.137599408e-05] , [10:3.904865893e-07] , [11:5.409969692e-05] , [12:0.0001383533517] , [13:0.0001206261733] , [14:8.516059047e-06] , [15:8.474615985e-05] , [16:1.284307582e-05] , [17:6.824058984e-05] , [18:5.971917145e-06] , [19:0.000146584181] , [20:8.238167043e-13] , [21:4.211089936e-09] , [22:8.61330245e-06] , [23:0.00015917265] , [24:7.581637051e-10] , [25:2.426752722e-13] , [26:0.0001363983426] , [27:5.613169928e-06] , [28:5.581172838e-06] , [29:1.934609148e-10] , [30:1.133181074e-05] , [31:1.413290172e-05] , [32:2.005320913e-05] , [33:8.433679815e-05] , [34:8.284459694e-06] , [35:7.627997376e-05] , [36:8.259011483e-05] , [37:8.05159357e-06] , [38:3.194159839e-06] , [39:8.451641854e-06] , [40:7.159648002e-07] , [41:6.840784597e-05] , [42:1.000952339e-05] , [43:5.808672152e-06] , [44:8.240898005e-10] , [45:0.0001282703168] , [46:2.16618588e-06] , [47:1.500479385e-06] , [48:1.97626663e-05] , [49:1.547923114e-05] , [50:5.910753463e-08] , [51:3.43831961e-05] , [52:8.380697574e-06] , [53:5.638146842e-06] , [54:0.0001348624906] , [55:3.641252259e-06] , [56:1.710369016e-05] , [57:5.637045892e-07] , [58:2.262943273e-06] , [59:4.049110916e-06] , [60:5.865316449e-06] , [61:1.560741375e-06] , [62:0.0001148683087] , [63:1.927912459e-06] , [64:4.476014156e-09] , [65:7.417168372e-06] , [66:1.456790645e-07] , [67:5.024356035e-06] , [68:9.343238457e-06] , [69:0.0001289501649] , [70:3.368413986e-06] , [71:4.981258397e-06] , [72:6.041018273e-06] , [73:8.318213256e-05] , [74:6.761006932e-05] , [75:8.515063427e-06] , [76:3.823544647e-06] , [77:1.335054308e-06] , [78:7.757811545e-21] , [79:4.028731501e-14] , [80:5.928517055e-06] , [81:0.0001305634464] , [82:5.298276209e-06] , [83:1.497368312e-05] , [84:0.0001054954045] , [85:2.582948462e-09] , [86:1.598449682e-255] , [87:1.425704641e-07] , [88:2.685411137e-06] , [89:4.609355473e-22] , [90:3.382135114e-06] , [91:1.572435288e-08] , [92:4.410223196e-14] , [93:5.311261604e-06] , [94:3.667685203e-06] , [95:8.620246384e-06] , [96:1.592048428e-05] , [97:7.201309276e-87] , [98:9.904806466e-05] , [99:8.031648708e-16] , [100:5.203155049e-06] , [101:0.0001616173889] ,
Last edited by Cuchulainn on January 12th, 2016, 11:00 pm, edited 1 time in total.
 
User avatar
Cuchulainn
Topic Author
Posts: 20253
Joined: July 16th, 2004, 7:38 am
Location: 20, 000

The "best" method to compute bivariate cumulative normal distribution

January 13th, 2016, 8:43 pm

Values is the matrix of cdf in x and y for a fixed rho.
Last edited by Cuchulainn on January 12th, 2016, 11:00 pm, edited 1 time in total.
 
User avatar
Cuchulainn
Topic Author
Posts: 20253
Joined: July 16th, 2004, 7:38 am
Location: 20, 000

The "best" method to compute bivariate cumulative normal distribution

January 14th, 2016, 9:28 am

QuoteOriginally posted by: outrunThe West code is evaluating a (max 10th order) polynomial, no? How can a 5000x5000 PDE be 15x faster than that?10th degree polynomial?? I did not see that. Don't know what you are trying to say. Have you run/examined the Genz C code? There's a lot of if-else logic in those loops.I create a _matrix_ of values in both cases. Look at the output. The point about PDE is that I calculate all values up to the values a and b and use simple arithmetic like + and *.// The GW code can be OpenMP and I get > 2 speedup.
Last edited by Cuchulainn on January 13th, 2016, 11:00 pm, edited 1 time in total.