I am using Aclam's formula. It works well with doubles, but not so good with floats. I am thinking to use Shaw-Brickman with floats.

I miss the performance test ...inverse cdfN using Shaw-Brickman: there are various versions and playing withthe 'original' one (using a rational approximation of degree = (6,6)) it seems tome, that the relative error (for that branch less way) is up to ~ 10 - 20 FTL_EPS. Edited 09 Oct 2010: a more careful test and using their version 4 with degree (8,8)from http://arxiv.org/abs/0901.0638 shows relative error are only up to ~ 1 FTL_EPS

Last edited by AVt on October 8th, 2010, 10:00 pm, edited 1 time in total.

Well, it turns out that Intel C++ compilers come with so called Short Vector Math Library (SVML). Every cmath and some C99 functions have both single precision and double precision versions implemented in SVML. Looks like what was Intel Approximate Math Library became SVML. Intel compiler automatically replaces cmath functions with SVML versions in vectorized loops. Unfortunately SVML is not available as standalone library.AVt,I saw that paper. In section 5.3 they claim that ICNDfloat2 has better single precision accuracy than Acklam's approximation. Looks like it is just what I need.Acklam's approximation has higher double precision accuracy, but for typical Monte-Carlo usage Shaw-Brickman is good enough. Another alternative is Ziggurat method.It is very fast and doesn't have approximation issues (just 1 floating point multiplication). However, Ziggurat doesn't work with low discrepancy sequences.

Last edited by renorm on October 8th, 2010, 10:00 pm, edited 1 time in total.

For ICNDfloat2 I see relative errors up to ~ 4*FLT_EPS: x = .230665847659111023gives -.736656129360198976, Maple based on 105 Digits says -.736655795076218376,where input is x = 15479723/16777216*pow(2,-2) and output is rounded to singlesin IEEE nearest, giving a relative error of -6/12359033 or ~ 4*FLT_EPS.Compiling to a single precision DLL was with VC2005. Since interfacing to Mapleis a bit of blackbox it would be worth to cross check the above indepentend(I do not want - but, of course, I may have made an error myself in the tests ...)

I get similar error. One issue is that, the compiler can change intermediate steps to use doubles, because double precision operations are no slower than single precision operation on Intel CPUs. What you see could be the formula error and not the error due to the limited hardware precision. My implementation uses single precision SIMD, which is not auto-promoted into double precision.

- Cuchulainn
**Posts:**62129**Joined:****Location:**Amsterdam-
**Contact:**

The (1,1) Pade approximation to exp(-z) is about 5 times faster. I tried it in a binomial tree pricer. Did not look at it since a while but continued fractions might be your friend.Have you looked at how the Fortran boys do it? For example, E02RAF

Last edited by Cuchulainn on October 9th, 2010, 10:00 pm, edited 1 time in total.

- Cuchulainn
**Posts:**62129**Joined:****Location:**Amsterdam-
**Contact:**

QuoteOriginally posted by: outrunQuoteOriginally posted by: CuchulainnThe (1,1) Pade approximation to exp(-z) is about 5 times faster. I tried it in a binomial tree pricer. Did not look at it since a while but continued fractions might be your friend.Nice. That would giveexp(z) = which is indeed very compact.. but what is the precision?This is O(z^2) by taking the series expansion of R(1,1) and comparing to exp(z). Higher order Pade R(m,n) give even better accuracy.BTW R(1,1) aka Cayley transform for Schroedinger's PDE.You would need to examine the radius of convergence to check for which values of z .. but it's all documented in the literature. Maybe used a swith depending on the value of z; if in a radius use Pade, otherwise exp(z)? In many apps z will be like exp(-r*deltaT)..Pade appoximants are very important in ODEU_t + AU = 0sol U(t) = exp(-At).Then R(1,1) == Crank NiclsonR(1,0) and R(0.1) == Euler twinsDo Microsoft /Intel use Pade, or is it something else? I think it is worth having a look at these approximations.

Last edited by Cuchulainn on October 10th, 2010, 10:00 pm, edited 1 time in total.

- Cuchulainn
**Posts:**62129**Joined:****Location:**Amsterdam-
**Contact:**

QuoteOriginally posted by: outrunQuoteOriginally posted by: CuchulainnDo Microsoft /Intel use Pade, or is it something else? This is how Intel does it: The Computation of Transcendental Functions on the IA-64 ArchitectureI had a look at this. Fair enough.Now play 'advocaat van de duivel'.They wrote x = Nlog(2)/2^K + rand then exp(x) = 2^(N/2)^K * exp(r)Now:1. They they use Remez to approxmate exp(r) by a polynomial of degree 5 (==> Solve a matrix system). 2. They get typically r < log(2)/64 = 0.01 which can be done with Pade, so why use Remez?You could try Intel solution versus the bespoke R(1,1) solution. I miss the rationale for Remez (btw Remez uses rational functions, they mention polynomials which can be notoriously bad). They build table lookups as well.

Last edited by Cuchulainn on October 10th, 2010, 10:00 pm, edited 1 time in total.

- Cuchulainn
**Posts:**62129**Joined:****Location:**Amsterdam-
**Contact:**

Here is a random search on computing log(1+x) with PadeThe rational approximation is particularly good for series with alternating terms and poor polynomial convergence like log(1+x). The polynomial series for exp(x) converges too well gain much benefit from Pade approximation using a first order rational polynomial. Longer versions of optimsed rational polynomials are frequently used for the computation of transcendental functions on larger machines. It is worth considering using them on microcontrollers too, because they provide a good compromise in accuracy, size and speed. Compare with bespoke paper page 6 to calculate Ln (polynomial of degree 6, which I shudder at..) On the other hand, maybe Remez works with a wider range of parameters. Maybe someone has looked into this issue. //BTW Remez is is Boost Maths.

Last edited by Cuchulainn on October 10th, 2010, 10:00 pm, edited 1 time in total.

This is more or less (besides the table method) a variant of what I said above aboutLuke, Moshier and the ancients.It 'completely' depends, where one wants to use exp(x), otherwise reduction is needed.If it for discounting factors only, then ~ 5% * 3 years or abs(x) <= 0.15 is enough andnot too hard to give.

Outrun,The full book for Handbook of Mathematical Functions With Formulas, Graphs, and Mathematical Tables Abramowitz and Stegun is here:http://www.math.sfu.ca/~cbm/aands/

GZIP: On