- Cuchulainn
**Posts:**62395**Joined:****Location:**Amsterdam-
**Contact:**

QuoteOriginally posted by: AVtwhat are your "specifications"? you kept it a bit vague (intentionally?)for 'fast' one could always glance what the gaming/visual guys suggest,for 'compact' for those engineers around embedded systems,for 'correct' (but poor) for 'ancients' before IEEE normingsfor correct and not poor -s.th. like Moshier's cephes or SUN as approach Good list.For some values of z, you can get good approximations to exp(-z) by taking rational approximation that are 13 times faster than normal. And the type of application will warrant the kind of speedup that is necessary. At some stage you hit some equivalent of Amdahl's law.

Last edited by Cuchulainn on October 5th, 2010, 10:00 pm, edited 1 time in total.

- Cuchulainn
**Posts:**62395**Joined:****Location:**Amsterdam-
**Contact:**

hey, where's the exp_ps disappeared to ?? I was about to answer.. Anyways: So the question is to compute exp(z) 100*N times and test speed? What about accucracy in this test?

Last edited by Cuchulainn on October 5th, 2010, 10:00 pm, edited 1 time in total.

- Cuchulainn
**Posts:**62395**Joined:****Location:**Amsterdam-
**Contact:**

QuoteOriginally posted by: outrunQuoteOriginally posted by: Cuchulainnhey, where's the exp_ps disappeared to ?? I was about to answer..ha! I was testing if you were awake enough can do both cases: x = (0.00001*i);and x = exp(0.00001*i);Eternal vigilance!Not sure what the firste test does.

- Cuchulainn
**Posts:**62395**Joined:****Location:**Amsterdam-
**Contact:**

QuoteOriginally posted by: outrunQuoteOriginally posted by: Cuchulainnhey, where's the exp_ps disappeared to ?? I was about to answer.. Anyways: So the question is to compute exp(z) 100*N times and test speed? What about accucracy in this test?Yes exp is one case, accuracy too, ..but to have a baseline, a simple flop assignment "= 0.0001*i "The reason is that I can't even do an assignment as fast as renorm can do exp (and my exp is 10x as slow as the assignment). Is my compiler wrong?Clear. I will have a look at this in the coming number of time units.

Here is my test code. Compile it with USE_SSE2 macro.

Last edited by renorm on October 5th, 2010, 10:00 pm, edited 1 time in total.

That works on MS compiler only. To run it on GCC replace __declspec with __attribute__

My coding knowledge is much more limited ... anyway (and without evento try to improve the code or hoping to beat Intel's lib) and only forthe 'exp' function:As that SSE stuff refers to Moshier's cephes it is 'understandable' andthat reduces to approximate for abs(x) <= log(2)/2, where a polynomialis used (coefficients = cephes_exp_p0 etc).That polynomial (in Horner form) needs 7*additions + 6*multiplications(plus z = x*x).(x+x) / ( (.123858701145554193e-6+.165854291283872459*z)/ (.995148572817264210+.164816914752296508e-1*z) + 2.0-x ) + 1needs 6*additions + 2*multiplications + 2*divisions is rational and seemsto be almost as accurate at somewhat lower costs (coefficients as singlesshould become correct).(x+x) / (2.00000047683715820+(.166654065251350403-.272036157548427582e-(z+z))*z-x)+1with 6*additions+1*multiplication+1*division is also something to dare with,it still seems to have relative errors of ~ 1 FLT_EPSILON.Would be fine if somebody is willing to check by tests.If one is willing to accept larger errors (1% ?), then Cody-Waite reductionfor 1/ln(2) can be left off and cheaper approx are enough; one can eventhink about exp(x) = exp(x/2)^2 (or more brute towards x ~~ 0 or variants)

AVt,If you give me plain working code (even Matlab would do it), I can punch it in SSE.

GZIP: On