Serving the Quantitative Finance Community

 
User avatar
renorm
Topic Author
Posts: 1
Joined: February 11th, 2010, 10:20 pm

Optimized math functions

October 16th, 2010, 5:17 pm

C compiler treats 1/2 as zero. x + 1/2 would be equal to x, because division has higher precedence than addition.
Last edited by renorm on October 15th, 2010, 10:00 pm, edited 1 time in total.
 
User avatar
AVt
Posts: 90
Joined: December 29th, 2001, 8:23 pm

Optimized math functions

October 16th, 2010, 7:26 pm

Thx - I think it worked for me in VC2005 ... so better use 0.5fThe code below holds an explicit approximation for small inputsand I use some hacky floor function for speed (but did no do thatfor 'ldexp' or 'modulo').Compiling into a DLL and checking with Maple (where I convertedinputs and the approximated and the 'exact' output to IEEE single)this gives relative error below 2*FLT_EPS, see appended graphics(being aware of the warning, that internally it may compute beyondsingle precision, but leave that to better educated ones).Certainly an educated coder can do better - and I also was too lazyto check timing improvements.
Attachments
exp_single_relative_errors.gif.zip
(32.5 KiB) Downloaded 35 times
Last edited by AVt on October 15th, 2010, 10:00 pm, edited 1 time in total.
 
User avatar
renorm
Topic Author
Posts: 1
Joined: February 11th, 2010, 10:20 pm

Optimized math functions

October 17th, 2010, 12:54 pm

We are not there yet. FLOOR can be replaced with SSE2 conversion intrinsics. ldexpf should be dropped, because it is too slow. Because its 2nd argument is always integer, we can replace it with mul and left shift (assuming k=>0):expt*(1<<(k>>1))Also, we need to get rid off branching. What is the allowed range of input variable x?
Last edited by renorm on October 16th, 2010, 10:00 pm, edited 1 time in total.
 
User avatar
AVt
Posts: 90
Joined: December 29th, 2001, 8:23 pm

Optimized math functions

October 17th, 2010, 1:39 pm

Good ideas. The range for x is like for the usual floats, roughly -37 ... 38.Note that k is an actual integer and will be negative for x < ln(2)/4.For floor I only have some ASM version (I don't use SSE).Branching is not needed I think: using higher order approximationshould do (though I do not like polynomials of higher order withouta must), it was more because of the cited table method.Have to think about it. Edited: 1+(1+(.5000000000+(.1666657627+(.4166637361e-1+(.8363175206e-2+.1394111314e-2*x)*x)*x)*x)*x)*x should dofor |x| <= ln(2)/2 without branching for the return (but modifyingthe code accordingly, i.e. the factors used).
Last edited by AVt on October 16th, 2010, 10:00 pm, edited 1 time in total.
 
User avatar
renorm
Topic Author
Posts: 1
Joined: February 11th, 2010, 10:20 pm

Optimized math functions

October 17th, 2010, 3:09 pm

I can replace branching with bitwise operations. It will cost only few operations.
 
User avatar
AVt
Posts: 90
Joined: December 29th, 2001, 8:23 pm

Optimized math functions

October 17th, 2010, 3:48 pm

And the bitshift for a signed integer by switching to abs and branching as bitwise operation as well?
 
User avatar
Cuchulainn
Posts: 20256
Joined: July 16th, 2004, 7:38 am
Location: 20, 000

Optimized math functions

October 18th, 2010, 5:16 am

Just thinking out loud; if round off errors is one of the issues(?) what about interval arithmetic it supports the usual trigonometric and transendental functions. It is very 60's but is it useful now?
Last edited by Cuchulainn on October 17th, 2010, 10:00 pm, edited 1 time in total.
 
User avatar
AVt
Posts: 90
Joined: December 29th, 2001, 8:23 pm

Optimized math functions

October 18th, 2010, 7:10 pm

I do wait for a concrete suggestion by renorm to learn about coding ...Concerning the (relative) errors: hoping for err < 0.5 ULP and all tweaks is likehoping for the quality of already implemented library solution, which may berelated to hardware ... unlikely to get it faster.The error of 2 EPS mainly has two sources:- evaluating a polynomial (for the central region) is not exact using the givenHorner form, even if the given polynomial is a minimax solution with err belowESP/8 (or so, and as exp(0) = 1 relative and abs error are the 'same'). The reason is, that the error increases by degree*EPS as a rule of thumb.Therefore one wants low degrees (or smaller ranges ---> table method).- the additive range reduction x - integer*ln(2) [or in more educated words:the modulus of x in Reals / Integers*ln(2)] is not exact enough (BTW thatalso occurs for trigonometric function as remainders of Pi, where the wayis Cody-Waite as the old and classical approach).I think, that both can be healed (as the libraries show), but not at cost=0(essentially one artificially goes beyond working precision in a 'local manner',i.e. for the problem only but not passing to extending precision in general).But costs for interval methods are tremendous and usually give estimates,not correctly rounded values.
 
User avatar
renorm
Topic Author
Posts: 1
Joined: February 11th, 2010, 10:20 pm

Optimized math functions

October 19th, 2010, 5:40 pm

Here is SSE2 implementation with a small test program.Resume:(*) it is slightly less accurate and noticeably slower than cephes in (0, 20).(*) relative error is 1e-3 for negative arguments.(*) sometime after 20 it breaks.Note: on GCC replace __declspec with __attribute__.
Last edited by renorm on October 18th, 2010, 10:00 pm, edited 1 time in total.
 
User avatar
quartz
Posts: 3
Joined: June 28th, 2005, 12:33 pm

Optimized math functions

October 19th, 2010, 6:13 pm

So you guys are having a hacking party without inviting me? Shame! Which branching are you referring to? the "?:"? Keeping the (rearranged) statement can do magic if you have the right compiler and processor: no jumps, no comparisons, just two stores and happy pipelines (nominal clock cycles are not everything in this cruel world): no need for much bit twiddling. (what about abs(k)%2 => k&1? the compiler *should be* your friend, but who knows...)Or were you talking of the inv. cdf branching?Ah, interval arithmetic is(/was) blazingly fast on Sun Studio; something like 2x time only afair. But donno right know if here it can be useful...PS: renorm, just noticed your code, could you please comment a bit more? looks interesting though!
Last edited by quartz on October 18th, 2010, 10:00 pm, edited 1 time in total.
 
User avatar
renorm
Topic Author
Posts: 1
Joined: February 11th, 2010, 10:20 pm

Optimized math functions

October 19th, 2010, 6:18 pm

My code is SSE2 port of Avt's code. I might need some debugging.P.S. I was wrong about ldexpf. I didn't notice that the 2nd argument is required to be int. Stock implementation should be fairly optimal.
Last edited by renorm on October 18th, 2010, 10:00 pm, edited 1 time in total.
 
User avatar
AVt
Posts: 90
Joined: December 29th, 2001, 8:23 pm

Optimized math functions

October 19th, 2010, 7:19 pm

Thx! At least half is understandable for me :-)That is strange: both error and 'breaks' (and speed, I have to ignore speed being off boost).It is completely different from my tests - may I be be fooled that much by my compiling?It is not so much different from cephes.Would you mind to give specific values for prominent errors (and breaks, if you can locate it)?
Last edited by AVt on October 18th, 2010, 10:00 pm, edited 1 time in total.
 
User avatar
renorm
Topic Author
Posts: 1
Joined: February 11th, 2010, 10:20 pm

Optimized math functions

October 19th, 2010, 8:43 pm

Non SSE implementation is fine. SSE implementation behaviors rather strangely. Attached is the file with errors 1.0 - _mm_myEXP_ps(x)/myEXP(x), where x is in (-21.5, 21.5).SSE implementation doesn't compute exp for negative arguments explicitly. It computes exp(abs(x)) and then returns reciprocal. For positive argument values the relative error is either exactly zero, -0.000000119 or 0.000000060 until it becomes exactly 2.0 around 21.3. Weird.P.S. I edited SSE code slightly, to match the code that was used to generate the attached file.
Last edited by renorm on October 18th, 2010, 10:00 pm, edited 1 time in total.