Serving the Quantitative Finance Community

  • 1
  • 2
  • 3
  • 4
  • 5
  • 11
 
User avatar
renorm
Topic Author
Posts: 1
Joined: February 11th, 2010, 10:20 pm

Optimized math functions

October 5th, 2010, 8:59 am

Does anyone aware of any optimized implementation of exp, sqrt and log functions? Compiler supplied implementation is designed for precision, not speed. In case one doesn't need 15 digit precision, there should a way to speedup those math function.Any thoughts?
 
User avatar
renorm
Topic Author
Posts: 1
Joined: February 11th, 2010, 10:20 pm

Optimized math functions

October 5th, 2010, 10:48 am

That might be just what I need. Thanks.Btw, I it is single precision, not double. The good new is that, each function does 4 floats per call. MS compiler intrinsics are all double precision. (Yes/No?) I presume that all intrinsics benefit from SSE2. (Yes/No?)
 
User avatar
renorm
Topic Author
Posts: 1
Joined: February 11th, 2010, 10:20 pm

Optimized math functions

October 5th, 2010, 3:12 pm

List of suspects: Intel Compiler's Math Library (ICML), Julien Pommier's SSE math library (SSE_Math), MS intrinsic math functions (MS).Platform: Visual Studio 2008, WinXP box running on Cure 2 duo.Math functions tested: exp, log, sin, cos.Remarks: ICML is a part of Intel's compiler suite. ICML automatically replaces cmath functions with optimized intrinsics. MS automatically replaces stock cmath.And the verdict is...Normalized single procession performance: MS (1), SSE_Math (4), ICML (6).ICML single precision performance is 6 times faster than MS and 1.5 times faster than SSE_Math.But...ICML must be used in a loop. The loop must be auto-vectorized by the compiler. Not all loops can be auto-vectorized. Using std::vector instead of plain array will privet vectorization. Aliased pointer will prevent vectorization. Many other things can prevent vectorization. Intel Compiler's manual provides some info on how to facilitate vectorization.SSE_Math has full single precision accuracy. SSE_Math can compute sin and cos of the same argument in 1 pass with almost zero overhead. It can be 8 times faster than MS in some circumstances.
Last edited by renorm on October 4th, 2010, 10:00 pm, edited 1 time in total.
 
User avatar
renorm
Topic Author
Posts: 1
Joined: February 11th, 2010, 10:20 pm

Optimized math functions

October 5th, 2010, 6:34 pm

4% error is too much.ICML and MKL are free to try for 1 month.
 
User avatar
AVt
Posts: 90
Joined: December 29th, 2001, 8:23 pm

Optimized math functions

October 5th, 2010, 6:45 pm

what are your "specifications"? you kept it a bit vague (intentionally?)for 'fast' one could always glance what the gaming/visual guys suggest,for 'compact' for those engineers around embedded systems,for 'correct' (but poor) for 'ancients' before IEEE normingsfor correct and not poor -s.th. like Moshier's cephes or SUN as approach
 
User avatar
renorm
Topic Author
Posts: 1
Joined: February 11th, 2010, 10:20 pm

Optimized math functions

October 5th, 2010, 7:14 pm

At least single precision accuracy.
 
User avatar
quantmeh
Posts: 0
Joined: April 6th, 2007, 1:39 pm

Optimized math functions

October 6th, 2010, 1:53 am

i just coded LUT for exp and it's ~10 faster than stock version, precision can be tuned, in my case it's at lelast 5 digits
 
User avatar
renorm
Topic Author
Posts: 1
Joined: February 11th, 2010, 10:20 pm

Optimized math functions

October 6th, 2010, 5:59 am

Can you post the actual speed?Here are my measurements for single precision exp.CPU: 2.33Ghz Core2 duo (single threaded test).Intel Math Library Speed: 291M per secSSE_Math Speed: 187M per secAn array of 2^20 numbers (1 : 2^20)*1e-5 was used as an input (range~ (0, 10.5)).The output was stored in another array of length 2^20 and then compared with stock exp. The relative error didn't exceed 2^-22. Everything was cycled 100 times and the total time measured. Using smaller/larger array with more/less cycles doesn't make any noticeable difference. Cache issues are less important in number crunching.
Last edited by renorm on October 5th, 2010, 10:00 pm, edited 1 time in total.