Page 7 of 11

### Optimized math functions

Posted: October 12th, 2010, 10:07 pm
OutrunI agree - I was just trying to understand why you wanted to do remez on [1/sqrt(pi) exp -.5 x^2] (to me that was only useful if it was a scalar gaussian,and you then did product). Otherwise it would seem better to calc euclidean distance and then remez etc on exp - in which case AVT suggestion was useful...but AVT is following this up QuoteQuote QuoteOriginally posted by: spv205exp(-1/2 sum x^2) is slower than product exp(-1/2 x^2) [if you can be bothered!]Good idea (to check order)!... I think it the other it's the other way around though. The left one has only 1x exp(), the right one N x exp()

### Optimized math functions

Posted: October 13th, 2010, 6:14 am
QuoteOriginally posted by: outrunjust found that boost has the Remez algorithm (boost/math/tools/remez.hpp), so that should make finding min-max approximation coefficients very easy!boost docIt would be nice if you would comment on this algo in Boost if you get some time.I did a few experiments with Brent's method in Maths and it worked well for the test cases.

### Optimized math functions

Posted: October 13th, 2010, 7:40 am
Maybe I can have a look at Remez then if you like if someone can specify the problem (exp(z() in range [A,b] etc. desired accuracy).

### Optimized math functions

Posted: October 14th, 2010, 5:43 am
QuoteOriginally posted by: outrunQuoteOriginally posted by: CuchulainnThey wrote x = Nlog(2)/2^K + rand then exp(x) = 2^(N/2)^K * exp(r)Now:1. They they use Remez to approxmate exp(r) by a polynomial of degree 5 (==> Solve a matrix system). 2. They get typically r < log(2)/64 = 0.01 which can be done with Pade, so why use Remez?..we can start with this, .. can we get K directly from the last bits (the exponent) of the float?we can then do various order approximations of exp(r) on r [0 .. 1/2]Here are some tests for Pade R(m,n) == Rmn at z = 1/2 (since exp(z) is a monotically increasing function of z)z = 1/2R01: 2R11: 1.66667R22: 1.64865R33: 1.64872exp(z): 1.64872Going outside radius of convergence to see what happens at z = 1R01: blowup 1/(1-z)R11: 3R22: 2.71429R33: 2.71831exp(z): 2.71828For r = log(2)/64 we getR01: 1.01095R11: 1.01089R22: 1.01089R33: 1.01089exp(z): 1.01089BTW This polynomial function gives 1.64787 at z = 1/2 and 2.61765 at z = 1. Not surprising because polynomials are in general not very good at approximation, when they go outside their domain. For r = log(2)/64 it gives 1.01136, clearly worse than R(m,n).Quoteexp(z) = (32793 +31836 z + 21146 z^2)/32768I have not done boost Remez yet.

### Optimized math functions

Posted: October 14th, 2010, 5:58 am
QuoteOriginally posted by: outrunQuoteOriginally posted by: CuchulainnDo Microsoft /Intel use Pade, or is it something else? This is how Intel does it: The Computation of Transcendental Functions on the IA-64 ArchitectureI had another look at this paper. Apart from h/w and machine dependencies I really have not been able to discover the rationale (no pun intended) why the authore use Remez. A glimpse could be remark 4 on page 3("do not fear polynomials"); polynomials are not vety good, they are not even wrong.Some of the features of Remez that I suspect will affect performance are:1. It is an iterative scheme2. It calculates Chebycheff point cos(x[k]) ==> yet another trig function3. The derivatives of the polynomial up to order 2 are needed4. LU decomposition5. It needs high precision arithmetic1st impressions is that it is top-heavy and not optimal in this case. The reasons for using are not compelling IMO.

### Optimized math functions

Posted: October 14th, 2010, 7:35 am
QuoteOriginally posted by: outrunQuoteOriginally posted by: Cuchulainnz = 1/2R01: 2R11: 1.66667R22: 1.64865R33: 1.64872exp(z): 1.64872..so is it correct that the final approximation equation is something like?exp(z) ~ (2 + 1.66667*z ) / (1 + 1.64864*z + 1.64872*x^2 )

### Optimized math functions

Posted: October 14th, 2010, 7:41 am
QuoteOriginally posted by: outrunThe steps you describe are pre-processing steps, with the goals to estimate coefficient of a Chebyshev approximation. The end result is something like this:exp(z) = (32793 +31836 z + 21146 z^2)/32768..nothing but a couple of simple + * an / float operators[...] edit:That's why I said earlier that you can pick any way to estimate these coefficient:C++, Matlab, Mathematica, a simple google search..I am not exactly sure what you are saying here. Prepropcessing is also time-consuming.The polynomial solution is not very robust.

### Optimized math functions

Posted: October 14th, 2010, 7:48 am
QuoteOriginally posted by: outrunGreat! This is very simple to the points! and allows for error analysis.And R33 is optimized for z in [0,1]?I have a deadline tonight, working my *** off, but will contribute soon.. I'll do the Chebyshev polynomial approx. for various number of terms.One quick thing... Why does Intel choose for optimizing the (rescalable) region for [0, 1/2] instead of the more common [0,1] or [-1,1]? Has that to do with the storage of the exponent in floating point notation? Is so, that we should to that too, otherwise we need to waste cycles on processing that part too (the exponent, K).R33 at z = 1 is accurate up to 4/5 decimal places; even for z = 2, R33 = 7.4 while exp(z) = 7.38.I am not sure why in [0,1/2]. The rationale behind the numerical analysis is not clear, as I mentioned.BTW you NTL library for boost Remez.Maybe R22 is OK as well