Hi everyone, first of all, I am new in this forum and to this kind of topic, but I have been talking to tagoma by mail, and he told me to try my chances here too. So I hope I am not going to say something (too) irrelevant for you, however I think that what I am about to ask you is fairly uncommon.So I have done my PhD in computer sciences by studying how we could automatically improve numerical accuracy and therefore, improve numerical stability of critical embedded software relying on floating-point arithmetic. My focus was on static analysis combine with abstract interpretation (for those who ever heard of it) of avionic codes. Another nice side effect was that by knowing how much drift a program introduces, we could also tune the precision of the calculations in order to boost performance when it was safe to do so. Results were pretty good, so we decide to build a startup around this idea to put this kind of technology out of the labToday I am interested to understand if our approach makes sense to people dealing with heavy financial calculations. I understand that you use a lot of Monte Carlo methodology, which always implies some confidence interval. So my guess is that numerical accuracy should not be a risky issue there. But, I think there may be other fields of interest that some of you have. Where calculations have to be absolutely correct and where there is no approximation induced by the method itself. Maybe in the risk management sect? Or in the HFT? I am not a quant or an expert on that kind of topic, so I could really use your opinion there.So this is my question(s): do you have some of these interesting problems that have some accuracy issues? And, if so, how do you currently solve them?I am interested in real case examples so that I could try my set of tools on them and see how much I can improve accuracy and/or performance on them.Thanks a lot for you future (many) answers, and I hope this topic will find some echoes in your community! Please feel free to engage me in private if you want toArnault

Last edited by ioualalen on December 2nd, 2015, 11:00 pm, edited 1 time in total.

- Cuchulainn
**Posts:**60518**Joined:****Location:**Amsterdam-
**Contact:**

Welcome to Wilmott.Sounds like it would be a good idea to give a compelling example and the rationale. Are current methods not good enough? ===I have a problem I encountered; a C++ application that creates a DLL that I call from Excel. Sometimes the app crashed in Excel but why? is it VBA, C++, Excel or is the maths wrong? For example, a Newton Raphson that is not converging, how can I detect it in IEEE 754, for example? I want to trace the variable across process and language boundaries.or calling log(1.0/0.0) by accident leads to sNaN.

Last edited by Cuchulainn on December 2nd, 2015, 11:00 pm, edited 1 time in total.

http://www.datasimfinancial.com

http://www.datasim.nl

Approach your problem from the right end and begin with the answers. Then one day, perhaps you will find the final question..

R. van Gulik

http://www.datasim.nl

Approach your problem from the right end and begin with the answers. Then one day, perhaps you will find the final question..

R. van Gulik

- Cuchulainn
**Posts:**60518**Joined:****Location:**Amsterdam-
**Contact:**

QuoteIt sounds technically very interesting, have you written papers? I had a look at the (very flashy:)) web site to look for some papers. This is a short discussion of Monte Carlo but I can't seem to display the figures. Couple broke links (e.g. documentation and pdf links).Numerical accuracy and performance IMO is determined by the quality of the numerical method being used. Does your approach address this or is it more like machine accuracy and reliability?It's not clear to me what you want to do.QuoteI am interested in real case examples so that I could try my set of tools on them and see how much I can improve accuracy and/or performance on them.Here is an example Crank Nicolson for BSThe accuracy is O(ds^2 + dt^2) and that's as good as it gets IMO.Speaking as a numerical analyst, what is interesting is to analyse machine accuracy and robustness of the answer and avoiding NaNs, under/overflow nasties.

Last edited by Cuchulainn on December 3rd, 2015, 11:00 pm, edited 1 time in total.

http://www.datasimfinancial.com

http://www.datasim.nl

Approach your problem from the right end and begin with the answers. Then one day, perhaps you will find the final question..

R. van Gulik

http://www.datasim.nl

Approach your problem from the right end and begin with the answers. Then one day, perhaps you will find the final question..

R. van Gulik

Hi everyone, and sorry for the long delay of my answer! First let me thank you for all you responses.OK so let me try to answer you all.QuoteIt sounds technically very interesting, have you written papers? Yes during my PhD I have wrote several papers, you can find them on my webpage on the university here: http://perso.univ-perp.fr/arnault.ioual ... .phpMainly in my thesis I was focusing on expressions of synchronous programs but now we have extended it to other types of language.QuoteAre current methods not good enough? Well that is a wide question. Current methods could, in some industries, be just switch to double or wider precision (but then performance could be an issue). Others will change their algorithms to try new one with better convergence properties (whenever these actually exist). Others will just buy some well-written libraries (hoping that accuracy issues are, indeed, resolved). Finally other will just switch to fully equipped environment, like Matlab or Simulink, but then they cannot really know what is going to happen when it will be implemented in C++ for example.From my perspective very few people give the answer we propose, which is : write the code differently so that mathematically it's the same, but for the computer it's more accurate. Also what we have is a methodology to find the good tradeoff between accuracy and performance for the need of the programmer.QuoteI do think that from a startup perspective you're looking at a very narrow business opportunity:That's a way to put it, on the other hand there are more and more calculations and simulations everywhere. So at some point accuracy and performance could be a hot topic. But, yes, currently it is an ?initiate? topic, not many people know about it.QuoteIf I had a financial company that has numerous stability issues due to low level floating point truncation then my first reaction would be to use check if we can increase the number of bits inside the compiler framework, or else start using an arbitrary precision library.Indeed you could do such a thing, but if you calculations were already slow and intensive this will make it even slower. I have seen some place were for performance gain, programmer were okay to use only single precision number for example. So I am not sure this approach could scale very much.QuoteMaybe look at large scale supercomputer simulations?That is an interesting take indeed! I will look closer to that, Thanks!In a more business perspective we though about quants in investment banking at first, then we were approach by some military contractors about the safety of critical piece of software. But I do think that simulations are also a possibility for us, I was not planning at first on supercomputer-like simulations, but more on geological simulations or such (for oil drilling for example). Do you know some people working in this kind of very large supercomputer simulations?QuoteI have a problem I encountered; a C++ application that creates a DLL that I call from Excel. Sometimes the app crashed in Excel but why? is it VBA, C++, Excel or is the maths wrong? For example, a Newton Raphson that is not converging, how can I detect it in IEEE 754, for example? I want to trace the variable across process and language boundaries. or calling log(1.0/0.0) by accident leads to sNaN. That's interesting; do you still have the way to reproduce such bugs?Concerning Newton Raphson one of my associate publish an interesting paper where even when your algorithm converge, it still loose iteration due to IEEE754 rounding errors. So, improving accuracy actually can make more convergent algorithms. You can look it up here if your interested: http://perso.univ-perp.fr/mmartel/lopstr15.pdfFor us any algorithm can be qualified to know if it is unstable or not. If you give me some code, and some data to run it, then I can tell you such a thing.Quote Couple broke links (e.g. documentation and pdf links).Thanks for the head?s up, we launched this version not very long ago, we have not yet review each pages. I will correct that.QuoteNumerical accuracy and performance IMO is determined by the quality of the numerical method being used. Does your approach address this or is it more like machine accuracy and reliability?This is a key question indeed. We focus on the accuracy of the machine, not the accuracy of the method. What we want to do it to make the software performed the same as the mathematical function. If the math is broken I cannot help it, but if the math is correct it does not say that the software is correct too. That is where we come in.Quote Or maybe it has potential relations to communication over noisy channels, error correcting codes and cryptography?On this one I am not sure, maybe we can improve DSP performing calculations in fixed-point arithmetic, but cryptography seems to be more of an ?integer world? than floating-point.Quote Here is an example Crank Nicolson for BS. The accuracy is O(ds^2 + dt^2) and that's as good as it gets IMO.Okay, thank you very much for that material. I let you know how it turns out using our tools. It will probably take me some time as I do not know much about this algorithm and I will have to make it work first.Quote Speaking as a numerical analyst, what is interesting is to analyse machine accuracy and robustness of the answer and avoiding NaNs, under/overflow nasties.Sure, these are the reason that makes the calculation actually crash. However does the numerical drift not important? For example you can have some result with no correct digit at all, without any overflow or underflow for example. I am not sure that over/under-flow could be the only operational risk in the business.I hope I have not forgot any answer for you, but there is still much to talk I guess.

- Cuchulainn
**Posts:**60518**Joined:****Location:**Amsterdam-
**Contact:**

QuoteThat's interesting; do you still have the way to reproduce such bugs?Sure. Take Newton Raphson with a bad seed. AFAIR you can get underflow or overflow super quick.Here is another example. It is a very simple approximation to the normal cdf (accuracy is not the point here). The coder (me) was trying to be clever by writing a recursive function but once you provide a NaN as input you get zero. NaNs don't obey total ordering of course.It's a quiet NaN and it propagates down the function chain. Where do I check for NaN without affecting performance?

Last edited by Cuchulainn on December 3rd, 2015, 11:00 pm, edited 1 time in total.

http://www.datasimfinancial.com

http://www.datasim.nl

Approach your problem from the right end and begin with the answers. Then one day, perhaps you will find the final question..

R. van Gulik

http://www.datasim.nl

Approach your problem from the right end and begin with the answers. Then one day, perhaps you will find the final question..

R. van Gulik

- Traden4Alpha
**Posts:**23951**Joined:**

Cool concept, ioualalen! A few thoughts for applications of your automagical algorithmical accurizer to QF:1. Monte Carlo with extremely high N (e.g., O(2^32) samples using 32-bit floats). Basic statistical running sums for the moments (e.g., sums for the mean, variance, skew, kurtosis) grow very large to the point where new data is less than the round-off error of the sum. There are known solutions to this, but it would be interesting to see how your methods handle this known case.2. Generation of random numbers with a nonuniform distribution (e.g., Normally-distributed random numbers): The classic algorithms start with a uniform random number (a random integer normalized to a 0-1 floating point interval) and then map that through a transform that models the inverse of the CDF (cumulative distribution function). The result creates discontinuities in the samples (e.g., values that fall between the mapping of consecutive integers from the original random numbers) as well as no samples in the tails of the distribution beyond some value.3. Matrix decomposition with near-singular matrices. Round-off errors in the matrix elements can make a matrix that should be PSD (positive semi-definite) have negative eigenvalues (failing the PSD criterion). There are ad hoc methods for tweaking the matrix to make it PSD, but they are not pleasant.

- Cuchulainn
**Posts:**60518**Joined:****Location:**Amsterdam-
**Contact:**

QuoteCool concept, ioualalen! A few thoughts for applications of your automagical algorithmical accurizer to QF:It would be nice to have a feature/wish list. A random sample hopefully shows what I am trying to get at here Like buying a car, sales has a list of enticing features. I assume that this will be a market-driven software product rather than an in-house piece of software for a single client?

Last edited by Cuchulainn on December 4th, 2015, 11:00 pm, edited 1 time in total.

http://www.datasim.nl

Approach your problem from the right end and begin with the answers. Then one day, perhaps you will find the final question..

R. van Gulik

- Traden4Alpha
**Posts:**23951**Joined:**

QuoteOriginally posted by: CuchulainnQuoteCool concept, ioualalen! A few thoughts for applications of your automagical algorithmical accurizer to QF:It would be nice to have a feature/wish list. A random sample hopefully shows what I am trying to get at here Like buying a car, sales has a list of enticing features. I assume that this will be a market-driven software product rather than an in-house piece of software for a single client?It would seem this tool would be something that goes into an advanced Integrated Development Environment. It would analyze a programmer's code to detect risky code segments that are prone to numerical accuracy problems.

Hi everyone, and thank again for all of your advices. That?s very helpful.QuoteI know(ish) the owner of this company, Indeed outrun! FPGA and GPU are very hot for us right now, we are still learning about the opportunities our technology represents for them. Any external and relevant opinion interests me.Quote Sure. Take Newton Raphson with a bad seed. AFAIR you can get underflow or overflow super quick.Okay, I will try this, but does this kind of algorithms really useful in you line of work Cuchulainn ?Quote It's a quiet NaN and it propagates down the function chain. Where do I check for NaN without affecting performance? Hem, let me check about this and I will give you an answer ASAP.Quote 1. Monte Carlo with extremely high N (e.g., O(2^32) samples using 32-bit floats). Basic statistical running sums for the moments (e.g., sums for the mean, variance, skew, kurtosis) grow very large to the point where new data is less than the round-off error of the sum. There are known solutions to this, but it would be interesting to see how your methods handle this known case. Sure when the confidence interval tends to zero my guess is that the cumulative drift will be larger. However how much is this a real case scenario? Do you actually do 2^32 simulations? If so, I?ll be very interested to run one on my own, I will just need some input to code it properly.Quote 2. Generation of random numbers with a nonuniform distribution (e.g., Normally-distributed random numbers): The classic algorithms start with a uniform random number (a random integer normalized to a 0-1 floating point interval) and then map that through a transform that models the inverse of the CDF (cumulative distribution function). The result creates discontinuities in the samples (e.g., values that fall between the mapping of consecutive integers from the original random numbers) as well as no samples in the tails of the distribution beyond some value. You lost me there, what kind of algorithms are talking about? do you have some pointers so I could learn more about it?Quote 3. Matrix decomposition with near-singular matrices. Round-off errors in the matrix elements can make a matrix that should be PSD (positive semi-definite) have negative eigenvalues (failing the PSD criterion). There are ad hoc methods for tweaking the matrix to make it PSD, but they are not pleasant.This indeed is a classical numerical problem. If you have here again some pointers of this method to avoid it you could try to compare to it.Quote I assume that this will be a market-driven software product rather than an in-house piece of software for a single client? Quote It would seem this tool would be something that goes into an advanced Integrated Development Environment. It would analyze a programmer's code to detect risky code segments that are prone to numerical accuracy problems.You are both correct. We have a whole set of techniques in our bag of tricks, but we do not know yet which one is suitable of a specific given market. That were we are now. The endgame we aimed at is an IDE integration, in order to give a real spell checker but for calculations.

- Cuchulainn
**Posts:**60518**Joined:****Location:**Amsterdam-
**Contact:**

QuoteIt would seem this tool would be something that goes into an advanced Integrated Development Environment. It would analyze a programmer's code to detect risky code segments that are prone to numerical accuracy problems.This reminds me a bit of IBM/Rational '-ify' products like purify, quantify etc. A non-finite number and a memory leak are not a million miles apart?I find this text below a bit weird: an underflow is not an error.QuoteThe IEEE standard for floating point operations does not consider overflows, underflows, and divide by zeros to be errors. When an underflow occurs, most people want the variable?s value set to zero and execution to continue. This is the default behavior for most and probably all compilers and run-time systems. We do not consider underflows to be an error. However, floating point overflows and divide by zeros usually are considered errors that require changes in the source code. Thus, one would want run-time error detection software to detect floating point overflows and divide by zeros and to show where they occurred.Some compilers and run-time systems provide detection of floating point overflows and divide by zeros but often not for integer divide by zeros and not for integer overflows.

Last edited by Cuchulainn on December 6th, 2015, 11:00 pm, edited 1 time in total.

http://www.datasim.nl

Approach your problem from the right end and begin with the answers. Then one day, perhaps you will find the final question..

R. van Gulik

- Traden4Alpha
**Posts:**23951**Joined:**

QuoteOriginally posted by: CuchulainnQuoteIt would seem this tool would be something that goes into an advanced Integrated Development Environment. It would analyze a programmer's code to detect risky code segments that are prone to numerical accuracy problems.This reminds me a bit of IBM/Rational '-ify' products like purify, quantify etc. A non-finite number and a memory leak are not a million miles apart?I find this text below a bit weird: an underflow is not an error.QuoteThe IEEE standard for floating point operations does not consider overflows, underflows, and divide by zeros to be errors. When an underflow occurs, most people want the variable?s value set to zero and execution to continue. This is the default behavior for most and probably all compilers and run-time systems. We do not consider underflows to be an error. However, floating point overflows and divide by zeros usually are considered errors that require changes in the source code. Thus, one would want run-time error detection software to detect floating point overflows and divide by zeros and to show where they occurred.Some compilers and run-time systems provide detection of floating point overflows and divide by zeros but often not for integer divide by zeros and not for integer overflows.Indeed! It's a clear sign of the thinking of a first-order mind -- they think only of computing y = f(x) and ignore the pathologies if one also needs to compute the inverse, x = g(y), or the derivatives or integrals of f(x). I can understand the logic that underflows should not crash the code, but it should be signaled in some way. There's a huge difference between a true zero and a number that is not zero but is too small to represent in selected real number binary encoding scheme.

- Traden4Alpha
**Posts:**23951**Joined:**

QuoteOriginally posted by: ioualalenQuote 1. Monte Carlo with extremely high N (e.g., O(2^32) samples using 32-bit floats). Basic statistical running sums for the moments (e.g., sums for the mean, variance, skew, kurtosis) grow very large to the point where new data is less than the round-off error of the sum. There are known solutions to this, but it would be interesting to see how your methods handle this known case. Sure when the confidence interval tends to zero my guess is that the cumulative drift will be larger. However how much is this a real case scenario? Do you actually do 2^32 simulations? If so, I?ll be very interested to run one on my own, I will just need some input to code it properly.It's not an issue of confidence interval or drift but a more fundamental limitation of adding small numbers to large ones with a fixed-length floating point representation. Actually, in single precision floats, the problem starts showing up at much lower sample counts than 2^32 especially for calculations of higher moments.If you want a simple test of this issue, create a loop that adds 1.0 to a 32-bit float and let it loop 2^26 times. What's the result? It should be 2^26 but it won't be. You'll reach a point where x+1.0 == x.QuoteQuote 2. Generation of random numbers with a nonuniform distribution (e.g., Normally-distributed random numbers): The classic algorithms start with a uniform random number (a random integer normalized to a 0-1 floating point interval) and then map that through a transform that models the inverse of the CDF (cumulative distribution function). The result creates discontinuities in the samples (e.g., values that fall between the mapping of consecutive integers from the original random numbers) as well as no samples in the tails of the distribution beyond some value. You lost me there, what kind of algorithms are talking about? do you have some pointers so I could learn more about it?Take a look at Box Muller method for computing random Normal values and think about what happens when a 32-bit uniform integer is converted into a 32-bit float and then the values are transformed via Box Muller into Normal distribution. IIRC, this method never produces random values outside of about -6 to +6 sigma (i.e., it truncates the tails of the distribution) no matter how many samples you run which means it will underestimate all the even moments of the distribution and cause some anomalies in the mean and skew if the MC involves nonlinear payoffs.Ziggurat also has the problems, but the algorithm is more complex.

- Cuchulainn
**Posts:**60518**Joined:****Location:**Amsterdam-
**Contact:**

Quote It's not an issue of confidence interval or drift but a more fundamental limitation of adding small numbers to large ones with a fixed-length floating point representation. Actually, in single precision floats, the problem starts showing up at much lower sample counts than 2^32 especially for calculations of higher moments.If you want a simple test of this issue, create a loop that adds 1.0 to a 32-bit float and let it loop 2^26 times. What's the result? It should be 2^26 but it won't be. You'll reach a point where x+1.0 == x.Is T4A referring to machine precision(ULP)?

Last edited by Cuchulainn on December 6th, 2015, 11:00 pm, edited 1 time in total.

http://www.datasim.nl

Approach your problem from the right end and begin with the answers. Then one day, perhaps you will find the final question..

R. van Gulik

GZIP: On