Let's benchmark C++, Java, C#

renorm · June 4th, 2010, 5:12 pm

I wrote very simple MC code to price future options. The code uses Spot=Strike=10, Rate=0, Time=1 and Volatility=0.3. The exact call price according to Black's formula is 1.1924.The algorithm is very straightforward and doesn't use any fancy libraries. RNGs are generated using mersenne twister. Gaussian distribution is approximated as a sum of 1200 uniform distribution. If you sum 1200 random variables uniformly distributed in (0, 1) and divide the sum by 10, then the result is approximately standard normal distribution. So far so good. Now, let's do to the following. Let's implement the algorithm in C++, Java and C# and see how fast it runs. We know that C++ is very fast, C# is fast too and Java is slightly slower . But how much faster or how much slower? Let's measure it.I provide C++ implementation, which doesn't use any library algorithms. Everything is done using simple loops. It should be straightforward to port my implementation into Java or C#. If no mersenne twister is available, you can use Random.nextLong() in Java and Random.next() in C# or any other implementation you have. Once Java and C# implementations are posted here, we can run them and post the results back here. Good luck!P.S. Parallel implementation is below.

capafan2 · June 4th, 2010, 5:26 pm

QuoteOriginally posted by: renormI wrote very simple MC code to price future options. The code uses Spot=Strike=10, Rate=0, Time=1 and Volatility=0.3. The exact call price according to Black's formula is 1.1924.The algorithm is very straightforward and doesn't use any fancy libraries. RNGs are generated using mersenne twister. Gaussian distribution is approximated as a sum of 1200 uniform distribution. If you sum 1200 random variables uniformly distributed in (0, 1) and divide the sum by 10, then the result is approximately standard normal distribution. So far so good. Now, let's do to following. Let's implement the algorithm in C++, Java and C# and see how fast it runs. We know that C++ is very fast, C# is fast too and Java is slightly slower . But how much faster or how much slower? Let's measure it.I provide C++ implementation, which doesn't use any library algorithms. Everything is done using simple loops. It should be straightforward to port my implementation into Java or C#. If no mersenne twister is available, you can use Random.nextLong() in Java and Random.next() in C# or any other implementation you have. Once Java and C# implementations are posted here, we can run them and post the results back here. Good luck!P.S. Code.I will rewrite your code in java and give you a concurrent version (detects the number of cores and uses them) and a non concurrent version. Sometime this weekend

. I am curious too. We can discuss it here.

renorm · June 4th, 2010, 6:52 pm

Here is TBB based parallel implementation of the same MC algorithm. It detects the number of cores and runs exactly one thread on each core at any time, without preempting threads and without leaving cores idle (except during mutex locks to use mersenne twister). On my dual core CPU it ran 1.8 faster than the serial implementation.

Cuchulainn · June 5th, 2010, 4:18 am

I have a proposal on this in order to address some new developments(and to avoid going over old territory ). Concurrent Java is new for me but parallel C++ and C# is known. Some ideas:1. Parallel design and coding in the big 3 2. Parallel design patterns (e.g. renorm's producer consumer) in B3 3. libs: PPL, TPL, TBB etc.At one level all these look the same, so you only need to _describe_ a design once and then you can implement it how you like.What do you think?BTW do you use Mattson 2005 (Patterns for Parallel Programming) Here is a site for parallel design patterns

Cuchulainn · June 5th, 2010, 5:23 am

Is this mutex ensuring that RNG is essentially single threaded? Is the whole loop locked? btw an idea to include SE and SD in your code as well?

renorm · June 5th, 2010, 9:14 am

No, I don't use Mattson 2005. I use TBB parallel design patterns reference.Yes, the loop is locked. Only 1 thread can use RNG at any time. What is SE and SD?

Cuchulainn · June 5th, 2010, 9:20 am

QuoteOriginally posted by: renormYes, the loop is locked. Only 1 thread can use RNG at any time. What is SE and SD?This means that speedup will be impaired. A better solution might be producer consumer (or 'single' construct in OpenMP).Standard (Error, Deviation) QuoteNo, I don't use Mattson 2005. I use TBB parallel design patterns reference.Mattson can be applied to any library (pattern) while I suspect TBB describes the pattern in TBB terms?

quantmeh · June 5th, 2010, 12:13 pm

this is what's called micro-benchmarking, useless activity

capafan2 · June 5th, 2010, 1:36 pm

QuoteOriginally posted by: jawabeanthis is what's called micro-benchmarking, useless activityYou are right it is useless because the benchmark will do well regardless of the language you use!! But then I said I will do it

.

quantmeh · June 5th, 2010, 1:44 pm

QuoteOriginally posted by: capafan2QuoteOriginally posted by: jawabeanthis is what's called micro-benchmarking, useless activityYou are right it is useless because the benchmark will do well regardless of the language you use!! But then I said I will do it

.it's great when people have a lot of spare time, i envy you the test I want to do is to port our credit model to GPU from Java and see what happens. i mean end-to-end comparison, with all the costs included. this is the only meaningful test to me.

capafan2 · June 5th, 2010, 2:35 pm

Java CodeMersenneTwisterFast Code (Cuch had lead me to it in a earlier conversation) MersenneTwisterFast codeResults on a dual core machineSamples = 1000000 (188 milliseconds for 1 thread and 94 milliseconds for 2 threads)Samples = 4000000 (704 milliseconds for 1 thread and 390 milliseconds for 2 threads)

Cuchulainn · June 5th, 2010, 2:40 pm

Quoteit's great when people have a lot of spare time, i envy you Yah, we missed you.

capafan2 · June 5th, 2010, 2:43 pm

QuoteOriginally posted by: jawabeanQuoteOriginally posted by: capafan2QuoteOriginally posted by: jawabeanthis is what's called micro-benchmarking, useless activityYou are right it is useless because the benchmark will do well regardless of the language you use!! But then I said I will do it

.it's great when people have a lot of spare time, i envy you No need to get sarcastic

! I do have some time currently but the reason for taking these up is because I think its the right thing to do. Most comments here would have you buy that numerical programming is all everyone does. In the real world these programs presumably live within a large applications where several factors play a role, disk, network, memory , standardized hardware and so on. Productivity is a important quality criteria as is maintainability. We do not go to work to indulge in our brilliance and people do not employ is for those reasons. There is a reason by Java, .NEt, Python and Ruby ( on Rails) are popular. They improve productivity. You can do more with less. Sometimes they fall short and then you don't use them.Quotethe test I want to do is to port our credit model to GPU from Java and see what happens. i mean end-to-end comparison, with all the costs included. this is the only meaningful test to me.Only that can be a meaningful test as a chain is only as strong as its weakest link. If serial IO component dominates then all your parallelization is bounded by a lower number.

Cuchulainn · June 5th, 2010, 2:46 pm

capa, don't mind JB, he's harmless.But C++ is like a red flag to a bull.

renorm · June 5th, 2010, 3:00 pm

Quotethis is what's called micro-benchmarking, useless activity Not useless. We want to see how fast numbers can be crunched, nothing else. The stuff I put into the code is a good representation of typical MC algorithm, because a typical MC spends 99% of time running RNGs and computing paths.Quotethe test I want to do is to port our credit model to GPU from Java and see what happens. i mean end-to-end comparison, with all the costs included. this is the only meaningful test to me.By "end-to-end comparison" you mean the development efforts? GPU coding is harder, no doubt. But C++ is not CUDA. In C++ you can code at the same abstraction as you do in Java. It should be relatively easy to port numerical code into C++ from Java.The pattern I used applicable pretty much to any MC algorithm in quant finance. In Java or C# one would use exactly the same pattern to parallelize MC code. The code uses producer-consumer pattern with serial RNG assigned to produce random numbers. Each thread locks the RNG and generates a batch of numbers for its consumption. Production of random numbers is fast, but the consumption is slow. Once all threads get the first batch of numbers, consecutive requests from the threads will become staggered in time. This means that when a thread locks the RNG, all other treads should by busy consuming their last batch of numbers.