Serving the Quantitative Finance Community

 
User avatar
pcaspers
Posts: 30
Joined: June 6th, 2005, 9:49 am
Location: Germany

Parallel RNG and distributed MC

January 30th, 2015, 7:14 pm

there is a nextInt32 and nextReal method, the latter for U(0,1). Everything else is up to you (or the generic QL helper classes to get other distributions).Uh I have to check if nextReal() works with w=31, I have some doubts now.For the array of template based mt instances: you have to use Mtdesc19937_0, _1, _2 ... to get independent instances. There is some example code on my slides using the boost preprocessor library concerning this.
 
User avatar
Polter
Posts: 1
Joined: April 29th, 2008, 4:55 pm

Parallel RNG and distributed MC

February 2nd, 2015, 7:49 pm

Relevant? http://arxiv.org/abs/1501.07701Quote"Reliable Initialization of GPU-enabled Parallel Stochastic Simulations Using Mersenne Twister for Graphics Processors"Parallel stochastic simulations tend to exploit more and more computing power and they are now also developed for General Purpose Graphics Process Units (GP-GPUs). Consequently, they need reliable random sources to feed their applications. We propose a survey of the current Pseudo Random Numbers Generators (PRNG) available on GPU. We give a particular focus to the recent Mersenne Twister for Graphics Processors (MTGP) that has just been released. Our work provides empirically checked statuses designed to initialize a particular configuration of this generator, in order to prevent any potential bias introduced by the parallelization of the PRNG.
Last edited by Polter on February 1st, 2015, 11:00 pm, edited 1 time in total.
 
User avatar
Cuchulainn
Posts: 20253
Joined: July 16th, 2004, 7:38 am
Location: 20, 000

Parallel RNG and distributed MC

February 3rd, 2015, 8:03 pm

Well, I would like just this QuoteLet's say we want to generate a (large) matrix of random numbers. It's got to be done in parallel somehow. So, how to do it with a good speedup and no race conditions?Looks like pcaspers' rng looks promising although I need U(0,1), U(a,b) (int and real), N(0,1) and then a bit like C++11 calling style. And how to dynamically create an arrays of rngs (up to 8 cores say, no GPU).I have not seen any other one besides the above.
Last edited by Cuchulainn on February 2nd, 2015, 11:00 pm, edited 1 time in total.
 
User avatar
pcaspers
Posts: 30
Joined: June 6th, 2005, 9:49 am
Location: Germany

Parallel RNG and distributed MC

February 6th, 2015, 7:50 am

outrun, what is the method behind ? sorry if this is clear from the code you posted ...
 
User avatar
Cuchulainn
Posts: 20253
Joined: July 16th, 2004, 7:38 am
Location: 20, 000

Parallel RNG and distributed MC

February 6th, 2015, 11:30 am

QuoteOriginally posted by: outrunwhy don't you use the one I've written and sumitted to boost? thread. It's fast, high quality, and offers a C++11 random engine interface. It's e.g. been used for "massively parallelized lattice Quantum Chromodynamics simulations", somepone ported it to R also ..Maybe because of the useless boost dependency?If you want speed then you'll need to worry about memory layout and cache hits. I'll bet you get *very* different performance if you have threads t0..t7 write to consecutive elements like thisV[t0, t1, t2, ... t7, t0, t1, t2, ... t7, ...]compared to thisV[t0, t0, t0, ... , t1,t1,t1,... ..., t7,t7,t7, ...]My model case is
 
User avatar
Cuchulainn
Posts: 20253
Joined: July 16th, 2004, 7:38 am
Location: 20, 000

Parallel RNG and distributed MC

February 6th, 2015, 11:33 am

QuoteOriginally posted by: outrunwhy don't you use the one I've written and sumitted to boost? thread. It's fast, high quality, and offers a C++11 random engine interface. It's e.g. been used for "massively parallelized lattice Quantum Chromodynamics simulations", somepone ported it to R also ..Maybe because of the useless boost dependency?If you want speed then you'll need to worry about memory layout and cache hits. I'll bet you get *very* different performance if you have threads t0..t7 write to consecutive elements like thisV[t0, t1, t2, ... t7, t0, t1, t2, ... t7, ...]compared to thisV[t0, t0, t0, ... , t1,t1,t1,... ..., t7,t7,t7, ...]My model case is QuoteLet's say we want to generate a (large) matrix of random numbers. It's got to be done in parallel somehow. So, how to do it with a good speedup and no race conditions?How do I do it, really? QuoteIf you want speed then you'll need to worry about memory layout and cache hitsLet's worry about that when I get some running code whose speedup can be measured.I have some experience with the sitmo when I wrapped it in C#. However, it is not clear to me how to use it with OpenMP or C++11 Concurrency, for example.
Last edited by Cuchulainn on February 5th, 2015, 11:00 pm, edited 1 time in total.
 
User avatar
Cuchulainn
Posts: 20253
Joined: July 16th, 2004, 7:38 am
Location: 20, 000

Parallel RNG and distributed MC

February 6th, 2015, 2:44 pm

Thanks, I'll get back on this. Any feeling for the speedup on an 8-core machine? Anything better than one is better than a speedup of 1. C++11 libs?#include <thread>#include <mutex>#include <stdfin/random/threefish_engine.hpp>My guess is that mutex in a loop will slow things down? OpenMP would use thread private and/or reduction variables.
Last edited by Cuchulainn on February 5th, 2015, 11:00 pm, edited 1 time in total.
 
User avatar
Cuchulainn
Posts: 20253
Joined: July 16th, 2004, 7:38 am
Location: 20, 000

Parallel RNG and distributed MC

February 6th, 2015, 4:27 pm

Ok, good idea. It's a domain decomposition design. I think the same can be used with pcaspers' approach.BTW can I use <random> instead of#include <stdfin/random/threefish_engine.hpp>?
Last edited by Cuchulainn on February 5th, 2015, 11:00 pm, edited 1 time in total.
 
User avatar
Traden4Alpha
Posts: 3300
Joined: September 20th, 2002, 8:30 pm

Parallel RNG and distributed MC

February 6th, 2015, 4:39 pm

QuoteOriginally posted by: outrun@cuch, in general (not specific to my rng, but any c++11 rng) give each thread it's own local engine var and seed each one differently. The seeding of the thread engines is typically done blocking, e.g. by letting each thread read a counter and increment it so that the next thread gets a different value. After that each thread will have its own random stream. How does one prevent creating a seed for thread i that emits an RN stream that leads to the RN stream emitted by thread j? As the number of threads grows and the length of each thread stream grows, the chance that some pair of threads will have 100% cross-correlation with some lag would seem to grow quite large. If the core RN has a cycle of length, C, and seeds initialize T threads at random points in that cycle which generate N numbers each, then the chance of overlap is O(T^2*N / C).Or have I missed something someplace?
 
User avatar
Cuchulainn
Posts: 20253
Joined: July 16th, 2004, 7:38 am
Location: 20, 000

Parallel RNG and distributed MC

February 6th, 2015, 6:18 pm

QuoteOriginally posted by: Traden4AlphaQuoteOriginally posted by: outrun@cuch, in general (not specific to my rng, but any c++11 rng) give each thread it's own local engine var and seed each one differently. The seeding of the thread engines is typically done blocking, e.g. by letting each thread read a counter and increment it so that the next thread gets a different value. After that each thread will have its own random stream. How does one prevent creating a seed for thread i that emits an RN stream that leads to the RN stream emitted by thread j? As the number of threads grows and the length of each thread stream grows, the chance that some pair of threads will have 100% cross-correlation with some lag would seem to grow quite large. If the core RN has a cycle of length, C, and seeds initialize T threads at random points in that cycle which generate N numbers each, then the chance of overlap is O(T^2*N / C).Or have I missed something someplace?Indeed. This was one of my questions already. If the streams are correlated then what's the point?
 
User avatar
Cuchulainn
Posts: 20253
Joined: July 16th, 2004, 7:38 am
Location: 20, 000

Parallel RNG and distributed MC

February 6th, 2015, 6:19 pm

QuoteOriginally posted by: Traden4AlphaQuoteOriginally posted by: outrun@cuch, in general (not specific to my rng, but any c++11 rng) give each thread it's own local engine var and seed each one differently. The seeding of the thread engines is typically done blocking, e.g. by letting each thread read a counter and increment it so that the next thread gets a different value. After that each thread will have its own random stream. How does one prevent creating a seed for thread i that emits an RN stream that leads to the RN stream emitted by thread j? As the number of threads grows and the length of each thread stream grows, the chance that some pair of threads will have 100% cross-correlation with some lag would seem to grow quite large. If the core RN has a cycle of length, C, and seeds initialize T threads at random points in that cycle which generate N numbers each, then the chance of overlap is O(T^2*N / C).Or have I missed something someplace?Indeed, you have not. This was one of my questions already. If the streams are correlated then what's the point?This is standardQuote1. No correlations between the numbers in different sequences.2. Scalable: support many processes/threads, each with its own sequence.3. Locality: a thread can generate a sequence of random numbers with no inter-thread communication.
Last edited by Cuchulainn on February 5th, 2015, 11:00 pm, edited 1 time in total.