Page **9** of **39**

### Parallel RNG and distributed MC

Posted: **February 7th, 2012, 6:34 am**

by **Cuchulainn**

Sorry to hear that, Milo. C++ without templates is unthinkable in this day and age.I do hope you stick around QuoteYes. I think I remember that some pieces of the boost random library had inefficient performance.Also in 1.47? This version has a lot of new stuff as well.

### Parallel RNG and distributed MC

Posted: **February 7th, 2012, 12:38 pm**

by **Cuchulainn**

QuoteOriginally posted by: outrunQuoteOriginally posted by: CuchulainnSorry to hear that, Milo. C++ without templates is unthinkable in this day and age.I do hope you stick around QuoteYes. I think I remember that some pieces of the boost random library had inefficient performance.Also in 1.47? This version has a lot of new stuff as well.Yes in 1.48, not just inefficient but also wrong if you get formal. I'm working on the outline posted a bit back: uniform [0,1] [0,1) (0,1) and (0,1], and 3x Normal, .. that's orthogonal to the (p)rngIs Boost Random ~ C++ 11 Random?

### Parallel RNG and distributed MC

Posted: **February 7th, 2012, 5:12 pm**

by **quartz**

QuoteOriginally posted by: MiloRambaldiI've read several papers on this. I think the "Jumping ahead" is called the leap-frog method. I don't believe that its as scalable as giving each path its own seed, but its been a while since I read about this. Also leap-frog can be difficult to implement, but of course the more methods to choose from the better. Unfortunately there's quite some confusion in the literature between these terms (and "skip-ahead") as well as the two corresponding techniques which both involve jumping. Get a stream of samples {o0, o1, o2, o3, o4, o5, o6, o7,...} (to simplify imagine that the output sample is also the generator state), then two disjoint/uncorrelated streams can be obtained by: - interleaving (short jumping at each sample): stream 1: {o0. o2, o4, o6....} and stream 2: {o1. o3, o5, o7....} - this is most commonly referred to as leap-frog, sometimes jump-ahead too (iirc)- nonoverlapping streams (large jump at generator seeding): stream 1: {o0. o1, o2, o3....} and stream 2: {o10000. o10001, o10002, o10003....} - this is most often "skip ahead" or "jump ahead" (iirc again)However interleaving is not really practical because of the repeated jumps draining performance (jumping with MT is much slower than just getting the next sample).QuoteI'm not sure what you meant by "having different internal MT state vectors". The state of the MT19337 is 624 bytes (or something like that). The boost code contains Matsumoto's algorithm (and maybe one of his colleages, I forget) of converting seeds to internal states in such a way to avoid "bad" internal states, i.e. that lead to cycles with bad statistical properties (this was a problem with the original version of MT). That improvement merely tried to remedy to those statistical issues but still gives no guarantee that the states are "far apart" enough to generate distinct streams.Therefore Matsumoto later also published a method to perform large jumps, but there are faster ones around ;-)Quote I also implement another algorithm for converting multiple seeds into an internal state.Interesting; wanna talk about it?

### Parallel RNG and distributed MC

Posted: **February 8th, 2012, 2:48 am**

by **MiloRambaldi**

QuoteOriginally posted by: CuchulainnSorry to hear that, Milo. C++ without templates is unthinkable in this day and age.I do hope you stick around Thanks. I wasn't planning on going anywhere

### Parallel RNG and distributed MC

Posted: **February 8th, 2012, 3:41 am**

by **MiloRambaldi**

QuoteOriginally posted by: quartzUnfortunately there's quite some confusion in the literature between these terms (and "skip-ahead") as well as the two corresponding techniques which both involve jumping. Get a stream of samples {o0, o1, o2, o3, o4, o5, o6, o7,...} (to simplify imagine that the output sample is also the generator state), then two disjoint/uncorrelated streams can be obtained by: - interleaving (short jumping at each sample): stream 1: {o0. o2, o4, o6....} and stream 2: {o1. o3, o5, o7....} - this is most commonly referred to as leap-frog, sometimes jump-ahead too (iirc)- nonoverlapping streams (large jump at generator seeding): stream 1: {o0. o1, o2, o3....} and stream 2: {o10000. o10001, o10002, o10003....} - this is most often "skip ahead" or "jump ahead" (iirc again)However interleaving is not really practical because of the repeated jumps draining performance (jumping with MT is much slower than just getting the next sample). So suppose we want a scalable MC simulation, i.e. the result from a given seed should be the same regardless of how many parallel processes are used. If I am not mistaken, the interleaving method has the limitation (besides performance) that the # of streams must be at least as large as the # of paths in the MC simulation. While the nonoverlapping method has the limitation that the jump ahead must be at least as long as the longest path. I don't understand why there is so much emphasis on these two jumping techniques in the literature. The third alternative, the one I implemented, is to have each path assigned a seed, where the seeds are generated by a secondary generator (in my case, MT for the primary and LCG for the secondary). This is trivially scalable, and did extremely well on tests for inter-path correlations. This is not a new method, but I could barely find anything about it in the literature.QuoteThat improvement merely tried to remedy to those statistical issues but still gives no guarantee that the states are "far apart" enough to generate distinct streams.Right. I don't think it had anything to do with distinct streams. IIRC, the issue was that if there were too many zero bytes in the initial state then it would take a *very* long time before the sequence started to behave like a random sequence of numbers. Quote MiloRambaldi: I also implement another algorithm for converting multiple seeds into an internal state. Interesting; wanna talk about it?I found the algorithm for converting multiple seeds into states on Matsumoto's website, where it was implemented in his C code. I don't recall any theoretical justification and it looked a bit "ad hoc". I understand there is some SVN for this project? Maybe I should just upload my implementation?

### Parallel RNG and distributed MC

Posted: **February 8th, 2012, 9:47 am**

by **quartz**

QuoteOriginally posted by: MiloRambaldiSo suppose we want a scalable MC simulation, i.e. the result from a given seed should be the same regardless of how many parallel processes are used. If I am not mistaken, the interleaving method has the limitation (besides performance) that the # of streams must be at least as large as the # of paths in the MC simulation. While the nonoverlapping method has the limitation that the jump ahead must be at least as long as the longest path. Yes, right! I didnt mention those problems cause in practice with large enough states (such as with MT) the jump can be long enough for all practical purposes, with all foreseeable technology. (for terminology: if you mean stochastic process paths, then you can still have more of them with the same RNG stream, so the problem with interleaving in the simplest case is the # of stream "seeds" vs # of threads/workers)The only case where things become more complicated is when there's a random and frequent demand for MANY new streams during the whole simulation, such as in particle physics, but I don't think this is now a matter concern for us (but please correct me in case it is). QuoteI don't understand why there is so much emphasis on these two jumping techniques in the literature. The third alternative, the one I implemented, is to have each path assigned a seed, where the seeds are generated by a secondary generator (in my case, MT for the primary and LCG for the secondary). This is trivially scalable, and did extremely well on tests for inter-path correlations. This is not a new method, but I could barely find anything about it in the literature.The problem is again that with such methods theres no *guarantee* that streams are uncorrelated. Often it all works fine when the generator states are large, but one can be "unlucky" and get overlapping streams. What's worse: there's not even a simple way to calculate the actual "probability" of this to happen, so you cant even be confident that it is rare "enough". It might even be just rare enough to pass manual "tests" AND frequent enough to slip occurring into production, which is certainly not what we want in a banking environment (competent audits wont like it either). Quality PRNG is very very tricky, that's why in 2012 even boost or CERN code still relies on hacks and prayer.On the other hand there are other generators than MT where interleaving is reasonably fast to be practical (e.g. a LCG, but we dont want to use'm anyway because of the poorer statistical quality).There are further reasons (e.g. "mixing" two PRNGs is doomed to cause problems) but I stop here...QuoteI found the algorithm for converting multiple seeds into states on Matsumoto's website, where it was implemented in his C code. I don't recall any theoretical justification and it looked a bit "ad hoc". Ah ok, I thought you were now talking of a further method instead.

### Parallel RNG and distributed MC

Posted: **February 8th, 2012, 7:55 pm**

by **Cuchulainn**

QuoteOriginally posted by: outrunI've checked in the 4 uniform rng's, and "normal box muller polar", working on the basic "normal box muller now", and the inverse method..Have you come across the Neave effect?

http://www.wilmott.com/messageview.cfm? ... adid=68934

### Parallel RNG and distributed MC

Posted: **February 8th, 2012, 9:45 pm**

by **Traden4Alpha**

QuoteOriginally posted by: outrunQ:If I have a rng that generates numbers [1,2,3,4,5,6], what is the best mapping to [0,1] prior to applying the inverse normal?1/7, 2/7 ,.. 6/7 ??That depends on what you want to accomplish.If you want to have each number represent the mid-point of a set of equally-sized intervals in [0,1], then the answer is 1/12, 3/12, 5/12, 7/12, 9/12, 11/12 (as you suggested).But if you want samples drawn from NORMINV(f(rng)) to have the same moments as a regular normal distribution, then you'll need something a bit different. In fact, I suspect you will need a non-linear transform to get the 2nd the 4th moment to behave.