SERVING THE QUANTITATIVE FINANCE COMMUNITY

 
User avatar
Cuchulainn
Posts: 62391
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

Parallel RNG and distributed MC

February 9th, 2012, 5:27 pm

If you can specify the problem a bit more I can look up my code and try to modify to suit the MC needs. BTW I have a fixed point algo with Aitken acceleration for any inverse cdf if useful. It converges in 3-5 iterations (no derivs needed). http://en.wikipedia.org/wiki/Aitken%27s ... ed_process http://en.wikipedia.org/wiki/Fixed-point_iteration http://newtonexcelbach.wordpress.com/20 ... nh-method/
Last edited by Cuchulainn on February 8th, 2012, 11:00 pm, edited 1 time in total.
 
User avatar
Cuchulainn
Posts: 62391
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

Parallel RNG and distributed MC

February 9th, 2012, 5:54 pm

Sinh-tanh is better, something like ??http://docserver.carma.newcastle.edu.au ... un_mcs.pdf It has exponential convergence, it seems.
Last edited by Cuchulainn on February 8th, 2012, 11:00 pm, edited 1 time in total.
 
User avatar
MiloRambaldi
Posts: 273
Joined: July 26th, 2010, 12:27 am

Parallel RNG and distributed MC

February 10th, 2012, 7:19 am

QuoteOriginally posted by: quartzThe only case where things become more complicated is when there's a random and frequent demand for MANY new streams during the whole simulation, such as in particle physics, but I don't think this is now a matter concern for us (but please correct me in case it is). QuoteI don't understand why there is so much emphasis on these two jumping techniques in the literature. The third alternative, the one I implemented, is to have each path assigned a seed, where the seeds are generated by a secondary generator (in my case, MT for the primary and LCG for the secondary). This is trivially scalable, and did extremely well on tests for inter-path correlations. This is not a new method, but I could barely find anything about it in the literature.The problem is again that with such methods theres no *guarantee* that streams are uncorrelated. Often it all works fine when the generator states are large, but one can be "unlucky" and get overlapping streams. What's worse: there's not even a simple way to calculate the actual "probability" of this to happen, so you cant even be confident that it is rare "enough". It might even be just rare enough to pass manual "tests" AND frequent enough to slip occurring into production, which is certainly not what we want in a banking environment (competent audits wont like it either). Quality PRNG is very very tricky, that's why in 2012 even boost or CERN code still relies on hacks and prayer.On the other hand there are other generators than MT where interleaving is reasonably fast to be practical (e.g. a LCG, but we dont want to use'm anyway because of the poorer statistical quality).There are further reasons (e.g. "mixing" two PRNGs is doomed to cause problems) but I stop here...I will mention one paper I just looked at (it was fairly informative, but there surely better sources): "Testing parallel random number generators", by Srinivasan, Mascagni and Ceperley. They mention methods of parallelizing RNGs (they don't go into detail, since the paper focuses on test results). What they call "cycle division" consists of exactly the 3 methods we discusses above: the one I implemented with seeds chosen by a secondary RNG, interleaving which is commonly called "leap-frog" and non-overlapping which is usually called "stream splitting". The advantage of the 2 methods you gave over random seeding is that it is guaranteed that the streams will not overlap, which is the worst form of inter-stream correlation. However, there is still no guarantee that streams are uncorrelated. There is the well-known danger that by splitting up the cycle, long-range correlations become short-range correlations which are vastly worse. I agree that quality PRNG is extremely tricky. However, I found your scenario above unconvincing. If the streams are overlapping frequently enough to affect the results then this will be revealed with proper testing. It is similar to serial random number generation, where you cannot prove that there are no intra-stream correlations (apparently even a suitable mathematical definition of pseudo-randomness is unknown) but must rely on tests. A couple of more topics from the paper: (1) There are even more methods: They recommend parallelization by what they call "parameterization", rather than the 3 cycle division methods above. For many RNGs (MT is not discussed there), the state space naturally divides itself into a number of disjoint cycles. Alternatively, it may be possible to parameterize the iteration function for obtaining the next state from the current one.(2) Application based tests as opposed to statistical tests (I don't think this was mentioned in this thread).Anyhow, I guess the more methods the better (unless the method is inferior in all respects)? I can implement the random seed method, if you're not interested
 
User avatar
quartz
Posts: 424
Joined: June 28th, 2005, 12:33 pm

Parallel RNG and distributed MC

February 10th, 2012, 6:59 pm

Dear Milo, I was not making the case for jumping ahead, it was just a report of the common reasons why that technique has been perceived in the literature as an improvement over the pre-2006 situation, as had been asked. Among others Mascagni was of such opinion, together e.g. with L'Ecuyer and Matsumoto. Again, we're happy if you contribute code for random initialization, which is just as useful. No parallelization method is perfect for what we know (talking of MT-like generators). But the better you know the assumptions and the risks, the better one can hedge them. E.g. in the random initialization case I would suggest using a source of real random numbers for seeding, instead of an LCG, or atleast use a nonlinear generator (mixing two linear generators is way too risky).QuoteHowever, there is still no guarantee that streams are uncorrelated. There is the well-known danger that by splitting up the cycle, long-range correlations become short-range correlations which are vastly worse. Sure, you might get inter-stream correlations, that's aknowledged in all jump-ahead works afair. But the same holds for leap-frog, *in addition* to other problems. On the other hand random init IS risky for sure, we just dont know how much. Choice is a matter of taste and it all depends on environment and constraints. Moreover there are reasons (related to the way MT works) to believe that long range correlations - if any - would not be significantly different than those just beyond the state size limit. For other PRNGs it's a different story.Anyway we cant compare a surely existing but not measurable risk and a hypotetical but non measurable risk, there's not enough data. I could not really favour any of the two in such uncertainty, so better keep the choice and provide both options.QuoteIf the streams are overlapping frequently enough to affect the results then this will be revealed with proper testing. It is similar to serial random number generation, where you cannot prove that there are no intra-stream correlations (apparently even a suitable mathematical definition of pseudo-randomness is unknown) but must rely on tests. Why should a problem be revealed by testing already? and what is "proper" testing anyway? There's no way to know in advance how much a test guarantees, especially in a "random" setting where 1) test results can vary by definition :-) and 2) simulation outputs are confidence intervals that may well hide a bias even when they're consistent with eachother (and analytical values). We hope that QFCL will be used in a much larger amount of ways and longer times than any testing can compare to!See the Ferrenberg affair, where testing didnt discover any PRNG problems in time and issues came up later only on the field, on an actual application. Besides, if you're worried of long range interactions, then you should be even more so of the quality of MT itself in a serial setting already, which is known to have low diffusion speed and fail various statistical tests!Quote(1) There are even more methods: They recommend parallelization by what they call "parameterization", rather than the 3 cycle division methods above. For many RNGs (MT is not discussed there), the state space naturally divides itself into a number of disjoint cycles. Alternatively, it may be possible to parameterize the iteration function for obtaining the next state from the current one.This is the approach taken e.g. by a variant of MT, which is also implemented in Intel's MKL. It has similar drawbacks to jumping ahead with the classical MT, but might be preferable in some situations. We might code this too, it's just a matter of time and priorities.Quote (2) Application based tests as opposed to statistical tests (I don't think this was mentioned in this thread).Sure, the more testing QFCL gets the better! It is also more practical for us even though in various ways not as powerful as modern RNG test batteries.
Last edited by quartz on February 9th, 2012, 11:00 pm, edited 1 time in total.
 
User avatar
Cuchulainn
Posts: 62391
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

Parallel RNG and distributed MC

February 10th, 2012, 7:51 pm

QuoteOriginally posted by: outrunquick note: I got CUDA SDK working! It was a crappy install process I must say....My 100 Euro card can do (I ran provided benchmark examples)* a binomial tree option pricing with 2.000 time step: 3.000x per sec* 1.2 billion MC sampels / sec. * 1 billion B&S formula evaluations / sec* pricing an option with 250.000 MC paths: 25.000x per secWhat is the speedup?
 
User avatar
Cuchulainn
Posts: 62391
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

Parallel RNG and distributed MC

February 11th, 2012, 9:47 am

QuoteIf I use the 2nd bullet to compare, then the 100 Euro GPU does 60x more than the dual quad core server. The large server has slots and power for 4 fermi cards, scaling things another factor 40 up. My current card has 96 cores, the largest cards have 1.000 cores each. I am trying to compute effciency hereE = T(1)/(p * T(p))where T(p) is computing time on p processors (cores???). E in the closed range [0,1].Quotedual quad core server. T(4) or T(8)? What is T(1) (sequential)
Last edited by Cuchulainn on February 10th, 2012, 11:00 pm, edited 1 time in total.
ABOUT WILMOTT

PW by JB

Wilmott.com has been "Serving the Quantitative Finance Community" since 2001. Continued...


Twitter LinkedIn Instagram

JOBS BOARD

JOBS BOARD

Looking for a quant job, risk, algo trading,...? Browse jobs here...


GZIP: On