Quoteah, it (the Wallace, 1996 one) is discussed in this GPU Gems link I've posted just before
, together with some trade-offs compared to B-M when used on GPUs.Still, perhaps the 2010 article you've encountered offers some new insights on the implementation?Wallace implicitly includes and underlying uniform generator, so you cannot plug in your favourite, impeding QMC too (and parallelization is kind of a mistery). Moreover it is known to have poor statistical quality (samples are quite dependent by construction).Hey, you must have to pay something for that extreme speed
(Oh well, in the end it WAS blazingly fast at the time, when memory transfers were fast compared to calculations, nowadays the situation has changed - except maybe on vector atchitectures. Also remember that speed tests generating serially a lot random samples - and nothing else - mask away the effect they are having in polluting cache memory, which whill slow down any real world calculation).I would expand on Polter's suggestion, having both BM and a fast inverse cumulative (such as Acklam's) for GPU and/or QMC, and Ziggurat for fast CPU MC.