Serving the Quantitative Finance Community

 
User avatar
quartz
Posts: 3
Joined: June 28th, 2005, 12:33 pm

Parallel RNG and distributed MC

February 4th, 2012, 8:59 am

Quoteah, it (the Wallace, 1996 one) is discussed in this GPU Gems link I've posted just before :-), together with some trade-offs compared to B-M when used on GPUs.Still, perhaps the 2010 article you've encountered offers some new insights on the implementation?Wallace implicitly includes and underlying uniform generator, so you cannot plug in your favourite, impeding QMC too (and parallelization is kind of a mistery). Moreover it is known to have poor statistical quality (samples are quite dependent by construction).Hey, you must have to pay something for that extreme speed :-)(Oh well, in the end it WAS blazingly fast at the time, when memory transfers were fast compared to calculations, nowadays the situation has changed - except maybe on vector atchitectures. Also remember that speed tests generating serially a lot random samples - and nothing else - mask away the effect they are having in polluting cache memory, which whill slow down any real world calculation).I would expand on Polter's suggestion, having both BM and a fast inverse cumulative (such as Acklam's) for GPU and/or QMC, and Ziggurat for fast CPU MC.
 
User avatar
quartz
Posts: 3
Joined: June 28th, 2005, 12:33 pm

Parallel RNG and distributed MC

February 4th, 2012, 9:31 am

QuoteOriginally posted by: Polteroutrun, have you played with C++ AMP yet? It looks quite interesting, http://www.danielmoth.com/Blog/C-AMP-Op ... tion.aspxI would like the HW guys to decide on what they are doing, finally. PeakStream and Rapidmind were trying to develop a common SW standard, but were acquired and kinda killed, be the majors damned.Now NVIDIA just added another ball to the game: OpenACC, we have OpenCL already, Intel's ArBB and I probably forgot a couple more (yeah, each of them has a different scope), without counting all small company solutions. And already Cuda isnt so stable from version to version already...Problem is that as long as there's no clear HW standard, neither will there be a SW one.My humble opinion: the most sensible thing now might be focusing on algorithmic issues, and on minimum standard requirements such as those offered by OpenCL. Sure, you wont get optimal performance, no hardware squeezing (which had been my focus for ages), but time to delivery will be much faster partially compensating that (earlier adoption of faster HW), and in particular less risk of getting locked-in at a dead end.
Last edited by quartz on February 3rd, 2012, 11:00 pm, edited 1 time in total.
 
User avatar
Cuchulainn
Posts: 20254
Joined: July 16th, 2004, 7:38 am
Location: 20, 000

Parallel RNG and distributed MC

February 4th, 2012, 10:11 am

QuoteMy humble opinion: the most sensible thing now might be focusing on algorithmic issues, and on minimum standard requirements such as those offered by OpenCL. Sure, you wont get optimal performance, no hardware squeezing (which had been my focus for ages), but time to delivery will be much faster partially compensating that (earlier adoption of faster HW), and in particular less risk of getting locked-in at a dead end.In that case, we have to avoid spending ages on h/w and s/w tools and then to realise that it is not future-proof. Risk-driven software development again (I like the Spiral model).
Last edited by Cuchulainn on February 3rd, 2012, 11:00 pm, edited 1 time in total.
 
User avatar
Cuchulainn
Posts: 20254
Joined: July 16th, 2004, 7:38 am
Location: 20, 000

Parallel RNG and distributed MC

February 4th, 2012, 12:02 pm

QuoteOriginally posted by: outrunQuoteOriginally posted by: CuchulainnQuoteMy humble opinion: the most sensible thing now might be focusing on algorithmic issues, and on minimum standard requirements such as those offered by OpenCL. Sure, you wont get optimal performance, no hardware squeezing (which had been my focus for ages), but time to delivery will be much faster partially compensating that (earlier adoption of faster HW), and in particular less risk of getting locked-in at a dead end.In that case, we have to avoid spending ages on h/w and s/w tools and then to realise that it is not future-proof. Risk-driven software development again (I like the Spiral model).Indeed! Hopefully these algo's only cost 10 lines of code, and then it's no issue. Let's do it incremental: first get the GPU up and running with BM. I have norminv code from Graem West ready.Great. Unfortunately Graeme is not around to see his work used BTW my own ADE code is also 10 lines; the rest is all 'baggage' but still has to be done. Kind of BB (boring boilerplate )
Last edited by Cuchulainn on February 3rd, 2012, 11:00 pm, edited 1 time in total.
 
User avatar
MiloRambaldi
Posts: 1
Joined: July 26th, 2010, 12:27 am

Parallel RNG and distributed MC

February 6th, 2012, 3:51 am

I haven't been following this thread. Apologies in case I am repeating things ...I built a PRNG library at work. The MC methodology is like this:Every path in a MC simulation is assigned its own seed, and the path is generated from this seed by the primary RNG. The seed for each path is generated with a secondary RNG. The library was generic, but MT19937_64 (64-bit Mersenne Twister) was used for the primary RNG, and a 64-bit LCG was used for the secondary seed generator. This is fully scalable: The MC simulation will be identical regardless of how many parallel processers are used. The U(0,1) variates generated by this scheme passed all of the tests in L'Ecuyer's BigCrush test suite (i.e. both intra and inter-path correlations were tested for).The implementation of the MT and LCG was largely copied from boost. However, one article I read claims that the new 128-bit version of MT gives the best U(0,1) generator for multicore platforms. It takes advantage of some SIMD instruction set (I think, I know little on this), and would be worth implementing.There is also the following issue with generating N(0,1) and other variates from U(0,1). There are obviously many different methods, and which one is best depends on the application. For example, quantile (i.e. inverse CDF) has certain optimal properties for MC simulations, while apparently ziggurat is the fastest N(0,1) generator for multicore processing. A good design would allow for multiple methods, whereas boost only allows one method for each variate (e.g. boost uses the slow form of BM for generating normal variates from U(0,1) ). Even if the client decides on the quantiile method, I would assume there are different normal quantile algorithms with different speed/accuracy trade offs, and thus ideally multiple algorithms should be available for a given method as well.We also used the Eigen library to make multivariate generators.Please let me know if anything I have mentioned would be of use to the QFCL project.
 
User avatar
Cuchulainn
Posts: 20254
Joined: July 16th, 2004, 7:38 am
Location: 20, 000

Parallel RNG and distributed MC

February 6th, 2012, 8:44 am

This sounds very good. Ideally, a how-to document and code so that it can be with other libraries.Does it have a boost-style interface?
 
User avatar
MiloRambaldi
Posts: 1
Joined: July 26th, 2010, 12:27 am

Parallel RNG and distributed MC

February 7th, 2012, 12:38 am

QuoteOriginally posted by: CuchulainnDoes it have a boost-style interface?Yes. [rant] This was completed before my boss realized that templates involved having source code in header files. More recently I received a long lecture that "no one puts code in header files" because "its ridiculous" and "violates the Microsoft standard", and that it is impossible when there are multiple developers because "there would be too many header files to share". It seems it was not appreciated when I pointed out that STL is header files only and the "S" stands for "standard". Today I was seriously reprimanded and told I was well on my way to being fired, partly because I continued to use templates despite the lecture (I was able to remove them with a few hours work) -- PS. Milo is not my real name. [/rant]Sorry, I know this isn't the career forum ...
 
User avatar
MiloRambaldi
Posts: 1
Joined: July 26th, 2010, 12:27 am

Parallel RNG and distributed MC

February 7th, 2012, 1:13 am

QuoteOriginally posted by: CuchulainnThis sounds very good. Ideally, a how-to document and code so that it can be with other libraries.Apologies, but I wasn't clear enough. The code and documentation are property of the IB that I work at. However, there is nothing stopping me from rewriting this from scratch, and it will be better the second time, but of course that means it won't be available right away ...
 
User avatar
Polter
Posts: 1
Joined: April 29th, 2008, 4:55 pm

Parallel RNG and distributed MC

February 7th, 2012, 2:11 am

Milo, my commiserations! Just to cheer you up, (disclaimer: to be treated strictly in a tongue-in-cheek manner, no responsibility for career advice or otherwise assumed, etc. ) here ;-) EDIT: Incidentally, are you allowed to use assembly language? Perhaps you can provide only the compiler-generated assembly code, while continuing to use C++ source code and templates yourself? (The disclaimer applies at least as much as above ).
Last edited by Polter on February 6th, 2012, 11:00 pm, edited 1 time in total.
 
User avatar
MiloRambaldi
Posts: 1
Joined: July 26th, 2010, 12:27 am

Parallel RNG and distributed MC

February 7th, 2012, 3:34 am

QuoteOriginally posted by: PolterMilo, my commiserations! Just to cheer you up, (disclaimer: to be treated strictly in a tongue-in-cheek manner, no responsibility for career advice or otherwise assumed, etc. ) here ;-) "bizarre nonsensical reasons", very relevant indeed!QuoteEDIT: Incidentally, are you allowed to use assembly language? Perhaps you can provide only the compiler-generated assembly code, while continuing to use C++ source code and templates yourself? (The disclaimer applies at least as much as above ).Hehe. Its worth a shot, but I seriously doubt it. The problem seems to be that my boss is a complete novice programmer and he is trying to force everyone to code just like him (its not just templates, a large amount of my code was thrown out, and it seems that the problem is that I used some basic design patterns).
 
User avatar
MiloRambaldi
Posts: 1
Joined: July 26th, 2010, 12:27 am

Parallel RNG and distributed MC

February 7th, 2012, 3:47 am

QuoteOriginally posted by: outrunI see a slight thing wit the seed method, the random seed might be replaces with alternative methods that generate guaranteed independent sequences. Quartz is working on that too. There are two methods. Jumping ahead in the sequence, or having different internal MT state vectors (there is a paper on that, I think you can find it via Tina's RNG). I've read several papers on this. I think the "Jumping ahead" is called the leap-frog method. I don't believe that its as scalable as giving each path its own seed, but its been a while since I read about this. Also leap-frog can be difficult to implement, but of course the more methods to choose from the better. I'm not sure what you meant by "having different internal MT state vectors". The state of the MT19337 is 624 bytes (or something like that). The boost code contains Matsumoto's algorithm (and maybe one of his colleages, I forget) of converting seeds to internal states in such a way to avoid "bad" internal states, i.e. that lead to cycles with bad statistical properties (this was a problem with the original version of MT). I also implement another algorithm for converting multiple seeds into an internal state. QuoteWould it be an idea to merge yours and Quartzs code into svn so that the best of both worlds get merged? As per my previous post, I will have to redo this from scratch. We should therefore make some design decisions first ... QuoteI've just gone through the boost source-code, and will post a suggestion on how we work with that. It has both some statistical and performance problems that we needs to watch out for. ...Yes. I think I remember that some pieces of the boost random library had inefficient performance.