SERVING THE QUANTITATIVE FINANCE COMMUNITY

  • 1
  • 4
  • 5
  • 6
  • 7
  • 8
  • 39
 
User avatar
quartz
Posts: 424
Joined: June 28th, 2005, 12:33 pm

Parallel RNG and distributed MC

January 29th, 2012, 1:30 pm

Cool, this random123 seems really nice, good tip outrun!QuoteOriginally posted by: outrunFrom random123"They all satisfy rigorous statistical testing (passing BigCrush in TestU01), vectorize and parallelize well (each generator can produce at least 2^64 independent streams), have long periods (the period of each stream is at least 2^128), require little or no memory or state, and have excellent performance (a few clock cycles per byte of random output)."The other test suite is DieHarder.Is there still interest for parallelization of MT? The advantage would be not requiring third party libs, but quality of a crypto PRNG should be higher. Maybe providing both is the best route...
 
User avatar
Cuchulainn
Posts: 62067
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

Parallel RNG and distributed MC

January 29th, 2012, 1:40 pm

QuoteIs there still interest for parallelization of MT? The advantage would be not requiring third party libs, but quality of a crypto PRNG should be higher. Maybe providing both is the best route...Yes, both. For the reasons given.
Last edited by Cuchulainn on January 28th, 2012, 11:00 pm, edited 1 time in total.
 
User avatar
Alan
Posts: 10163
Joined: December 19th, 2001, 4:01 am
Location: California
Contact:

Parallel RNG and distributed MC

January 29th, 2012, 2:12 pm

QuoteOriginally posted by: CuchulainnAlan,What about a more high-level interface to GPU, i.e. Accelerator F# Black-Scholes running on GPUs and SSE3 Multicore Processors using Accelerator edit: if you were running the code from C/C++ then a C++/CLI wrapper can be made around F#. Granted, an extra level of indirection but probably still massive speedup. http://msdn.microsoft.com/en-us/magazin ... 81.aspxHow would it work for Monte Carlo's? (looked briefly and saw no mention of that)
 
User avatar
Alan
Posts: 10163
Joined: December 19th, 2001, 4:01 am
Location: California
Contact:

Parallel RNG and distributed MC

January 29th, 2012, 2:22 pm

QuoteOriginally posted by: outrun1 thread 100M samples take 7 sec4 thread, 400M samples total take 14 sec..so I have 2 cores in my machine Naive question. Suppose you create threads without boost. Same result?
 
User avatar
Cuchulainn
Posts: 62067
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

Parallel RNG and distributed MC

January 29th, 2012, 2:36 pm

QuoteOriginally posted by: AlanQuoteOriginally posted by: CuchulainnAlan,What about a more high-level interface to GPU, i.e. Accelerator F# Black-Scholes running on GPUs and SSE3 Multicore Processors using Accelerator edit: if you were running the code from C/C++ then a C++/CLI wrapper can be made around F#. Granted, an extra level of indirection but probably still massive speedup. http://msdn.microsoft.com/en-us/magazin ... 81.aspxHow would it work for Monte Carlo's? (looked briefly and saw no mention of that)Well, 1) the code hides GPU-specific stuff for you 2) write C/C++ code and 3) wrap it in C++/CLI.
Last edited by Cuchulainn on January 28th, 2012, 11:00 pm, edited 1 time in total.
 
User avatar
Cuchulainn
Posts: 62067
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

Parallel RNG and distributed MC

January 29th, 2012, 2:40 pm

One scenario that is needed is that multiple 'simulators' threads can use ONE rng. Since Random123 is thead-safe and all the rest, then this should be fine, yes?
Last edited by Cuchulainn on January 28th, 2012, 11:00 pm, edited 1 time in total.
 
User avatar
Alan
Posts: 10163
Joined: December 19th, 2001, 4:01 am
Location: California
Contact:

Parallel RNG and distributed MC

January 29th, 2012, 4:21 pm

QuoteOriginally posted by: outrunQuoteOriginally posted by: AlanQuoteOriginally posted by: outrun1 thread 100M samples take 7 sec4 thread, 400M samples total take 14 sec..so I have 2 cores in my machine Naive question. Suppose you create threads without boost. Same result?Yes, the same result. Boost merely unifies the different platform specific thread-code into a uniform coding standard.Thanks. Another naive question. I see workstations advertised with "dual 12-core". So, your posted code (with say 24 threads)would allocate to all 24 hardware cores and run approx 24 times faster than the single thread version? (assuming the cores weren't busy with some other processes?)
Last edited by Alan on January 28th, 2012, 11:00 pm, edited 1 time in total.
 
User avatar
Cuchulainn
Posts: 62067
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

Parallel RNG and distributed MC

January 29th, 2012, 6:16 pm

This is the theoretical speedup (superlinear). Then you have Amdahl's etc. law. The speedup depends on the type of application.
Last edited by Cuchulainn on January 28th, 2012, 11:00 pm, edited 1 time in total.
 
User avatar
AVt
Posts: 1074
Joined: December 29th, 2001, 8:23 pm

Parallel RNG and distributed MC

January 29th, 2012, 7:32 pm

I always wonder whether one needs that and how about the statistical propertyof the union over those samples (are they really that independent ?).Even on my oldish PC (2.2 GHz AMD with 32 Bit Win, 1 Mb memory) for Marsaglia'sZiggurat I need only ~ 1.3 sec for 50*10^6 samples into 1 array and interfaceit to Excel for access (beyond I run out of memory) + creating the array needs~ 0.3 sec (rough measures using 'GetTickCount') and just with classical C codethough it is for normal variats, only.The main consumption would be to *process* the samples, no? Or is that too naive?
ABOUT WILMOTT

PW by JB

Wilmott.com has been "Serving the Quantitative Finance Community" since 2001. Continued...


Twitter LinkedIn Instagram

JOBS BOARD

JOBS BOARD

Looking for a quant job, risk, algo trading,...? Browse jobs here...


GZIP: On