Multithreading tool

imirman · February 4th, 2009, 7:32 pm

I am looking for a compute-intensive C/C++ algorithm to multithread using the recently released Cilk++ programming platform (targeted at multicore programming, in particular bringing serial algorithms into the multicore realm).Any advice, or interest in playing around with the (free) download?thanks in advance.

vgoklani · February 5th, 2009, 12:20 am

Why choose this framework over another? What about CUDA?

imirman · February 5th, 2009, 8:23 am

My sense is that in mainstream multicore programming for C/C++, it's currently a 4-horse race: Intel's TBB, OpenMP, Cilk++, and - soon - Microsoft's Concurrency Runtime platform.Each has pros and cons. For example, a library approach such as TBB works best for applications that have relatively distinct, long-running functions that don't interact much with the rest of the app; may be less intrusive; and works best at the "leaves" of the computation. On the other hand, a language extension such as Cilk++ has cleaner, simpler syntax; typically requires less restructuring of legacy code; works well for unbalanced problems; and makes race detection easier.Regarding why one might choose Cilk++ over CUDA (caveat - not an expert on CUDA). CUDA targets GPUs, arguably a specialty processor; doesn't support recursion; limited to the SIMD (Single Instruction, Multiple Data) execution.So why use Cilk++... the considerations boil down to the combination of ease of use, performance, and reliability. Cilk++ maintains your serial semantics, so an app doesn't need to be substantially restructured; Cilk++ has a provably good scheduler that dynamically load balances, has low overhead on a single core, and scales linearly; has a race detector that is guaranteed to find races that could arise due to any scheduling of the threads; and a solution to the global variable problem that eliminates data races without having to resort to locks (that tend to destroy parallelism). [more info is available on our site, or I'd be glad to answer by email at ilya[at]cilk.com - don't mean to turn a post into a product pitch]

Cuchulainn · February 5th, 2009, 9:32 am

What is needed in my opinion is an accepted interface standard that hides all these hardware dependencies. One effort is OpenCL

bojan · February 5th, 2009, 11:40 am

In terms of multi-core processing models one should not forget plain old unix process connected with sockets. For some things in life, this is quite enough.... If you need to value a 1000 different contract that have to be valued independently anyway, very simple server/client and sockets is all you need.I think Ilya is after some concrete code so he can try to multithread it and see how well cilk++ does? The big one to try is Quantlib, http://quantlib.org/index.shtml

Cuchulainn · February 5th, 2009, 12:37 pm

QuoteOriginally posted by: bojanIn terms of multi-core processing models one should not forget plain old unix process connected with sockets. For some things in life, this is quite enough.... If you need to value a 1000 different contract that have to be valued independently anyway, very simple server/client and sockets is all you need.I think Ilya is after some concrete code so he can try to multithread it and see how well cilk++ does? The big one to try is Quantlib, http://quantlib.org/index.shtmlI hear what Ilya is saying. And indeed we need to distinguish between coarse and fine parallelism.In my experience porting serial to parallel C++ code is very time-consuming and error-prone, especially if it has been designed bottom-up, what I expecte is a large % of applications. There are so many optimisations that should take place before we even consider MT. So, the choices are:1. Incremental parallelism (loop-level at best?)2. Do a parallel design and implementation up-frontOption 1 is high-risk and may even give a speedup < 1 compared to the serial version. It's scary (e.g. silent data corruption).

graam · February 6th, 2009, 5:54 am

How is this done in Java platform?

imirman · February 6th, 2009, 1:12 pm

Thanks guys for the dialogue.Regarding Quantlib - is there a particular function and input data set that would be interesting?I ask because we're algorithms and parallel computing guys, but are novices with respect to what hard-core quants care about.thanks!ilya

Cuchulainn · February 8th, 2009, 10:44 am

In some applications (small data sets, not so much computation) the serial version is faster than the MT version. So, if the current response time is good enough, don't speed it up. But applications that do not scale are candidates for MT. Another scenario: there is no point parallelising an existing application in its current form, and there would be many reasons for this conclusion...Ilya,Good candidates imo PRNGMCparallel PDEParallel linear algebra (and other libraries)OptimisationI think there is a need for a company to provide these libraries; the alternative is each developer (and the vast majority do not have (any) experience of MT) has to do it themselves. This means less time for their own core activities (quant models). Ideally, the API should be h/w-independent.hth

MP3HiFi · February 13th, 2009, 4:21 pm

Hi,I think the performance depends on the interaction between the threads. Sometimes the overhead for multithread is much higher than the benefits. It would be easier, if the work could be splitted into parts.Example: Searching in a file. With two thread the file could be splitted and each thread in searching in his part. If the file hase only 4 lines, it overhead creating the thread than searching serial.ByeMartin