February 5th, 2009, 8:23 am
My sense is that in mainstream multicore programming for C/C++, it's currently a 4-horse race: Intel's TBB, OpenMP, Cilk++, and - soon - Microsoft's Concurrency Runtime platform.Each has pros and cons. For example, a library approach such as TBB works best for applications that have relatively distinct, long-running functions that don't interact much with the rest of the app; may be less intrusive; and works best at the "leaves" of the computation. On the other hand, a language extension such as Cilk++ has cleaner, simpler syntax; typically requires less restructuring of legacy code; works well for unbalanced problems; and makes race detection easier.Regarding why one might choose Cilk++ over CUDA (caveat - not an expert on CUDA). CUDA targets GPUs, arguably a specialty processor; doesn't support recursion; limited to the SIMD (Single Instruction, Multiple Data) execution.So why use Cilk++... the considerations boil down to the combination of ease of use, performance, and reliability. Cilk++ maintains your serial semantics, so an app doesn't need to be substantially restructured; Cilk++ has a provably good scheduler that dynamically load balances, has low overhead on a single core, and scales linearly; has a race detector that is guaranteed to find races that could arise due to any scheduling of the threads; and a solution to the global variable problem that eliminates data races without having to resort to locks (that tend to destroy parallelism). [more info is available on our site, or I'd be glad to answer by email at ilya[at]cilk.com - don't mean to turn a post into a product pitch]