Serving the Quantitative Finance Community

Search found 8 matches

by hasmanean
December 1st, 2010, 9:13 am
Forum: Programming and Software Forum
Topic: A short test on the code efficiency of CUDA and thrust
Replies: 47
Views: 41987

A short test on the code efficiency of CUDA and thrust

<t>QuoteOriginally posted by: wwmchQuoteOriginally posted by: renormThe language of CUDA is only a modified C (some features of C++ are also supported). It is not difficult if you are familiar with C/C++. However, if we only translate the C/C++ code to CUDA, the performance won't be good at all. The...
by hasmanean
December 1st, 2010, 8:47 am
Forum: Programming and Software Forum
Topic: A short test on the code efficiency of CUDA and thrust
Replies: 47
Views: 41987

A short test on the code efficiency of CUDA and thrust

<t>It's nice that thrust has such low overhead on the GPU, although the numbers for thrust::host_vector on the CPU look a little slow. Was the library compiled from source and optimized for your particular processor, or did it use a precompiled binary? Maybe if you turned on all the right optimizati...
by hasmanean
November 20th, 2010, 11:07 am
Forum: Programming and Software Forum
Topic: GPU vs SIMD
Replies: 62
Views: 41298

GPU vs SIMD

<t>Well massive threading (massively parallel threading ) plays a role in the performance analysis. With CUDA, even if 1% of the threads will branch and 99% will not (and so finish their task quickly), threads are not run individually (like on a CPU), rather up to 32 threads at a time ( known as a "...
by hasmanean
November 20th, 2010, 1:43 am
Forum: Programming and Software Forum
Topic: GPU vs SIMD
Replies: 62
Views: 41298

GPU vs SIMD

<t>Originally IIRC, cuda originally did not support if statements inside GPU kernels, this was added later in version 2. As long as you access the data sequentially, and as long as there is enough register space available then there is enough leeway in how you process it to not affect performance to...
by hasmanean
November 19th, 2010, 2:10 pm
Forum: Programming and Software Forum
Topic: GPU vs SIMD
Replies: 62
Views: 41298

GPU vs SIMD

<t>The two pass algorithm works better, provided that the result of doing processing on "illegal" inputs (which the branch identifies) does not affect the output. GPUs really only care that memory be accessed in a linear, sequential fashion, so to do that there are 128-bit wide datatypes defined lik...
by hasmanean
November 19th, 2010, 10:54 am
Forum: Programming and Software Forum
Topic: GPU vs SIMD
Replies: 62
Views: 41298

GPU vs SIMD

<t>Small detail, I'm assuming that since the branch conditional is evaluated inside the loop, the data has to be loaded into general purpose registers in scalar form first and packed into the vector (the SSE registers) manually. So for the vectorized case M>1 it should read N*(1/M + P + (1 - 0.99^M)...
by hasmanean
November 14th, 2010, 7:13 pm
Forum: Programming and Software Forum
Topic: How to do fast element-wise array operations in C?
Replies: 3
Views: 28858

How to do fast element-wise array operations in C?

Have you looked at sparse matrices?
by hasmanean
November 12th, 2010, 7:12 pm
Forum: Programming and Software Forum
Topic: GPU vs SIMD
Replies: 62
Views: 41298

GPU vs SIMD

<t>Back in the 1980s processors were slower than memory, so optimizing meant reducing the number of arithmetic operations you carried out. The bottleneck today is memory access latency, so the point is to optimize the data access patterns. Both GPUs and Intel processors can do arithmetic operations ...