SERVING THE QUANTITATIVE FINANCE COMMUNITY

 
User avatar
dgibsontx
Topic Author
Posts: 2
Joined: March 18th, 2010, 8:30 pm

MATLAB and GPUs thru Jacket and CUDA

March 19th, 2010, 4:22 am

Hello All, first of all I will state up front that this post boarders on advertising but I could not help but providing visibility to some software that may be very valuable to this community that does not necessarily involve programming in C, C++ and CUDA but still enables performance improvements available via GPUs. I could not find any post referencing the Jacket software platform for MATLAB from AccelerEyes. Jacket has been referred to as the GPU Engine for MATLAB as it enables quants to use MATLAB to build algorithms and offload data to the GPU via Jacket's GPU datatypes. Behind the scenes, Jacket provides just-in-time compilation leveraging CUDA to get performance out of Nvidia GPUs(AMD and Intel will be in the future). A blog post this week on the AccelerEyes blog outlines some key areas where Jacket is being targeted in finance and at the Stern School of Business at NYU testing has just been completed showing 8 to 10x performance improvement in days of effort for large multiple regressions. We are confident the quant community can benefit from the platform and are willing to work with the community to assist where needed.Seems to me that so much power already available in notebooks and desktop computers, in the form of GPUs, should not go unused by a group that is so hungry for performance. Hope we can help and that I don't get crucified for this post.
 
User avatar
willsmith
Posts: 281
Joined: January 14th, 2008, 11:59 pm

MATLAB and GPUs thru Jacket and CUDA

March 24th, 2010, 9:29 am

See also www.gp-you.org which is an open source version of the same idea. Currently (2010a) there's nothing inbuilt into Matlab that knows anything about CUDA. Admittedly the above product is further progressed than GP-you. For example I tried to use Gp-you for some monte-carlo. You can do all the simulation on the GPU but unfortunately you can't yet generate the random numbers (normally distriibuted or whatever) on the GPU. So you have to gnerate them on the CPU and send them. But then the communications overheat dwarfs the gain from doing the matrix calculations on the GPU. And you have to do all the GPU coding well vectorised (no or minimal 'for loops'), to get the gain from the GPU, meaning your CPU equivalent is already quite well coded. My conclusion is that GPU stuff works well only where you can keep the CPU<->GPU communications much smaller than the problem size.
Last edited by willsmith on March 23rd, 2010, 11:00 pm, edited 1 time in total.
 
User avatar
zeta
Posts: 1973
Joined: September 27th, 2005, 3:25 pm
Location: Houston, TX
Contact:

MATLAB and GPUs thru Jacket and CUDA

March 24th, 2010, 1:08 pm

I've used and would recommend Jacket, although you can hack Matlab/Octave + CUDA yourself, I give details here. I had an interesting discussion at the same blog post with someone who's done this more extensively, their work is here. Interested parties might also want to check out GPULib.BTW willsmith, if you want better RNG, try the Mersenne example in the CUDA SDK. imho a good parallel RNG for cuda in higher dimensions doesn't exist yet
 
User avatar
Cuchulainn
Posts: 62626
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

MATLAB and GPUs thru Jacket and CUDA

March 24th, 2010, 1:45 pm

Interesting conclusions. QuoteMy conclusion is that GPU stuff works well only where you can keep the CPU<->GPU communications much smaller than the problem size. Or alternatively, for small problems. I suppose a 4-factor PDE is out of the question.
Last edited by Cuchulainn on March 23rd, 2010, 11:00 pm, edited 1 time in total.
Step over the gap, not into it. Watch the space between platform and train.
http://www.datasimfinancial.com
http://www.datasim.nl
 
User avatar
zeta
Posts: 1973
Joined: September 27th, 2005, 3:25 pm
Location: Houston, TX
Contact:

MATLAB and GPUs thru Jacket and CUDA

March 24th, 2010, 2:05 pm

au contraire Just don't be transferring data too much, unless it's asynchronous and hidden behind the cost of calculation. I'll get to my homework assignment in the other thread eventually
 
User avatar
Cuchulainn
Posts: 62626
Joined: July 16th, 2004, 7:38 am
Location: Amsterdam
Contact:

MATLAB and GPUs thru Jacket and CUDA

March 24th, 2010, 4:08 pm

QuoteJust don't be transferring data too much, unless it's asynchronous and hidden behind the cost of calculationThis is not realistic scenario for me. My problem is a data-intensive/blackboard and not an event-driven one, unfortunately.
Last edited by Cuchulainn on March 23rd, 2010, 11:00 pm, edited 1 time in total.
Step over the gap, not into it. Watch the space between platform and train.
http://www.datasimfinancial.com
http://www.datasim.nl
 
User avatar
zeta
Posts: 1973
Joined: September 27th, 2005, 3:25 pm
Location: Houston, TX
Contact:

MATLAB and GPUs thru Jacket and CUDA

March 24th, 2010, 4:34 pm

no problem; we can take option two; async ->http://xcr.cenit.latech.edu/hapc/termpr ... treams.pdf
 
User avatar
hamster
Posts: 216
Joined: October 12th, 2008, 3:51 pm

MATLAB and GPUs thru Jacket and CUDA

August 3rd, 2010, 2:05 pm

gp-you uses Matlab OOP classes. However the GPUdouble or GPUsingle objects behaves like objects and not like common Matlab data types. For example if you manipulate a gp-you object within a Matlab function (without intending the data be changed ouside the function), the data will be changed inside GPU memory - It is changed permanently. The whole memory allocation of gp-you does not conform how you write programs in Matlab. second, it is not really fast to use indexing with gp-you data type, eg x(:,1). rather you need to use special "low-level" functions such as assign. always you use indexing such x(:,1), you will cause data transfers between GPU and CPU what is very time consuming. But what is the point of using special function for trivial Matlab stuff?third, if you want the matlab program a little bit faster, you need to compile a Mex-file. The GPUcompiler have a huge problem. if you compile a for-loop, you need to use GPUfor instead of for. But the number of iterations is fixed. if you want your program to be flexible, eg. GPUfor n=1:N, it is not possible because N is fixed to the number you have used during compilation. So you need to compile over and over again every time you want a different number for-loops. the maturity of the gp-you interface is less than beta.
ABOUT WILMOTT

PW by JB

Wilmott.com has been "Serving the Quantitative Finance Community" since 2001. Continued...


Twitter LinkedIn Instagram

JOBS BOARD

JOBS BOARD

Looking for a quant job, risk, algo trading,...? Browse jobs here...


GZIP: On