Serving the Quantitative Finance Community

 
User avatar
Bon
Topic Author
Posts: 0
Joined: May 24th, 2006, 9:12 am

parallel programming for Matrix class

August 7th, 2007, 5:07 am

If I'm dealing with very large matrices, what kind of common operations that would reasonably implemented with parallel programming?In that regard, how does the performance of QR factorization compare to that of SVD in computing the inverse of a matrix?
 
User avatar
stali
Posts: 0
Joined: January 10th, 2006, 12:40 am

parallel programming for Matrix class

August 7th, 2007, 10:17 am

Just use ScaLAPACK. It is trivial to use and has routines for SVD as well as QR factorization. Btw what sort of a problem (in CF) gives you a extremely large dense matrix ? Just curious.Quotewhat kind of common operations that would reasonably implemented with parallel programming?BLAS operations (look here).Edit: Most vendors already provide ScaLAPACK libraries (for example Intel's MKL, Sun's S3L etc.). If there is none on your machine then you can install one yourself from source. It's really trivial.
Last edited by stali on August 6th, 2007, 10:00 pm, edited 1 time in total.
 
User avatar
dmaniyar
Posts: 0
Joined: October 16th, 2006, 9:34 pm

parallel programming for Matrix class

August 7th, 2007, 11:19 am

If you want a library which can do this with very good design and matlab like syntax, look into <a href="http://itpp.sourceforge.net/">IT++</a>. It uses BLAS, LAPACK, and ATLAS routines. Though the PP part is still not that matured.
 
User avatar
stali
Posts: 0
Joined: January 10th, 2006, 12:40 am

parallel programming for Matrix class

August 8th, 2007, 12:01 am

I just wanted to make a quick note (which you probably know) that ScaLAPACK is best used with a distributed memory machine. If you want parallelism on a shared memory machine (dual/quad core/proc etc.) then all you need to do is simply use the vendor supplied LAPACK (eg Intel's MKL) and set the OMP_NUM_THREADS variable to 2 or more before running the code. The multi-threaded BLAS 3 will automatically parallelize parts of it.Attached is a toy example (in Fortran) which I just ran on a quad proc machine and here are the results (for a matrix of size N=2500; double precision) ...with OMP_NUM_THREADS = 1SVD Time= 380.650000000000with OMP_NUM_THREADS = 4SVD Time= 181.630000000000Please note that other people were logged on to the machine so the speedup might be skewed.