Hello all,I am doing some research on the PCG algorithm, because I have been tasked with optimizing one at work. I was able to do this on the serial version by storing the sparse matrix in CSR which greatly sped up the matrix vector multiplication and was great for the serial version of this algorithm.However, it needs to be parallelized. I am having a bit of difficulty with this. Does anyone have any links to a good discussion of the PCG algorithm's parts, as well as what needs to be shared when doing it across multiple processors? I basically have my x0 defined as my initial mesh for this particle in cell problem. I am thinking i dont want to mess with CSR for the parallel version because the accounting and packing/unpacking could get nasty.Also as a side note - it seems every single reference i find on the internet has a different set of vars defined for the PCG algorithm, and uses a different pre-conditioner. And none of them seem to use the same preconditioner this linear solver I am working on uses (basically a multiply through by the original diagonal best I can tell). Does the preconditioner part happen only at the beggining of the iterations, before they start, or sporadically throughout if the solver is starting to get stuck or some such thing?ThanksDerek

Get yourself a copy of GreenbaumQuoteOriginally posted by: Dcole[...]However, it needs to be parallelized. I am having a bit of difficulty with this. Does anyone have any links to a good discussion of the PCG algorithm's parts, as well as what needs to be shared when doing it across multiple processors? I basically have my x0 defined as my initial mesh for this particle in cell problem. I am thinking i dont want to mess with CSR for the parallel version because the accounting and packing/unpacking could get nasty.[...]CG comprises simple vector operations (dot products, addition etc) and matrix-vector products. The vector operations easily parallelised. The matrix-vector products require more thought but can be well parallelised. I presume you will be able to find some reference implementations with google.QuoteOriginally posted by: Dcole[...]Also as a side note - it seems every single reference i find on the internet has a different set of vars defined for the PCG algorithm, and uses a different pre-conditioner. And none of them seem to use the same preconditioner this linear solver I am working on uses (basically a multiply through by the original diagonal best I can tell). Does the preconditioner part happen only at the beggining of the iterations, before they start, or sporadically throughout if the solver is starting to get stuck or some such thing?[...]Preconditioners are inherently problem specific. The diagonal preconditioner you describe is about the simplest, and is rarely useful. The preconditioner is applied within every iteration of CG.

QuoteOriginally posted by: DcoleHowever, it needs to be parallelized. I am having a bit of difficulty with this. Does anyone have any links to a good discussion of the PCG algorithm's parts, as well as what needs to be shared when doing it across multiple processors? I basically have my x0 defined as my initial mesh for this particle in cell problem. I am thinking i dont want to mess with CSR for the parallel version because the accounting and packing/unpacking could get nasty.rchardlm is perfectly right. You only need to parallelize your preconditioner, matrix-vector products and vector-vector operations.You do not want to drop the CSR format though. In contrast you want to reuse all of your efficient serial linear algebraMaybe this helps?I still have to ask: Why do you need to do all this on your own? There are lots of libraries out there.

Last edited by mblatt on August 6th, 2010, 10:00 pm, edited 1 time in total.

- FaridMoussaoui
**Posts:**507**Joined:****Location:**Genève, Genf, Ginevra, Geneva

Hi,It is not that difficult to parallelise the PCG algorithm. The parallelisation depends on the use of overlapping or not. Have a look to Saad software, pARMS a portable library of distributed-memory sparse iterative solvers. You can also have a look to the Trilinos project and especially to the AztecOO package.F. PS: An efficient preconditioner depends also on the problem. How you generate your symmetric matrix?

Last edited by FaridMoussaoui on August 9th, 2010, 10:00 pm, edited 1 time in total.

BTW: Jack Dongarra maintains a nice list of freely available linear algebra software.

Central to many precon schemes is incomplete LU which *isn't* easily parallelizable. While there are many parallel spMV kernels, even GPU based, I have yet to see fully parallel precon schemes. As I say elsewhere I would check out Olaf Schenk's excellent work (eg., pardiso), he is also very friendly and helpful

QuoteOriginally posted by: zeta Central to many precon schemes is incomplete LU which *isn't* easily parallelizable. And that is why often domain decomposition methods or parallel multilevel schemes are used.QuoteI have yet to see fully parallel precon schemes. Would you care to mention what you mean with "fully parallel"?Regarding Pardiso, maybe the original poster should point out what kind of parallelization he is aiming ar, GPU, shared memory, distributed memory (MPI)?

specifically, an ilu algorithm which executes from start to finish in parallel, preferably using a shared memory architecture; I don't think it exists in mkl for instance

QuoteOriginally posted by: zetaCentral to many precon schemes is incomplete LU which *isn't* easily parallelizable.I'm not familiar with any problems where ILU is the best preconditioner. Where would you use it?

iluo + gmres in reservoir simulation, the A matrices can be terrible. Here's details of the mkl interface

As we collaborate(d) with both INRIA in France and Statoil Research Centre in Norway, I can assure you that at least some of the people doing reservoir simulation use algebraic multigrid methods for these kind of problems.BTW if it is not in the mkl, it does not mean that it is not there in other libraries. I am pretty sure that parallel ilu preconditioners exist, as the parallelization should be rather straight forward due to the limited and forseeable additional fill-in.

I have no doubt -in fact the codebase is going to amg ultimately- when we get there I think the plan is saad et als miluot

QuoteOriginally posted by: mblatt[...]BTW if it is not in the mkl, it does not mean that it is not there in other libraries. I am pretty sure that parallel ilu preconditioners exist, as the parallelization should be rather straight forward due to the limited and forseeable additional fill-in.I don't think ILU (even ILU(0)) is very easily parallelised. For example, the ILU in Trilinos Ifpack (from a group who tend to write quality code) does not scale too well.QuoteOriginally posted by: zetaI have no doubt -in fact the codebase is going to amg ultimately- when we get there I think the plan is saad et als miluotDo you have a reference? (Google scholar wasn't very helpful).

should be some references hereI highly recommend spike/pspike by Sameh, schenk et al for those who want something highly scalable and alternative to precon krylov type solutions (I don't know if pspike is in the public domain yet)

GZIP: On