[Dune] Parallel AMG on up to 65536 cores on Hector

Tue Feb 28 17:54:15 CET 2012

Hi,

I will produce some TOFU here.

The coarsening in our AMG method is decoupled. That is every
process coarsens its regions and no agglomeration can take place
across process boundaries.

If you do not have ParMetis installed, we coarsen until we reach the
coarsen target (defaults to 2000 dofs) or until we cannot coarsen any
more. In your 32K case every process only has 1 unknown left. Then we
agglomerate all the data on one master process and solve that system.

I would recommend installing ParMETIS. Anyway, because we had a lot of
troubles with ParMETIS on large core counts, we use metis on processor
for computing the data agglomeration. (We use the metis methods
provided with ParMETIS).

If your coarse level system can be solved with BiCGSTAB perconditioned
by your smoother, you do not need to install SuperLU. Otherwise you
should.

BTW: If you think that you do not need to agglomerate the data, there
is the possibility to switch it off.

Cheers,

Markus
On Fri, Feb 24, 2012 at 04:00:48PM +0000, Eike Mueller wrote:
> I have now started some highly parallel runs on Hector where my
> first goal is to get the solver to scale to 65536 cores (the maximal
> available core count in Phase 3 is ~90,000). So far I have done some
> weak scaling runs on 64, 512, 4096 and 32768 cores.
> 
> I have not tuned anything, I use the ISTL Overlapping CG solver
> backend with the parallel AMG preconditioner (with an SSOR point
> smoother). I am not using SuperLU to solve the coarse level problem.
> On the smaller machine (up to 800 cores), which I have used so far,
> this already gave quite good results.
> 
> Basically, as compute time on Hector is expensive, I would be
> interested in whether anybody already has experience with the ideal
> setup for the parallel AMG for very large core counts, which I could
> use as a starting point.
> 
> The two main questions are:
> 
> * Will using SuperLU help (or be essential)?
> * Will using ParMETIS help (or be essential) (and do I need to use
> Metis in addition to ParMETIS, or will ParMETIS alone be enough?)?
> 
> The first three runs (on 64, 512 and 4096 cores) look ok, with the
> time per iteration increasing from 0.6s to 0.65s to 1.1s between 64
> and 512 and 4096 cores (and on 8 cores I get 0.59s). The 32768 run
> does not complete in 10 minutes, but manages to get to the point
> where it has built the coarse grid matrices. This, however, takes
> 48.7s instead of 6.5s on 4096 cores, so it has effectively stopped
> scaling as 48.7/6.5 is not very far from 8.
> 
> In the largest run I use 4096 x 4096 x 1024 = 1.8E10 degrees of freedom.
> 
> I observed that for the 4096 and 32768 core runs I get this warning message:
> 'Stopped coarsening because of rate breakdown 32768/32768=1<1.2
> and the hierarchy is built up to 9 level only.'
> I guess this is potentially a problem if I do not use SuperLU.
> 
> I have not compiled with ParMetis support, which is why I get this
> message as well:
> 'Successive accumulation of data on coarse levels only works with
> ParMETIS installed.  Fell back to accumulation to one domain on
> coarsest level'
> 
> Thank you very much for any ideas,
> 
> Eike
> 
> 
> _______________________________________________
> Dune mailing list
> Dune at dune-project.org
> http://lists.dune-project.org/mailman/listinfo/dune
> 

-- 
Do you need more support with DUNE or HPC in general? 

Dr. Markus Blatt - HPC-Simulation-Software & Services http://www.dr-blatt.de
Rappoltsweilerstr. 5, 68229 Mannheim, Germany
Tel.: +49 (0) 160 97590858  Fax: +49 (0)322 1108991658