[Dune] Parallel AMG on up to 65536 cores on Hector

Eike Mueller em459 at bath.ac.uk
Wed Feb 29 00:20:53 CET 2012


Hi Markus,

many thanks for your explanations. I'm not sure if I understand  
everything, maybe you can confirm if the following is right, or  
whether I got it completely wrong:

Without ParMETIS, each processor will coarsen its local region and  
data is only pulled together on one processor to solve the problem on  
the coarsest level. I can see that this causes all-to-all  
communication, so might be a problem for parallel scalability (when I  
say my code stopped scaling, I have not looked in detail at what  
causes the problem, of course, so this is just a wild guess)?
I presume that in my case, even if each process only has one dof left  
(but maybe more, if the lower limit is 2000), then the agglomerated  
problem, which is solved on one processor, still has 32768 dofs,  
whereas on smaller processor counts it was much smaller?

If ParMETIS is installed, as soon as the lower limit of dof on one  
process is reached, data will be pulled together on a smaller number  
of processes, where each now has more than 2000 dofs, then coarsening  
continues until the dofs per process fall below 2000 again and the  
process is repeated until we end up with one process with less than  
2000 dofs. So the size of problem that is solved on the coarsest level  
does not grow with the process count. ParMETIS is used (or rather the  
METIS subroutines inside it - which makes me believe I do not need to  
install Metis in addition to ParMETIS), to work out the best way of  
pulling data together, i.e. partition the problem between a decreasing  
number of processors on the coarser levels.

If I switch off data agglomeration, what happens at the coarsest level?

The next thing I'm going to do is run some tests on our smaller  
machine to see what impact the use of ParMETIS and SuperLU has on up  
to 512 cores.

Thanks a lot,

Eike


On 28 Feb 2012, at 16:54, Markus Blatt wrote:

> Hi,
>
> I will produce some TOFU here.
>
> The coarsening in our AMG method is decoupled. That is every
> process coarsens its regions and no agglomeration can take place
> across process boundaries.
>
> If you do not have ParMetis installed, we coarsen until we reach the
> coarsen target (defaults to 2000 dofs) or until we cannot coarsen any
> more. In your 32K case every process only has 1 unknown left. Then we
> agglomerate all the data on one master process and solve that system.
>
> I would recommend installing ParMETIS. Anyway, because we had a lot of
> troubles with ParMETIS on large core counts, we use metis on processor
> for computing the data agglomeration. (We use the metis methods
> provided with ParMETIS).
>
> If your coarse level system can be solved with BiCGSTAB perconditioned
> by your smoother, you do not need to install SuperLU. Otherwise you
> should.
>
> BTW: If you think that you do not need to agglomerate the data, there
> is the possibility to switch it off.
>
> Cheers,
>
> Markus
> On Fri, Feb 24, 2012 at 04:00:48PM +0000, Eike Mueller wrote:
>> I have now started some highly parallel runs on Hector where my
>> first goal is to get the solver to scale to 65536 cores (the maximal
>> available core count in Phase 3 is ~90,000). So far I have done some
>> weak scaling runs on 64, 512, 4096 and 32768 cores.
>>
>> I have not tuned anything, I use the ISTL Overlapping CG solver
>> backend with the parallel AMG preconditioner (with an SSOR point
>> smoother). I am not using SuperLU to solve the coarse level problem.
>> On the smaller machine (up to 800 cores), which I have used so far,
>> this already gave quite good results.
>>
>> Basically, as compute time on Hector is expensive, I would be
>> interested in whether anybody already has experience with the ideal
>> setup for the parallel AMG for very large core counts, which I could
>> use as a starting point.
>>
>> The two main questions are:
>>
>> * Will using SuperLU help (or be essential)?
>> * Will using ParMETIS help (or be essential) (and do I need to use
>> Metis in addition to ParMETIS, or will ParMETIS alone be enough?)?
>>
>> The first three runs (on 64, 512 and 4096 cores) look ok, with the
>> time per iteration increasing from 0.6s to 0.65s to 1.1s between 64
>> and 512 and 4096 cores (and on 8 cores I get 0.59s). The 32768 run
>> does not complete in 10 minutes, but manages to get to the point
>> where it has built the coarse grid matrices. This, however, takes
>> 48.7s instead of 6.5s on 4096 cores, so it has effectively stopped
>> scaling as 48.7/6.5 is not very far from 8.
>>
>> In the largest run I use 4096 x 4096 x 1024 = 1.8E10 degrees of  
>> freedom.
>>
>> I observed that for the 4096 and 32768 core runs I get this warning  
>> message:
>> 'Stopped coarsening because of rate breakdown 32768/32768=1<1.2
>> and the hierarchy is built up to 9 level only.'
>> I guess this is potentially a problem if I do not use SuperLU.
>>
>> I have not compiled with ParMetis support, which is why I get this
>> message as well:
>> 'Successive accumulation of data on coarse levels only works with
>> ParMETIS installed.  Fell back to accumulation to one domain on
>> coarsest level'
>>
>> Thank you very much for any ideas,
>>
>> Eike
>>
>>
>> _______________________________________________
>> Dune mailing list
>> Dune at dune-project.org
>> http://lists.dune-project.org/mailman/listinfo/dune
>>
>
> -- 
> Do you need more support with DUNE or HPC in general?
>
> Dr. Markus Blatt - HPC-Simulation-Software & Services http://www.dr-blatt.de
> Rappoltsweilerstr. 5, 68229 Mannheim, Germany
> Tel.: +49 (0) 160 97590858  Fax: +49 (0)322 1108991658
>
> _______________________________________________
> Dune mailing list
> Dune at dune-project.org
> http://lists.dune-project.org/mailman/listinfo/dune
>

Dr Eike Mueller
Department of Mathematical Sciences
University of Bath
e.mueller at bath.ac.uk

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.dune-project.org/pipermail/dune/attachments/20120228/18dcc143/attachment.htm>


More information about the Dune mailing list