[Dune] Scalability of parallel grid construction

Tue Dec 6 10:52:10 CET 2016

As you load the whole mesh on each rank, I assume that your coarse
mesh is not too large. In this case I suggest to use metis instead of
parmetis. As many people observed parmetis doesn't scale well, thus it
is better (if you have sufficient resources) to run metis locally and
then communicate the data.

Christian

On Tue, Dec 06, 2016 at 09:47:30AM +0100, Aleksejs Fomins wrote:
> Hey all,
> 
> Thank you very much for your replies.
> 
> The situation is like this:
> 1) The grid does not exist yet
> 2) I have just read the grid in parallel from gmsh, and partitioned it using ParMETIS. It is reasonable to assume that partition results in contiguous mesh parts on each process, that the element number is balanced within 1% and that the total number of interprocess boundaries is close to minimal. Other than that, ParMETIS does not provide additional information on, e.g, neighbouring processes. George Karypis has explicitly told me that this info is internally available, but I would have to internally modify ParMETIS code to access it.
> 3) As Markus noticed, the task is to construct the ghost elements and the communication interfaces that would implement DataHandle. In my view, the most important step in such implementation is to find out all neighbouring processes for each boundary face, edge and vertex.
> 
> I mentioned in my previous email, that currently I have one non-scalable operation in my code - all to all communication of all boundary vertex global indices, such that each process can find its vertex-neighbour-processes. All to all mpi communication is bad. As Markus suggests, communication can be done in a ring, communicating vertex indices between all pairs of processes. It does remove the memory bottleneck of such communication. However, in terms of computation speed, the complexity is still O(nTotalProcessBoundaries), which does not scale with gigantic mesh sizes.
> 
> While it is not of immediate concern to run curvilinear grid on gigantic meshes, it would be good to know if someone has already invented/published an optimal solution to this problem. My very rough understanding is that an optimal algorithm would benefit from two observations:
> 1) Most interprocessor boundary vertices are shared by two processes only, all other cases are progressively more rare
> 2) If an interprocessor boundary vertex is shared between two processes, the surrounding vertices are likely to be shared by the same two processes
> 
> My question is about the current theoretical progress on such optimal algorithm, as well as how this problem is currently solved in other unstructured grids (e.g. ALU,UG)
> 
> Best regards,
> Aleksejs
> 
> 
> On 05.12.2016 23:47, Christian Engwer wrote:
> > Hi Markus,
> >
> >> I  do not think that this is an answer to Aleksejs question. I sounded
> >> to as if he was asking how to setup the communication interfaces
> >> within the grid implementation in a scalable manner.
> > then I'm even more puzzled, as he has full access to the internals, it
> > should be information which is immediatelly available.
> >
> > @Aleksejs could you shed some more light on this question?
> >
> > Christian
> 
> 

-- 
Prof. Dr. Christian Engwer 
Institut für Numerische und Angewandte Mathematik
Fachbereich Mathematik und Informatik der Universität Münster
Einsteinstrasse 62
48149 Münster

E-Mail  christian.engwer at uni-muenster.de
Telefon +49 251 83-35067
FAX     +49 251 83-32729