[Dune] Scalability of parallel grid construction
Aleksejs Fomins
aleksejs.fomins at lspr.swiss
Tue Dec 6 11:40:57 CET 2016
Dear Christian,
Curvilinear GMSH Reader is fully parallel, it only reads parts of the mesh on each process, exactly to avoid this bottleneck. It is sad to hear that ParMETIS is slow, but is there really an alternative for the very large grids?
Regards,
Aleksejs
On 06.12.2016 10:52, Christian Engwer wrote:
> As you load the whole mesh on each rank, I assume that your coarse
> mesh is not too large. In this case I suggest to use metis instead of
> parmetis. As many people observed parmetis doesn't scale well, thus it
> is better (if you have sufficient resources) to run metis locally and
> then communicate the data.
>
> Christian
>
> On Tue, Dec 06, 2016 at 09:47:30AM +0100, Aleksejs Fomins wrote:
>> Hey all,
>>
>> Thank you very much for your replies.
>>
>> The situation is like this:
>> 1) The grid does not exist yet
>> 2) I have just read the grid in parallel from gmsh, and partitioned it using ParMETIS. It is reasonable to assume that partition results in contiguous mesh parts on each process, that the element number is balanced within 1% and that the total number of interprocess boundaries is close to minimal. Other than that, ParMETIS does not provide additional information on, e.g, neighbouring processes. George Karypis has explicitly told me that this info is internally available, but I would have to internally modify ParMETIS code to access it.
>> 3) As Markus noticed, the task is to construct the ghost elements and the communication interfaces that would implement DataHandle. In my view, the most important step in such implementation is to find out all neighbouring processes for each boundary face, edge and vertex.
>>
>> I mentioned in my previous email, that currently I have one non-scalable operation in my code - all to all communication of all boundary vertex global indices, such that each process can find its vertex-neighbour-processes. All to all mpi communication is bad. As Markus suggests, communication can be done in a ring, communicating vertex indices between all pairs of processes. It does remove the memory bottleneck of such communication. However, in terms of computation speed, the complexity is still O(nTotalProcessBoundaries), which does not scale with gigantic mesh sizes.
>>
>> While it is not of immediate concern to run curvilinear grid on gigantic meshes, it would be good to know if someone has already invented/published an optimal solution to this problem. My very rough understanding is that an optimal algorithm would benefit from two observations:
>> 1) Most interprocessor boundary vertices are shared by two processes only, all other cases are progressively more rare
>> 2) If an interprocessor boundary vertex is shared between two processes, the surrounding vertices are likely to be shared by the same two processes
>>
>> My question is about the current theoretical progress on such optimal algorithm, as well as how this problem is currently solved in other unstructured grids (e.g. ALU,UG)
>>
>> Best regards,
>> Aleksejs
>>
>>
>> On 05.12.2016 23:47, Christian Engwer wrote:
>>> Hi Markus,
>>>
>>>> I do not think that this is an answer to Aleksejs question. I sounded
>>>> to as if he was asking how to setup the communication interfaces
>>>> within the grid implementation in a scalable manner.
>>> then I'm even more puzzled, as he has full access to the internals, it
>>> should be information which is immediatelly available.
>>>
>>> @Aleksejs could you shed some more light on this question?
>>>
>>> Christian
>>
More information about the Dune
mailing list