[Dune] [dune-istl] mpi error due to wrong send rank

Simon Praetorius simon.praetorius at tu-dresden.de
Tue Jun 18 23:32:13 CEST 2019


Hi again,

I've sent my message too early:

The solution below does not solve the whole problem.
At first, it results in a singular matrix after restribution. And
second, changing the grid overlapSize to 0 gives me again the MPI error
with "MPI_ERR_RANK: invalid rank"

Best,
Simon


Am 18.06.19 um 23:21 schrieb Simon Praetorius:
> Hi everyone,
>
> after working another day on that problem I've found the source of the
> MPI error by chance: The ParallelIndexSet was not a real set, meaning:
> it contained duplicates.
>
> While the name ParallelIndexSET indicates, that internally it is a set
> or a map from global IDs to local indices, it actually it not. It is
> similar to a std::deque. Insertion of indices does not guarantee its
> uniqueness.This was a trap I was falling into. When the ParallelIndexSet
> has no unique entries, also the RemoteIndices container contains duplicates.
>
> Probably this is the reason why the computation of a `setPartition` is
> messed up. It first allocates a vector with
> size=parallelIndexSet.size(), where the latter contains all the
> duplicate entries, and initializes the vector entries with -1. Then the
> vector entries are modified to contain the correct processor rank, but
> this does only handle the unique entries and the duplicates are not
> changed and thus there remain several -1 values in the vector. Those are
> used as send ranks later on, leading to the MPI error.
>
> So, it clearly was my mistake to fill the set with duplicate entries.
> However, it is nowhere formulated as a strict requirement that inserted
> indices must be unique. Maybe an extended documentation in the
> ParallelIndexSet could help to not make the same mistake again, or
> better: Let the class itself guarantee that a set does not contain
> duplicates, by (a) using internally a set or map data structure, or (b)
> erasing duplicates after sorting the indices in endResize() (should be a
> cheap operation)
>
> On the other hand, it is not so nice to get an MPI error deep in
> implementation details. Even with all debug outputs and checks
> activates. Maybe at some point one could check the validity of the
> input, like the ParallelIndexSet or the RemoteIndices.
>
> Best,
> Simon
>
> Am 17.06.19 um 12:29 schrieb Simon Praetorius:
>> Hi Markus,
>>
>> a "small" test case is not so easy to create, but I managed to put
>> everything into a separate independent dune project that can be found on
>>
>>> https://gitlab.mn.tu-dresden.de/spraetor/amg_error
>> Simply build with
>>
>>> dunecontrol --current all
>> and run with
>>
>>> ./build-cmake/src/amg_error (WORKS)
>>> mpirun -np 2 ./build-cmake/src/amg_error (ERROR)
>> It uses dune-functions for the assembling and dune-istl for the linear
>> algebra data structures.
>>
>> I have tested the code with dune-2.6 and the latest git version.
>>
>> My system configuration is:
>>
>> Linux Mint 19.1 Tessa 64-bit
>> Kernel Linux 4.15.0-47-generic x86_64
>>
>> GCC/7.3.0
>> OpenMPI/3.1.1
>> CMake/3.11.4
>> binutils/2.30
>> OpenBLAS/0.3.1
>> SuiteSparse/5.1.2
>> SuperLU/5.2.1
>> METIS/5.1.0
>> ParMETIS/4.0.3
>>
>> Best regards,
>> Simon
>>
>> On 17.06.19 08:13, Markus Blatt wrote:
>>> Hi,
>>>
>>> Please provide a test case and tell us what system you are on (Linux version
>>> and MPI version).
>>>
>>> Cheers,
>>>
>>> Markus
>> _______________________________________________
>> Dune mailing list
>> Dune at lists.dune-project.org
>> https://lists.dune-project.org/mailman/listinfo/dune
>
> _______________________________________________
> Dune mailing list
> Dune at lists.dune-project.org
> https://lists.dune-project.org/mailman/listinfo/dune




More information about the Dune mailing list