[Dune] [dune-istl] mpi error due to wrong send rank

Simon Praetorius simon.praetorius at tu-dresden.de
Tue Jun 18 23:21:20 CEST 2019


Hi everyone,

after working another day on that problem I've found the source of the
MPI error by chance: The ParallelIndexSet was not a real set, meaning:
it contained duplicates.

While the name ParallelIndexSET indicates, that internally it is a set
or a map from global IDs to local indices, it actually it not. It is
similar to a std::deque. Insertion of indices does not guarantee its
uniqueness.This was a trap I was falling into. When the ParallelIndexSet
has no unique entries, also the RemoteIndices container contains duplicates.

Probably this is the reason why the computation of a `setPartition` is
messed up. It first allocates a vector with
size=parallelIndexSet.size(), where the latter contains all the
duplicate entries, and initializes the vector entries with -1. Then the
vector entries are modified to contain the correct processor rank, but
this does only handle the unique entries and the duplicates are not
changed and thus there remain several -1 values in the vector. Those are
used as send ranks later on, leading to the MPI error.

So, it clearly was my mistake to fill the set with duplicate entries.
However, it is nowhere formulated as a strict requirement that inserted
indices must be unique. Maybe an extended documentation in the
ParallelIndexSet could help to not make the same mistake again, or
better: Let the class itself guarantee that a set does not contain
duplicates, by (a) using internally a set or map data structure, or (b)
erasing duplicates after sorting the indices in endResize() (should be a
cheap operation)

On the other hand, it is not so nice to get an MPI error deep in
implementation details. Even with all debug outputs and checks
activates. Maybe at some point one could check the validity of the
input, like the ParallelIndexSet or the RemoteIndices.

Best,
Simon

Am 17.06.19 um 12:29 schrieb Simon Praetorius:
> Hi Markus,
>
> a "small" test case is not so easy to create, but I managed to put
> everything into a separate independent dune project that can be found on
>
>> https://gitlab.mn.tu-dresden.de/spraetor/amg_error
> Simply build with
>
>> dunecontrol --current all
> and run with
>
>> ./build-cmake/src/amg_error (WORKS)
>> mpirun -np 2 ./build-cmake/src/amg_error (ERROR)
> It uses dune-functions for the assembling and dune-istl for the linear
> algebra data structures.
>
> I have tested the code with dune-2.6 and the latest git version.
>
> My system configuration is:
>
> Linux Mint 19.1 Tessa 64-bit
> Kernel Linux 4.15.0-47-generic x86_64
>
> GCC/7.3.0
> OpenMPI/3.1.1
> CMake/3.11.4
> binutils/2.30
> OpenBLAS/0.3.1
> SuiteSparse/5.1.2
> SuperLU/5.2.1
> METIS/5.1.0
> ParMETIS/4.0.3
>
> Best regards,
> Simon
>
> On 17.06.19 08:13, Markus Blatt wrote:
>> Hi,
>>
>> Please provide a test case and tell us what system you are on (Linux version
>> and MPI version).
>>
>> Cheers,
>>
>> Markus
> _______________________________________________
> Dune mailing list
> Dune at lists.dune-project.org
> https://lists.dune-project.org/mailman/listinfo/dune





More information about the Dune mailing list