[Dune-devel] CI failure on dune-grid master debian 11 gcc-9-20

Christian Engwer christian.engwer at uni-muenster.de
Mon Sep 28 09:11:44 CEST 2020


Yesterday I had the same issue with a CI run of dune-istl. It seems this is related to MPI.

Ciao
Christian

Am 28. September 2020 08:55:58 MESZ schrieb Oliver Sander <oliver.sander at tu-dresden.de>:
>I just noticed the following in
>
>https://gitlab.dune-project.org/kilian.weishaupt/dune-grid/-/jobs/175885
>
>
>[1601275211.221338] [runner-d307b235-project-936-concurrent-0:11175:0] 
>mm_posix.c:162  UCX  ERROR Not enough memory to write total of 4292720
>bytes. Please check that /dev/shm or the directory you specified has
>more available memory.
>3773[1601275211.221802]
>[runner-d307b235-project-936-concurrent-0:11175:0]        uct_mem.c:132
>UCX  ERROR failed to allocate 4292720 bytes using md posix for
>mm_recv_desc: Out of memory
>3774[1601275211.222012]
>[runner-d307b235-project-936-concurrent-0:11175:0]          mpool.c:191
>UCX  ERROR Failed to allocate memory pool (name=mm_recv_desc) chunk:
>Out of memory
>3775[1601275211.222397]
>[runner-d307b235-project-936-concurrent-0:11175:0]       mm_iface.c:644
> UCX  ERROR failed to get the first receive descriptor
>3776[runner-d307b235-project-936-concurrent-0:11175]
>../../../../../../ompi/mca/pml/ucx/pml_ucx.c:291  Error: Failed to
>create UCP worker
>3777[runner-d307b235-project-936-concurrent-0:11175] [[28993,1],1]
>selected pml ob1, but peer [[28993,1],0] on
>runner-d307b235-project-936-concurrent-0 selected pml ucx
>3778--------------------------------------------------------------------------
>3779MPI_INIT has failed because at least one MPI process is unreachable
>3780from another.  This *usually* means that an underlying
>communication
>3781plugin -- such as a BTL or an MTL -- has either not loaded or not
>3782allowed itself to be used.  Your MPI job will now abort.
>3783You may wish to try to narrow down the problem;
>3784 * Check the output of ompi_info to see which BTL/MTL plugins are
>3785   available.
>3786 * Run your application with MPI_THREAD_SINGLE
>
>
>@CI_gurus can you please have a look!
>
>Thanks,
>Oliver
>
>
>On 27.09.20 09:10, Oliver Sander wrote:
>> Dear Dune,
>> 
>> the CI system for the master branch seems to fail with the debian 11
>gcc-9-20 image:
>> 
>>   https://gitlab.dune-project.org/core/dune-grid/-/pipelines/29754
>> 
>> It reports a run-time failure in some YaspGrid-related tests.  Any
>idea
>> about the possible causes of this?
>> 
>> Best regards,
>> Oliver
>> 
>> 
>> _______________________________________________
>> Dune-devel mailing list
>> Dune-devel at lists.dune-project.org
>> https://lists.dune-project.org/mailman/listinfo/dune-devel
>> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.dune-project.org/pipermail/dune-devel/attachments/20200928/70afce4d/attachment.htm>


More information about the Dune-devel mailing list