[Dune-devel] CI failure on dune-grid master debian 11 gcc-9-20

René Heß rene.hk-edv at gmx.de
Mon Sep 28 10:44:45 CEST 2020


Hi,

comparing the two pipelines 

https://gitlab.dune-project.org/core/dune-istl/-/pipelines/29834
https://gitlab.dune-project.org/core/dune-grid/-/pipelines/29754

show that they fail for the same image. (In dune-grid there are two
images that fail but the second one is apparently not used in
dune-istl).

The images that fail all use debian and gcc with C++-20. The Ubuntu
image with clang and C++-20 works. I'm not sure how to
proceed. Debugging through the images doesn't sound very appealing :(


Best regards,
René



Christian Engwer <christian.engwer at uni-muenster.de> writes:

> Yesterday I had the same issue with a CI run of dune-istl. It seems this is related to MPI.
>
> Ciao
> Christian
>
> Am 28. September 2020 08:55:58 MESZ schrieb Oliver Sander <oliver.sander at tu-dresden.de>:
>>I just noticed the following in
>>
>>https://gitlab.dune-project.org/kilian.weishaupt/dune-grid/-/jobs/175885
>>
>>
>>[1601275211.221338] [runner-d307b235-project-936-concurrent-0:11175:0] 
>>mm_posix.c:162  UCX  ERROR Not enough memory to write total of 4292720
>>bytes. Please check that /dev/shm or the directory you specified has
>>more available memory.
>>3773[1601275211.221802]
>>[runner-d307b235-project-936-concurrent-0:11175:0]        uct_mem.c:132
>>UCX  ERROR failed to allocate 4292720 bytes using md posix for
>>mm_recv_desc: Out of memory
>>3774[1601275211.222012]
>>[runner-d307b235-project-936-concurrent-0:11175:0]          mpool.c:191
>>UCX  ERROR Failed to allocate memory pool (name=mm_recv_desc) chunk:
>>Out of memory
>>3775[1601275211.222397]
>>[runner-d307b235-project-936-concurrent-0:11175:0]       mm_iface.c:644
>> UCX  ERROR failed to get the first receive descriptor
>>3776[runner-d307b235-project-936-concurrent-0:11175]
>>../../../../../../ompi/mca/pml/ucx/pml_ucx.c:291  Error: Failed to
>>create UCP worker
>>3777[runner-d307b235-project-936-concurrent-0:11175] [[28993,1],1]
>>selected pml ob1, but peer [[28993,1],0] on
>>runner-d307b235-project-936-concurrent-0 selected pml ucx
>>3778--------------------------------------------------------------------------
>>3779MPI_INIT has failed because at least one MPI process is unreachable
>>3780from another.  This *usually* means that an underlying
>>communication
>>3781plugin -- such as a BTL or an MTL -- has either not loaded or not
>>3782allowed itself to be used.  Your MPI job will now abort.
>>3783You may wish to try to narrow down the problem;
>>3784 * Check the output of ompi_info to see which BTL/MTL plugins are
>>3785   available.
>>3786 * Run your application with MPI_THREAD_SINGLE
>>
>>
>>@CI_gurus can you please have a look!
>>
>>Thanks,
>>Oliver
>>
>>
>>On 27.09.20 09:10, Oliver Sander wrote:
>>> Dear Dune,
>>> 
>>> the CI system for the master branch seems to fail with the debian 11
>>gcc-9-20 image:
>>> 
>>>   https://gitlab.dune-project.org/core/dune-grid/-/pipelines/29754
>>> 
>>> It reports a run-time failure in some YaspGrid-related tests.  Any
>>idea
>>> about the possible causes of this?
>>> 
>>> Best regards,
>>> Oliver
>>> 
>>> 
>>> _______________________________________________
>>> Dune-devel mailing list
>>> Dune-devel at lists.dune-project.org
>>> https://lists.dune-project.org/mailman/listinfo/dune-devel
>>> 
> _______________________________________________
> Dune-devel mailing list
> Dune-devel at lists.dune-project.org
> https://lists.dune-project.org/mailman/listinfo/dune-devel
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <https://lists.dune-project.org/pipermail/dune-devel/attachments/20200928/f650cae1/attachment.sig>


More information about the Dune-devel mailing list