[Dune-devel] CI failure on dune-grid master debian 11 gcc-9-20
René Heß
rene.hk-edv at gmx.de
Mon Sep 28 10:44:45 CEST 2020
Hi,
comparing the two pipelines
https://gitlab.dune-project.org/core/dune-istl/-/pipelines/29834
https://gitlab.dune-project.org/core/dune-grid/-/pipelines/29754
show that they fail for the same image. (In dune-grid there are two
images that fail but the second one is apparently not used in
dune-istl).
The images that fail all use debian and gcc with C++-20. The Ubuntu
image with clang and C++-20 works. I'm not sure how to
proceed. Debugging through the images doesn't sound very appealing :(
Best regards,
René
Christian Engwer <christian.engwer at uni-muenster.de> writes:
> Yesterday I had the same issue with a CI run of dune-istl. It seems this is related to MPI.
>
> Ciao
> Christian
>
> Am 28. September 2020 08:55:58 MESZ schrieb Oliver Sander <oliver.sander at tu-dresden.de>:
>>I just noticed the following in
>>
>>https://gitlab.dune-project.org/kilian.weishaupt/dune-grid/-/jobs/175885
>>
>>
>>[1601275211.221338] [runner-d307b235-project-936-concurrent-0:11175:0]
>>mm_posix.c:162 UCX ERROR Not enough memory to write total of 4292720
>>bytes. Please check that /dev/shm or the directory you specified has
>>more available memory.
>>3773[1601275211.221802]
>>[runner-d307b235-project-936-concurrent-0:11175:0] uct_mem.c:132
>>UCX ERROR failed to allocate 4292720 bytes using md posix for
>>mm_recv_desc: Out of memory
>>3774[1601275211.222012]
>>[runner-d307b235-project-936-concurrent-0:11175:0] mpool.c:191
>>UCX ERROR Failed to allocate memory pool (name=mm_recv_desc) chunk:
>>Out of memory
>>3775[1601275211.222397]
>>[runner-d307b235-project-936-concurrent-0:11175:0] mm_iface.c:644
>> UCX ERROR failed to get the first receive descriptor
>>3776[runner-d307b235-project-936-concurrent-0:11175]
>>../../../../../../ompi/mca/pml/ucx/pml_ucx.c:291 Error: Failed to
>>create UCP worker
>>3777[runner-d307b235-project-936-concurrent-0:11175] [[28993,1],1]
>>selected pml ob1, but peer [[28993,1],0] on
>>runner-d307b235-project-936-concurrent-0 selected pml ucx
>>3778--------------------------------------------------------------------------
>>3779MPI_INIT has failed because at least one MPI process is unreachable
>>3780from another. This *usually* means that an underlying
>>communication
>>3781plugin -- such as a BTL or an MTL -- has either not loaded or not
>>3782allowed itself to be used. Your MPI job will now abort.
>>3783You may wish to try to narrow down the problem;
>>3784 * Check the output of ompi_info to see which BTL/MTL plugins are
>>3785 available.
>>3786 * Run your application with MPI_THREAD_SINGLE
>>
>>
>>@CI_gurus can you please have a look!
>>
>>Thanks,
>>Oliver
>>
>>
>>On 27.09.20 09:10, Oliver Sander wrote:
>>> Dear Dune,
>>>
>>> the CI system for the master branch seems to fail with the debian 11
>>gcc-9-20 image:
>>>
>>> https://gitlab.dune-project.org/core/dune-grid/-/pipelines/29754
>>>
>>> It reports a run-time failure in some YaspGrid-related tests. Any
>>idea
>>> about the possible causes of this?
>>>
>>> Best regards,
>>> Oliver
>>>
>>>
>>> _______________________________________________
>>> Dune-devel mailing list
>>> Dune-devel at lists.dune-project.org
>>> https://lists.dune-project.org/mailman/listinfo/dune-devel
>>>
> _______________________________________________
> Dune-devel mailing list
> Dune-devel at lists.dune-project.org
> https://lists.dune-project.org/mailman/listinfo/dune-devel
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <https://lists.dune-project.org/pipermail/dune-devel/attachments/20200928/f650cae1/attachment.sig>
More information about the Dune-devel
mailing list