[Dune] problem with ALUGrid<simplex> in parallel case using open mpi 1.3.3

Oswald Benedikt Benedikt.Oswald at psi.ch
Tue Nov 3 16:27:58 CET 2009


Dear Dune and particularly ALUGrid developers, you may remember our memory leak problem which
we now address with valgrind in the parallel case.

In order to be on the safe side we have upgraded as follows:

MPI to openmpi version1.3.3
valgrind version 3.5.0.
and recompiled ALUGrid with the respective mpi compiler version.

Now, when I run the code for a very simple grid (8 tetrahedra centered around the origin,
and refine them twice, the code aborts at the load balancing stage, cf. below.
It is the same with more complicated grids. If I use mpirun with only one single core it runs
without crash...

It is interesting to see that the same code, using openmpi version1.3 on Mac OS X runs through.

Now, could you comment on this ?


Thanks and have a great day! Benedikt



====================================================================
memLogMessage  1257265271.500000  "create ALUgrid object"  # 2009/11/03 16:21:11.500789
GridParameterBlock: Parameter 'dumpfilename' not specified, not dumping!
GridParameterBlock: Parameter 'dumpfilename' not specified, not dumping!

**WARNING (ignored) could'nt open file < alugrid.cfg > . Using default values: 
0 < [balance] < 1.2   partitioning method "METIS_PartGraphKway"


Created parallel ALUSimplexGrid<3,3> from macro grid file 'ALU3dGrid.qY3XMy'. 

2009-Nov-03 16:21:11.505444 ::: hades3d.cc:  333 ::: PRODUCTION  ::: [ reading the tetrahedral mesh
2009-Nov-03 16:21:11.505512 ::: hades3d.cc:  335 ::: PRODUCTION  ::: refining the mesh ... #[global refinement]=2
2009-Nov-03 16:21:11.506944 ::: hades3d.cc:  340 ::: PRODUCTION  ::: ready
2009-Nov-03 16:21:11.507010 ::: hades3d.cc:  343 ::: PRODUCTION  ::: [ load balancing the mesh
hades3d: parallel/gitter_pll_sti.h:727: void ALUGridSpace::LinkedObject::Identifier::read(__gnu_cxx::__normal_iterator<const int*, std::vector<int, std::allocator<int> > >&, const __gnu_cxx::__normal_iterator<const int*, std::vector<int, std::allocator<int> > >&): Assertion `pos != end' failed.
hades3d: parallel/gitter_pll_sti.h:727: void ALUGridSpace::LinkedObject::Identifier::read(__gnu_cxx::__normal_iterator<const int*, std::vector<int, std::allocator<int> > >&, const __gnu_cxx::__normal_iterator<const int*, std::vector<int, std::allocator<int> > >&): Assertion `pos != end' failed.
[felsim01:22012] *** Process received signal ***
[felsim01:22011] *** Process received signal ***
[felsim01:22012] Signal: Aborted (6)
[felsim01:22012] Signal code:  (-6)
[felsim01:22011] Signal: Aborted (6)
[felsim01:22011] Signal code:  (-6)
[felsim01:22012] [ 0] /lib64/libpthread.so.0 [0x35f8a0de60]
[felsim01:22012] [ 1] /lib64/libc.so.6(gsignal+0x35) [0x35f7e30045]
[felsim01:22012] [ 2] /lib64/libc.so.6(abort+0x110) [0x35f7e31ae0]
[felsim01:22012] [ 3] /lib64/libc.so.6(__assert_fail+0xf6) [0x35f7e29756]
[felsim01:22012] [ 4] ../../../hades3d/hades3d(_ZN12ALUGridSpace8identifyINS_6Gitter5hedgeEEEvNS_14AccessIteratorIT_E6HandleERSt6vectorISt4pairISt4listIS6_SaIS6_EESB_ESaISC_EERKNS_13MpAccessLocalE+0x1645) [0x7967f5]
[felsim01:22012] [ 5] ../../../hades3d/hades3d(_ZN12ALUGridSpace9GitterPll14MacroGitterPll14identificationERNS_13MpAccessLocalE+0x386) [0x7301e6]
[felsim01:22012] [ 6] ../../../hades3d/hades3d(_ZN12ALUGridSpace9GitterPll22notifyMacroGridChangesEv+0x75) [0x70bdf5]
[felsim01:22011] [ 0] /lib64/libpthread.so.0 [0x35f8a0de60]
[felsim01:22011] [ 1] /lib64/libc.so.6(gsignal+0x35) [0x35f7e30045]
[felsim01:22011] [ 2] /lib64/libc.so.6(abort+0x110) [0x35f7e31ae0]
[felsim01:22011] [ 3] /lib64/libc.so.6(__assert_fail+0xf6) [0x35f7e29756]
[felsim01:22011] [ 4] ../../../hades3d/hades3d(_ZN12ALUGridSpace8identifyINS_6Gitter5hedgeEEEvNS_14AccessIteratorIT_E6HandleERSt6vectorISt4pairISt4listIS6_SaIS6_EESB_ESaISC_EERKNS_13MpAccessLocalE+0x1645) [0x7967f5]
[felsim01:22011] [ 5] ../../../hades3d/hades3d(_ZN12ALUGridSpace9GitterPll14MacroGitterPll14identificationERNS_13MpAccessLocalE+0x386) [0x7301e6]
[felsim01:22011] [ 6] ../../../hades3d/hades3d(_ZN12ALUGridSpace9GitterPll22notifyMacroGridChangesEv+0x75) [0x70bdf5]
[felsim01:22011] [ 7] ../../../hades3d/hades3d(_ZN12ALUGridSpace9GitterPll29loadBalancerGridChangesNotifyEv+0x448) [0x70f5e8]
[felsim01:22011] [ 8] ../../../hades3d/hades3d(_ZN12ALUGridSpace13GitterDunePll15duneLoadBalanceEv+0xd) [0x70903d]
[felsim01:22011] [ 9] ../../../hades3d/hades3d(main+0x1702) [0x5b9102]
[felsim01:22011] [10] /lib64/libc.so.6(__libc_start_main+0xf4) [0x35f7e1d8a4]
[felsim01:22011] [11] ../../../hades3d/hades3d(_ZNSt8ios_base4InitD1Ev+0x49) [0x5b5929]
[felsim01:22011] *** End of error message ***
[felsim01:22012] [ 7] ../../../hades3d/hades3d(_ZN12ALUGridSpace9GitterPll29loadBalancerGridChangesNotifyEv+0x448) [0x70f5e8]
[felsim01:22012] [ 8] ../../../hades3d/hades3d(_ZN12ALUGridSpace13GitterDunePll15duneLoadBalanceEv+0xd) [0x70903d]
[felsim01:22012] [ 9] ../../../hades3d/hades3d(main+0x1702) [0x5b9102]
[felsim01:22012] [10] /lib64/libc.so.6(__libc_start_main+0xf4) [0x35f7e1d8a4]
[felsim01:22012] [11] ../../../hades3d/hades3d(_ZNSt8ios_base4InitD1Ev+0x49) [0x5b5929]
[felsim01:22012] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 22012 on node felsim01 exited on signal 6 (Aborted).
--------------------------------------------------------------------------


------------------------------------------------------------------------------------------------------------------
Benedikt Oswald, Dr. sc. techn., dipl. El. Ing. ETH, www.psi.ch, Computational Accelerator Scientist
Paul Scherrer  Institute (PSI), CH-5232 Villigen, Suisse, benedikt.oswald at psi.ch, +41(0)56 310 32 12
"Passion is required for any great work, and for the Revolution passion and audacity are required in big doses.", 
Ernesto 'Che' Guevara, Letter to his parents.
http://maxwell.psi.ch/amaswiki/index.php/User:BenediktOswald 
------------------------------------------------------------------------------------------------------------------






More information about the Dune mailing list