[Dune] Segmentation Fault in loadBalance()

Marco Cisternino marco.cisternino at optimad.it
Wed Oct 3 12:58:44 CEST 2012


Good morning,
I'm experiencing something weird in calling loadBalance() method on a 
ALUCubeGrid<3,3>.
I build my coarse grid reading from a file with rank 0 and then I call 
the first loadBalance() to distribute the grid among the other processors.
In this case loadBalance gives no problem.
Then I refine the grid locally, marking the cells to be refined, calling 
preAdapt(), mapping my data to a persistent container, calling adapt(), 
mapping back from the persistent container and then calling postAdapt.
At the end of local refinement procedure I call loadBalance again.
If the refined grid is not unbalanced (globally refining or locally 
refining without getting an unbalanced grid) loadBalance works fine.
Let me sketch a four elements example (in every element the rank of the 
processor owing it)

------ ------
|  0  |   1  |
------ ------    Coarse grid
|  0  |   1  |
------ ------

------ ------ ------ ------
|  0  |   0  |   1  |   1  |
------ ------ ------ ------
|  0  |   0  |   1  |   1  |
------ ------ ------ ------  Globally Refined
|  0  |   0  |   1  |   1  |
------ ------ ------ ------
|  0  |   0  |   1  |   1  |
------ ------ ------ ------


------ ------ ------ ------
|  0  |   0  |   1  |   1  |
------ ------ ------ ------
|  0  |   0  |   1  |   1  |
------ ------ ------ ------  Locally Refined (balanced grid)
|              |               |
        0              1
|              |               |
------ ------ ------ ------

But if I refine the grid getting an unbalanced grid,

------ ------ ------ ------
|              |   1  |   1  |
        0       ------ ------
|              |   1  |   1  |
------ ------ ------ ------  Locally Refined (unbalanced grid)
|              |   1  |   1  |
        0       ------ ------
|              |   1  |   1  |
------ ------ ------ ------

loadBalance yields a Segmentation Fault and exactly:

std::bad_alloc'
   what():  std::bad_alloc
[marco-laptop:27837] *** Process received signal ***
[marco-laptop:27837] *** Process received signal ***
[marco-laptop:27837] Signal: Segmentation fault (11)
[marco-laptop:27837] Signal code: Address not mapped (1)
[marco-laptop:27837] Failing at address: 0x3b
[marco-laptop:27837] [ 0] [0xb77c9410]
[marco-laptop:27837] [ 1] [0xb77c9400]
[marco-laptop:27837] [ 2] /lib/tls/i686/cmov/libc.so.6(abort+0x182) 
[0xb7443a82]
[marco-laptop:27837] [ 3] 
/usr/lib/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x14f) 
[0xb768d52f]
[marco-laptop:27837] [ 4] /usr/lib/libstdc++.so.6(+0xbd465) [0xb768b465]
[marco-laptop:27837] [ 5] /usr/lib/libstdc++.so.6(+0xbd4a2) [0xb768b4a2]
[marco-laptop:27837] [ 6] /usr/lib/libstdc++.so.6(+0xbd5e1) [0xb768b5e1]
[marco-laptop:27837] [ 7] /usr/lib/libstdc++.so.6(_Znwj+0x7f) [0xb768bc5f]
[marco-laptop:27837] [ 8] /usr/lib/libstdc++.so.6(_Znaj+0x1d) [0xb768bd3d]
[marco-laptop:27837] [ 9] 
./dune_foo(_ZN12ALUGridSpace12LoadBalancer8DataBase11repartitionERNS_14MpAccessGlobalENS1_6methodERSt6vectorIiSaIiEEi+0x537) 
[0x864702d]
[marco-laptop:27837] [10] 
./dune_foo(_ZN12ALUGridSpace12LoadBalancer8DataBase11repartitionERNS_14MpAccessGlobalENS1_6methodE+0x49) 
[0x8646a11]
[marco-laptop:27837] [11] 
./dune_foo(_ZN12ALUGridSpace13GitterDunePll20repartitionMacroGridERNS_12LoadBalancer8DataBaseE+0x36) 
[0x8648f58]
[marco-laptop:27837] [12] 
./dune_foo(_ZN12ALUGridSpace9GitterPll29loadBalancerGridChangesNotifyEv+0x397) 
[0x8634e27]
[marco-laptop:27837] [13] 
./dune_foo(_ZN12ALUGridSpace13GitterDunePll15duneLoadBalanceEv+0x18) 
[0x864ab0a]
[marco-laptop:27837] [14] 
./dune_foo(_ZN4Dune19ALU3dGridCommHelperILNS_20ALU3dGridElementTypeE7EP19ompi_communicator_tE11loadBalanceERNS_9ALU3dGridILS1_7ES3_EE+0x3d) 
[0x84f7c48]
[marco-laptop:27837] [15] 
./dune_foo(_ZN4Dune9ALU3dGridILNS_20ALU3dGridElementTypeE7EP19ompi_communicator_tE11loadBalanceEv+0x11) 
[0x84f000d]
[marco-laptop:27837] [16] ./dune_foo(main+0x957) [0x84dd584]
[marco-laptop:27837] [17] 
/lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe6) [0xb742cbd6]
[marco-laptop:27837] [18] ./dune_foo() [0x84dc991]
[marco-laptop:27837] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 27837 on node marco-laptop 
exited on signal 11 (Segmentation fault).

If I don't care about my data, avoiding the mapping to/from the 
persistent container the error is different:

[marco-laptop:28539] *** Process received signal ***
[marco-laptop:28539] Signal: Segmentation fault (11)
[marco-laptop:28539] Signal code: Address not mapped (1)
[marco-laptop:28539] Failing at address: 0xe64e57e4
[marco-laptop:28538] *** Process received signal ***
[marco-laptop:28538] Signal: Segmentation fault (11)
[marco-laptop:28538] Signal code: Address not mapped (1)
[marco-laptop:28538] Failing at address: 0x9b7b000
[marco-laptop:28539] [ 0] [0xb77df410]
[marco-laptop:28539] [ 1] 
./dune_foo(libparmetis__Adaptive_Partition+0x46) [0x8721246]
[marco-laptop:28539] [ 2] ./dune_foo(ParMETIS_V3_AdaptiveRepart+0x273) 
[0x8721993]
[marco-laptop:28539] [ 3] 
./dune_foo(_ZN15ALUGridParMETIS31CALL_ParMETIS_V3_AdaptiveRepartEPiS0_S0_S0_S0_S0_S0_S0_S0_S0_PfS1_S1_S0_S0_S0_PP19ompi_communicator_t+0x81) 
[0x8658d0f]
[marco-laptop:28539] [ 4] 
./dune_foo(_ZN12ALUGridSpace12LoadBalancer8DataBase11repartitionERNS_14MpAccessGlobalENS1_6methodERSt6vectorIiSaIiEEi+0xaf9) 
[0x8642e6f]
[marco-laptop:28539] [ 5] 
./dune_foo(_ZN12ALUGridSpace12LoadBalancer8DataBase11repartitionERNS_14MpAccessGlobalENS1_6methodE+0x49) 
[0x8642291]
[marco-laptop:28539] [ 6] 
./dune_foo(_ZN12ALUGridSpace13GitterDunePll20repartitionMacroGridERNS_12LoadBalancer8DataBaseE+0x36) 
[0x86447d8]
[marco-laptop:28539] [ 7] 
./dune_foo(_ZN12ALUGridSpace9GitterPll29loadBalancerGridChangesNotifyEv+0x397) 
[0x86306a7]
[marco-laptop:28539] [ 8] 
./dune_foo(_ZN12ALUGridSpace13GitterDunePll15duneLoadBalanceEv+0x18) 
[0x864638a]
[marco-laptop:28539] [ 9] 
./dune_foo(_ZN4Dune19ALU3dGridCommHelperILNS_20ALU3dGridElementTypeE7EP19ompi_communicator_tE11loadBalanceERNS_9ALU3dGridILS1_7ES3_EE+0x3d) 
[0x84f496d]
[marco-laptop:28539] [10] 
./dune_foo(_ZN4Dune9ALU3dGridILNS_20ALU3dGridElementTypeE7EP19ompi_communicator_tE11loadBalanceEv+0x11) 
[0x84ed1f1]
[marco-laptop:28539] [11] ./dune_foo(main+0x957) [0x84dafa4]
[marco-laptop:28539] [12] 
/lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe6) [0xb7442bd6]
[marco-laptop:28539] [13] ./dune_foo() [0x84da3b1]
[marco-laptop:28539] *** End of error message ***
[marco-laptop:28538] [ 0] [0xb7715410]
[marco-laptop:28538] [ 1] 
./dune_foo(libparmetis__Adaptive_Partition+0x46) [0x8721246]
[marco-laptop:28538] [ 2] ./dune_foo(ParMETIS_V3_AdaptiveRepart+0x273) 
[0x8721993]
[marco-laptop:28538] [ 3] 
./dune_foo(_ZN15ALUGridParMETIS31CALL_ParMETIS_V3_AdaptiveRepartEPiS0_S0_S0_S0_S0_S0_S0_S0_S0_PfS1_S1_S0_S0_S0_PP19ompi_communicator_t+0x81) 
[0x8658d0f]
[marco-laptop:28538] [ 4] 
./dune_foo(_ZN12ALUGridSpace12LoadBalancer8DataBase11repartitionERNS_14MpAccessGlobalENS1_6methodERSt6vectorIiSaIiEEi+0xaf9) 
[0x8642e6f]
[marco-laptop:28538] [ 5] 
./dune_foo(_ZN12ALUGridSpace12LoadBalancer8DataBase11repartitionERNS_14MpAccessGlobalENS1_6methodE+0x49) 
[0x8642291]
[marco-laptop:28538] [ 6] 
./dune_foo(_ZN12ALUGridSpace13GitterDunePll20repartitionMacroGridERNS_12LoadBalancer8DataBaseE+0x36) 
[0x86447d8]
[marco-laptop:28538] [ 7] 
./dune_foo(_ZN12ALUGridSpace9GitterPll29loadBalancerGridChangesNotifyEv+0x397) 
[0x86306a7]
[marco-laptop:28538] [ 8] 
./dune_foo(_ZN12ALUGridSpace13GitterDunePll15duneLoadBalanceEv+0x18) 
[0x864638a]
[marco-laptop:28538] [ 9] 
./dune_foo(_ZN4Dune19ALU3dGridCommHelperILNS_20ALU3dGridElementTypeE7EP19ompi_communicator_tE11loadBalanceERNS_9ALU3dGridILS1_7ES3_EE+0x3d) 
[0x84f496d]
[marco-laptop:28538] [10] 
./dune_foo(_ZN4Dune9ALU3dGridILNS_20ALU3dGridElementTypeE7EP19ompi_communicator_tE11loadBalanceEv+0x11) 
[0x84ed1f1]
[marco-laptop:28538] [11] ./dune_foo(main+0x957) [0x84dafa4]
[marco-laptop:28538] [12] 
/lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe6) [0xb7378bd6]
[marco-laptop:28538] [13] ./dune_foo() [0x84da3b1]
[marco-laptop:28538] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 28539 on node marco-laptop 
exited on signal 11 (Segmentation fault).



My parameters in alugrid.cfg are 0,1.2,14.
Could anyone help me to understand what is happening, please?? Sincerely 
I have no idea!

Thanks a lot for any hint!

Best regards,
Marco






More information about the Dune mailing list