[Dune] Segmentation Fault in loadBalance()

Dedner, Andreas A.S.Dedner at warwick.ac.uk
Wed Oct 3 13:20:58 CEST 2012


Its extremely difficult to tell from the output of the segmentation fault - is there anyway to
reproduce that for example with the grid-howto code?
Two questions I do have:
- you are calling loadBalance with the dataHandle I'm assuming? What I'm wondering about is that
   you say that you are
      mapping back from the persistent container and then calling postAdapt 
  and then you
     call loadBalance again.
  That will not work because loadBalance also changes the indexSets (you have to think of it
   as part of the grid modificaion phase, i.e., call it before postAdapt and before moving the data out
   of the persistentContainer. You need to call loadBalance with the dataHandle objects and that has to
   work on the persistentContainer.
- Please try on of the metis methods, e.g., use lugrid.cfg 0,1.2,11 
  There might be a problem with the parmetis bindings (at least I do not have that much experience with
   parmetis and alu. Perhaps others have?
Andreas

________________________________________
From: dune-bounces+a.s.dedner=warwick.ac.uk at dune-project.org [dune-bounces+a.s.dedner=warwick.ac.uk at dune-project.org] on behalf of Marco Cisternino [marco.cisternino at optimad.it]
Sent: 03 October 2012 11:58
To: dune at dune-project.org
Subject: [Dune] Segmentation Fault in loadBalance()

Good morning,
I'm experiencing something weird in calling loadBalance() method on a
ALUCubeGrid<3,3>.
I build my coarse grid reading from a file with rank 0 and then I call
the first loadBalance() to distribute the grid among the other processors.
In this case loadBalance gives no problem.
Then I refine the grid locally, marking the cells to be refined, calling
preAdapt(), mapping my data to a persistent container, calling adapt(),
mapping back from the persistent container and then calling postAdapt.
At the end of local refinement procedure I call loadBalance again.
If the refined grid is not unbalanced (globally refining or locally
refining without getting an unbalanced grid) loadBalance works fine.
Let me sketch a four elements example (in every element the rank of the
processor owing it)

------ ------
|  0  |   1  |
------ ------    Coarse grid
|  0  |   1  |
------ ------

------ ------ ------ ------
|  0  |   0  |   1  |   1  |
------ ------ ------ ------
|  0  |   0  |   1  |   1  |
------ ------ ------ ------  Globally Refined
|  0  |   0  |   1  |   1  |
------ ------ ------ ------
|  0  |   0  |   1  |   1  |
------ ------ ------ ------


------ ------ ------ ------
|  0  |   0  |   1  |   1  |
------ ------ ------ ------
|  0  |   0  |   1  |   1  |
------ ------ ------ ------  Locally Refined (balanced grid)
|              |               |
        0              1
|              |               |
------ ------ ------ ------

But if I refine the grid getting an unbalanced grid,

------ ------ ------ ------
|              |   1  |   1  |
        0       ------ ------
|              |   1  |   1  |
------ ------ ------ ------  Locally Refined (unbalanced grid)
|              |   1  |   1  |
        0       ------ ------
|              |   1  |   1  |
------ ------ ------ ------

loadBalance yields a Segmentation Fault and exactly:

std::bad_alloc'
   what():  std::bad_alloc
[marco-laptop:27837] *** Process received signal ***
[marco-laptop:27837] *** Process received signal ***
[marco-laptop:27837] Signal: Segmentation fault (11)
[marco-laptop:27837] Signal code: Address not mapped (1)
[marco-laptop:27837] Failing at address: 0x3b
[marco-laptop:27837] [ 0] [0xb77c9410]
[marco-laptop:27837] [ 1] [0xb77c9400]
[marco-laptop:27837] [ 2] /lib/tls/i686/cmov/libc.so.6(abort+0x182)
[0xb7443a82]
[marco-laptop:27837] [ 3]
/usr/lib/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x14f)
[0xb768d52f]
[marco-laptop:27837] [ 4] /usr/lib/libstdc++.so.6(+0xbd465) [0xb768b465]
[marco-laptop:27837] [ 5] /usr/lib/libstdc++.so.6(+0xbd4a2) [0xb768b4a2]
[marco-laptop:27837] [ 6] /usr/lib/libstdc++.so.6(+0xbd5e1) [0xb768b5e1]
[marco-laptop:27837] [ 7] /usr/lib/libstdc++.so.6(_Znwj+0x7f) [0xb768bc5f]
[marco-laptop:27837] [ 8] /usr/lib/libstdc++.so.6(_Znaj+0x1d) [0xb768bd3d]
[marco-laptop:27837] [ 9]
./dune_foo(_ZN12ALUGridSpace12LoadBalancer8DataBase11repartitionERNS_14MpAccessGlobalENS1_6methodERSt6vectorIiSaIiEEi+0x537)
[0x864702d]
[marco-laptop:27837] [10]
./dune_foo(_ZN12ALUGridSpace12LoadBalancer8DataBase11repartitionERNS_14MpAccessGlobalENS1_6methodE+0x49)
[0x8646a11]
[marco-laptop:27837] [11]
./dune_foo(_ZN12ALUGridSpace13GitterDunePll20repartitionMacroGridERNS_12LoadBalancer8DataBaseE+0x36)
[0x8648f58]
[marco-laptop:27837] [12]
./dune_foo(_ZN12ALUGridSpace9GitterPll29loadBalancerGridChangesNotifyEv+0x397)
[0x8634e27]
[marco-laptop:27837] [13]
./dune_foo(_ZN12ALUGridSpace13GitterDunePll15duneLoadBalanceEv+0x18)
[0x864ab0a]
[marco-laptop:27837] [14]
./dune_foo(_ZN4Dune19ALU3dGridCommHelperILNS_20ALU3dGridElementTypeE7EP19ompi_communicator_tE11loadBalanceERNS_9ALU3dGridILS1_7ES3_EE+0x3d)
[0x84f7c48]
[marco-laptop:27837] [15]
./dune_foo(_ZN4Dune9ALU3dGridILNS_20ALU3dGridElementTypeE7EP19ompi_communicator_tE11loadBalanceEv+0x11)
[0x84f000d]
[marco-laptop:27837] [16] ./dune_foo(main+0x957) [0x84dd584]
[marco-laptop:27837] [17]
/lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe6) [0xb742cbd6]
[marco-laptop:27837] [18] ./dune_foo() [0x84dc991]
[marco-laptop:27837] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 27837 on node marco-laptop
exited on signal 11 (Segmentation fault).

If I don't care about my data, avoiding the mapping to/from the
persistent container the error is different:

[marco-laptop:28539] *** Process received signal ***
[marco-laptop:28539] Signal: Segmentation fault (11)
[marco-laptop:28539] Signal code: Address not mapped (1)
[marco-laptop:28539] Failing at address: 0xe64e57e4
[marco-laptop:28538] *** Process received signal ***
[marco-laptop:28538] Signal: Segmentation fault (11)
[marco-laptop:28538] Signal code: Address not mapped (1)
[marco-laptop:28538] Failing at address: 0x9b7b000
[marco-laptop:28539] [ 0] [0xb77df410]
[marco-laptop:28539] [ 1]
./dune_foo(libparmetis__Adaptive_Partition+0x46) [0x8721246]
[marco-laptop:28539] [ 2] ./dune_foo(ParMETIS_V3_AdaptiveRepart+0x273)
[0x8721993]
[marco-laptop:28539] [ 3]
./dune_foo(_ZN15ALUGridParMETIS31CALL_ParMETIS_V3_AdaptiveRepartEPiS0_S0_S0_S0_S0_S0_S0_S0_S0_PfS1_S1_S0_S0_S0_PP19ompi_communicator_t+0x81)
[0x8658d0f]
[marco-laptop:28539] [ 4]
./dune_foo(_ZN12ALUGridSpace12LoadBalancer8DataBase11repartitionERNS_14MpAccessGlobalENS1_6methodERSt6vectorIiSaIiEEi+0xaf9)
[0x8642e6f]
[marco-laptop:28539] [ 5]
./dune_foo(_ZN12ALUGridSpace12LoadBalancer8DataBase11repartitionERNS_14MpAccessGlobalENS1_6methodE+0x49)
[0x8642291]
[marco-laptop:28539] [ 6]
./dune_foo(_ZN12ALUGridSpace13GitterDunePll20repartitionMacroGridERNS_12LoadBalancer8DataBaseE+0x36)
[0x86447d8]
[marco-laptop:28539] [ 7]
./dune_foo(_ZN12ALUGridSpace9GitterPll29loadBalancerGridChangesNotifyEv+0x397)
[0x86306a7]
[marco-laptop:28539] [ 8]
./dune_foo(_ZN12ALUGridSpace13GitterDunePll15duneLoadBalanceEv+0x18)
[0x864638a]
[marco-laptop:28539] [ 9]
./dune_foo(_ZN4Dune19ALU3dGridCommHelperILNS_20ALU3dGridElementTypeE7EP19ompi_communicator_tE11loadBalanceERNS_9ALU3dGridILS1_7ES3_EE+0x3d)
[0x84f496d]
[marco-laptop:28539] [10]
./dune_foo(_ZN4Dune9ALU3dGridILNS_20ALU3dGridElementTypeE7EP19ompi_communicator_tE11loadBalanceEv+0x11)
[0x84ed1f1]
[marco-laptop:28539] [11] ./dune_foo(main+0x957) [0x84dafa4]
[marco-laptop:28539] [12]
/lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe6) [0xb7442bd6]
[marco-laptop:28539] [13] ./dune_foo() [0x84da3b1]
[marco-laptop:28539] *** End of error message ***
[marco-laptop:28538] [ 0] [0xb7715410]
[marco-laptop:28538] [ 1]
./dune_foo(libparmetis__Adaptive_Partition+0x46) [0x8721246]
[marco-laptop:28538] [ 2] ./dune_foo(ParMETIS_V3_AdaptiveRepart+0x273)
[0x8721993]
[marco-laptop:28538] [ 3]
./dune_foo(_ZN15ALUGridParMETIS31CALL_ParMETIS_V3_AdaptiveRepartEPiS0_S0_S0_S0_S0_S0_S0_S0_S0_PfS1_S1_S0_S0_S0_PP19ompi_communicator_t+0x81)
[0x8658d0f]
[marco-laptop:28538] [ 4]
./dune_foo(_ZN12ALUGridSpace12LoadBalancer8DataBase11repartitionERNS_14MpAccessGlobalENS1_6methodERSt6vectorIiSaIiEEi+0xaf9)
[0x8642e6f]
[marco-laptop:28538] [ 5]
./dune_foo(_ZN12ALUGridSpace12LoadBalancer8DataBase11repartitionERNS_14MpAccessGlobalENS1_6methodE+0x49)
[0x8642291]
[marco-laptop:28538] [ 6]
./dune_foo(_ZN12ALUGridSpace13GitterDunePll20repartitionMacroGridERNS_12LoadBalancer8DataBaseE+0x36)
[0x86447d8]
[marco-laptop:28538] [ 7]
./dune_foo(_ZN12ALUGridSpace9GitterPll29loadBalancerGridChangesNotifyEv+0x397)
[0x86306a7]
[marco-laptop:28538] [ 8]
./dune_foo(_ZN12ALUGridSpace13GitterDunePll15duneLoadBalanceEv+0x18)
[0x864638a]
[marco-laptop:28538] [ 9]
./dune_foo(_ZN4Dune19ALU3dGridCommHelperILNS_20ALU3dGridElementTypeE7EP19ompi_communicator_tE11loadBalanceERNS_9ALU3dGridILS1_7ES3_EE+0x3d)
[0x84f496d]
[marco-laptop:28538] [10]
./dune_foo(_ZN4Dune9ALU3dGridILNS_20ALU3dGridElementTypeE7EP19ompi_communicator_tE11loadBalanceEv+0x11)
[0x84ed1f1]
[marco-laptop:28538] [11] ./dune_foo(main+0x957) [0x84dafa4]
[marco-laptop:28538] [12]
/lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe6) [0xb7378bd6]
[marco-laptop:28538] [13] ./dune_foo() [0x84da3b1]
[marco-laptop:28538] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 28539 on node marco-laptop
exited on signal 11 (Segmentation fault).



My parameters in alugrid.cfg are 0,1.2,14.
Could anyone help me to understand what is happening, please?? Sincerely
I have no idea!

Thanks a lot for any hint!

Best regards,
Marco



_______________________________________________
Dune mailing list
Dune at dune-project.org
http://lists.dune-project.org/mailman/listinfo/dune






More information about the Dune mailing list