[Dune] Segmentation Fault in loadBalance()

Marco Cisternino marco.cisternino at optimad.it
Wed Oct 3 14:00:06 CEST 2012


I know, Andreas, and I'm sorry for that.
No, I'm not calling loadBalance with the DataHandle, but just loadBalance().
I understand I have to, but I was looking at the grid first. I'll do it. 
I need it.
Anyway, the grid is right balanced using Metis and no Segmentation fault 
is produced.
I still don't know what about data but probably they're not balanced 
because I don't use DataHandle in loadBalancing after refinement.
Then I think it was a matter of Parmetis, isn't it?
Thanks a lot, Andreas.
PS: does alu load balance the grid distributing sons of the same father 
to different processors??

Best regards,
Marco



Il 03/10/2012 13:20, Dedner, Andreas ha scritto:
> Its extremely difficult to tell from the output of the segmentation fault - is there anyway to
> reproduce that for example with the grid-howto code?
> Two questions I do have:
> - you are calling loadBalance with the dataHandle I'm assuming? What I'm wondering about is that
>     you say that you are
>        mapping back from the persistent container and then calling postAdapt
>    and then you
>       call loadBalance again.
>    That will not work because loadBalance also changes the indexSets (you have to think of it
>     as part of the grid modificaion phase, i.e., call it before postAdapt and before moving the data out
>     of the persistentContainer. You need to call loadBalance with the dataHandle objects and that has to
>     work on the persistentContainer.
> - Please try on of the metis methods, e.g., use lugrid.cfg 0,1.2,11
>    There might be a problem with the parmetis bindings (at least I do not have that much experience with
>     parmetis and alu. Perhaps others have?
> Andreas
>
> ________________________________________
> From: dune-bounces+a.s.dedner=warwick.ac.uk at dune-project.org [dune-bounces+a.s.dedner=warwick.ac.uk at dune-project.org] on behalf of Marco Cisternino [marco.cisternino at optimad.it]
> Sent: 03 October 2012 11:58
> To: dune at dune-project.org
> Subject: [Dune] Segmentation Fault in loadBalance()
>
> Good morning,
> I'm experiencing something weird in calling loadBalance() method on a
> ALUCubeGrid<3,3>.
> I build my coarse grid reading from a file with rank 0 and then I call
> the first loadBalance() to distribute the grid among the other processors.
> In this case loadBalance gives no problem.
> Then I refine the grid locally, marking the cells to be refined, calling
> preAdapt(), mapping my data to a persistent container, calling adapt(),
> mapping back from the persistent container and then calling postAdapt.
> At the end of local refinement procedure I call loadBalance again.
> If the refined grid is not unbalanced (globally refining or locally
> refining without getting an unbalanced grid) loadBalance works fine.
> Let me sketch a four elements example (in every element the rank of the
> processor owing it)
>
> ------ ------
> |  0  |   1  |
> ------ ------    Coarse grid
> |  0  |   1  |
> ------ ------
>
> ------ ------ ------ ------
> |  0  |   0  |   1  |   1  |
> ------ ------ ------ ------
> |  0  |   0  |   1  |   1  |
> ------ ------ ------ ------  Globally Refined
> |  0  |   0  |   1  |   1  |
> ------ ------ ------ ------
> |  0  |   0  |   1  |   1  |
> ------ ------ ------ ------
>
>
> ------ ------ ------ ------
> |  0  |   0  |   1  |   1  |
> ------ ------ ------ ------
> |  0  |   0  |   1  |   1  |
> ------ ------ ------ ------  Locally Refined (balanced grid)
> |              |               |
>          0              1
> |              |               |
> ------ ------ ------ ------
>
> But if I refine the grid getting an unbalanced grid,
>
> ------ ------ ------ ------
> |              |   1  |   1  |
>          0       ------ ------
> |              |   1  |   1  |
> ------ ------ ------ ------  Locally Refined (unbalanced grid)
> |              |   1  |   1  |
>          0       ------ ------
> |              |   1  |   1  |
> ------ ------ ------ ------
>
> loadBalance yields a Segmentation Fault and exactly:
>
> std::bad_alloc'
>     what():  std::bad_alloc
> [marco-laptop:27837] *** Process received signal ***
> [marco-laptop:27837] *** Process received signal ***
> [marco-laptop:27837] Signal: Segmentation fault (11)
> [marco-laptop:27837] Signal code: Address not mapped (1)
> [marco-laptop:27837] Failing at address: 0x3b
> [marco-laptop:27837] [ 0] [0xb77c9410]
> [marco-laptop:27837] [ 1] [0xb77c9400]
> [marco-laptop:27837] [ 2] /lib/tls/i686/cmov/libc.so.6(abort+0x182)
> [0xb7443a82]
> [marco-laptop:27837] [ 3]
> /usr/lib/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x14f)
> [0xb768d52f]
> [marco-laptop:27837] [ 4] /usr/lib/libstdc++.so.6(+0xbd465) [0xb768b465]
> [marco-laptop:27837] [ 5] /usr/lib/libstdc++.so.6(+0xbd4a2) [0xb768b4a2]
> [marco-laptop:27837] [ 6] /usr/lib/libstdc++.so.6(+0xbd5e1) [0xb768b5e1]
> [marco-laptop:27837] [ 7] /usr/lib/libstdc++.so.6(_Znwj+0x7f) [0xb768bc5f]
> [marco-laptop:27837] [ 8] /usr/lib/libstdc++.so.6(_Znaj+0x1d) [0xb768bd3d]
> [marco-laptop:27837] [ 9]
> ./dune_foo(_ZN12ALUGridSpace12LoadBalancer8DataBase11repartitionERNS_14MpAccessGlobalENS1_6methodERSt6vectorIiSaIiEEi+0x537)
> [0x864702d]
> [marco-laptop:27837] [10]
> ./dune_foo(_ZN12ALUGridSpace12LoadBalancer8DataBase11repartitionERNS_14MpAccessGlobalENS1_6methodE+0x49)
> [0x8646a11]
> [marco-laptop:27837] [11]
> ./dune_foo(_ZN12ALUGridSpace13GitterDunePll20repartitionMacroGridERNS_12LoadBalancer8DataBaseE+0x36)
> [0x8648f58]
> [marco-laptop:27837] [12]
> ./dune_foo(_ZN12ALUGridSpace9GitterPll29loadBalancerGridChangesNotifyEv+0x397)
> [0x8634e27]
> [marco-laptop:27837] [13]
> ./dune_foo(_ZN12ALUGridSpace13GitterDunePll15duneLoadBalanceEv+0x18)
> [0x864ab0a]
> [marco-laptop:27837] [14]
> ./dune_foo(_ZN4Dune19ALU3dGridCommHelperILNS_20ALU3dGridElementTypeE7EP19ompi_communicator_tE11loadBalanceERNS_9ALU3dGridILS1_7ES3_EE+0x3d)
> [0x84f7c48]
> [marco-laptop:27837] [15]
> ./dune_foo(_ZN4Dune9ALU3dGridILNS_20ALU3dGridElementTypeE7EP19ompi_communicator_tE11loadBalanceEv+0x11)
> [0x84f000d]
> [marco-laptop:27837] [16] ./dune_foo(main+0x957) [0x84dd584]
> [marco-laptop:27837] [17]
> /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe6) [0xb742cbd6]
> [marco-laptop:27837] [18] ./dune_foo() [0x84dc991]
> [marco-laptop:27837] *** End of error message ***
> --------------------------------------------------------------------------
> mpirun noticed that process rank 0 with PID 27837 on node marco-laptop
> exited on signal 11 (Segmentation fault).
>
> If I don't care about my data, avoiding the mapping to/from the
> persistent container the error is different:
>
> [marco-laptop:28539] *** Process received signal ***
> [marco-laptop:28539] Signal: Segmentation fault (11)
> [marco-laptop:28539] Signal code: Address not mapped (1)
> [marco-laptop:28539] Failing at address: 0xe64e57e4
> [marco-laptop:28538] *** Process received signal ***
> [marco-laptop:28538] Signal: Segmentation fault (11)
> [marco-laptop:28538] Signal code: Address not mapped (1)
> [marco-laptop:28538] Failing at address: 0x9b7b000
> [marco-laptop:28539] [ 0] [0xb77df410]
> [marco-laptop:28539] [ 1]
> ./dune_foo(libparmetis__Adaptive_Partition+0x46) [0x8721246]
> [marco-laptop:28539] [ 2] ./dune_foo(ParMETIS_V3_AdaptiveRepart+0x273)
> [0x8721993]
> [marco-laptop:28539] [ 3]
> ./dune_foo(_ZN15ALUGridParMETIS31CALL_ParMETIS_V3_AdaptiveRepartEPiS0_S0_S0_S0_S0_S0_S0_S0_S0_PfS1_S1_S0_S0_S0_PP19ompi_communicator_t+0x81)
> [0x8658d0f]
> [marco-laptop:28539] [ 4]
> ./dune_foo(_ZN12ALUGridSpace12LoadBalancer8DataBase11repartitionERNS_14MpAccessGlobalENS1_6methodERSt6vectorIiSaIiEEi+0xaf9)
> [0x8642e6f]
> [marco-laptop:28539] [ 5]
> ./dune_foo(_ZN12ALUGridSpace12LoadBalancer8DataBase11repartitionERNS_14MpAccessGlobalENS1_6methodE+0x49)
> [0x8642291]
> [marco-laptop:28539] [ 6]
> ./dune_foo(_ZN12ALUGridSpace13GitterDunePll20repartitionMacroGridERNS_12LoadBalancer8DataBaseE+0x36)
> [0x86447d8]
> [marco-laptop:28539] [ 7]
> ./dune_foo(_ZN12ALUGridSpace9GitterPll29loadBalancerGridChangesNotifyEv+0x397)
> [0x86306a7]
> [marco-laptop:28539] [ 8]
> ./dune_foo(_ZN12ALUGridSpace13GitterDunePll15duneLoadBalanceEv+0x18)
> [0x864638a]
> [marco-laptop:28539] [ 9]
> ./dune_foo(_ZN4Dune19ALU3dGridCommHelperILNS_20ALU3dGridElementTypeE7EP19ompi_communicator_tE11loadBalanceERNS_9ALU3dGridILS1_7ES3_EE+0x3d)
> [0x84f496d]
> [marco-laptop:28539] [10]
> ./dune_foo(_ZN4Dune9ALU3dGridILNS_20ALU3dGridElementTypeE7EP19ompi_communicator_tE11loadBalanceEv+0x11)
> [0x84ed1f1]
> [marco-laptop:28539] [11] ./dune_foo(main+0x957) [0x84dafa4]
> [marco-laptop:28539] [12]
> /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe6) [0xb7442bd6]
> [marco-laptop:28539] [13] ./dune_foo() [0x84da3b1]
> [marco-laptop:28539] *** End of error message ***
> [marco-laptop:28538] [ 0] [0xb7715410]
> [marco-laptop:28538] [ 1]
> ./dune_foo(libparmetis__Adaptive_Partition+0x46) [0x8721246]
> [marco-laptop:28538] [ 2] ./dune_foo(ParMETIS_V3_AdaptiveRepart+0x273)
> [0x8721993]
> [marco-laptop:28538] [ 3]
> ./dune_foo(_ZN15ALUGridParMETIS31CALL_ParMETIS_V3_AdaptiveRepartEPiS0_S0_S0_S0_S0_S0_S0_S0_S0_PfS1_S1_S0_S0_S0_PP19ompi_communicator_t+0x81)
> [0x8658d0f]
> [marco-laptop:28538] [ 4]
> ./dune_foo(_ZN12ALUGridSpace12LoadBalancer8DataBase11repartitionERNS_14MpAccessGlobalENS1_6methodERSt6vectorIiSaIiEEi+0xaf9)
> [0x8642e6f]
> [marco-laptop:28538] [ 5]
> ./dune_foo(_ZN12ALUGridSpace12LoadBalancer8DataBase11repartitionERNS_14MpAccessGlobalENS1_6methodE+0x49)
> [0x8642291]
> [marco-laptop:28538] [ 6]
> ./dune_foo(_ZN12ALUGridSpace13GitterDunePll20repartitionMacroGridERNS_12LoadBalancer8DataBaseE+0x36)
> [0x86447d8]
> [marco-laptop:28538] [ 7]
> ./dune_foo(_ZN12ALUGridSpace9GitterPll29loadBalancerGridChangesNotifyEv+0x397)
> [0x86306a7]
> [marco-laptop:28538] [ 8]
> ./dune_foo(_ZN12ALUGridSpace13GitterDunePll15duneLoadBalanceEv+0x18)
> [0x864638a]
> [marco-laptop:28538] [ 9]
> ./dune_foo(_ZN4Dune19ALU3dGridCommHelperILNS_20ALU3dGridElementTypeE7EP19ompi_communicator_tE11loadBalanceERNS_9ALU3dGridILS1_7ES3_EE+0x3d)
> [0x84f496d]
> [marco-laptop:28538] [10]
> ./dune_foo(_ZN4Dune9ALU3dGridILNS_20ALU3dGridElementTypeE7EP19ompi_communicator_tE11loadBalanceEv+0x11)
> [0x84ed1f1]
> [marco-laptop:28538] [11] ./dune_foo(main+0x957) [0x84dafa4]
> [marco-laptop:28538] [12]
> /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe6) [0xb7378bd6]
> [marco-laptop:28538] [13] ./dune_foo() [0x84da3b1]
> [marco-laptop:28538] *** End of error message ***
> --------------------------------------------------------------------------
> mpirun noticed that process rank 1 with PID 28539 on node marco-laptop
> exited on signal 11 (Segmentation fault).
>
>
>
> My parameters in alugrid.cfg are 0,1.2,14.
> Could anyone help me to understand what is happening, please?? Sincerely
> I have no idea!
>
> Thanks a lot for any hint!
>
> Best regards,
> Marco
>
>
>
> _______________________________________________
> Dune mailing list
> Dune at dune-project.org
> http://lists.dune-project.org/mailman/listinfo/dune
>
>

-- 
Marco Cisternino
Optimad Engineering s.r.l.
www.optimad.it
marco.cisternino at optimad.it
+3901119719782





More information about the Dune mailing list