[Dune] Segmentation Fault in loadBalance()
Marco Cisternino
marco.cisternino at optimad.it
Wed Oct 3 15:23:59 CEST 2012
Is there a grid implementation able to separate children? UGGrid, for
example?
This could be a strong limit for cases with strongly local refinement,
isn't it?
Any hint for alugrid and parmetis from the others, please?
Thanks again.
Marco
Il 03/10/2012 15:18, Dedner, Andreas ha scritto:
> ALU can only partition on the macro level - so al children of a macro element are
> on one node.
> ________________________________________
> From: Marco Cisternino [marco.cisternino at optimad.it]
> Sent: 03 October 2012 13:00
> To: Dedner, Andreas
> Cc: dune at dune-project.org
> Subject: Re: [Dune] Segmentation Fault in loadBalance()
>
> I know, Andreas, and I'm sorry for that.
> No, I'm not calling loadBalance with the DataHandle, but just loadBalance().
> I understand I have to, but I was looking at the grid first. I'll do it.
> I need it.
> Anyway, the grid is right balanced using Metis and no Segmentation fault
> is produced.
> I still don't know what about data but probably they're not balanced
> because I don't use DataHandle in loadBalancing after refinement.
> Then I think it was a matter of Parmetis, isn't it?
> Thanks a lot, Andreas.
> PS: does alu load balance the grid distributing sons of the same father
> to different processors??
>
> Best regards,
> Marco
>
>
>
> Il 03/10/2012 13:20, Dedner, Andreas ha scritto:
>> Its extremely difficult to tell from the output of the segmentation fault - is there anyway to
>> reproduce that for example with the grid-howto code?
>> Two questions I do have:
>> - you are calling loadBalance with the dataHandle I'm assuming? What I'm wondering about is that
>> you say that you are
>> mapping back from the persistent container and then calling postAdapt
>> and then you
>> call loadBalance again.
>> That will not work because loadBalance also changes the indexSets (you have to think of it
>> as part of the grid modificaion phase, i.e., call it before postAdapt and before moving the data out
>> of the persistentContainer. You need to call loadBalance with the dataHandle objects and that has to
>> work on the persistentContainer.
>> - Please try on of the metis methods, e.g., use lugrid.cfg 0,1.2,11
>> There might be a problem with the parmetis bindings (at least I do not have that much experience with
>> parmetis and alu. Perhaps others have?
>> Andreas
>>
>> ________________________________________
>> From: dune-bounces+a.s.dedner=warwick.ac.uk at dune-project.org [dune-bounces+a.s.dedner=warwick.ac.uk at dune-project.org] on behalf of Marco Cisternino [marco.cisternino at optimad.it]
>> Sent: 03 October 2012 11:58
>> To: dune at dune-project.org
>> Subject: [Dune] Segmentation Fault in loadBalance()
>>
>> Good morning,
>> I'm experiencing something weird in calling loadBalance() method on a
>> ALUCubeGrid<3,3>.
>> I build my coarse grid reading from a file with rank 0 and then I call
>> the first loadBalance() to distribute the grid among the other processors.
>> In this case loadBalance gives no problem.
>> Then I refine the grid locally, marking the cells to be refined, calling
>> preAdapt(), mapping my data to a persistent container, calling adapt(),
>> mapping back from the persistent container and then calling postAdapt.
>> At the end of local refinement procedure I call loadBalance again.
>> If the refined grid is not unbalanced (globally refining or locally
>> refining without getting an unbalanced grid) loadBalance works fine.
>> Let me sketch a four elements example (in every element the rank of the
>> processor owing it)
>>
>> ------ ------
>> | 0 | 1 |
>> ------ ------ Coarse grid
>> | 0 | 1 |
>> ------ ------
>>
>> ------ ------ ------ ------
>> | 0 | 0 | 1 | 1 |
>> ------ ------ ------ ------
>> | 0 | 0 | 1 | 1 |
>> ------ ------ ------ ------ Globally Refined
>> | 0 | 0 | 1 | 1 |
>> ------ ------ ------ ------
>> | 0 | 0 | 1 | 1 |
>> ------ ------ ------ ------
>>
>>
>> ------ ------ ------ ------
>> | 0 | 0 | 1 | 1 |
>> ------ ------ ------ ------
>> | 0 | 0 | 1 | 1 |
>> ------ ------ ------ ------ Locally Refined (balanced grid)
>> | | |
>> 0 1
>> | | |
>> ------ ------ ------ ------
>>
>> But if I refine the grid getting an unbalanced grid,
>>
>> ------ ------ ------ ------
>> | | 1 | 1 |
>> 0 ------ ------
>> | | 1 | 1 |
>> ------ ------ ------ ------ Locally Refined (unbalanced grid)
>> | | 1 | 1 |
>> 0 ------ ------
>> | | 1 | 1 |
>> ------ ------ ------ ------
>>
>> loadBalance yields a Segmentation Fault and exactly:
>>
>> std::bad_alloc'
>> what(): std::bad_alloc
>> [marco-laptop:27837] *** Process received signal ***
>> [marco-laptop:27837] *** Process received signal ***
>> [marco-laptop:27837] Signal: Segmentation fault (11)
>> [marco-laptop:27837] Signal code: Address not mapped (1)
>> [marco-laptop:27837] Failing at address: 0x3b
>> [marco-laptop:27837] [ 0] [0xb77c9410]
>> [marco-laptop:27837] [ 1] [0xb77c9400]
>> [marco-laptop:27837] [ 2] /lib/tls/i686/cmov/libc.so.6(abort+0x182)
>> [0xb7443a82]
>> [marco-laptop:27837] [ 3]
>> /usr/lib/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x14f)
>> [0xb768d52f]
>> [marco-laptop:27837] [ 4] /usr/lib/libstdc++.so.6(+0xbd465) [0xb768b465]
>> [marco-laptop:27837] [ 5] /usr/lib/libstdc++.so.6(+0xbd4a2) [0xb768b4a2]
>> [marco-laptop:27837] [ 6] /usr/lib/libstdc++.so.6(+0xbd5e1) [0xb768b5e1]
>> [marco-laptop:27837] [ 7] /usr/lib/libstdc++.so.6(_Znwj+0x7f) [0xb768bc5f]
>> [marco-laptop:27837] [ 8] /usr/lib/libstdc++.so.6(_Znaj+0x1d) [0xb768bd3d]
>> [marco-laptop:27837] [ 9]
>> ./dune_foo(_ZN12ALUGridSpace12LoadBalancer8DataBase11repartitionERNS_14MpAccessGlobalENS1_6methodERSt6vectorIiSaIiEEi+0x537)
>> [0x864702d]
>> [marco-laptop:27837] [10]
>> ./dune_foo(_ZN12ALUGridSpace12LoadBalancer8DataBase11repartitionERNS_14MpAccessGlobalENS1_6methodE+0x49)
>> [0x8646a11]
>> [marco-laptop:27837] [11]
>> ./dune_foo(_ZN12ALUGridSpace13GitterDunePll20repartitionMacroGridERNS_12LoadBalancer8DataBaseE+0x36)
>> [0x8648f58]
>> [marco-laptop:27837] [12]
>> ./dune_foo(_ZN12ALUGridSpace9GitterPll29loadBalancerGridChangesNotifyEv+0x397)
>> [0x8634e27]
>> [marco-laptop:27837] [13]
>> ./dune_foo(_ZN12ALUGridSpace13GitterDunePll15duneLoadBalanceEv+0x18)
>> [0x864ab0a]
>> [marco-laptop:27837] [14]
>> ./dune_foo(_ZN4Dune19ALU3dGridCommHelperILNS_20ALU3dGridElementTypeE7EP19ompi_communicator_tE11loadBalanceERNS_9ALU3dGridILS1_7ES3_EE+0x3d)
>> [0x84f7c48]
>> [marco-laptop:27837] [15]
>> ./dune_foo(_ZN4Dune9ALU3dGridILNS_20ALU3dGridElementTypeE7EP19ompi_communicator_tE11loadBalanceEv+0x11)
>> [0x84f000d]
>> [marco-laptop:27837] [16] ./dune_foo(main+0x957) [0x84dd584]
>> [marco-laptop:27837] [17]
>> /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe6) [0xb742cbd6]
>> [marco-laptop:27837] [18] ./dune_foo() [0x84dc991]
>> [marco-laptop:27837] *** End of error message ***
>> --------------------------------------------------------------------------
>> mpirun noticed that process rank 0 with PID 27837 on node marco-laptop
>> exited on signal 11 (Segmentation fault).
>>
>> If I don't care about my data, avoiding the mapping to/from the
>> persistent container the error is different:
>>
>> [marco-laptop:28539] *** Process received signal ***
>> [marco-laptop:28539] Signal: Segmentation fault (11)
>> [marco-laptop:28539] Signal code: Address not mapped (1)
>> [marco-laptop:28539] Failing at address: 0xe64e57e4
>> [marco-laptop:28538] *** Process received signal ***
>> [marco-laptop:28538] Signal: Segmentation fault (11)
>> [marco-laptop:28538] Signal code: Address not mapped (1)
>> [marco-laptop:28538] Failing at address: 0x9b7b000
>> [marco-laptop:28539] [ 0] [0xb77df410]
>> [marco-laptop:28539] [ 1]
>> ./dune_foo(libparmetis__Adaptive_Partition+0x46) [0x8721246]
>> [marco-laptop:28539] [ 2] ./dune_foo(ParMETIS_V3_AdaptiveRepart+0x273)
>> [0x8721993]
>> [marco-laptop:28539] [ 3]
>> ./dune_foo(_ZN15ALUGridParMETIS31CALL_ParMETIS_V3_AdaptiveRepartEPiS0_S0_S0_S0_S0_S0_S0_S0_S0_PfS1_S1_S0_S0_S0_PP19ompi_communicator_t+0x81)
>> [0x8658d0f]
>> [marco-laptop:28539] [ 4]
>> ./dune_foo(_ZN12ALUGridSpace12LoadBalancer8DataBase11repartitionERNS_14MpAccessGlobalENS1_6methodERSt6vectorIiSaIiEEi+0xaf9)
>> [0x8642e6f]
>> [marco-laptop:28539] [ 5]
>> ./dune_foo(_ZN12ALUGridSpace12LoadBalancer8DataBase11repartitionERNS_14MpAccessGlobalENS1_6methodE+0x49)
>> [0x8642291]
>> [marco-laptop:28539] [ 6]
>> ./dune_foo(_ZN12ALUGridSpace13GitterDunePll20repartitionMacroGridERNS_12LoadBalancer8DataBaseE+0x36)
>> [0x86447d8]
>> [marco-laptop:28539] [ 7]
>> ./dune_foo(_ZN12ALUGridSpace9GitterPll29loadBalancerGridChangesNotifyEv+0x397)
>> [0x86306a7]
>> [marco-laptop:28539] [ 8]
>> ./dune_foo(_ZN12ALUGridSpace13GitterDunePll15duneLoadBalanceEv+0x18)
>> [0x864638a]
>> [marco-laptop:28539] [ 9]
>> ./dune_foo(_ZN4Dune19ALU3dGridCommHelperILNS_20ALU3dGridElementTypeE7EP19ompi_communicator_tE11loadBalanceERNS_9ALU3dGridILS1_7ES3_EE+0x3d)
>> [0x84f496d]
>> [marco-laptop:28539] [10]
>> ./dune_foo(_ZN4Dune9ALU3dGridILNS_20ALU3dGridElementTypeE7EP19ompi_communicator_tE11loadBalanceEv+0x11)
>> [0x84ed1f1]
>> [marco-laptop:28539] [11] ./dune_foo(main+0x957) [0x84dafa4]
>> [marco-laptop:28539] [12]
>> /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe6) [0xb7442bd6]
>> [marco-laptop:28539] [13] ./dune_foo() [0x84da3b1]
>> [marco-laptop:28539] *** End of error message ***
>> [marco-laptop:28538] [ 0] [0xb7715410]
>> [marco-laptop:28538] [ 1]
>> ./dune_foo(libparmetis__Adaptive_Partition+0x46) [0x8721246]
>> [marco-laptop:28538] [ 2] ./dune_foo(ParMETIS_V3_AdaptiveRepart+0x273)
>> [0x8721993]
>> [marco-laptop:28538] [ 3]
>> ./dune_foo(_ZN15ALUGridParMETIS31CALL_ParMETIS_V3_AdaptiveRepartEPiS0_S0_S0_S0_S0_S0_S0_S0_S0_PfS1_S1_S0_S0_S0_PP19ompi_communicator_t+0x81)
>> [0x8658d0f]
>> [marco-laptop:28538] [ 4]
>> ./dune_foo(_ZN12ALUGridSpace12LoadBalancer8DataBase11repartitionERNS_14MpAccessGlobalENS1_6methodERSt6vectorIiSaIiEEi+0xaf9)
>> [0x8642e6f]
>> [marco-laptop:28538] [ 5]
>> ./dune_foo(_ZN12ALUGridSpace12LoadBalancer8DataBase11repartitionERNS_14MpAccessGlobalENS1_6methodE+0x49)
>> [0x8642291]
>> [marco-laptop:28538] [ 6]
>> ./dune_foo(_ZN12ALUGridSpace13GitterDunePll20repartitionMacroGridERNS_12LoadBalancer8DataBaseE+0x36)
>> [0x86447d8]
>> [marco-laptop:28538] [ 7]
>> ./dune_foo(_ZN12ALUGridSpace9GitterPll29loadBalancerGridChangesNotifyEv+0x397)
>> [0x86306a7]
>> [marco-laptop:28538] [ 8]
>> ./dune_foo(_ZN12ALUGridSpace13GitterDunePll15duneLoadBalanceEv+0x18)
>> [0x864638a]
>> [marco-laptop:28538] [ 9]
>> ./dune_foo(_ZN4Dune19ALU3dGridCommHelperILNS_20ALU3dGridElementTypeE7EP19ompi_communicator_tE11loadBalanceERNS_9ALU3dGridILS1_7ES3_EE+0x3d)
>> [0x84f496d]
>> [marco-laptop:28538] [10]
>> ./dune_foo(_ZN4Dune9ALU3dGridILNS_20ALU3dGridElementTypeE7EP19ompi_communicator_tE11loadBalanceEv+0x11)
>> [0x84ed1f1]
>> [marco-laptop:28538] [11] ./dune_foo(main+0x957) [0x84dafa4]
>> [marco-laptop:28538] [12]
>> /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe6) [0xb7378bd6]
>> [marco-laptop:28538] [13] ./dune_foo() [0x84da3b1]
>> [marco-laptop:28538] *** End of error message ***
>> --------------------------------------------------------------------------
>> mpirun noticed that process rank 1 with PID 28539 on node marco-laptop
>> exited on signal 11 (Segmentation fault).
>>
>>
>>
>> My parameters in alugrid.cfg are 0,1.2,14.
>> Could anyone help me to understand what is happening, please?? Sincerely
>> I have no idea!
>>
>> Thanks a lot for any hint!
>>
>> Best regards,
>> Marco
>>
>>
>>
>> _______________________________________________
>> Dune mailing list
>> Dune at dune-project.org
>> http://lists.dune-project.org/mailman/listinfo/dune
>>
>>
> --
> Marco Cisternino
> Optimad Engineering s.r.l.
> www.optimad.it
> marco.cisternino at optimad.it
> +3901119719782
>
>
>
--
Marco Cisternino
Optimad Engineering s.r.l.
www.optimad.it
marco.cisternino at optimad.it
+3901119719782
More information about the Dune
mailing list