[dune-pdelab] Fwd: Fwd: solver fails to reset correctly after FMatrixError (singular matrix)

Shubhangi Gupta sgupta at geomar.de
Thu Jul 25 15:49:58 CEST 2019


Hi Jö,

Thanks again for your reply... I am wondering whether the current master 
version of dune common also includes these ulfm changes?

Best wishes, Shubhangi


On 24.07.19 11:55, Jö Fahlke wrote:
> Am Mi, 24. Jul 2019, 10:36:41 +0200 schrieb Shubhangi Gupta:
>> Date: Wed, 24 Jul 2019 10:36:41 +0200
>> From: Shubhangi Gupta <sgupta at geomar.de>
>> To: Jö Fahlke <jorrit at jorrit.de>
>> Cc: dune-pdelab at lists.dune-project.org
>> Subject: Re: [dune-pdelab] Fwd: Fwd: solver fails to reset correctly after
>>   FMatrixError (singular matrix)
>> X-MGA-submission: MDF026Y98A3si49SufLKx2BJDaOgSR2RFTj31H7fDBsniUDZQLylim0vaZOCJ350ICInSmFF5GR28Z7yqF/dDhhKi/Kmq067wU1oEqyKiizDxkAT19P3XITGuGngAPM774oA2MxLsUt6EeSzX/QKn8AR
>> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101
>>   Thunderbird/60.7.2
>> Content-Language: en-US
>>
>> Hi Nils,
>>
>> Thanks a lot! I managed to install blackchannel-ulfm. While building Dune
>> with following CMake opts:
>>
>> CMAKE_FLAGS="
>> -DCMAKE_C_COMPILER='/usr/bin/gcc'
>> -DCMAKE_CXX_COMPILER='/usr/bin/g++-7'
>> -DCMAKE_Fortran_COMPILER='/usr/bin/gfortran'
>> -DBLACKCHANNEL_INCLUDE_DIR='/usr/local/include'
>> -DBLACKCHANNEL_LIBRARIES='/usr/local/lib'
>> -DCMAKE_CXX_FLAGS_RELEASE='-O3 -DNDEBUG -g0 -Wno-deprecated-declarations
>> -funroll-loops'
>> -DCMAKE_BUILD_TYPE=Release
>> -DDUNE_SYMLINK_TO_SOURCE_TREE=1
>> "
>>
>> I get the following message:
>>
>>    Manually-specified variables were not used by the project:
>>
>>      BLACKCHANNEL_INCLUDE_DIR
>>      BLACKCHANNEL_LIBRARIES
>>
>>
>> How can I check (or force) whether Dune indeed finds the blackchannel ?
> If those variables were not used, you probably need to switch the Dune-common
> branch to that of
> https://gitlab.dune-project.org/core/dune-common/merge_requests/517.  Or at
> least something that includes the changes from that MR.
>
> The CMake output during "dunecontrol configure" will probably say something
> about blackchannel (whether it is even looking for it, and whether it was
> found).
>
> Note however:  All what is needed is the revoke functionality from ULFM, if
> your MPI already includes that, you may not need blackchannel at all (and
> "dunecontrol configure" probably won't look for it).
>
> Regards,
> Jö.
>
>> Thanks again, and warm wishes, Shubhangi
>>
>>
>> On 23.07.19 16:45, Jö Fahlke wrote:
>>> Am Di, 23. Jul 2019, 15:26:39 +0200 schrieb Shubhangi Gupta:
>>>> Sorry, I am still struggling with this issue... and my BiCGStab solver is
>>>> freezing a lot more often so I cant ignore this ..
>>>>
>>>> About the ULFM... you sent me the following link:
>>>>
>>>> https://gitlab.dune-project.org/exadune/blackchannel-ulfm
>>> That is a (more-or-less) standard cmake buildsystem, i.e. it works outside of
>>> dune.  Try something like this (untested, replace the "..." as needed):
>>> ```sh
>>> git clone https://gitlab.dune-project.org/exadune/blackchannel-ulfm
>>> mkdir build
>>> ( cd build && cmake ../blackchannel-ulfm -DCMAKE_INSTALL_PREFIX=... )
>>> make -C build install
>>> ```
>>>
>>> Then, in your Dune opts file, you may need to set
>>> `-DBLACKCHANNEL_INCLUDE_DIR=.../include -DBLACKCHANNEL_LIBRARIES=.../lib` (see
>>> [1]) in the `CMAKE_FLAGS` and Dune should pick the library up when
>>> reconfiguring.
>>>
>>> [1]: https://gitlab.dune-project.org/core/dune-common/blob/edef55ec9ed40617d12648d6ec95cbfc7120c676/cmake/modules/FindBlackChannel.cmake
>>>
>>> Regards,
>>> Jö.
>>>
>>>> Sorry if this is a trivial question, but how should I compile this? With
>>>> dune-build? and how should I include this in my code?
>>>>
>>>> Thanks, and warm wishes, Shubhangi
>>>>
>>>>
>>>> On 12.07.19 13:38, Nils-Arne Dreier wrote:
>>>>> Hi Shubhangi,
>>>>>
>>>>> you have to call the MPIGuard::finalize() method after that point, where
>>>>> the exception might be thrown and before the next communication is
>>>>> performed. From the information, you provided, I guess that the
>>>>> exception is thrown in the smoother of the AMG. Which makes things
>>>>> slightly complicated. Maybe AMG::mgc is a good starting point.
>>>>>
>>>>> By the way: If you use the ULFM things I described previously you can
>>>>> use the MPIGuard on the coarsest level and don't need to call
>>>>> MPIGuard::finalize() after every critical section.
>>>>>
>>>>> Regards
>>>>> Nils
>>>>>
>>>>> On 11.07.19 14:56, Shubhangi Gupta wrote:
>>>>>> Dear Jö and Nils,
>>>>>>
>>>>>> Thanks a lot for your replies.
>>>>>>
>>>>>> I actually tried putting the mpiguard within the time loop (at the
>>>>>> highest level) just to see what happens... Indeed, the one step method
>>>>>> now proceeds as it should, but the BiCGSTab freezes... So yeah, as Jö
>>>>>> mentioned, the mpiguard needs to be introduced inside the
>>>>>> ISTL-solver... I am not very sure how and where exactly though! Any
>>>>>> ideas?
>>>>>>
>>>>>> Thanks again, and warm wishes, Shubhangi
>>>>>>
>>>>>> On 10.07.19 14:52, Jö Fahlke wrote:
>>>>>>> Am Mi, 10. Jul 2019, 14:39:09 +0200 schrieb Nils-Arne Dreier:
>>>>>>>> Hi Shubhangi,
>>>>>>>>
>>>>>>>> I just talked to Jö. We guess that the problem is, that the
>>>>>>>> exception is
>>>>>>>> only thrown on one rank, say rank X. All other ranks do not know that
>>>>>>>> rank X failed and proceed as usual, at some point all these ranks
>>>>>>>> waiting for communication of rank X. That is the deadlock that you see.
>>>>>>>>
>>>>>>>> You may want to have a look at Dune::MPIGuard in
>>>>>>>> dune/common/parallel/mpiguard.hh. It makes it possible to propagate the
>>>>>>>> error state to all ranks.
>>>>>>> It should be mentioned that MPIGuard probably cannot be used at a
>>>>>>> high level,
>>>>>>> it would probably need to be introduced into the ISTL-Solver
>>>>>>> (BiCGSTab, AMG,
>>>>>>> SSOR) and/or PEDLab (the parallel scalar product, Newton) for this to
>>>>>>> work.
>>>>>>> Not sure where exactly.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Jö.
>>>>>>>
>>>>>>>> There is also a merge request for dune-common, which adapts the
>>>>>>>> MPIGuard
>>>>>>>> such that you don't need to check for an error state before
>>>>>>>> communicating, making use of the ULFM proposal for MPI. You can find it
>>>>>>>> here:
>>>>>>>> https://gitlab.dune-project.org/core/dune-common/merge_requests/517
>>>>>>>>
>>>>>>>> If you don't have a MPI implementation that provides a *working* ULFM
>>>>>>>> implementation, you may want to use the blackchannel-ulfm lib:
>>>>>>>> https://gitlab.dune-project.org/exadune/blackchannel-ulfm
>>>>>>>>
>>>>>>>> I hope that helps.
>>>>>>>>
>>>>>>>> Kind regards
>>>>>>>> Nils
>>>>>>>>
>>>>>>>> On 10.07.19 14:07, Shubhangi Gupta wrote:
>>>>>>>>> Hi Jö,
>>>>>>>>>
>>>>>>>>> So, since you asked about the number of ranks... I tried running the
>>>>>>>>> simulations again on 2 processes and 1 process. I get the same problem
>>>>>>>>> with 2, but not with 1.
>>>>>>>>>
>>>>>>>>> On 10.07.19 13:33, Shubhangi Gupta wrote:
>>>>>>>>>> Hi Jö,
>>>>>>>>>>
>>>>>>>>>> Yes, I am running it MPI-parallel, on 4 ranks.
>>>>>>>>>>
>>>>>>>>>> On 10.07.19 13:32, Jö Fahlke wrote:
>>>>>>>>>>> Are you running this MPI-parallel?  If yes, how many ranks?
>>>>>>>>>>>
>>>>>>>>>>> Regards, Jö.
>>>>>>>>>>>
>>>>>>>>>>> Am Mi, 10. Jul 2019, 11:55:45 +0200 schrieb Shubhangi Gupta:
>>>>>>>>>>>> Dear pdelab users,
>>>>>>>>>>>>
>>>>>>>>>>>> I am currently experiencing a rather strange problem during
>>>>>>>>>>>> parallel
>>>>>>>>>>>> solution of my finite volume code. I have written a short outline
>>>>>>>>>>>> of my code
>>>>>>>>>>>> below for reference.
>>>>>>>>>>>>
>>>>>>>>>>>> At some point during computation, if dune throws an error, the code
>>>>>>>>>>>> catches
>>>>>>>>>>>> this error, resets the solution vector to the old value, halves the
>>>>>>>>>>>> time
>>>>>>>>>>>> step size, and tries to redo the calculation (osm.apply()).
>>>>>>>>>>>>
>>>>>>>>>>>> However, if I get the error "FMatrixError: matrix is singular", the
>>>>>>>>>>>> solver
>>>>>>>>>>>> seems to freeze. Even the initial defect is not shown! (See the
>>>>>>>>>>>> terminal
>>>>>>>>>>>> output below.) I am not sure why this is so, and I have not
>>>>>>>>>>>> experienced this
>>>>>>>>>>>> issue before.
>>>>>>>>>>>>
>>>>>>>>>>>> I will be very thankful if someone can help me figure out a way
>>>>>>>>>>>> around this
>>>>>>>>>>>> problem.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks, and warm wishes, Shubhangi
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> *// code layout*
>>>>>>>>>>>>
>>>>>>>>>>>>          ...UG grid, generated using gmsh, GV, ...
>>>>>>>>>>>>
>>>>>>>>>>>>          typedef
>>>>>>>>>>>> Dune::PDELab::QkDGLocalFiniteElementMap<GV::Grid::ctype, double,
>>>>>>>>>>>> 0, dim, Dune::PDELab::QkDGBasisPolynomial::lagrange> FEMP0;
>>>>>>>>>>>>          FEMP0 femp0;
>>>>>>>>>>>>          typedef
>>>>>>>>>>>> Dune::PDELab::GridFunctionSpace<GV,FEMP0,Dune::PDELab::P0ParallelConstraints,Dune::PDELab::ISTL::VectorBackend<>>
>>>>>>>>>>>>
>>>>>>>>>>>> GFS0;
>>>>>>>>>>>>          GFS0 gfs0(gv,femp0);
>>>>>>>>>>>>          typedef Dune::PDELab::PowerGridFunctionSpace<
>>>>>>>>>>>> GFS0,num_of_vars,
>>>>>>>>>>>> Dune::PDELab::ISTL::VectorBackend<Dune::PDELab::ISTL::Blocking::fixed>,
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Dune::PDELab::EntityBlockedOrderingTag> GFS_TCH;
>>>>>>>>>>>>
>>>>>>>>>>>>          ... LocalOperator LOP lop, TimeLocalOperator TOP top,
>>>>>>>>>>>> GridOperator GO
>>>>>>>>>>>> go, InstationaryGridOperator IGO igo, ...
>>>>>>>>>>>>
>>>>>>>>>>>>          typedef Dune::PDELab::ISTLBackend_BCGS_AMG_SSOR<IGO> LS;
>>>>>>>>>>>>          LS ls(gfs,50,1,false,true);
>>>>>>>>>>>>          typedef Dune::PDELab::Newton< IGO, LS, U > PDESOLVER;
>>>>>>>>>>>>          PDESOLVER pdesolver( igo, ls );
>>>>>>>>>>>> Dune::PDELab::ImplicitEulerParameter<double> method;
>>>>>>>>>>>>
>>>>>>>>>>>>          Dune::PDELab::OneStepMethod< double, IGO, PDESOLVER, U, U >
>>>>>>>>>>>> osm( method,
>>>>>>>>>>>> igo, pdesolver );
>>>>>>>>>>>>
>>>>>>>>>>>>          //TIME-LOOP
>>>>>>>>>>>>          while( time < t_END - 1e-8){
>>>>>>>>>>>>                  try{
>>>>>>>>>>>>                      //PDE-SOLVE
>>>>>>>>>>>>                      osm.apply( time, dt, uold, unew );
>>>>>>>>>>>>                      exceptionCaught = false;
>>>>>>>>>>>>                  }catch ( Dune::Exception &e ) {
>>>>>>>>>>>>                      //RESET
>>>>>>>>>>>>                      exceptionCaught = true;
>>>>>>>>>>>>                      std::cout << "Catched Error, Dune reported error:
>>>>>>>>>>>> " << e <<
>>>>>>>>>>>> std::endl;
>>>>>>>>>>>>                      unew = uold;
>>>>>>>>>>>>                      dt *= 0.5;
>>>>>>>>>>>> osm.getPDESolver().discardMatrix();
>>>>>>>>>>>>                      continue;
>>>>>>>>>>>>                  }
>>>>>>>>>>>>                  uold = unew;
>>>>>>>>>>>>                  time += dt;
>>>>>>>>>>>>          }
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> *// terminal output showing FMatrixError...*
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>       time = 162.632 , time+dt = 164.603 , opTime = 180 , dt  :
>>>>>>>>>>>> 1.97044
>>>>>>>>>>>>
>>>>>>>>>>>>       READY FOR NEXT ITERATION.
>>>>>>>>>>>> _____________________________________________________
>>>>>>>>>>>>       current opcount = 2
>>>>>>>>>>>> ****************************
>>>>>>>>>>>> TCH HYDRATE:
>>>>>>>>>>>> ****************************
>>>>>>>>>>>> TIME STEP [implicit Euler]     89 time (from):   1.6263e+02 dt:
>>>>>>>>>>>> 1.9704e+00
>>>>>>>>>>>> time (to):   1.6460e+02
>>>>>>>>>>>> STAGE 1 time (to):   1.6460e+02.
>>>>>>>>>>>>        Initial defect:   2.1649e-01
>>>>>>>>>>>> Using a direct coarse solver (SuperLU)
>>>>>>>>>>>> Building hierarchy of 2 levels (inclusive coarse solver) took
>>>>>>>>>>>> 0.2195
>>>>>>>>>>>> seconds.
>>>>>>>>>>>> === BiCGSTABSolver
>>>>>>>>>>>>       12.5        6.599e-11
>>>>>>>>>>>> === rate=0.1733, T=1.152, TIT=0.09217, IT=12.5
>>>>>>>>>>>>        Newton iteration  1.  New defect:   3.4239e-02.  Reduction
>>>>>>>>>>>> (this):
>>>>>>>>>>>> 1.5816e-01.  Reduction (total):   1.5816e-01
>>>>>>>>>>>> Using a direct coarse solver (SuperLU)
>>>>>>>>>>>> Building hierarchy of 2 levels (inclusive coarse solver) took 0.195
>>>>>>>>>>>> seconds.
>>>>>>>>>>>> === BiCGSTABSolver
>>>>>>>>>>>>         17        2.402e-11
>>>>>>>>>>>> === rate=0.2894, T=1.655, TIT=0.09738, IT=17
>>>>>>>>>>>>        Newton iteration  2.  New defect:   3.9906e+00.  Reduction
>>>>>>>>>>>> (this):
>>>>>>>>>>>> 1.1655e+02.  Reduction (total):   1.8434e+01
>>>>>>>>>>>> Using a direct coarse solver (SuperLU)
>>>>>>>>>>>> Building hierarchy of 2 levels (inclusive coarse solver) took
>>>>>>>>>>>> 0.8697
>>>>>>>>>>>> seconds.
>>>>>>>>>>>> === BiCGSTABSolver
>>>>>>>>>>>> Catched Error, Dune reported error: FMatrixError
>>>>>>>>>>>> [luDecomposition:/home/sgupta/dune_2_6/source/dune/dune-common/dune/common/densematrix.hh:909]:
>>>>>>>>>>>>
>>>>>>>>>>>> matrix is singular
>>>>>>>>>>>> _____________________________________________________
>>>>>>>>>>>>       current opcount = 2
>>>>>>>>>>>> ****************************
>>>>>>>>>>>> TCH HYDRATE:
>>>>>>>>>>>> ****************************
>>>>>>>>>>>> TIME STEP [implicit Euler]     89 time (from):   1.6263e+02 dt:
>>>>>>>>>>>> 9.8522e-01
>>>>>>>>>>>> time (to):   1.6362e+02
>>>>>>>>>>>> STAGE 1 time (to):   1.6362e+02.
>>>>>>>>>>>>
>>>>>>>>>>>> *... nothing happens here... the terminal appears to freeze...*
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> -- 
>>>>>>>>>>>> Dr. Shubhangi Gupta
>>>>>>>>>>>> Marine Geosystems
>>>>>>>>>>>> GEOMAR Helmholtz Center for Ocean Research
>>>>>>>>>>>> Wischhofstraße 1-3,
>>>>>>>>>>>> D-24148 Kiel
>>>>>>>>>>>>
>>>>>>>>>>>> Room: 12-206
>>>>>>>>>>>> Phone: +49 431 600-1402
>>>>>>>>>>>> Email:sgupta at geomar.de
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> dune-pdelab mailing list
>>>>>>>>>>>> dune-pdelab at lists.dune-project.org
>>>>>>>>>>>> https://lists.dune-project.org/mailman/listinfo/dune-pdelab
>>>>>>>> _______________________________________________
>>>>>>>> dune-pdelab mailing list
>>>>>>>> dune-pdelab at lists.dune-project.org
>>>>>>>> https://lists.dune-project.org/mailman/listinfo/dune-pdelab
>>>> -- 
>>>> Dr. Shubhangi Gupta
>>>> Marine Geosystems
>>>> GEOMAR Helmholtz Center for Ocean Research
>>>> Wischhofstraße 1-3,
>>>> D-24148 Kiel
>>>>
>>>> Room: 12-206
>>>> Phone: +49 431 600-1402
>>>> Email: sgupta at geomar.de
>>>>
>>>>
>>>> _______________________________________________
>>>> dune-pdelab mailing list
>>>> dune-pdelab at lists.dune-project.org
>>>> https://lists.dune-project.org/mailman/listinfo/dune-pdelab
>> -- 
>> Dr. Shubhangi Gupta
>> Marine Geosystems
>> GEOMAR Helmholtz Center for Ocean Research
>> Wischhofstraße 1-3,
>> D-24148 Kiel
>>
>> Room: 12-206
>> Phone: +49 431 600-1402
>> Email: sgupta at geomar.de
>>
>> _______________________________________________
>> dune-pdelab mailing list
>> dune-pdelab at lists.dune-project.org
>> https://lists.dune-project.org/mailman/listinfo/dune-pdelab
>
-- 
Dr. Shubhangi Gupta
Marine Geosystems
GEOMAR Helmholtz Center for Ocean Research
Wischhofstraße 1-3,
D-24148 Kiel

Room: 12-206
Phone: +49 431 600-1402
Email: sgupta at geomar.de





More information about the dune-pdelab mailing list