[dune-pdelab] Fwd: Fwd: solver fails to reset correctly after FMatrixError (singular matrix)
Shubhangi Gupta
sgupta at geomar.de
Thu Jul 11 14:56:35 CEST 2019
Dear Jö and Nils,
Thanks a lot for your replies.
I actually tried putting the mpiguard within the time loop (at the
highest level) just to see what happens... Indeed, the one step method
now proceeds as it should, but the BiCGSTab freezes... So yeah, as Jö
mentioned, the mpiguard needs to be introduced inside the ISTL-solver...
I am not very sure how and where exactly though! Any ideas?
Thanks again, and warm wishes, Shubhangi
On 10.07.19 14:52, Jö Fahlke wrote:
> Am Mi, 10. Jul 2019, 14:39:09 +0200 schrieb Nils-Arne Dreier:
>> Hi Shubhangi,
>>
>> I just talked to Jö. We guess that the problem is, that the exception is
>> only thrown on one rank, say rank X. All other ranks do not know that
>> rank X failed and proceed as usual, at some point all these ranks
>> waiting for communication of rank X. That is the deadlock that you see.
>>
>> You may want to have a look at Dune::MPIGuard in
>> dune/common/parallel/mpiguard.hh. It makes it possible to propagate the
>> error state to all ranks.
> It should be mentioned that MPIGuard probably cannot be used at a high level,
> it would probably need to be introduced into the ISTL-Solver (BiCGSTab, AMG,
> SSOR) and/or PEDLab (the parallel scalar product, Newton) for this to work.
> Not sure where exactly.
>
> Regards,
> Jö.
>
>> There is also a merge request for dune-common, which adapts the MPIGuard
>> such that you don't need to check for an error state before
>> communicating, making use of the ULFM proposal for MPI. You can find it
>> here: https://gitlab.dune-project.org/core/dune-common/merge_requests/517
>>
>> If you don't have a MPI implementation that provides a *working* ULFM
>> implementation, you may want to use the blackchannel-ulfm lib:
>> https://gitlab.dune-project.org/exadune/blackchannel-ulfm
>>
>> I hope that helps.
>>
>> Kind regards
>> Nils
>>
>> On 10.07.19 14:07, Shubhangi Gupta wrote:
>>> Hi Jö,
>>>
>>> So, since you asked about the number of ranks... I tried running the
>>> simulations again on 2 processes and 1 process. I get the same problem
>>> with 2, but not with 1.
>>>
>>> On 10.07.19 13:33, Shubhangi Gupta wrote:
>>>> Hi Jö,
>>>>
>>>> Yes, I am running it MPI-parallel, on 4 ranks.
>>>>
>>>> On 10.07.19 13:32, Jö Fahlke wrote:
>>>>> Are you running this MPI-parallel? If yes, how many ranks?
>>>>>
>>>>> Regards, Jö.
>>>>>
>>>>> Am Mi, 10. Jul 2019, 11:55:45 +0200 schrieb Shubhangi Gupta:
>>>>>> Dear pdelab users,
>>>>>>
>>>>>> I am currently experiencing a rather strange problem during parallel
>>>>>> solution of my finite volume code. I have written a short outline
>>>>>> of my code
>>>>>> below for reference.
>>>>>>
>>>>>> At some point during computation, if dune throws an error, the code
>>>>>> catches
>>>>>> this error, resets the solution vector to the old value, halves the
>>>>>> time
>>>>>> step size, and tries to redo the calculation (osm.apply()).
>>>>>>
>>>>>> However, if I get the error "FMatrixError: matrix is singular", the
>>>>>> solver
>>>>>> seems to freeze. Even the initial defect is not shown! (See the
>>>>>> terminal
>>>>>> output below.) I am not sure why this is so, and I have not
>>>>>> experienced this
>>>>>> issue before.
>>>>>>
>>>>>> I will be very thankful if someone can help me figure out a way
>>>>>> around this
>>>>>> problem.
>>>>>>
>>>>>> Thanks, and warm wishes, Shubhangi
>>>>>>
>>>>>>
>>>>>> *// code layout*
>>>>>>
>>>>>> ...UG grid, generated using gmsh, GV, ...
>>>>>>
>>>>>> typedef
>>>>>> Dune::PDELab::QkDGLocalFiniteElementMap<GV::Grid::ctype, double,
>>>>>> 0, dim, Dune::PDELab::QkDGBasisPolynomial::lagrange> FEMP0;
>>>>>> FEMP0 femp0;
>>>>>> typedef
>>>>>> Dune::PDELab::GridFunctionSpace<GV,FEMP0,Dune::PDELab::P0ParallelConstraints,Dune::PDELab::ISTL::VectorBackend<>>
>>>>>> GFS0;
>>>>>> GFS0 gfs0(gv,femp0);
>>>>>> typedef Dune::PDELab::PowerGridFunctionSpace< GFS0,num_of_vars,
>>>>>> Dune::PDELab::ISTL::VectorBackend<Dune::PDELab::ISTL::Blocking::fixed>,
>>>>>>
>>>>>> Dune::PDELab::EntityBlockedOrderingTag> GFS_TCH;
>>>>>>
>>>>>> ... LocalOperator LOP lop, TimeLocalOperator TOP top,
>>>>>> GridOperator GO
>>>>>> go, InstationaryGridOperator IGO igo, ...
>>>>>>
>>>>>> typedef Dune::PDELab::ISTLBackend_BCGS_AMG_SSOR<IGO> LS;
>>>>>> LS ls(gfs,50,1,false,true);
>>>>>> typedef Dune::PDELab::Newton< IGO, LS, U > PDESOLVER;
>>>>>> PDESOLVER pdesolver( igo, ls );
>>>>>> Dune::PDELab::ImplicitEulerParameter<double> method;
>>>>>>
>>>>>> Dune::PDELab::OneStepMethod< double, IGO, PDESOLVER, U, U >
>>>>>> osm( method,
>>>>>> igo, pdesolver );
>>>>>>
>>>>>> //TIME-LOOP
>>>>>> while( time < t_END - 1e-8){
>>>>>> try{
>>>>>> //PDE-SOLVE
>>>>>> osm.apply( time, dt, uold, unew );
>>>>>> exceptionCaught = false;
>>>>>> }catch ( Dune::Exception &e ) {
>>>>>> //RESET
>>>>>> exceptionCaught = true;
>>>>>> std::cout << "Catched Error, Dune reported error:
>>>>>> " << e <<
>>>>>> std::endl;
>>>>>> unew = uold;
>>>>>> dt *= 0.5;
>>>>>> osm.getPDESolver().discardMatrix();
>>>>>> continue;
>>>>>> }
>>>>>> uold = unew;
>>>>>> time += dt;
>>>>>> }
>>>>>>
>>>>>>
>>>>>> *// terminal output showing FMatrixError...*
>>>>>>
>>>>>>
>>>>>> time = 162.632 , time+dt = 164.603 , opTime = 180 , dt : 1.97044
>>>>>>
>>>>>> READY FOR NEXT ITERATION.
>>>>>> _____________________________________________________
>>>>>> current opcount = 2
>>>>>> ****************************
>>>>>> TCH HYDRATE:
>>>>>> ****************************
>>>>>> TIME STEP [implicit Euler] 89 time (from): 1.6263e+02 dt:
>>>>>> 1.9704e+00
>>>>>> time (to): 1.6460e+02
>>>>>> STAGE 1 time (to): 1.6460e+02.
>>>>>> Initial defect: 2.1649e-01
>>>>>> Using a direct coarse solver (SuperLU)
>>>>>> Building hierarchy of 2 levels (inclusive coarse solver) took 0.2195
>>>>>> seconds.
>>>>>> === BiCGSTABSolver
>>>>>> 12.5 6.599e-11
>>>>>> === rate=0.1733, T=1.152, TIT=0.09217, IT=12.5
>>>>>> Newton iteration 1. New defect: 3.4239e-02. Reduction (this):
>>>>>> 1.5816e-01. Reduction (total): 1.5816e-01
>>>>>> Using a direct coarse solver (SuperLU)
>>>>>> Building hierarchy of 2 levels (inclusive coarse solver) took 0.195
>>>>>> seconds.
>>>>>> === BiCGSTABSolver
>>>>>> 17 2.402e-11
>>>>>> === rate=0.2894, T=1.655, TIT=0.09738, IT=17
>>>>>> Newton iteration 2. New defect: 3.9906e+00. Reduction (this):
>>>>>> 1.1655e+02. Reduction (total): 1.8434e+01
>>>>>> Using a direct coarse solver (SuperLU)
>>>>>> Building hierarchy of 2 levels (inclusive coarse solver) took 0.8697
>>>>>> seconds.
>>>>>> === BiCGSTABSolver
>>>>>> Catched Error, Dune reported error: FMatrixError
>>>>>> [luDecomposition:/home/sgupta/dune_2_6/source/dune/dune-common/dune/common/densematrix.hh:909]:
>>>>>> matrix is singular
>>>>>> _____________________________________________________
>>>>>> current opcount = 2
>>>>>> ****************************
>>>>>> TCH HYDRATE:
>>>>>> ****************************
>>>>>> TIME STEP [implicit Euler] 89 time (from): 1.6263e+02 dt:
>>>>>> 9.8522e-01
>>>>>> time (to): 1.6362e+02
>>>>>> STAGE 1 time (to): 1.6362e+02.
>>>>>>
>>>>>> *... nothing happens here... the terminal appears to freeze...*
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Dr. Shubhangi Gupta
>>>>>> Marine Geosystems
>>>>>> GEOMAR Helmholtz Center for Ocean Research
>>>>>> Wischhofstraße 1-3,
>>>>>> D-24148 Kiel
>>>>>>
>>>>>> Room: 12-206
>>>>>> Phone: +49 431 600-1402
>>>>>> Email:sgupta at geomar.de
>>>>>>
>>>>>> _______________________________________________
>>>>>> dune-pdelab mailing list
>>>>>> dune-pdelab at lists.dune-project.org
>>>>>> https://lists.dune-project.org/mailman/listinfo/dune-pdelab
>>
>>
>> _______________________________________________
>> dune-pdelab mailing list
>> dune-pdelab at lists.dune-project.org
>> https://lists.dune-project.org/mailman/listinfo/dune-pdelab
--
Dr. Shubhangi Gupta
Marine Geosystems
GEOMAR Helmholtz Center for Ocean Research
Wischhofstraße 1-3,
D-24148 Kiel
Room: 12-206
Phone: +49 431 600-1402
Email: sgupta at geomar.de
More information about the dune-pdelab
mailing list