[dune-pdelab] Fwd: Fwd: solver fails to reset correctly after FMatrixError (singular matrix)
Jö Fahlke
jorrit at jorrit.de
Wed Jul 10 14:52:16 CEST 2019
Am Mi, 10. Jul 2019, 14:39:09 +0200 schrieb Nils-Arne Dreier:
> Hi Shubhangi,
>
> I just talked to Jö. We guess that the problem is, that the exception is
> only thrown on one rank, say rank X. All other ranks do not know that
> rank X failed and proceed as usual, at some point all these ranks
> waiting for communication of rank X. That is the deadlock that you see.
>
> You may want to have a look at Dune::MPIGuard in
> dune/common/parallel/mpiguard.hh. It makes it possible to propagate the
> error state to all ranks.
It should be mentioned that MPIGuard probably cannot be used at a high level,
it would probably need to be introduced into the ISTL-Solver (BiCGSTab, AMG,
SSOR) and/or PEDLab (the parallel scalar product, Newton) for this to work.
Not sure where exactly.
Regards,
Jö.
> There is also a merge request for dune-common, which adapts the MPIGuard
> such that you don't need to check for an error state before
> communicating, making use of the ULFM proposal for MPI. You can find it
> here: https://gitlab.dune-project.org/core/dune-common/merge_requests/517
>
> If you don't have a MPI implementation that provides a *working* ULFM
> implementation, you may want to use the blackchannel-ulfm lib:
> https://gitlab.dune-project.org/exadune/blackchannel-ulfm
>
> I hope that helps.
>
> Kind regards
> Nils
>
> On 10.07.19 14:07, Shubhangi Gupta wrote:
> > Hi Jö,
> >
> > So, since you asked about the number of ranks... I tried running the
> > simulations again on 2 processes and 1 process. I get the same problem
> > with 2, but not with 1.
> >
> > On 10.07.19 13:33, Shubhangi Gupta wrote:
> >> Hi Jö,
> >>
> >> Yes, I am running it MPI-parallel, on 4 ranks.
> >>
> >> On 10.07.19 13:32, Jö Fahlke wrote:
> >>> Are you running this MPI-parallel? If yes, how many ranks?
> >>>
> >>> Regards, Jö.
> >>>
> >>> Am Mi, 10. Jul 2019, 11:55:45 +0200 schrieb Shubhangi Gupta:
> >>>> Dear pdelab users,
> >>>>
> >>>> I am currently experiencing a rather strange problem during parallel
> >>>> solution of my finite volume code. I have written a short outline
> >>>> of my code
> >>>> below for reference.
> >>>>
> >>>> At some point during computation, if dune throws an error, the code
> >>>> catches
> >>>> this error, resets the solution vector to the old value, halves the
> >>>> time
> >>>> step size, and tries to redo the calculation (osm.apply()).
> >>>>
> >>>> However, if I get the error "FMatrixError: matrix is singular", the
> >>>> solver
> >>>> seems to freeze. Even the initial defect is not shown! (See the
> >>>> terminal
> >>>> output below.) I am not sure why this is so, and I have not
> >>>> experienced this
> >>>> issue before.
> >>>>
> >>>> I will be very thankful if someone can help me figure out a way
> >>>> around this
> >>>> problem.
> >>>>
> >>>> Thanks, and warm wishes, Shubhangi
> >>>>
> >>>>
> >>>> *// code layout*
> >>>>
> >>>> ...UG grid, generated using gmsh, GV, ...
> >>>>
> >>>> typedef
> >>>> Dune::PDELab::QkDGLocalFiniteElementMap<GV::Grid::ctype, double,
> >>>> 0, dim, Dune::PDELab::QkDGBasisPolynomial::lagrange> FEMP0;
> >>>> FEMP0 femp0;
> >>>> typedef
> >>>> Dune::PDELab::GridFunctionSpace<GV,FEMP0,Dune::PDELab::P0ParallelConstraints,Dune::PDELab::ISTL::VectorBackend<>>
> >>>> GFS0;
> >>>> GFS0 gfs0(gv,femp0);
> >>>> typedef Dune::PDELab::PowerGridFunctionSpace< GFS0,num_of_vars,
> >>>> Dune::PDELab::ISTL::VectorBackend<Dune::PDELab::ISTL::Blocking::fixed>,
> >>>>
> >>>> Dune::PDELab::EntityBlockedOrderingTag> GFS_TCH;
> >>>>
> >>>> ... LocalOperator LOP lop, TimeLocalOperator TOP top,
> >>>> GridOperator GO
> >>>> go, InstationaryGridOperator IGO igo, ...
> >>>>
> >>>> typedef Dune::PDELab::ISTLBackend_BCGS_AMG_SSOR<IGO> LS;
> >>>> LS ls(gfs,50,1,false,true);
> >>>> typedef Dune::PDELab::Newton< IGO, LS, U > PDESOLVER;
> >>>> PDESOLVER pdesolver( igo, ls );
> >>>> Dune::PDELab::ImplicitEulerParameter<double> method;
> >>>>
> >>>> Dune::PDELab::OneStepMethod< double, IGO, PDESOLVER, U, U >
> >>>> osm( method,
> >>>> igo, pdesolver );
> >>>>
> >>>> //TIME-LOOP
> >>>> while( time < t_END - 1e-8){
> >>>> try{
> >>>> //PDE-SOLVE
> >>>> osm.apply( time, dt, uold, unew );
> >>>> exceptionCaught = false;
> >>>> }catch ( Dune::Exception &e ) {
> >>>> //RESET
> >>>> exceptionCaught = true;
> >>>> std::cout << "Catched Error, Dune reported error:
> >>>> " << e <<
> >>>> std::endl;
> >>>> unew = uold;
> >>>> dt *= 0.5;
> >>>> osm.getPDESolver().discardMatrix();
> >>>> continue;
> >>>> }
> >>>> uold = unew;
> >>>> time += dt;
> >>>> }
> >>>>
> >>>>
> >>>> *// terminal output showing FMatrixError...*
> >>>>
> >>>>
> >>>> time = 162.632 , time+dt = 164.603 , opTime = 180 , dt : 1.97044
> >>>>
> >>>> READY FOR NEXT ITERATION.
> >>>> _____________________________________________________
> >>>> current opcount = 2
> >>>> ****************************
> >>>> TCH HYDRATE:
> >>>> ****************************
> >>>> TIME STEP [implicit Euler] 89 time (from): 1.6263e+02 dt:
> >>>> 1.9704e+00
> >>>> time (to): 1.6460e+02
> >>>> STAGE 1 time (to): 1.6460e+02.
> >>>> Initial defect: 2.1649e-01
> >>>> Using a direct coarse solver (SuperLU)
> >>>> Building hierarchy of 2 levels (inclusive coarse solver) took 0.2195
> >>>> seconds.
> >>>> === BiCGSTABSolver
> >>>> 12.5 6.599e-11
> >>>> === rate=0.1733, T=1.152, TIT=0.09217, IT=12.5
> >>>> Newton iteration 1. New defect: 3.4239e-02. Reduction (this):
> >>>> 1.5816e-01. Reduction (total): 1.5816e-01
> >>>> Using a direct coarse solver (SuperLU)
> >>>> Building hierarchy of 2 levels (inclusive coarse solver) took 0.195
> >>>> seconds.
> >>>> === BiCGSTABSolver
> >>>> 17 2.402e-11
> >>>> === rate=0.2894, T=1.655, TIT=0.09738, IT=17
> >>>> Newton iteration 2. New defect: 3.9906e+00. Reduction (this):
> >>>> 1.1655e+02. Reduction (total): 1.8434e+01
> >>>> Using a direct coarse solver (SuperLU)
> >>>> Building hierarchy of 2 levels (inclusive coarse solver) took 0.8697
> >>>> seconds.
> >>>> === BiCGSTABSolver
> >>>> Catched Error, Dune reported error: FMatrixError
> >>>> [luDecomposition:/home/sgupta/dune_2_6/source/dune/dune-common/dune/common/densematrix.hh:909]:
> >>>> matrix is singular
> >>>> _____________________________________________________
> >>>> current opcount = 2
> >>>> ****************************
> >>>> TCH HYDRATE:
> >>>> ****************************
> >>>> TIME STEP [implicit Euler] 89 time (from): 1.6263e+02 dt:
> >>>> 9.8522e-01
> >>>> time (to): 1.6362e+02
> >>>> STAGE 1 time (to): 1.6362e+02.
> >>>>
> >>>> *... nothing happens here... the terminal appears to freeze...*
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Dr. Shubhangi Gupta
> >>>> Marine Geosystems
> >>>> GEOMAR Helmholtz Center for Ocean Research
> >>>> Wischhofstraße 1-3,
> >>>> D-24148 Kiel
> >>>>
> >>>> Room: 12-206
> >>>> Phone: +49 431 600-1402
> >>>> Email:sgupta at geomar.de
> >>>>
> >>>> _______________________________________________
> >>>> dune-pdelab mailing list
> >>>> dune-pdelab at lists.dune-project.org
> >>>> https://lists.dune-project.org/mailman/listinfo/dune-pdelab
> >>>
>
>
>
> _______________________________________________
> dune-pdelab mailing list
> dune-pdelab at lists.dune-project.org
> https://lists.dune-project.org/mailman/listinfo/dune-pdelab
--
Jorrit (Jö) Fahlke, Institute for Computational und Applied Mathematics,
University of Münster, Orleans-Ring 10, D-48149 Münster
Tel: +49 251 83 35146 Fax: +49 251 83 32729
[Zum Thema GNOME 3]
(15:11:22) marina: linus ist auf meiner seite
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <https://lists.dune-project.org/pipermail/dune-pdelab/attachments/20190710/df9a92ff/attachment.sig>
More information about the dune-pdelab
mailing list