[dune-pdelab] Fwd: Fwd: solver fails to reset correctly after FMatrixError (singular matrix)
Nils-Arne Dreier
n.dreier at uni-muenster.de
Wed Jul 10 14:39:09 CEST 2019
Hi Shubhangi,
I just talked to Jö. We guess that the problem is, that the exception is
only thrown on one rank, say rank X. All other ranks do not know that
rank X failed and proceed as usual, at some point all these ranks
waiting for communication of rank X. That is the deadlock that you see.
You may want to have a look at Dune::MPIGuard in
dune/common/parallel/mpiguard.hh. It makes it possible to propagate the
error state to all ranks.
There is also a merge request for dune-common, which adapts the MPIGuard
such that you don't need to check for an error state before
communicating, making use of the ULFM proposal for MPI. You can find it
here: https://gitlab.dune-project.org/core/dune-common/merge_requests/517
If you don't have a MPI implementation that provides a *working* ULFM
implementation, you may want to use the blackchannel-ulfm lib:
https://gitlab.dune-project.org/exadune/blackchannel-ulfm
I hope that helps.
Kind regards
Nils
On 10.07.19 14:07, Shubhangi Gupta wrote:
> Hi Jö,
>
> So, since you asked about the number of ranks... I tried running the
> simulations again on 2 processes and 1 process. I get the same problem
> with 2, but not with 1.
>
> On 10.07.19 13:33, Shubhangi Gupta wrote:
>> Hi Jö,
>>
>> Yes, I am running it MPI-parallel, on 4 ranks.
>>
>> On 10.07.19 13:32, Jö Fahlke wrote:
>>> Are you running this MPI-parallel? If yes, how many ranks?
>>>
>>> Regards, Jö.
>>>
>>> Am Mi, 10. Jul 2019, 11:55:45 +0200 schrieb Shubhangi Gupta:
>>>> Dear pdelab users,
>>>>
>>>> I am currently experiencing a rather strange problem during parallel
>>>> solution of my finite volume code. I have written a short outline
>>>> of my code
>>>> below for reference.
>>>>
>>>> At some point during computation, if dune throws an error, the code
>>>> catches
>>>> this error, resets the solution vector to the old value, halves the
>>>> time
>>>> step size, and tries to redo the calculation (osm.apply()).
>>>>
>>>> However, if I get the error "FMatrixError: matrix is singular", the
>>>> solver
>>>> seems to freeze. Even the initial defect is not shown! (See the
>>>> terminal
>>>> output below.) I am not sure why this is so, and I have not
>>>> experienced this
>>>> issue before.
>>>>
>>>> I will be very thankful if someone can help me figure out a way
>>>> around this
>>>> problem.
>>>>
>>>> Thanks, and warm wishes, Shubhangi
>>>>
>>>>
>>>> *// code layout*
>>>>
>>>> ...UG grid, generated using gmsh, GV, ...
>>>>
>>>> typedef
>>>> Dune::PDELab::QkDGLocalFiniteElementMap<GV::Grid::ctype, double,
>>>> 0, dim, Dune::PDELab::QkDGBasisPolynomial::lagrange> FEMP0;
>>>> FEMP0 femp0;
>>>> typedef
>>>> Dune::PDELab::GridFunctionSpace<GV,FEMP0,Dune::PDELab::P0ParallelConstraints,Dune::PDELab::ISTL::VectorBackend<>>
>>>> GFS0;
>>>> GFS0 gfs0(gv,femp0);
>>>> typedef Dune::PDELab::PowerGridFunctionSpace< GFS0,num_of_vars,
>>>> Dune::PDELab::ISTL::VectorBackend<Dune::PDELab::ISTL::Blocking::fixed>,
>>>>
>>>> Dune::PDELab::EntityBlockedOrderingTag> GFS_TCH;
>>>>
>>>> ... LocalOperator LOP lop, TimeLocalOperator TOP top,
>>>> GridOperator GO
>>>> go, InstationaryGridOperator IGO igo, ...
>>>>
>>>> typedef Dune::PDELab::ISTLBackend_BCGS_AMG_SSOR<IGO> LS;
>>>> LS ls(gfs,50,1,false,true);
>>>> typedef Dune::PDELab::Newton< IGO, LS, U > PDESOLVER;
>>>> PDESOLVER pdesolver( igo, ls );
>>>> Dune::PDELab::ImplicitEulerParameter<double> method;
>>>>
>>>> Dune::PDELab::OneStepMethod< double, IGO, PDESOLVER, U, U >
>>>> osm( method,
>>>> igo, pdesolver );
>>>>
>>>> //TIME-LOOP
>>>> while( time < t_END - 1e-8){
>>>> try{
>>>> //PDE-SOLVE
>>>> osm.apply( time, dt, uold, unew );
>>>> exceptionCaught = false;
>>>> }catch ( Dune::Exception &e ) {
>>>> //RESET
>>>> exceptionCaught = true;
>>>> std::cout << "Catched Error, Dune reported error:
>>>> " << e <<
>>>> std::endl;
>>>> unew = uold;
>>>> dt *= 0.5;
>>>> osm.getPDESolver().discardMatrix();
>>>> continue;
>>>> }
>>>> uold = unew;
>>>> time += dt;
>>>> }
>>>>
>>>>
>>>> *// terminal output showing FMatrixError...*
>>>>
>>>>
>>>> time = 162.632 , time+dt = 164.603 , opTime = 180 , dt : 1.97044
>>>>
>>>> READY FOR NEXT ITERATION.
>>>> _____________________________________________________
>>>> current opcount = 2
>>>> ****************************
>>>> TCH HYDRATE:
>>>> ****************************
>>>> TIME STEP [implicit Euler] 89 time (from): 1.6263e+02 dt:
>>>> 1.9704e+00
>>>> time (to): 1.6460e+02
>>>> STAGE 1 time (to): 1.6460e+02.
>>>> Initial defect: 2.1649e-01
>>>> Using a direct coarse solver (SuperLU)
>>>> Building hierarchy of 2 levels (inclusive coarse solver) took 0.2195
>>>> seconds.
>>>> === BiCGSTABSolver
>>>> 12.5 6.599e-11
>>>> === rate=0.1733, T=1.152, TIT=0.09217, IT=12.5
>>>> Newton iteration 1. New defect: 3.4239e-02. Reduction (this):
>>>> 1.5816e-01. Reduction (total): 1.5816e-01
>>>> Using a direct coarse solver (SuperLU)
>>>> Building hierarchy of 2 levels (inclusive coarse solver) took 0.195
>>>> seconds.
>>>> === BiCGSTABSolver
>>>> 17 2.402e-11
>>>> === rate=0.2894, T=1.655, TIT=0.09738, IT=17
>>>> Newton iteration 2. New defect: 3.9906e+00. Reduction (this):
>>>> 1.1655e+02. Reduction (total): 1.8434e+01
>>>> Using a direct coarse solver (SuperLU)
>>>> Building hierarchy of 2 levels (inclusive coarse solver) took 0.8697
>>>> seconds.
>>>> === BiCGSTABSolver
>>>> Catched Error, Dune reported error: FMatrixError
>>>> [luDecomposition:/home/sgupta/dune_2_6/source/dune/dune-common/dune/common/densematrix.hh:909]:
>>>> matrix is singular
>>>> _____________________________________________________
>>>> current opcount = 2
>>>> ****************************
>>>> TCH HYDRATE:
>>>> ****************************
>>>> TIME STEP [implicit Euler] 89 time (from): 1.6263e+02 dt:
>>>> 9.8522e-01
>>>> time (to): 1.6362e+02
>>>> STAGE 1 time (to): 1.6362e+02.
>>>>
>>>> *... nothing happens here... the terminal appears to freeze...*
>>>>
>>>>
>>>>
>>>> --
>>>> Dr. Shubhangi Gupta
>>>> Marine Geosystems
>>>> GEOMAR Helmholtz Center for Ocean Research
>>>> Wischhofstraße 1-3,
>>>> D-24148 Kiel
>>>>
>>>> Room: 12-206
>>>> Phone: +49 431 600-1402
>>>> Email:sgupta at geomar.de
>>>>
>>>> _______________________________________________
>>>> dune-pdelab mailing list
>>>> dune-pdelab at lists.dune-project.org
>>>> https://lists.dune-project.org/mailman/listinfo/dune-pdelab
>>>
More information about the dune-pdelab
mailing list