[dune-pdelab] Fwd: Fwd: solver fails to reset correctly after FMatrixError (singular matrix)

Shubhangi Gupta sgupta at geomar.de
Tue Jul 23 15:26:39 CEST 2019


Hi Nils,

Sorry, I am still struggling with this issue... and my BiCGStab solver 
is freezing a lot more often so I cant ignore this ..

About the ULFM... you sent me the following link:

https://gitlab.dune-project.org/exadune/blackchannel-ulfm

Sorry if this is a trivial question, but how should I compile this? With 
dune-build? and how should I include this in my code?

Thanks, and warm wishes, Shubhangi


On 12.07.19 13:38, Nils-Arne Dreier wrote:
> Hi Shubhangi,
>
> you have to call the MPIGuard::finalize() method after that point, where
> the exception might be thrown and before the next communication is
> performed. From the information, you provided, I guess that the
> exception is thrown in the smoother of the AMG. Which makes things
> slightly complicated. Maybe AMG::mgc is a good starting point.
>
> By the way: If you use the ULFM things I described previously you can
> use the MPIGuard on the coarsest level and don't need to call
> MPIGuard::finalize() after every critical section.
>
> Regards
> Nils
>
> On 11.07.19 14:56, Shubhangi Gupta wrote:
>> Dear Jö and Nils,
>>
>> Thanks a lot for your replies.
>>
>> I actually tried putting the mpiguard within the time loop (at the
>> highest level) just to see what happens... Indeed, the one step method
>> now proceeds as it should, but the BiCGSTab freezes... So yeah, as Jö
>> mentioned, the mpiguard needs to be introduced inside the
>> ISTL-solver... I am not very sure how and where exactly though! Any
>> ideas?
>>
>> Thanks again, and warm wishes, Shubhangi
>>
>> On 10.07.19 14:52, Jö Fahlke wrote:
>>> Am Mi, 10. Jul 2019, 14:39:09 +0200 schrieb Nils-Arne Dreier:
>>>> Hi Shubhangi,
>>>>
>>>> I just talked to Jö. We guess that the problem is, that the
>>>> exception is
>>>> only thrown on one rank, say rank X. All other ranks do not know that
>>>> rank X failed and proceed as usual, at some point all these ranks
>>>> waiting for communication of rank X. That is the deadlock that you see.
>>>>
>>>> You may want to have a look at Dune::MPIGuard in
>>>> dune/common/parallel/mpiguard.hh. It makes it possible to propagate the
>>>> error state to all ranks.
>>> It should be mentioned that MPIGuard probably cannot be used at a
>>> high level,
>>> it would probably need to be introduced into the ISTL-Solver
>>> (BiCGSTab, AMG,
>>> SSOR) and/or PEDLab (the parallel scalar product, Newton) for this to
>>> work.
>>> Not sure where exactly.
>>>
>>> Regards,
>>> Jö.
>>>
>>>> There is also a merge request for dune-common, which adapts the
>>>> MPIGuard
>>>> such that you don't need to check for an error state before
>>>> communicating, making use of the ULFM proposal for MPI. You can find it
>>>> here:
>>>> https://gitlab.dune-project.org/core/dune-common/merge_requests/517
>>>>
>>>> If you don't have a MPI implementation that provides a *working* ULFM
>>>> implementation, you may want to use the blackchannel-ulfm lib:
>>>> https://gitlab.dune-project.org/exadune/blackchannel-ulfm
>>>>
>>>> I hope that helps.
>>>>
>>>> Kind regards
>>>> Nils
>>>>
>>>> On 10.07.19 14:07, Shubhangi Gupta wrote:
>>>>> Hi Jö,
>>>>>
>>>>> So, since you asked about the number of ranks... I tried running the
>>>>> simulations again on 2 processes and 1 process. I get the same problem
>>>>> with 2, but not with 1.
>>>>>
>>>>> On 10.07.19 13:33, Shubhangi Gupta wrote:
>>>>>> Hi Jö,
>>>>>>
>>>>>> Yes, I am running it MPI-parallel, on 4 ranks.
>>>>>>
>>>>>> On 10.07.19 13:32, Jö Fahlke wrote:
>>>>>>> Are you running this MPI-parallel?  If yes, how many ranks?
>>>>>>>
>>>>>>> Regards, Jö.
>>>>>>>
>>>>>>> Am Mi, 10. Jul 2019, 11:55:45 +0200 schrieb Shubhangi Gupta:
>>>>>>>> Dear pdelab users,
>>>>>>>>
>>>>>>>> I am currently experiencing a rather strange problem during
>>>>>>>> parallel
>>>>>>>> solution of my finite volume code. I have written a short outline
>>>>>>>> of my code
>>>>>>>> below for reference.
>>>>>>>>
>>>>>>>> At some point during computation, if dune throws an error, the code
>>>>>>>> catches
>>>>>>>> this error, resets the solution vector to the old value, halves the
>>>>>>>> time
>>>>>>>> step size, and tries to redo the calculation (osm.apply()).
>>>>>>>>
>>>>>>>> However, if I get the error "FMatrixError: matrix is singular", the
>>>>>>>> solver
>>>>>>>> seems to freeze. Even the initial defect is not shown! (See the
>>>>>>>> terminal
>>>>>>>> output below.) I am not sure why this is so, and I have not
>>>>>>>> experienced this
>>>>>>>> issue before.
>>>>>>>>
>>>>>>>> I will be very thankful if someone can help me figure out a way
>>>>>>>> around this
>>>>>>>> problem.
>>>>>>>>
>>>>>>>> Thanks, and warm wishes, Shubhangi
>>>>>>>>
>>>>>>>>
>>>>>>>> *// code layout*
>>>>>>>>
>>>>>>>>        ...UG grid, generated using gmsh, GV, ...
>>>>>>>>
>>>>>>>>        typedef
>>>>>>>> Dune::PDELab::QkDGLocalFiniteElementMap<GV::Grid::ctype, double,
>>>>>>>> 0, dim, Dune::PDELab::QkDGBasisPolynomial::lagrange> FEMP0;
>>>>>>>>        FEMP0 femp0;
>>>>>>>>        typedef
>>>>>>>> Dune::PDELab::GridFunctionSpace<GV,FEMP0,Dune::PDELab::P0ParallelConstraints,Dune::PDELab::ISTL::VectorBackend<>>
>>>>>>>>
>>>>>>>> GFS0;
>>>>>>>>        GFS0 gfs0(gv,femp0);
>>>>>>>>        typedef Dune::PDELab::PowerGridFunctionSpace<
>>>>>>>> GFS0,num_of_vars,
>>>>>>>> Dune::PDELab::ISTL::VectorBackend<Dune::PDELab::ISTL::Blocking::fixed>,
>>>>>>>>
>>>>>>>>
>>>>>>>> Dune::PDELab::EntityBlockedOrderingTag> GFS_TCH;
>>>>>>>>
>>>>>>>>        ... LocalOperator LOP lop, TimeLocalOperator TOP top,
>>>>>>>> GridOperator GO
>>>>>>>> go, InstationaryGridOperator IGO igo, ...
>>>>>>>>
>>>>>>>>        typedef Dune::PDELab::ISTLBackend_BCGS_AMG_SSOR<IGO> LS;
>>>>>>>>        LS ls(gfs,50,1,false,true);
>>>>>>>>        typedef Dune::PDELab::Newton< IGO, LS, U > PDESOLVER;
>>>>>>>>        PDESOLVER pdesolver( igo, ls );
>>>>>>>> Dune::PDELab::ImplicitEulerParameter<double> method;
>>>>>>>>
>>>>>>>>        Dune::PDELab::OneStepMethod< double, IGO, PDESOLVER, U, U >
>>>>>>>> osm( method,
>>>>>>>> igo, pdesolver );
>>>>>>>>
>>>>>>>>        //TIME-LOOP
>>>>>>>>        while( time < t_END - 1e-8){
>>>>>>>>                try{
>>>>>>>>                    //PDE-SOLVE
>>>>>>>>                    osm.apply( time, dt, uold, unew );
>>>>>>>>                    exceptionCaught = false;
>>>>>>>>                }catch ( Dune::Exception &e ) {
>>>>>>>>                    //RESET
>>>>>>>>                    exceptionCaught = true;
>>>>>>>>                    std::cout << "Catched Error, Dune reported error:
>>>>>>>> " << e <<
>>>>>>>> std::endl;
>>>>>>>>                    unew = uold;
>>>>>>>>                    dt *= 0.5;
>>>>>>>> osm.getPDESolver().discardMatrix();
>>>>>>>>                    continue;
>>>>>>>>                }
>>>>>>>>                uold = unew;
>>>>>>>>                time += dt;
>>>>>>>>        }
>>>>>>>>
>>>>>>>>
>>>>>>>> *// terminal output showing FMatrixError...*
>>>>>>>>
>>>>>>>>
>>>>>>>>     time = 162.632 , time+dt = 164.603 , opTime = 180 , dt  :
>>>>>>>> 1.97044
>>>>>>>>
>>>>>>>>     READY FOR NEXT ITERATION.
>>>>>>>> _____________________________________________________
>>>>>>>>     current opcount = 2
>>>>>>>> ****************************
>>>>>>>> TCH HYDRATE:
>>>>>>>> ****************************
>>>>>>>> TIME STEP [implicit Euler]     89 time (from):   1.6263e+02 dt:
>>>>>>>> 1.9704e+00
>>>>>>>> time (to):   1.6460e+02
>>>>>>>> STAGE 1 time (to):   1.6460e+02.
>>>>>>>>      Initial defect:   2.1649e-01
>>>>>>>> Using a direct coarse solver (SuperLU)
>>>>>>>> Building hierarchy of 2 levels (inclusive coarse solver) took
>>>>>>>> 0.2195
>>>>>>>> seconds.
>>>>>>>> === BiCGSTABSolver
>>>>>>>>     12.5        6.599e-11
>>>>>>>> === rate=0.1733, T=1.152, TIT=0.09217, IT=12.5
>>>>>>>>      Newton iteration  1.  New defect:   3.4239e-02.  Reduction
>>>>>>>> (this):
>>>>>>>> 1.5816e-01.  Reduction (total):   1.5816e-01
>>>>>>>> Using a direct coarse solver (SuperLU)
>>>>>>>> Building hierarchy of 2 levels (inclusive coarse solver) took 0.195
>>>>>>>> seconds.
>>>>>>>> === BiCGSTABSolver
>>>>>>>>       17        2.402e-11
>>>>>>>> === rate=0.2894, T=1.655, TIT=0.09738, IT=17
>>>>>>>>      Newton iteration  2.  New defect:   3.9906e+00.  Reduction
>>>>>>>> (this):
>>>>>>>> 1.1655e+02.  Reduction (total):   1.8434e+01
>>>>>>>> Using a direct coarse solver (SuperLU)
>>>>>>>> Building hierarchy of 2 levels (inclusive coarse solver) took
>>>>>>>> 0.8697
>>>>>>>> seconds.
>>>>>>>> === BiCGSTABSolver
>>>>>>>> Catched Error, Dune reported error: FMatrixError
>>>>>>>> [luDecomposition:/home/sgupta/dune_2_6/source/dune/dune-common/dune/common/densematrix.hh:909]:
>>>>>>>>
>>>>>>>> matrix is singular
>>>>>>>> _____________________________________________________
>>>>>>>>     current opcount = 2
>>>>>>>> ****************************
>>>>>>>> TCH HYDRATE:
>>>>>>>> ****************************
>>>>>>>> TIME STEP [implicit Euler]     89 time (from):   1.6263e+02 dt:
>>>>>>>> 9.8522e-01
>>>>>>>> time (to):   1.6362e+02
>>>>>>>> STAGE 1 time (to):   1.6362e+02.
>>>>>>>>
>>>>>>>> *... nothing happens here... the terminal appears to freeze...*
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> -- 
>>>>>>>> Dr. Shubhangi Gupta
>>>>>>>> Marine Geosystems
>>>>>>>> GEOMAR Helmholtz Center for Ocean Research
>>>>>>>> Wischhofstraße 1-3,
>>>>>>>> D-24148 Kiel
>>>>>>>>
>>>>>>>> Room: 12-206
>>>>>>>> Phone: +49 431 600-1402
>>>>>>>> Email:sgupta at geomar.de
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> dune-pdelab mailing list
>>>>>>>> dune-pdelab at lists.dune-project.org
>>>>>>>> https://lists.dune-project.org/mailman/listinfo/dune-pdelab
>>>>
>>>> _______________________________________________
>>>> dune-pdelab mailing list
>>>> dune-pdelab at lists.dune-project.org
>>>> https://lists.dune-project.org/mailman/listinfo/dune-pdelab
>
-- 
Dr. Shubhangi Gupta
Marine Geosystems
GEOMAR Helmholtz Center for Ocean Research
Wischhofstraße 1-3,
D-24148 Kiel

Room: 12-206
Phone: +49 431 600-1402
Email: sgupta at geomar.de





More information about the dune-pdelab mailing list