[dune-pdelab] Fwd: Fwd: solver fails to reset correctly after FMatrixError (singular matrix)

Jö Fahlke jorrit at jorrit.de
Tue Jul 23 16:45:20 CEST 2019


Am Di, 23. Jul 2019, 15:26:39 +0200 schrieb Shubhangi Gupta:
> Sorry, I am still struggling with this issue... and my BiCGStab solver is
> freezing a lot more often so I cant ignore this ..
> 
> About the ULFM... you sent me the following link:
> 
> https://gitlab.dune-project.org/exadune/blackchannel-ulfm

That is a (more-or-less) standard cmake buildsystem, i.e. it works outside of
dune.  Try something like this (untested, replace the "..." as needed):
```sh
git clone https://gitlab.dune-project.org/exadune/blackchannel-ulfm
mkdir build
( cd build && cmake ../blackchannel-ulfm -DCMAKE_INSTALL_PREFIX=... )
make -C build install
```

Then, in your Dune opts file, you may need to set
`-DBLACKCHANNEL_INCLUDE_DIR=.../include -DBLACKCHANNEL_LIBRARIES=.../lib` (see
[1]) in the `CMAKE_FLAGS` and Dune should pick the library up when
reconfiguring.

[1]: https://gitlab.dune-project.org/core/dune-common/blob/edef55ec9ed40617d12648d6ec95cbfc7120c676/cmake/modules/FindBlackChannel.cmake

Regards,
Jö.

> Sorry if this is a trivial question, but how should I compile this? With
> dune-build? and how should I include this in my code?
> 
> Thanks, and warm wishes, Shubhangi
> 
> 
> On 12.07.19 13:38, Nils-Arne Dreier wrote:
> > Hi Shubhangi,
> > 
> > you have to call the MPIGuard::finalize() method after that point, where
> > the exception might be thrown and before the next communication is
> > performed. From the information, you provided, I guess that the
> > exception is thrown in the smoother of the AMG. Which makes things
> > slightly complicated. Maybe AMG::mgc is a good starting point.
> > 
> > By the way: If you use the ULFM things I described previously you can
> > use the MPIGuard on the coarsest level and don't need to call
> > MPIGuard::finalize() after every critical section.
> > 
> > Regards
> > Nils
> > 
> > On 11.07.19 14:56, Shubhangi Gupta wrote:
> > > Dear Jö and Nils,
> > > 
> > > Thanks a lot for your replies.
> > > 
> > > I actually tried putting the mpiguard within the time loop (at the
> > > highest level) just to see what happens... Indeed, the one step method
> > > now proceeds as it should, but the BiCGSTab freezes... So yeah, as Jö
> > > mentioned, the mpiguard needs to be introduced inside the
> > > ISTL-solver... I am not very sure how and where exactly though! Any
> > > ideas?
> > > 
> > > Thanks again, and warm wishes, Shubhangi
> > > 
> > > On 10.07.19 14:52, Jö Fahlke wrote:
> > > > Am Mi, 10. Jul 2019, 14:39:09 +0200 schrieb Nils-Arne Dreier:
> > > > > Hi Shubhangi,
> > > > > 
> > > > > I just talked to Jö. We guess that the problem is, that the
> > > > > exception is
> > > > > only thrown on one rank, say rank X. All other ranks do not know that
> > > > > rank X failed and proceed as usual, at some point all these ranks
> > > > > waiting for communication of rank X. That is the deadlock that you see.
> > > > > 
> > > > > You may want to have a look at Dune::MPIGuard in
> > > > > dune/common/parallel/mpiguard.hh. It makes it possible to propagate the
> > > > > error state to all ranks.
> > > > It should be mentioned that MPIGuard probably cannot be used at a
> > > > high level,
> > > > it would probably need to be introduced into the ISTL-Solver
> > > > (BiCGSTab, AMG,
> > > > SSOR) and/or PEDLab (the parallel scalar product, Newton) for this to
> > > > work.
> > > > Not sure where exactly.
> > > > 
> > > > Regards,
> > > > Jö.
> > > > 
> > > > > There is also a merge request for dune-common, which adapts the
> > > > > MPIGuard
> > > > > such that you don't need to check for an error state before
> > > > > communicating, making use of the ULFM proposal for MPI. You can find it
> > > > > here:
> > > > > https://gitlab.dune-project.org/core/dune-common/merge_requests/517
> > > > > 
> > > > > If you don't have a MPI implementation that provides a *working* ULFM
> > > > > implementation, you may want to use the blackchannel-ulfm lib:
> > > > > https://gitlab.dune-project.org/exadune/blackchannel-ulfm
> > > > > 
> > > > > I hope that helps.
> > > > > 
> > > > > Kind regards
> > > > > Nils
> > > > > 
> > > > > On 10.07.19 14:07, Shubhangi Gupta wrote:
> > > > > > Hi Jö,
> > > > > > 
> > > > > > So, since you asked about the number of ranks... I tried running the
> > > > > > simulations again on 2 processes and 1 process. I get the same problem
> > > > > > with 2, but not with 1.
> > > > > > 
> > > > > > On 10.07.19 13:33, Shubhangi Gupta wrote:
> > > > > > > Hi Jö,
> > > > > > > 
> > > > > > > Yes, I am running it MPI-parallel, on 4 ranks.
> > > > > > > 
> > > > > > > On 10.07.19 13:32, Jö Fahlke wrote:
> > > > > > > > Are you running this MPI-parallel?  If yes, how many ranks?
> > > > > > > > 
> > > > > > > > Regards, Jö.
> > > > > > > > 
> > > > > > > > Am Mi, 10. Jul 2019, 11:55:45 +0200 schrieb Shubhangi Gupta:
> > > > > > > > > Dear pdelab users,
> > > > > > > > > 
> > > > > > > > > I am currently experiencing a rather strange problem during
> > > > > > > > > parallel
> > > > > > > > > solution of my finite volume code. I have written a short outline
> > > > > > > > > of my code
> > > > > > > > > below for reference.
> > > > > > > > > 
> > > > > > > > > At some point during computation, if dune throws an error, the code
> > > > > > > > > catches
> > > > > > > > > this error, resets the solution vector to the old value, halves the
> > > > > > > > > time
> > > > > > > > > step size, and tries to redo the calculation (osm.apply()).
> > > > > > > > > 
> > > > > > > > > However, if I get the error "FMatrixError: matrix is singular", the
> > > > > > > > > solver
> > > > > > > > > seems to freeze. Even the initial defect is not shown! (See the
> > > > > > > > > terminal
> > > > > > > > > output below.) I am not sure why this is so, and I have not
> > > > > > > > > experienced this
> > > > > > > > > issue before.
> > > > > > > > > 
> > > > > > > > > I will be very thankful if someone can help me figure out a way
> > > > > > > > > around this
> > > > > > > > > problem.
> > > > > > > > > 
> > > > > > > > > Thanks, and warm wishes, Shubhangi
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > *// code layout*
> > > > > > > > > 
> > > > > > > > >        ...UG grid, generated using gmsh, GV, ...
> > > > > > > > > 
> > > > > > > > >        typedef
> > > > > > > > > Dune::PDELab::QkDGLocalFiniteElementMap<GV::Grid::ctype, double,
> > > > > > > > > 0, dim, Dune::PDELab::QkDGBasisPolynomial::lagrange> FEMP0;
> > > > > > > > >        FEMP0 femp0;
> > > > > > > > >        typedef
> > > > > > > > > Dune::PDELab::GridFunctionSpace<GV,FEMP0,Dune::PDELab::P0ParallelConstraints,Dune::PDELab::ISTL::VectorBackend<>>
> > > > > > > > > 
> > > > > > > > > GFS0;
> > > > > > > > >        GFS0 gfs0(gv,femp0);
> > > > > > > > >        typedef Dune::PDELab::PowerGridFunctionSpace<
> > > > > > > > > GFS0,num_of_vars,
> > > > > > > > > Dune::PDELab::ISTL::VectorBackend<Dune::PDELab::ISTL::Blocking::fixed>,
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > Dune::PDELab::EntityBlockedOrderingTag> GFS_TCH;
> > > > > > > > > 
> > > > > > > > >        ... LocalOperator LOP lop, TimeLocalOperator TOP top,
> > > > > > > > > GridOperator GO
> > > > > > > > > go, InstationaryGridOperator IGO igo, ...
> > > > > > > > > 
> > > > > > > > >        typedef Dune::PDELab::ISTLBackend_BCGS_AMG_SSOR<IGO> LS;
> > > > > > > > >        LS ls(gfs,50,1,false,true);
> > > > > > > > >        typedef Dune::PDELab::Newton< IGO, LS, U > PDESOLVER;
> > > > > > > > >        PDESOLVER pdesolver( igo, ls );
> > > > > > > > > Dune::PDELab::ImplicitEulerParameter<double> method;
> > > > > > > > > 
> > > > > > > > >        Dune::PDELab::OneStepMethod< double, IGO, PDESOLVER, U, U >
> > > > > > > > > osm( method,
> > > > > > > > > igo, pdesolver );
> > > > > > > > > 
> > > > > > > > >        //TIME-LOOP
> > > > > > > > >        while( time < t_END - 1e-8){
> > > > > > > > >                try{
> > > > > > > > >                    //PDE-SOLVE
> > > > > > > > >                    osm.apply( time, dt, uold, unew );
> > > > > > > > >                    exceptionCaught = false;
> > > > > > > > >                }catch ( Dune::Exception &e ) {
> > > > > > > > >                    //RESET
> > > > > > > > >                    exceptionCaught = true;
> > > > > > > > >                    std::cout << "Catched Error, Dune reported error:
> > > > > > > > > " << e <<
> > > > > > > > > std::endl;
> > > > > > > > >                    unew = uold;
> > > > > > > > >                    dt *= 0.5;
> > > > > > > > > osm.getPDESolver().discardMatrix();
> > > > > > > > >                    continue;
> > > > > > > > >                }
> > > > > > > > >                uold = unew;
> > > > > > > > >                time += dt;
> > > > > > > > >        }
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > *// terminal output showing FMatrixError...*
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > >     time = 162.632 , time+dt = 164.603 , opTime = 180 , dt  :
> > > > > > > > > 1.97044
> > > > > > > > > 
> > > > > > > > >     READY FOR NEXT ITERATION.
> > > > > > > > > _____________________________________________________
> > > > > > > > >     current opcount = 2
> > > > > > > > > ****************************
> > > > > > > > > TCH HYDRATE:
> > > > > > > > > ****************************
> > > > > > > > > TIME STEP [implicit Euler]     89 time (from):   1.6263e+02 dt:
> > > > > > > > > 1.9704e+00
> > > > > > > > > time (to):   1.6460e+02
> > > > > > > > > STAGE 1 time (to):   1.6460e+02.
> > > > > > > > >      Initial defect:   2.1649e-01
> > > > > > > > > Using a direct coarse solver (SuperLU)
> > > > > > > > > Building hierarchy of 2 levels (inclusive coarse solver) took
> > > > > > > > > 0.2195
> > > > > > > > > seconds.
> > > > > > > > > === BiCGSTABSolver
> > > > > > > > >     12.5        6.599e-11
> > > > > > > > > === rate=0.1733, T=1.152, TIT=0.09217, IT=12.5
> > > > > > > > >      Newton iteration  1.  New defect:   3.4239e-02.  Reduction
> > > > > > > > > (this):
> > > > > > > > > 1.5816e-01.  Reduction (total):   1.5816e-01
> > > > > > > > > Using a direct coarse solver (SuperLU)
> > > > > > > > > Building hierarchy of 2 levels (inclusive coarse solver) took 0.195
> > > > > > > > > seconds.
> > > > > > > > > === BiCGSTABSolver
> > > > > > > > >       17        2.402e-11
> > > > > > > > > === rate=0.2894, T=1.655, TIT=0.09738, IT=17
> > > > > > > > >      Newton iteration  2.  New defect:   3.9906e+00.  Reduction
> > > > > > > > > (this):
> > > > > > > > > 1.1655e+02.  Reduction (total):   1.8434e+01
> > > > > > > > > Using a direct coarse solver (SuperLU)
> > > > > > > > > Building hierarchy of 2 levels (inclusive coarse solver) took
> > > > > > > > > 0.8697
> > > > > > > > > seconds.
> > > > > > > > > === BiCGSTABSolver
> > > > > > > > > Catched Error, Dune reported error: FMatrixError
> > > > > > > > > [luDecomposition:/home/sgupta/dune_2_6/source/dune/dune-common/dune/common/densematrix.hh:909]:
> > > > > > > > > 
> > > > > > > > > matrix is singular
> > > > > > > > > _____________________________________________________
> > > > > > > > >     current opcount = 2
> > > > > > > > > ****************************
> > > > > > > > > TCH HYDRATE:
> > > > > > > > > ****************************
> > > > > > > > > TIME STEP [implicit Euler]     89 time (from):   1.6263e+02 dt:
> > > > > > > > > 9.8522e-01
> > > > > > > > > time (to):   1.6362e+02
> > > > > > > > > STAGE 1 time (to):   1.6362e+02.
> > > > > > > > > 
> > > > > > > > > *... nothing happens here... the terminal appears to freeze...*
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > -- 
> > > > > > > > > Dr. Shubhangi Gupta
> > > > > > > > > Marine Geosystems
> > > > > > > > > GEOMAR Helmholtz Center for Ocean Research
> > > > > > > > > Wischhofstraße 1-3,
> > > > > > > > > D-24148 Kiel
> > > > > > > > > 
> > > > > > > > > Room: 12-206
> > > > > > > > > Phone: +49 431 600-1402
> > > > > > > > > Email:sgupta at geomar.de
> > > > > > > > > 
> > > > > > > > > _______________________________________________
> > > > > > > > > dune-pdelab mailing list
> > > > > > > > > dune-pdelab at lists.dune-project.org
> > > > > > > > > https://lists.dune-project.org/mailman/listinfo/dune-pdelab
> > > > > 
> > > > > _______________________________________________
> > > > > dune-pdelab mailing list
> > > > > dune-pdelab at lists.dune-project.org
> > > > > https://lists.dune-project.org/mailman/listinfo/dune-pdelab
> > 
> -- 
> Dr. Shubhangi Gupta
> Marine Geosystems
> GEOMAR Helmholtz Center for Ocean Research
> Wischhofstraße 1-3,
> D-24148 Kiel
> 
> Room: 12-206
> Phone: +49 431 600-1402
> Email: sgupta at geomar.de
> 
> 
> _______________________________________________
> dune-pdelab mailing list
> dune-pdelab at lists.dune-project.org
> https://lists.dune-project.org/mailman/listinfo/dune-pdelab

-- 
Jorrit (Jö) Fahlke, Institute for Computational und Applied Mathematics,
University of Münster, Orleans-Ring 10, D-48149 Münster
Tel: +49 251 83 35146 Fax: +49 251 83 32729

In the beginning the Universe was created.  This has made a lot of
people very angry and been widely regarded as a bad move.
-- Douglas Adams
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <https://lists.dune-project.org/pipermail/dune-pdelab/attachments/20190723/32905368/attachment.sig>


More information about the dune-pdelab mailing list