[dune-pdelab] Fwd: Fwd: solver fails to reset correctly after FMatrixError (singular matrix)

Jö Fahlke jorrit at jorrit.de
Wed Jul 24 11:55:14 CEST 2019


Am Mi, 24. Jul 2019, 10:36:41 +0200 schrieb Shubhangi Gupta:
> Date: Wed, 24 Jul 2019 10:36:41 +0200
> From: Shubhangi Gupta <sgupta at geomar.de>
> To: Jö Fahlke <jorrit at jorrit.de>
> Cc: dune-pdelab at lists.dune-project.org
> Subject: Re: [dune-pdelab] Fwd: Fwd: solver fails to reset correctly after
>  FMatrixError (singular matrix)
> X-MGA-submission: MDF026Y98A3si49SufLKx2BJDaOgSR2RFTj31H7fDBsniUDZQLylim0vaZOCJ350ICInSmFF5GR28Z7yqF/dDhhKi/Kmq067wU1oEqyKiizDxkAT19P3XITGuGngAPM774oA2MxLsUt6EeSzX/QKn8AR
> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101
>  Thunderbird/60.7.2
> Content-Language: en-US
> 
> Hi Nils,
> 
> Thanks a lot! I managed to install blackchannel-ulfm. While building Dune
> with following CMake opts:
> 
> CMAKE_FLAGS="
> -DCMAKE_C_COMPILER='/usr/bin/gcc'
> -DCMAKE_CXX_COMPILER='/usr/bin/g++-7'
> -DCMAKE_Fortran_COMPILER='/usr/bin/gfortran'
> -DBLACKCHANNEL_INCLUDE_DIR='/usr/local/include'
> -DBLACKCHANNEL_LIBRARIES='/usr/local/lib'
> -DCMAKE_CXX_FLAGS_RELEASE='-O3 -DNDEBUG -g0 -Wno-deprecated-declarations
> -funroll-loops'
> -DCMAKE_BUILD_TYPE=Release
> -DDUNE_SYMLINK_TO_SOURCE_TREE=1
> "
> 
> I get the following message:
> 
>   Manually-specified variables were not used by the project:
> 
>     BLACKCHANNEL_INCLUDE_DIR
>     BLACKCHANNEL_LIBRARIES
> 
> 
> How can I check (or force) whether Dune indeed finds the blackchannel ?

If those variables were not used, you probably need to switch the Dune-common
branch to that of
https://gitlab.dune-project.org/core/dune-common/merge_requests/517.  Or at
least something that includes the changes from that MR.

The CMake output during "dunecontrol configure" will probably say something
about blackchannel (whether it is even looking for it, and whether it was
found).

Note however:  All what is needed is the revoke functionality from ULFM, if
your MPI already includes that, you may not need blackchannel at all (and
"dunecontrol configure" probably won't look for it).

Regards,
Jö.

> Thanks again, and warm wishes, Shubhangi
> 
> 
> On 23.07.19 16:45, Jö Fahlke wrote:
> > Am Di, 23. Jul 2019, 15:26:39 +0200 schrieb Shubhangi Gupta:
> > > Sorry, I am still struggling with this issue... and my BiCGStab solver is
> > > freezing a lot more often so I cant ignore this ..
> > > 
> > > About the ULFM... you sent me the following link:
> > > 
> > > https://gitlab.dune-project.org/exadune/blackchannel-ulfm
> > That is a (more-or-less) standard cmake buildsystem, i.e. it works outside of
> > dune.  Try something like this (untested, replace the "..." as needed):
> > ```sh
> > git clone https://gitlab.dune-project.org/exadune/blackchannel-ulfm
> > mkdir build
> > ( cd build && cmake ../blackchannel-ulfm -DCMAKE_INSTALL_PREFIX=... )
> > make -C build install
> > ```
> > 
> > Then, in your Dune opts file, you may need to set
> > `-DBLACKCHANNEL_INCLUDE_DIR=.../include -DBLACKCHANNEL_LIBRARIES=.../lib` (see
> > [1]) in the `CMAKE_FLAGS` and Dune should pick the library up when
> > reconfiguring.
> > 
> > [1]: https://gitlab.dune-project.org/core/dune-common/blob/edef55ec9ed40617d12648d6ec95cbfc7120c676/cmake/modules/FindBlackChannel.cmake
> > 
> > Regards,
> > Jö.
> > 
> > > Sorry if this is a trivial question, but how should I compile this? With
> > > dune-build? and how should I include this in my code?
> > > 
> > > Thanks, and warm wishes, Shubhangi
> > > 
> > > 
> > > On 12.07.19 13:38, Nils-Arne Dreier wrote:
> > > > Hi Shubhangi,
> > > > 
> > > > you have to call the MPIGuard::finalize() method after that point, where
> > > > the exception might be thrown and before the next communication is
> > > > performed. From the information, you provided, I guess that the
> > > > exception is thrown in the smoother of the AMG. Which makes things
> > > > slightly complicated. Maybe AMG::mgc is a good starting point.
> > > > 
> > > > By the way: If you use the ULFM things I described previously you can
> > > > use the MPIGuard on the coarsest level and don't need to call
> > > > MPIGuard::finalize() after every critical section.
> > > > 
> > > > Regards
> > > > Nils
> > > > 
> > > > On 11.07.19 14:56, Shubhangi Gupta wrote:
> > > > > Dear Jö and Nils,
> > > > > 
> > > > > Thanks a lot for your replies.
> > > > > 
> > > > > I actually tried putting the mpiguard within the time loop (at the
> > > > > highest level) just to see what happens... Indeed, the one step method
> > > > > now proceeds as it should, but the BiCGSTab freezes... So yeah, as Jö
> > > > > mentioned, the mpiguard needs to be introduced inside the
> > > > > ISTL-solver... I am not very sure how and where exactly though! Any
> > > > > ideas?
> > > > > 
> > > > > Thanks again, and warm wishes, Shubhangi
> > > > > 
> > > > > On 10.07.19 14:52, Jö Fahlke wrote:
> > > > > > Am Mi, 10. Jul 2019, 14:39:09 +0200 schrieb Nils-Arne Dreier:
> > > > > > > Hi Shubhangi,
> > > > > > > 
> > > > > > > I just talked to Jö. We guess that the problem is, that the
> > > > > > > exception is
> > > > > > > only thrown on one rank, say rank X. All other ranks do not know that
> > > > > > > rank X failed and proceed as usual, at some point all these ranks
> > > > > > > waiting for communication of rank X. That is the deadlock that you see.
> > > > > > > 
> > > > > > > You may want to have a look at Dune::MPIGuard in
> > > > > > > dune/common/parallel/mpiguard.hh. It makes it possible to propagate the
> > > > > > > error state to all ranks.
> > > > > > It should be mentioned that MPIGuard probably cannot be used at a
> > > > > > high level,
> > > > > > it would probably need to be introduced into the ISTL-Solver
> > > > > > (BiCGSTab, AMG,
> > > > > > SSOR) and/or PEDLab (the parallel scalar product, Newton) for this to
> > > > > > work.
> > > > > > Not sure where exactly.
> > > > > > 
> > > > > > Regards,
> > > > > > Jö.
> > > > > > 
> > > > > > > There is also a merge request for dune-common, which adapts the
> > > > > > > MPIGuard
> > > > > > > such that you don't need to check for an error state before
> > > > > > > communicating, making use of the ULFM proposal for MPI. You can find it
> > > > > > > here:
> > > > > > > https://gitlab.dune-project.org/core/dune-common/merge_requests/517
> > > > > > > 
> > > > > > > If you don't have a MPI implementation that provides a *working* ULFM
> > > > > > > implementation, you may want to use the blackchannel-ulfm lib:
> > > > > > > https://gitlab.dune-project.org/exadune/blackchannel-ulfm
> > > > > > > 
> > > > > > > I hope that helps.
> > > > > > > 
> > > > > > > Kind regards
> > > > > > > Nils
> > > > > > > 
> > > > > > > On 10.07.19 14:07, Shubhangi Gupta wrote:
> > > > > > > > Hi Jö,
> > > > > > > > 
> > > > > > > > So, since you asked about the number of ranks... I tried running the
> > > > > > > > simulations again on 2 processes and 1 process. I get the same problem
> > > > > > > > with 2, but not with 1.
> > > > > > > > 
> > > > > > > > On 10.07.19 13:33, Shubhangi Gupta wrote:
> > > > > > > > > Hi Jö,
> > > > > > > > > 
> > > > > > > > > Yes, I am running it MPI-parallel, on 4 ranks.
> > > > > > > > > 
> > > > > > > > > On 10.07.19 13:32, Jö Fahlke wrote:
> > > > > > > > > > Are you running this MPI-parallel?  If yes, how many ranks?
> > > > > > > > > > 
> > > > > > > > > > Regards, Jö.
> > > > > > > > > > 
> > > > > > > > > > Am Mi, 10. Jul 2019, 11:55:45 +0200 schrieb Shubhangi Gupta:
> > > > > > > > > > > Dear pdelab users,
> > > > > > > > > > > 
> > > > > > > > > > > I am currently experiencing a rather strange problem during
> > > > > > > > > > > parallel
> > > > > > > > > > > solution of my finite volume code. I have written a short outline
> > > > > > > > > > > of my code
> > > > > > > > > > > below for reference.
> > > > > > > > > > > 
> > > > > > > > > > > At some point during computation, if dune throws an error, the code
> > > > > > > > > > > catches
> > > > > > > > > > > this error, resets the solution vector to the old value, halves the
> > > > > > > > > > > time
> > > > > > > > > > > step size, and tries to redo the calculation (osm.apply()).
> > > > > > > > > > > 
> > > > > > > > > > > However, if I get the error "FMatrixError: matrix is singular", the
> > > > > > > > > > > solver
> > > > > > > > > > > seems to freeze. Even the initial defect is not shown! (See the
> > > > > > > > > > > terminal
> > > > > > > > > > > output below.) I am not sure why this is so, and I have not
> > > > > > > > > > > experienced this
> > > > > > > > > > > issue before.
> > > > > > > > > > > 
> > > > > > > > > > > I will be very thankful if someone can help me figure out a way
> > > > > > > > > > > around this
> > > > > > > > > > > problem.
> > > > > > > > > > > 
> > > > > > > > > > > Thanks, and warm wishes, Shubhangi
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > *// code layout*
> > > > > > > > > > > 
> > > > > > > > > > >         ...UG grid, generated using gmsh, GV, ...
> > > > > > > > > > > 
> > > > > > > > > > >         typedef
> > > > > > > > > > > Dune::PDELab::QkDGLocalFiniteElementMap<GV::Grid::ctype, double,
> > > > > > > > > > > 0, dim, Dune::PDELab::QkDGBasisPolynomial::lagrange> FEMP0;
> > > > > > > > > > >         FEMP0 femp0;
> > > > > > > > > > >         typedef
> > > > > > > > > > > Dune::PDELab::GridFunctionSpace<GV,FEMP0,Dune::PDELab::P0ParallelConstraints,Dune::PDELab::ISTL::VectorBackend<>>
> > > > > > > > > > > 
> > > > > > > > > > > GFS0;
> > > > > > > > > > >         GFS0 gfs0(gv,femp0);
> > > > > > > > > > >         typedef Dune::PDELab::PowerGridFunctionSpace<
> > > > > > > > > > > GFS0,num_of_vars,
> > > > > > > > > > > Dune::PDELab::ISTL::VectorBackend<Dune::PDELab::ISTL::Blocking::fixed>,
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > Dune::PDELab::EntityBlockedOrderingTag> GFS_TCH;
> > > > > > > > > > > 
> > > > > > > > > > >         ... LocalOperator LOP lop, TimeLocalOperator TOP top,
> > > > > > > > > > > GridOperator GO
> > > > > > > > > > > go, InstationaryGridOperator IGO igo, ...
> > > > > > > > > > > 
> > > > > > > > > > >         typedef Dune::PDELab::ISTLBackend_BCGS_AMG_SSOR<IGO> LS;
> > > > > > > > > > >         LS ls(gfs,50,1,false,true);
> > > > > > > > > > >         typedef Dune::PDELab::Newton< IGO, LS, U > PDESOLVER;
> > > > > > > > > > >         PDESOLVER pdesolver( igo, ls );
> > > > > > > > > > > Dune::PDELab::ImplicitEulerParameter<double> method;
> > > > > > > > > > > 
> > > > > > > > > > >         Dune::PDELab::OneStepMethod< double, IGO, PDESOLVER, U, U >
> > > > > > > > > > > osm( method,
> > > > > > > > > > > igo, pdesolver );
> > > > > > > > > > > 
> > > > > > > > > > >         //TIME-LOOP
> > > > > > > > > > >         while( time < t_END - 1e-8){
> > > > > > > > > > >                 try{
> > > > > > > > > > >                     //PDE-SOLVE
> > > > > > > > > > >                     osm.apply( time, dt, uold, unew );
> > > > > > > > > > >                     exceptionCaught = false;
> > > > > > > > > > >                 }catch ( Dune::Exception &e ) {
> > > > > > > > > > >                     //RESET
> > > > > > > > > > >                     exceptionCaught = true;
> > > > > > > > > > >                     std::cout << "Catched Error, Dune reported error:
> > > > > > > > > > > " << e <<
> > > > > > > > > > > std::endl;
> > > > > > > > > > >                     unew = uold;
> > > > > > > > > > >                     dt *= 0.5;
> > > > > > > > > > > osm.getPDESolver().discardMatrix();
> > > > > > > > > > >                     continue;
> > > > > > > > > > >                 }
> > > > > > > > > > >                 uold = unew;
> > > > > > > > > > >                 time += dt;
> > > > > > > > > > >         }
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > *// terminal output showing FMatrixError...*
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > >      time = 162.632 , time+dt = 164.603 , opTime = 180 , dt  :
> > > > > > > > > > > 1.97044
> > > > > > > > > > > 
> > > > > > > > > > >      READY FOR NEXT ITERATION.
> > > > > > > > > > > _____________________________________________________
> > > > > > > > > > >      current opcount = 2
> > > > > > > > > > > ****************************
> > > > > > > > > > > TCH HYDRATE:
> > > > > > > > > > > ****************************
> > > > > > > > > > > TIME STEP [implicit Euler]     89 time (from):   1.6263e+02 dt:
> > > > > > > > > > > 1.9704e+00
> > > > > > > > > > > time (to):   1.6460e+02
> > > > > > > > > > > STAGE 1 time (to):   1.6460e+02.
> > > > > > > > > > >       Initial defect:   2.1649e-01
> > > > > > > > > > > Using a direct coarse solver (SuperLU)
> > > > > > > > > > > Building hierarchy of 2 levels (inclusive coarse solver) took
> > > > > > > > > > > 0.2195
> > > > > > > > > > > seconds.
> > > > > > > > > > > === BiCGSTABSolver
> > > > > > > > > > >      12.5        6.599e-11
> > > > > > > > > > > === rate=0.1733, T=1.152, TIT=0.09217, IT=12.5
> > > > > > > > > > >       Newton iteration  1.  New defect:   3.4239e-02.  Reduction
> > > > > > > > > > > (this):
> > > > > > > > > > > 1.5816e-01.  Reduction (total):   1.5816e-01
> > > > > > > > > > > Using a direct coarse solver (SuperLU)
> > > > > > > > > > > Building hierarchy of 2 levels (inclusive coarse solver) took 0.195
> > > > > > > > > > > seconds.
> > > > > > > > > > > === BiCGSTABSolver
> > > > > > > > > > >        17        2.402e-11
> > > > > > > > > > > === rate=0.2894, T=1.655, TIT=0.09738, IT=17
> > > > > > > > > > >       Newton iteration  2.  New defect:   3.9906e+00.  Reduction
> > > > > > > > > > > (this):
> > > > > > > > > > > 1.1655e+02.  Reduction (total):   1.8434e+01
> > > > > > > > > > > Using a direct coarse solver (SuperLU)
> > > > > > > > > > > Building hierarchy of 2 levels (inclusive coarse solver) took
> > > > > > > > > > > 0.8697
> > > > > > > > > > > seconds.
> > > > > > > > > > > === BiCGSTABSolver
> > > > > > > > > > > Catched Error, Dune reported error: FMatrixError
> > > > > > > > > > > [luDecomposition:/home/sgupta/dune_2_6/source/dune/dune-common/dune/common/densematrix.hh:909]:
> > > > > > > > > > > 
> > > > > > > > > > > matrix is singular
> > > > > > > > > > > _____________________________________________________
> > > > > > > > > > >      current opcount = 2
> > > > > > > > > > > ****************************
> > > > > > > > > > > TCH HYDRATE:
> > > > > > > > > > > ****************************
> > > > > > > > > > > TIME STEP [implicit Euler]     89 time (from):   1.6263e+02 dt:
> > > > > > > > > > > 9.8522e-01
> > > > > > > > > > > time (to):   1.6362e+02
> > > > > > > > > > > STAGE 1 time (to):   1.6362e+02.
> > > > > > > > > > > 
> > > > > > > > > > > *... nothing happens here... the terminal appears to freeze...*
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > -- 
> > > > > > > > > > > Dr. Shubhangi Gupta
> > > > > > > > > > > Marine Geosystems
> > > > > > > > > > > GEOMAR Helmholtz Center for Ocean Research
> > > > > > > > > > > Wischhofstraße 1-3,
> > > > > > > > > > > D-24148 Kiel
> > > > > > > > > > > 
> > > > > > > > > > > Room: 12-206
> > > > > > > > > > > Phone: +49 431 600-1402
> > > > > > > > > > > Email:sgupta at geomar.de
> > > > > > > > > > > 
> > > > > > > > > > > _______________________________________________
> > > > > > > > > > > dune-pdelab mailing list
> > > > > > > > > > > dune-pdelab at lists.dune-project.org
> > > > > > > > > > > https://lists.dune-project.org/mailman/listinfo/dune-pdelab
> > > > > > > _______________________________________________
> > > > > > > dune-pdelab mailing list
> > > > > > > dune-pdelab at lists.dune-project.org
> > > > > > > https://lists.dune-project.org/mailman/listinfo/dune-pdelab
> > > -- 
> > > Dr. Shubhangi Gupta
> > > Marine Geosystems
> > > GEOMAR Helmholtz Center for Ocean Research
> > > Wischhofstraße 1-3,
> > > D-24148 Kiel
> > > 
> > > Room: 12-206
> > > Phone: +49 431 600-1402
> > > Email: sgupta at geomar.de
> > > 
> > > 
> > > _______________________________________________
> > > dune-pdelab mailing list
> > > dune-pdelab at lists.dune-project.org
> > > https://lists.dune-project.org/mailman/listinfo/dune-pdelab
> 
> -- 
> Dr. Shubhangi Gupta
> Marine Geosystems
> GEOMAR Helmholtz Center for Ocean Research
> Wischhofstraße 1-3,
> D-24148 Kiel
> 
> Room: 12-206
> Phone: +49 431 600-1402
> Email: sgupta at geomar.de
> 

> _______________________________________________
> dune-pdelab mailing list
> dune-pdelab at lists.dune-project.org
> https://lists.dune-project.org/mailman/listinfo/dune-pdelab


-- 
Jorrit (Jö) Fahlke, Institute for Computational und Applied Mathematics,
University of Münster, Orleans-Ring 10, D-48149 Münster
Tel: +49 251 83 35146 Fax: +49 251 83 32729

If God had intended Man to Smoke, He would have set him on Fire.
-- fortune
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <https://lists.dune-project.org/pipermail/dune-pdelab/attachments/20190724/9cb2b5a3/attachment.sig>


More information about the dune-pdelab mailing list