[dune-pdelab] Fwd: Fwd: solver fails to reset correctly after FMatrixError (singular matrix)
Shubhangi Gupta
sgupta at geomar.de
Thu Jul 25 17:28:33 CEST 2019
Hi Markus,
Thanks a lot for your advice.
I corrected the implementation of the mpiguard as per your suggestion
(both, in the main time loop, and in the ovlpistlsolverbackend). Two
notable things I observe:
1. The mpiguard **seems** to work on my local machine... as in, I have
run my simulations for a number of parameter sets, and my linear solver
hasn't frozen *yet*. But, the mpiguard doesn't work on the copy of the
code on our university server!
2. It seems that the mpiguard is making the code slower ... can this be?
Also, yes, I agree that my linear system could be ill-condition (or
weird, as you put it). I have a complicated setting with rather extreme
properties taken from the Black Sea cores.. But, I think the
linear/nonlinear solvers shouldn't fail partially, and communication
failure between processes is certainly not a good sign for the solvers
in general... or? I would expect the solver to simply not converge
overall if the linear system is incorrect... not freeze halfway and stop
communicating.
Thanks once again! I really appreciate your help.
best wishes, Shubhangi
On 24.07.19 11:25, Markus Blatt wrote:
> Please always reply to the list. Free consulting is only available there.
>
> the solution to your problems is at the bottom. Please also read the rest
> as you seem to use MPGuard the wrong way
>
> On Wed, Jul 24, 2019 at 09:42:01AM +0200, Shubhangi Gupta wrote:
>> Hi Markus,
>>
>> Thanks a lot for your reply! I am answering your questions below...
>>
>> 1. Does at the highest level mean outside the try clause? That might be wrong as it will throw if something went wrong. It needs to be inside the try clause.
>>
>> By highest level, I meant **inside** the try clause.
> I really have no experience with MPIGuard. Maybe someone else can tell us where
> it throws.
>
> but I think you are using it wrong.
>> Dune::MPIGuard guard;
>>
> This would be outside the try clause. But that might be right as MPIGuard
> throws during finalize.
>
>> bool exceptionCaught = false;
>>
>> while( time < t_END ){
>>
>> try{
>>
> Personally I would have initialize the MPIGuard here, but maybe reactivating
> but it seems like your approach is valid too as you reactivate.
>
>> // reactivate the guard for the next critical operation
>> guard.reactivate();
>>
>> osm.apply( time, dt, uold, unew );
>>
>> exceptionCaught = false;
>>
> Here you definitely need to tell it that you passed the critial section:
> guard.finalize();
>
>> }catch ( Dune::Exception &e ) {
>> exceptionCaught = true;
>>
>> // tell the guard that you successfully passed a critical
>> operation
>> guard.finalize();
> This is too late! You have already experienced any exception there might be.
>
>> unew = uold;
>>
>> dt *= 0.5;
>>
>> osm_tch.getPDESolver().discardMatrix();
>>
>> continue;
>> }
>>
>> uold = unew;
>> time += dt;
>> }
>>
>> 2. freezes means deadlock (stopping at an iteration and never finishing)? That will happen in your code if the MPIGuard is before the try clause.
>>
>> Yes, freezes means stopping at the iteration and never finishing it.
>>
>> So first, this was happening right after FMatrixError (singular matrix).
>> The osm froze without initiating Newton solver... After I put the MPIGuard,
>> this problem was solved... Newton solver restarts as it should... But now
>> the freezing happens with the linear solver (BiCGStab, in this case). Nils
>> said to solve this I will have to put the MPIGuard also on lower levels
>> (inside newton and linear solver...). I, on the other hand, prefer to not
>> touch the dune core code and risk introducing more errors along the way...
>>
> That is because in your case different processor will work with different
> timesteps and that cannot work as the linear system is utterly wrong.
>
>> 3. ....have you tried the poor-man's solution, below? ...
>>
>> Yes, I tried that, but the problem is if the apply step doesn't finish, then
>> nothing really happens...
>>
> Finally I understand. Your are using Dune::PDELab::ISTLBackend_BCGS_AMG_SSOR<IGO>.
> You must have a very weired linear system as this bug can only appear when
> inverting the diagonal block in the application of one step SSOR. Personally
> I would say that your linear system is incorrect/not very sane.
>
> The bug is in PDELab that does not expect an exception
> during the application of the preconditioner. It has to be fixed there in
> file ovlpistlsolverbackend.hh OverlappingPreconditioner::apply
>
> MPIGuard guard;
> prec.apply(Backend::native(v),Backend::native(dd));
> guard.finalize(true);
>
> and probably many more. In addition this construct is also need in the
> constructor of AMG as it can happen if ILU is used as the smoother-
>
> Please make your patch available afterwards.
>
> HTH
>
> Markus
>
--
Dr. Shubhangi Gupta
Marine Geosystems
GEOMAR Helmholtz Center for Ocean Research
Wischhofstraße 1-3,
D-24148 Kiel
Room: 12-206
Phone: +49 431 600-1402
Email: sgupta at geomar.de
More information about the dune-pdelab
mailing list