[dune-pdelab] Fwd: Fwd: solver fails to reset correctly after FMatrixError (singular matrix)

Dmitry Mazilkin dmm16 at tu-clausthal.de
Wed Jul 10 15:26:51 CEST 2019


Hello all Dune developers,

According to the description

 > I just talked to Jö. We guess that the problem is, that the exception
 > is only thrown on one rank, say rank X. All other ranks do not know
 > that rank X failed and proceed as usual, at some point all these ranks
 > waiting for communication of rank X. That is the deadlock that you see

we've got very similar behavior, which is described here 
https://gitlab.dune-project.org/pdelab/dune-pdelab/issues/130

we got the bug using:
  ISTLBackend_OVLP_GMRES_ILU0
  Alexander3
  Newton

Best regards,
Dmitry


On 10.07.19 15:21, Markus Blatt wrote:
> On Wed, Jul 10, 2019 at 02:39:09PM +0200, Nils-Arne Dreier wrote:
>> I just talked to Jö. We guess that the problem is, that the exception is
>> only thrown on one rank, say rank X. All other ranks do not know that
>> rank X failed and proceed as usual, at some point all these ranks
>> waiting for communication of rank X. That is the deadlock that you see.
>>
>> You may want to have a look at Dune::MPIGuard in
>> dune/common/parallel/mpiguard.hh. It makes it possible to propagate the
>> error state to all ranks.
>>
> 
> One could also argue that if this happens in OneStepMethod of PDELab then
> PDELab (in the long run) should make sure that the behaviour is consistent
> across all processors...
> 
> Just my 2 cents.
> 
> Markus
> 

-- 
Dmitry Mazilkin
Institut für Mathematik, Raum 314
Erzstraße 1, 38678 Clausthal-Zellerfeld




More information about the dune-pdelab mailing list