[dune-pdelab] Parallel (block?) preconditioners

Wed Nov 2 17:00:52 CET 2011

Consider the Gauss-Seidel prec.

mesh:
   x--x--x--x
decomposition with overlap:
 P1 x--x--o
 P2    o--x--x

on one process the result of the last node will depend on current
iterative of the other three nodes, in parallel, it will depend
on the old iterative of the first two nodes and the current iterative
of the second last node.

WIth ILU it is in principal the same, but more complicated, because
you don't know apriori, what ILU will do. In 1D the ILU is a bit
simpler, because it does a full LU decomposition, but in 2D, it
depends on the values of the actual matrix.

Christian

On Wed, Nov 02, 2011 at 01:56:51PM +0000, Eike Mueller wrote:
> Sorry, I just realised I made a mistake, with 8 unknowns it should
> of course be 8x8 instead of 10x10.
> 
> Eike Mueller wrote:
> >Hi Christian,
> >
> >many thanks for your reply. I guess in the overlapping case ILU0
> >will consider the couplings A_{ij} if i is local to the processor
> >and j is within the overlap region (in my case I use an overlap of
> >1).
> >
> >So if I have, for example, a 1d problem with 8 unknowns
> >represented by a graph like this:
> >
> >o--o--o--o--o--o--o--o, it will decompose this into
> >
> >o--o--o--o--x (processor 1) and x--o--o--o--o (processor 2)
> >
> >(o denotes unknowns, x denotes overlap elements and -- couplings)
> >
> >the serial matrix will be 10x10, and will be ILU0 decomposed and
> >inverted (whenever the preconditioner is applied), but in the
> >parallel case, each processor will store a 5x5 matrix, which will
> >be ILU0 decomposed and inverted locally (whenever the
> >preconditioner is applied), the ISTL consistency model enforces
> >that the vector the preconditioner is applied to is in a unique
> >representation before application of the preconditioner and the
> >result is in a consistent representation (as defined in P.
> >Bastian, M. Blatt. On the Generic Parallelisation of Iterative
> >Solvers for the Finite Element Method In Int. J. Computational
> >Science and Engineering,4(1):56-69, 2008).
> >
> >I can imagine that inverting the two 5x5 ILU0 matrices and
> >converting to a consistent representation will give a different
> >result than inverting the 10x10 ILU0 matrix on one processor (ok,
> >in the 1d case it might not give a different result, but in the
> >2/3d equivalent it probably will), although it is not completely
> >obvious to me at the moment. I can see what happens for 5 sweeps
> >of SSOR in overlapping domains as above, if the halos are not
> >swapped between the sweeps, then this will of course be different
> >than doing 5 sweeps on the  entire domain.
> >
> >Apologies for thinking aloud...
> >
> >Eike
> >
> >Christian Engwer wrote:
> >>Hi Eike,
> >>
> >>in dune-istl the parallel run is usually different from the seq. run.
> >>Especially for ILU. ISTL doesn't use global preconditioners, but uses
> >>a schwarz style parallelization.
> >>
> >>Christian
> >>
> >>On Wed, Nov 02, 2011 at 12:52:06PM +0000, Eike Mueller wrote:
> >>>Dear dune-pdelab list,
> >>>
> >>>I have modified the overlapping ISTL backend for CG with a SSOR
> >>>preconditioner to use an ILU0 preconditioner instead, i.e. I wrote a
> >>>new
> >>>
> >>>class ISTLBackend_OVLP_CG_ILU0 :
> >>>  public Dune::PDELab::OVLPScalarProductImplementation<GFS>,
> >>>  public Dune::PDELab::LinearResultStorage
> >>>
> >>>[...]
> >>>
> >>>The 'apply' method, where the preconditioner is initialised, looks
> >>>like this:
> >>>
> >>>     void apply(M& A, V& z, W& r, typename V::ElementType reduction)
> >>>     {
> >>>       typedef Dune::PDELab::OverlappingOperator<C,M,V,W> POP;
> >>>       POP pop(c,A);
> >>>       typedef Dune::PDELab::OVLPScalarProduct<GFS,V> PSP;
> >>>       PSP psp(*this);
> >>>       typedef Dune::SeqILU0<M,V,W,1> SeqPrec;
> >>>       SeqPrec seqprec(A,steps);
> >>>       typedef
> >>>Dune::PDELab::OverlappingWrappedPreconditioner<C,GFS,SeqPrec> WPREC;
> >>>       WPREC wprec(gfs,seqprec,c,this->parallelHelper());
> >>>       int verb=0;
> >>>       if (gfs.gridview().comm().rank()==0) verb=verbose;
> >>>         Dune::CGSolver<V> solver(pop,psp,wprec,reduction,maxiter,verb);
> >>>         Dune::InverseOperatorResult stat;
> >>>         solver.apply(z,r,stat);
> >>>         res.converged  = stat.converged;
> >>>         res.iterations = stat.iterations;
> >>>         res.elapsed    = stat.elapsed;
> >>>         res.reduction  = stat.reduction;
> >>>         res.conv_rate  = stat.conv_rate;
> >>>       }
> >>>
> >>>Now, will this create a block preconditioner (i.e. when decomposing
> >>>the matrix A into L and U, will it treat entries A_{ij} for which i
> >>>and j are owned by different processors as zero) or will this
> >>>construct the 'full' incomplete LU decomposition? In other words,
> >>>will the fill structure of the matrix in a one process run be
> >>>identical to the preconditioner in a 4 process run (up to rounding
> >>>errors etc)?
> >>>I guess the same question can be asked for the SSOR preconditioner
> >>>(ISTLBackend_OVLP_CG_SSOR), i.e. if I do 5 SSOR smoothing steps,
> >>>will each processor relax its local domain 5 times?
> >>>
> >>>Thank you very much,
> >>>
> >>>Eike
> >>>
> >>>_______________________________________________
> >>>dune-pdelab mailing list
> >>>dune-pdelab at dune-project.org
> >>>http://lists.dune-project.org/mailman/listinfo/dune-pdelab
> >>>
> >>
> >
> >
> >_______________________________________________
> >dune-pdelab mailing list
> >dune-pdelab at dune-project.org
> >http://lists.dune-project.org/mailman/listinfo/dune-pdelab
> >
> 
>