[Dune] Problem in implementing parallel linear solver

Thu Dec 7 14:49:13 CET 2017

Hi Mr. Markus,

Thanks for the reply.

On 2017-12-06 12:27, Markus Blatt wrote:
> Hi,
> 
> On Tue, Dec 05, 2017 at 07:51:34PM +0100, Kumar, Paras wrote:
>> I'm trying to implement a parallel iterative solver for a LSE based on 
>> a FE
>> discretization for a time depenpendent problem.
>> It was observed that the number of iterations taken by the iterative 
>> solver
>> to converge varies with the number of processors. This also affects 
>> the
>> accuracy of the computed solution.
>> 
>> 
>> For the purpose of simplification, I developed a MWE where I solve the 
>> 1D
>> laplacian problem by FE discretization. The code for this example 
>> along with
>> the output for 4 processes is attached for reference.
>> The matrix A and the rhs vector b are first computed by the master 
>> process
>> and then distributed to all others so that the solution can be 
>> computed in
>> parallel.
>> Pt-scotch has been used for the data distribution.
>> Currently, the following preconditioner-solver combination is used :
>> RestartedGMRES + SSOR
>> Once, the parallel solver has ended, the solution is computed in 
>> serial at
>> the master processes in order to compare the two.
>> 
>> Here, are a few observations which might give some clues about the 
>> error I
>> make, but I'm not sure.
>> 1. As can be seen in the output file, A is tridiagonal where as the
>> distributed are not always symmetric.
> 
> Please check where the unsymmetry happens. This should only happen
> rows whose index is not marked as owner.
> 
> using OwnerSet = DuneCommunication::OwnerSet;
> using GlobalLookup = DuneCommunication::GlobalLookupIndexSet;
> GlobalLookup global(DuneComm.indexSet());
> auto pair = global.pair(index);
> if(!pair || OwnerSet::contains(pair.local().attribute())
>   cerr<<"Houston we have a problem at index "<<index<<" which is
>   owned.
> 
Could you please explain how do we get the variable index. It seems to 
be the local index (probably wrt *comm_redist in my example code). Also, 
do I need to do this check for all dofs in an iterative manner.

>> 2. The defect for the solver is different for the parallel and the 
>> serial
>> cases even before the solver starts.
> 
> Well in the parallel case you seem to only compute the local defect on
> the master process. You should not use the Matrix::mv (or any other
> Matrix method directly here, but the functions of the linear
> operator (applyscaleadd). See how it is is done in e.g. the CGSolver
> dune/istl/solvers.hh or any other solver.
> 
I computed the defect for the whole system using the complete matrix 
available at the master process. I tried using the applyscaleadd 
function but it gives me the local residual for each process. Then, I 
employed redistribute_backwards() to get the global residual vector but 
the 2-norm seems to be same as the manual calculation using Matrix::mv 
and thus different from the first entry of the defect column as computed 
by the GMRES solver.

>> 3. The defect differs from the residual(2-norm) [||A-bx||] computed 
>> manually
>> for both the serial and the parallel cases.
>> 
> 
> We can only assume that the first defect is the same as the
> convergence will differ in parallel. In your case it seems like the
> parallel_solve is not using v at all. Maybe there is a redistribution
> of that vector missing?
> 
So should we expect the first defect to be different from the 2-norm of 
residual_initial_global in both the cases? Does the defect calculation 
differe from the residual (b-Ax) calculation ?
As can be seen from the output, v is initially 0 and thus it seemed 
there was no need to redistribute it to all the processes. The parallel 
computed solution parts are later redistributed backwards into v.

> BTW it also seems like sol_s, and sol_p are not initialized to any
> values. Resizing will not touch the content of the memory. At least in
> an optimized build the values will be determined by whatever was
> stored in that memory chunck before.
> 
As visiable from the output, both sol_s and sol_p are 0.0 before the 
start of the parallel solution process.

> 
> HTH
> 
> Markus

With best regards,
Paras Kumar