[Dune-devel] New assembly mode for BCRSMatrix

Thu Aug 1 16:01:30 CEST 2013

On Thu, Aug 01, 2013 at 02:37:52PM +0200, Steffen Müthing wrote:
> Hi Markus,
> 
> Am 01.08.2013 um 13:02 schrieb Markus Blatt:
> 
> > On Wed, Jul 24, 2013 at 11:55:02AM +0200, Steffen Müthing wrote:
> >> Hi everybody,
> >> 
> > 
> > What I liked about their approach was:
> > 1. It does not clutter the actual matrix interface. Instead the
> > creation is clearly separated from the rest.
> 
> That might be true - on the other hand, it requires *significantly* more code. Did you look at their implementation?
> The inserter for their CRS matrix is about 500 lines, and a lot of that just has to forward stuff like number of rows / columns
> etc.

No I did not look at the code and would not do it. If they use 500
line than either there is a better way to do it, or they are doing
manual code optimization. Anyway I do not see why splitting
functionality into an inserter class using friend declarations should
make so much more code. (But that is theoretical of course).

> 
> > 2. Code might be reused for other matrices to. Even using an inserter
> > for a dense matrix to facilitate generic programing is
> > possible. Further more resusing code to set up PETSc matrices seems
> > like a viable further extension.
> 
> Well, there isn't really any inserter code reuse in MTL - that thing
> is specialized for every one of their matrix implementations.

Does that already mean that it is not possible at all? PETSc might
have been a bad example, I might even be wrong. But with the current
way, there is definitely no way for code reuse, may it be possible or not.

> [Snipped PETSc details] 
> 
> Even if we go for an inserter, I really don't like their syntax with the streaming operator<<, where you have to decide beforehand
> what it will do (add, set, subtract). What if you need to do something else with it? That's a very weird interface IMHO - just give me
> a reference and be done with it!
> 

I was not talking about copying the way inserters work in MTL in every
detail. Call it wathever you like operator(size_t, size_t might seem
handy if operator[]operator[] is not handable.
> > 
> >> Example: You have a matrix with 1 000 rows and 25 entries on average per row (25 000 entries in total). Specifying
> >> an overflow of .05 gives you a total capacity for 25 000 + 1 250 = 26 250 nonzero entries. If you try to add more entries
> >> than that, you get an exception.
> >> 
> > 
> > Actually, I think specifying an overflow should be
> > unnecessary. Can't the user incorporate the overflow into the average
> > number of nonzeros e.g. use 10.1 instead of 10? Therefore I would
> > prefer skipping it. As floating point numbers for the number of
> > nonzeros per row seems natural, we could also simply ask the user for
> > provinding an upper bound of the total number of nonzeros exspected.
> 
> The reason for this split approach is that Peter and Dominic have come up with a more memory-efficient way of
> doing the compression step: The new code allocates arrays for the column indices and values of size (N*row_avg*(1+overflow)).
> The overflow area (which is not used initially because overflow entries end up in a map during insertion) is now placed
> *at the beginning* of those arrays, i.e. the entries for row 0 start at index int(N*row_avg*overflow). Then, during compression, those
> entries are moved to the beginning of the arrays (and additional entries from the map are inserted, slowly eating up the margin
> provided by the overflow area). If you have an unstructured grid, the number of entries per row can vary quite a lot, so you want to
> be able to have a large overflow area, but that's not possible if you only specify a single number.

OK. It is needed then.

> The one major disadvantage of this approach is that you permanently waste some memory (if your row average or your overflow
> fraction were too large), but you avoid having to resize the arrays, which temporarily doubles your memory requirements. This short
> spike in memory usage is really the central problem for PDELab because it unnecessarily limits the matrix size you can fit on a node.
> 
> During testing, we found that you can dump quite a few entries into the map without a major performance impact, so a bit of
> twiddling with the two parameters easily gets you up to a used memory fraction of .95+. So the tradeoff is: You have to tweak a bit,
> but that allows you to really get you matrix as big as possible.
> 
> Another interesting observation was that we could speed up the code by *not* sorting the per-row entries during insertion, but only
> at the compression stage.

Shouldn't this depend on the number of nonzeros per row? Would be
interested at which number this picture changes.
> 
> > 
> > I am unsure whether there should be an alternative to throwing an
> > exception. Think of a productive parallel simulatio and you experience
> > such an exception  after a few days of runtime.
> 
> Well, we could implement that (you'd have to resize the column index and the value array). That said, it should normally not be
> difficult to determine the pattern size in a small model simulation before going big...
> 

Weren't you talking about adaptive unstructured grids. It might be
hard doing this for all corner cases with a model (and I doubt that
users will actually do this). If this is to be the default mode for
PDELab, then it should probably be safe by default.

> > 
> >> Those parameters are currently set via setMymodeParameters(), but
> >> I'm not sure whether that's the best way of doing things.
> > 
> > One more argument for stripping the build mode off into an inserter class.
> > 
> >> After setting the parameters, you set the size of the matrix and can start assembling (there is also a new constructor that
> >> takes the additional parameters for direct matrix setup in a single step).
> >> 
> >> In order to avoid slowing down operator[] (and because the CompressedBlockVectorWindows aren't functional at this stage),
> >> you have to access the matrix entries using entry(row,column) during construction. As long as there are less entries in the
> >> current row than the specified average, new entries will be placed directly in the matrix; otherwise they end up in a std::map
> >> for now.
> >> 
> > 
> > IMHO this another strong argument for using an inserter class. Having
> > different access methods for different stages seems rather confusing
> > to me. Addtionally, it will not be possible to use  existing
> > algorithms based on our matrix access scheme.
> 
> I don't quite understand what you mean by existing algorithms. Are you talking about algorithms that operate on the matrix, like a scalar
> product? 

E.g. the old discretization code. It would be handy if I could just
skip the setup of the sparsity pattern and reuse the code that sets up
the values by simply passing it the Inserter instead of the matrix.

> That doesn't work anyway during matrix setup. Apart from that, I really don't like the idea of having to fiddle around with this
> additional inserter object. Frameworks like PDELab have to support repeated assembly into the same matrix. With the current implementation,
> I just have to check the current build stage and call either [][] or entry(). With an inserter, I would have to dynamically allocate and deallocate that
> thing (to call its destructor) and lug it around… I don't like that. 

I do not understand this argument. Why can't you simply do something
like this:

template<class M>
void reassemble(M& matrix)
     typedef typename InserterType<M>::type Inserter;
     Inserter ins(matrix);
     // assemble with whatever the method is called e.g. ins(i,j)=val;
     // or even ins[i][j]=val which would be most convenient
}
> 
> > 
> >> When you are done with the assembly, you have to finish up the construction process by calling compress(). That call frees the
> >> map used as intermediate storage and optimizes data storage, but it does not reclaim any excess memory requested via the row
> >> average and overflow parameters. After the call, you cannot add additional entries anymore and the matrix works just as if it had
> >> been constructed the traditional way. compress() also returns a struct with some statistics about your matrix: The average and
> >> maximum number of entries per row, the number of entries that had to be cached in the map and the fraction of the allocated
> >> memory that is actually in use.
> > 
> > Again this seems clear with an inserter class. There this would simply
> > appen in the destructor. Still, I have to admit that the user needs to
> > make sure that the destructor is called by using correct scoping.
> 
> As I said, in any kind of framework, this involves allocating and deallocating the inserter on the heap. I don't see how that is simpler than
> calling a compress() method...

I do not say that it is simpler per se. But not having methods in the
matrix that only work for a special mode, seems much clearer for
users. How should I judge by the method names which ones are meant for
insertion and which ones are not. My fear is simply that we will either
get a lot of support calls, why entry does not work at this point of a
users algorithms and that people do not see any more how a matrix can
be setup.

> 
> > 
> >> 
> >> During construction, you cannot copy matrices. If a matrix is copied after construction, the copy constructor only allocates as much
> >> memory as is actually needed by the matrix, throwing away unused overflow space.
> >> 
> >> The one problem with this approach is the fact that matrix entries are created on the fly, and if they are of type FieldMatrix with a
> >> primitive field type, you run into FS1322.
> >> 
> >> We did some benchmarking with this new scheme, and it really improves memory usage and (initial) assembly time quite a bit: In
> >> PDELab (wich right now uses a std::vector<std::unordered_set<std::size_t> > to build up the pattern before creating the matrix),
> >> memory usage goes down by a factor of at least 2 (and a lot more if you have a large pattern), and initial assembly is sped up as well
> >> (depending on pattern size, the improvements range from just a few percent up to a reduction by a factor of 2 or 3). Interestingly, the
> >> mode is quite tolerant regarding the overflow area: I've tried with large matrices, and even if you end up with a million entries in the
> >> map, performance only goes down by something like 10-20%. If you are really concerned about matrix size, you can use this to
> >> compensate for the smaller pattern at the domain boundary: Just specify a smaller row average and add enough overflow. ;-)
> >> 
> > 
> > Sounds promissing.
> > 
> >> You can find Dominic's code on GitHub (https://github.com/dokempf/dune-istl/tree/feature/new-fast-build-mode), and if you have
> >> PDELab, you can play with the new build mode on the branch
> >> p/muething/new-bcrs-matrix. Check out
> >> dune/pdelab/test/testpoisson.cc
> >> as an example. Just note that you might need the fix from FS1322. The code in ISTL isn't really cleaned up yet, but I thought it might
> >> be a good time to get some initial feedback.
> >> 
> > 
> > I am not sure when I will find the time to test this.
> > 
> >> Any comments / ideas / problems / naming suggestions?
> >> 
> > 
> > Have you thought about exting a sparsity pattern of an already setup
> > matrix. For the AMG for non-overlapping grids Rebecca might take
> > advantage of this.
> 
> I don't quite understand that sentence, looks like your
> autocorrection messed it up. 

Not having any autocorrection probably messed it up ;)

> But if you are talking about multiple BCRSMatrices
> sharing a pattern, that has been implemented since 2010 by Carsten: If you copy a matrix that has been correctly set up, both matrices
> will share the pattern through a shared_ptr.

There are cases where the sparsity pattern matrix cannot be setup for
the grid information alone. Therefore after setting up the sparsity
pattern from the local grid part, parts of this information need to
be communicated and based on the global information each process adds
additional matrix entries to the pattern.
Having such a nice overflow region this could  probably
also be handeled by the the new mode?

I am not sure how this is handled currently. Maybe Peter can enlighten
you as Rebecca is not available.

Markus