[Dune-devel] SOCIS 2014: FEM assembly with threads

Thu Jun 26 21:27:50 CEST 2014

Am Thu, 26. Jun 2014, 14:41:23 +0000 schrieb Agnese, Marco:
> Hi DUNErs,
> as suggested by Markus I have written a very simple code which uses threads to assemble the stiffness matrix and the RHS of a FEM scheme for a 1D Poisson problem.
> 
> You can find the code here: https://github.com/magnese/dune-common/blob/threads/dune/common/test/threadstest.cc
> 
> The code constructs a vector of grid points. The mesh is "partitioned" among threads. Each thread assembles a local stiffness matrix and than copy the values in the global stiffness matrix A. The only race condition which could happen is for the grid points which are the boundaries of the partitions. Therefore if we are using N threads the number "critical" points is N-1. To avoid a race condition I have created a mutex for each "critical" point (which is marked as shared). If an entry of type shared need to be inserted, assemble() takes the corresponding lock. This choice works in this case because the number of critical point is very small compared to the size of the vector. In general this approach is very inefficient. 
> 
> Now I am writing a more general class to obtain something similar of what we have for ParallelIndexSet. 
> Let's suppose that we have a vector v and we want to call several threads which will read and write on this vector. The class should work like this:
> 1) the user specifies all the positions for each thread which can be accessed by the thread
> 2) the class will find the which part of the vector are critical and will create several mutex to address the problem (computed only once)
> 3) the user give to the class the function to be called by the threads. This function can only read from the global vector and return a buffer of values which need to be inserted in the global vector
> 4) when the buffers are created, the values are inserted in the vectors directly if they aren't in a critical position, otherwise using the mutexs.
> Using the class, the user avoids to directly deal with threads and mutex. 
> 
> Obviously the difficult part it to implement an efficient and general lock mechanism. As soon as I have some code I will let you know. 
> 
> Therefore my objective right now is to implement this class and test it with the assemble(). What do you think? I would appreciate very much some feedbacks to understand if I am moving in the right direction or not. 

This is pretty similar to something I did for EXADUNE, see attachment.  Ignore
the global-lock implementation at the top, there is an implementation with one
mutex per element further down.

The difference is: I'm doing this for dune-grid and you're implementing your
own grid ;) But it should not matter greatly if your mesh entity is some
complicated class or just some integer.  And even whether you're restricting
yourself to vertices or not.

Since you asked I'm going to dump my thoughts on this matter here:

- A lock manager should not force locking on entities that do not require
  locking.  My lock manager concept currently fails there, because it must
  always return some kind of mutex when asked.

- It should be possible to lock a single entity (for residual assembly).  This
  is not a big deal, both our concepts allow that.

- It should be possible to lock a pair of entities.  This can be necessary for
  matrix assembly.

  + It isn't actually necessary in your case (yet): your locking the entity
    corresponding to row elem+i of your matrix, then you iterate over the
    columns of the matrix, locking each one in turn.  Locking the columns is
    not necessary, because no other thread will access the same row, either
    because you locked it, or because it isn't shared.

    If you know you matrix is going to be symmetric and you want to compute
    the update to A[i][j] and A[j][i] only once at the same time, then the
    matter is different.

  + Your implementation depends on the mutex being recursive (which is
    guaranteed for std::mutex, see the table at[1]).  It would be
    nice if the lock manager could do without such a requirement, although one
    might wonder whether it is worth to work around that limitation.

  + My implementation compares the addresses of both mutexes to avoid trying
    to double-lock the same mutex (which is quite frankly a hack).  The lock
    manager should not require such hacks.

- Locking three entities at once and beyond: I cannot think of a case where
  this is necessary, so I would ignore this case.

- The lock manager should allow alternate implementations to let you try out
  different things.

  + For instance, you may want to try out a global lock for all entities that
    need locking, just to see how bad it really is.

  + Or you may want to unconditionally have one lock per entity.

  + Or you may want to use a lock manager that does not actually lock
    anything.  Your results will be wrong, but you can time your program and
    get an idea of the overhead from locking.

  + Beyond 1D, you have many entities shared between multiple partitions, and
    storage may become an issue.  You may want to use the same lock for
    multiple entities (e.g. all entities shared between the same two
    partitions).  This of course comes at the cost of locking too much, but it
    may still be worth it: it is something that happens on the surface of the
    partitions, which should be small compared to the interior, so threads
    competing for the same lock should not happen too often.

    Again this can pose a problem when locking two entities for matrix
    assembly: the entities may share the same lock, even though they are
    different entities.  This needs either detection of this condition or
    recursive locks.

- For ease of programming it would be nice to be able to use guard objects for
  locking, which provide for automatic unlocking at end of scope and exception
  safety.  Examples are std::lock_guard and std::unique_lock.

Here is one approache how to achieve this under a general interface:

Write a wrapper mutex that has a list of zero, one, or two (or even more)
underlying mutexes.  When the wrapper is locked, it locks all the mutexes in
the list using std::lock().  When it is unlocked, it unlocks all of them.
This wrapper can be used as the mutex in std::lock_guard.

  template<class BaseMutex>
  class MutexList {
    std::size_t nMutexes;
    typdef BaseMutex *BaseMutexP;
    BaseMutexP mutexList[2];
  public:

    MutexList();
    MutexList(BaseMutex*);
    MutexList(BaseMutex*, BaseMutex*);

    // usual mutex interface
  };

The lock manager can then be queried for the mutexes of one or two entities,
and it will return a wrapper with the list of the requested mutexes.  If the
underlying mutexes are not recursive, the lock manager must ensure that the
list does not contain the same mutex twice.

  template<class BaseMutex>
  struct FooLockManager {
    typedef MutexList<BaseMutex> Mutex;

    template<class Entity>
    Mutex get(const Entity &);
    template<class Entity>
    Mutex get(const Entity &, const Entity &);
  };

Of course, the length of the list in the wrapper must be dynamic at runtime.
This can be a challenge since std::lock() requires each mutex to be locked as
a seperate function argument.  However, we can limit ourselves to lists of at
most two underlying mutexes, so this should not pose a real problem.

Regards,
Jö.

[1] https://software.intel.com/sites/products/documentation/doclib/tbb_sa/help/index.htm#reference/reference.htm

-- 
Jorrit (Jö) Fahlke, Institute for Computational und Applied Mathematics,
University of Münster, Orleans-Ring 10, D-48149 Münster
Tel: +49 251 83 35146 Fax: +49 251 83 32729

featured product: Debian GNU/Linux - http://www.debian.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lockmanager.hh
Type: text/x-c++hdr
Size: 2321 bytes
Desc: not available
URL: <https://lists.dune-project.org/pipermail/dune-devel/attachments/20140626/560f28a8/attachment.hh>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 828 bytes
Desc: Digital signature
URL: <https://lists.dune-project.org/pipermail/dune-devel/attachments/20140626/560f28a8/attachment.sig>