[dune-pdelab] reg: OpenMP parallelization of assembly

Fri Jun 26 12:30:51 CEST 2020

Dear Linus,

Thanks a lot for your reply...

I am already using parallelization via MPI, and that ofcourse helps in 
faster computation, but the communication starts to get expensive really 
fast.

As I said earlier, matrix assembly is the main bottleneck (very large 
system of PDEs, highly nonlinear...), and I was wondering whether there 
was an easier way of just parallelizing the assembly but continue using 
one of the direct solvers (superlu) instead of the available parallel 
solvers (I am currently using amg).

Best wishes, Shubhangi

On 26.06.20 11:54, Linus Seelinger wrote:
> Hi Shubhangi,
>
> just to make sure you are not heading into the wrong direction, are you sure
> you really want to use OpenMP? Parallelization via MPI is fully integrated in
> PDELab, rather easy to use, and would allow you to scale beyond a single
> machine.
> By using a parallel grid, matrix assembly will immediately scale as well, so
> maybe that would be a better choice for you?
>
> Best,
>
> Linus
>
> Am Freitag, 26. Juni 2020, 09:23:55 CEST schrieb Shubhangi Gupta:
>> Dear Santiago,
>>
>> Thanks a lot for your reply.
>>
>> I was hoping it would be a bit easier than this to get openMP working..
>> but I'll give it a shot.. if by any chance it works, I'll get back to you :)
>>
>> Warm wishes, Shubhangi
>>
>> On 25.06.20 18:38, Santiago Ospina wrote:
>>> Hi Shubhangi,
>>>
>>> as far as I can tell, the main PDELab is not able to do so. I know
>>> that this was tried out in the EXADUNE project but I don't know the
>>> outcome of that implementation. Perhaps someone else may comment on
>>> that one. But in general, you need that each thread owns a copy of a
>>> LocalFunctionSpace, an LFSCache, an assembler_engine and an entity
>>> (this may need some modifications on these classes). Once that is
>>> done, most of the assembler loop can be done in parallel. Binds, loads
>>> and assemble methods should be OK with multiple threads. The
>>> problematic part comes on the unbind. There is when the local
>>> container from assembler_engine are scattered to the global container.
>>> Since contiguous entities are likely to have common DOFs or be very
>>> near in memory in the global container, data races may appear.
>>> Thinkthreads most of that is possible with the C++ thread, but I might
>>> be wrong.
>>>
>>> Please let us know if you get that working ;-)
>>>
>>> Best,
>>> Santiago Ospina
>>>
>>> On Thu, Jun 25, 2020 at 2:10 PM Shubhangi Gupta <sgupta at geomar.de
>>>
>>> <mailto:sgupta at geomar.de>> wrote:
>>>      Dear all,
>>>      
>>>      The matrix assembly is the main bottleneck for my numerical
>>>      implementation in pdelab. So, I am thinking of parallelizing this
>>>      part
>>>      using openMP.. I understand that dune-pdelab is already capable of
>>>      doing
>>>      this...but I don't know where to start and I have only very
>>>      superficial
>>>      understanding of openMP.
>>>      
>>>      Is there an example that I can look at? Or can someone give me a
>>>      quick
>>>      outline of how to proceed?
>>>      
>>>      Thanks, and warm wishes, Shubhangi
>>>      
>>>      
>>>      _______________________________________________
>>>      dune-pdelab mailing list
>>>      dune-pdelab at lists.dune-project.org
>>>      <mailto:dune-pdelab at lists.dune-project.org>
>>>      https://lists.dune-project.org/mailman/listinfo/dune-pdelab
>
>
>
>
> _______________________________________________
> dune-pdelab mailing list
> dune-pdelab at lists.dune-project.org
> https://lists.dune-project.org/mailman/listinfo/dune-pdelab