[Dune] restart and factory

Jö Fahlke jorrit at jorrit.de
Thu Aug 27 23:31:14 CEST 2015


Am Thu, 27. Aug 2015, 17:12:05 +0100 schrieb Andreas Dedner:
> Date: Thu, 27 Aug 2015 17:12:05 +0100
> From: Andreas Dedner <a.s.dedner at warwick.ac.uk>
> To: dune at dune-project.org, Marco Cisternino <marco.cisternino at optimad.it>
> Subject: Re: [Dune] restart and factory
> 
> Hi.
> Could you tell me how you measured the memory consumption of each
> process -  did you use some C tool for doing this or used one of the
> columns provided by top? It is clear that during the actual restoring of
> the grid process zero needs quite a large chunk of memory to load the
> full grid but after load balance this memory should be freed up again -
> otherwise there is a memory leak which I haven't seen so far with alugrid.

I remember that behaviour from way back.  I think I talked to Robert about it,
and he figured it was something along those lines: The memory is getting
freed, but the library cannot return it to the OS because it is fragmented...

Anyway, the workaround was to write the grid out (one file per rank) after
load balancing and then reading a new grid from those files in parallel.
However,
- I only did that with an unrefined grid, using writeMacroGrid(), and
- I only tried the reading in a newly invoked program (I did the partitioning
  on a SMP machine with lots of memory, and the reading on a cluster).

Regards,
Jö.

> > probably it keeps
> > some information in memory (the factory??).
> There is no factory involved in this case, since the backup/restore is
> based on alugrid internal structures and doesn't go through any grid
> factory.
> 
> Best
> Andreas
> 
> On 27/08/15 16:37, Marco Cisternino wrote:
> > Good morning,
> > 
> > I'm facing a problem I would like to share with you.
> > 
> > I'm talking about restart.
> > 
> > I'm using Dune 2.3 and ALUGrid 1.52.
> > 
> > Let me say that I launched my code using 1 process and I stopped it,
> > exiting with a backup of the grid obtained by writeGrid<Dune::xdr>.
> > 
> > Then, I want to launch a new run on 4 processes starting from the grid
> > of the previous run.
> > 
> > I  do it using readGrid<Dune::xdr> and then I call a load balance to
> > distribute the grid to the other 3 processes.
> > 
> > Everything works fine, but...
> > 
> > If, at runtime after a successful load balance, I check the memory usage
> > of every process, process 0 uses more memory than the other processes.
> > 
> > It loaded the grid before my call to load balance and probably it keeps
> > some information in memory (the factory??).
> > 
> > Is there a way to free this amount of memory? Or this information must
> > live in the 0 process memory?
> > 
> > For large grid, this behaviour can make one node to swap, drastically
> > lowering the code performances.
> > 
> > 
> > Any hint is really appreciated.
> > 
> > Thanks for your attention.
> > 
> > 
> > Bests,
> > 
> > Marco
> > 
> > 
> > 
> > _______________________________________________
> > Dune mailing list
> > Dune at dune-project.org
> > http://lists.dune-project.org/mailman/listinfo/dune
> > 
> 
> _______________________________________________
> Dune mailing list
> Dune at dune-project.org
> http://lists.dune-project.org/mailman/listinfo/dune
> 

-- 
Jorrit (Jö) Fahlke, Institute for Computational und Applied Mathematics,
University of Münster, Orleans-Ring 10, D-48149 Münster
Tel: +49 251 83 35146 Fax: +49 251 83 32729

If you receive something that says "Send this to everyone you know,"
pretend you don't know me.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 811 bytes
Desc: Digital signature
URL: <https://lists.dune-project.org/pipermail/dune/attachments/20150827/6baf1829/attachment.sig>


More information about the Dune mailing list