[Dune] DUNE running out of memory on 65536 cores?
markus at dr-blatt.de
Tue Mar 13 10:14:21 CET 2012
On Sat, Mar 10, 2012 at 11:44:16AM +0000, Eike Mueller wrote:
> Dear DUNE list,
> I'm slowly increasing the core count on Hector... With ParMETIS
> installed, I could extend my weak scaling runs to 32768 processes.
> I'm using 1.7E10 degrees of freedom there, i.e. 0.5E6 dof per
> process. However, when I push this further to 65536 processes (with
> 3.4E10 dof, but the same number of dof per process), my program gets
> killed as it runs out of memory (I get error messages like this:
> '[NID 00134] 2012-03-07 09:42:53 Apid 1756591: OOM killer terminated
> this process.') .
you are not using the trunk, are you?
These problems are due to parmetis using a dense matrix structure for
saving the adjacency information. In your case this results in
allocting a 65536x65536 matrix.
I fixed this is in the trunk, but probably forgot to merge the changes
to the 2.1 branch. If I find the time I will do it at the end of this
week or the beginning of next week.
BTW: The attachment was missing.
Do you need more support with DUNE or HPC in general?
Dr. Markus Blatt - HPC-Simulation-Software & Services http://www.dr-blatt.de
Hans-Bunte-Str. 8-10, 69123 Heidelberg, Germany
Tel.: +49 (0) 160 97590858 Fax: +49 (0)322 1108991658
More information about the Dune