[Dune] DUNE running out of memory on 65536 cores?

Markus Blatt markus at dr-blatt.de
Tue Mar 13 10:14:21 CET 2012


Hi Eike,

On Sat, Mar 10, 2012 at 11:44:16AM +0000, Eike Mueller wrote:
> Dear DUNE list,
> 
> I'm slowly increasing the core count on Hector... With ParMETIS
> installed, I could extend my weak scaling runs to 32768 processes.
> I'm using 1.7E10 degrees of freedom there, i.e. 0.5E6 dof per
> process. However, when I push this further to 65536 processes (with
> 3.4E10 dof, but the same number of dof per process), my program gets
> killed as it runs out of memory (I get error messages like this:
> '[NID 00134] 2012-03-07 09:42:53 Apid 1756591: OOM killer terminated
> this process.') .

you are not using the trunk, are you?

These problems are due to parmetis using a dense matrix structure for
saving the adjacency information. In your case this results in
allocting a 65536x65536 matrix.

I fixed this is in the trunk, but probably forgot to merge the changes
to the 2.1 branch. If I find the time I will do it at the end of this
week or the beginning of next week.

BTW: The attachment was missing.

Cheers, 

Markus

-- 
Do you need more support with DUNE or HPC in general? 

Dr. Markus Blatt - HPC-Simulation-Software & Services http://www.dr-blatt.de
Hans-Bunte-Str. 8-10, 69123 Heidelberg, Germany
Tel.: +49 (0) 160 97590858  Fax: +49 (0)322 1108991658 




More information about the Dune mailing list