[Dune] DUNE running out of memory on 65536 cores?

Eike Mueller E.Mueller at bath.ac.uk
Wed Mar 21 10:50:04 CET 2012


Hi,

I now tried it again with the latest version of the trunk (I use 
revision 1542) of dune-istl, but it still fails with the same error 
message (running out of memory) if I run the code on 65536 cores.

Now it does not get beyond the matrix setup/residual assembly phase:

[...]
=== matrix setup (max) 2.89218 s
=== matrix assembly (max) 4.64429 s
=== residual assembly (max) 1.74011 s
Application 1817525 exit signals: Killed
Application 1817525 resources: utime ~0s, stime ~7641s

Cheers,

Eike

Markus Blatt wrote:
> Hi Eike,
> 
> On Sat, Mar 10, 2012 at 11:44:16AM +0000, Eike Mueller wrote:
>> Dear DUNE list,
>>
>> I'm slowly increasing the core count on Hector... With ParMETIS
>> installed, I could extend my weak scaling runs to 32768 processes.
>> I'm using 1.7E10 degrees of freedom there, i.e. 0.5E6 dof per
>> process. However, when I push this further to 65536 processes (with
>> 3.4E10 dof, but the same number of dof per process), my program gets
>> killed as it runs out of memory (I get error messages like this:
>> '[NID 00134] 2012-03-07 09:42:53 Apid 1756591: OOM killer terminated
>> this process.') .
> 
> you are not using the trunk, are you?
> 
> These problems are due to parmetis using a dense matrix structure for
> saving the adjacency information. In your case this results in
> allocting a 65536x65536 matrix.
> 
> I fixed this is in the trunk, but probably forgot to merge the changes
> to the 2.1 branch. If I find the time I will do it at the end of this
> week or the beginning of next week.
> 
> BTW: The attachment was missing.
> 
> Cheers, 
> 
> Markus
> 





More information about the Dune mailing list