[Dune] Memory leak, valgrind check, Dune

Jö Fahlke jorrit at jorrit.de
Wed Oct 7 17:55:59 CEST 2009


Am Wed,  7. Oct 2009, 17:16:42 +0200 schrieb Oswald Benedikt:
> Dear Dune, we are currently experiencing a memory leak problem in our code. Below you find
> an excerpt of a valgrind run. I think this problem may related to FS#556 - GenericReferenceElements leak memory.
> 
> 
> Indeed, we ran into troubles when we used the code on 64 cores for about 8000 timesteps.
> 
> Do you have any idea what we could do ? At present we do not use the very latest svn version of dune.
> 
> Would it help to update to the latest revision ?
> 
> 
> Thanks for any suggestions since this problem in essence breaks our calculations at a time
> when we are investigating realistically large problems.
> 
> Have a great day, Benedikt

I don't think there are real memory leaks.  The memory issues valgrind shows
you are either connected to the SmallObjectPool or to OpenMPI.

Something might however be hiding behind the issues related to the
SmallObjectPool.  Is there a way to turn the SmallObjectPool off for debugging
purposes?

> ==48833== 
> ==48833== 324 bytes in 1 blocks are definitely lost in loss record 2,039 of 2,077
> ==48833==    at 0xA6E2CD: operator new[](unsigned long) (vg_replace_malloc.c:264)
> ==48833==    by 0x15BCB6: Dune::SmallObjectPool::allocate(unsigned int) (in ../../../hades3d/hades3d)
> ==48833==    by 0x15BCE0: Dune::SmallObject::operator new(unsigned long) (in ../../../hades3d/hades3d)

AFAICT (someone please correct me) the small object pool is an allocator which
was introduced because it was more efficient under certain condition than
standard new/delete.  When your program tells the small object pool to free
some memory, the pool will not immediatly return that memory to the system,
but try to reuse it later.  Since the system will reclaim the memory anyway,
the is no need for the small object pool to free any memory it still holds at
program exit, except (maybe) to keep valgrind happy.

The following three errors seem to be be related to OpenMPI.  OpenMPI may
choose not to free some memory it allocated in MPI_Init, for the same reason:
it knows that it will only ever need one instance of that data, and the system
is going to reclaim the memory at program exit anyway, so why bother to free
it.

> ==48833== 668 bytes in 1 blocks are definitely lost in loss record 2,065 of 2,077
> ==48833==    at 0xA6D416: malloc (vg_replace_malloc.c:195)
> ==48833==    by 0xA80595: ompi_free_list_grow (in /usr/local/openmpi-1.3/lib/libmpi.0.dylib)
> ==48833==    by 0x132BBBF: ???
> ==48833==    by 0xA9E736: ompi_mpi_init (in /usr/local/openmpi-1.3/lib/libmpi.0.dylib)
> ==48833==    by 0xAC471F: MPI_Init (in /usr/local/openmpi-1.3/lib/libmpi.0.dylib)
> ==48833==    by 0xB6DBC: Dune::MPIHelper::MPIHelper(int&, char**&) (in ../../../hades3d/hades3d)
> ==48833==    by 0xB6F1A: Dune::MPIHelper::instance(int&, char**&) (in ../../../hades3d/hades3d)
> ==48833==    by 0x649E: main (in ../../../hades3d/hades3d)
> ==48833== 
> ==48833== 1,380 (4 direct, 1,376 indirect) bytes in 1 blocks are definitely lost in loss record 2,072 of 2,077
> ==48833==    at 0xA6D416: malloc (vg_replace_malloc.c:195)
> ==48833==    by 0xBA53BC: opal_ifinit (in /usr/local/openmpi-1.3/lib/libopen-pal.0.dylib)
> ==48833==    by 0xBA5A81: opal_ifcount (in /usr/local/openmpi-1.3/lib/libopen-pal.0.dylib)
> ==48833==    by 0x12EB6EC: ???
> ==48833==    by 0xB665B8: mca_oob_base_init (in /usr/local/openmpi-1.3/lib/libopen-rte.0.dylib)
> ==48833==    by 0x12DDE29: ???
> ==48833==    by 0xB6C634: orte_rml_base_select (in /usr/local/openmpi-1.3/lib/libopen-rte.0.dylib)
> ==48833==    by 0xB5B444: orte_ess_base_app_setup (in /usr/local/openmpi-1.3/lib/libopen-rte.0.dylib)
> ==48833==    by 0x12E6D15: ???
> ==48833==    by 0xB40FB1: orte_init (in /usr/local/openmpi-1.3/lib/libopen-rte.0.dylib)
> ==48833==    by 0xA9E246: ompi_mpi_init (in /usr/local/openmpi-1.3/lib/libmpi.0.dylib)
> ==48833==    by 0xAC471F: MPI_Init (in /usr/local/openmpi-1.3/lib/libmpi.0.dylib)
> ==48833== 
> ==48833== 1,920 bytes in 1 blocks are definitely lost in loss record 2,073 of 2,077
> ==48833==    at 0xA6D416: malloc (vg_replace_malloc.c:195)
> ==48833==    by 0x1308D4D: ???
> ==48833==    by 0x13088A1: ???
> ==48833==    by 0x132D82B: ???
> ==48833==    by 0xB9B4FD: mca_base_components_open (in /usr/local/openmpi-1.3/lib/libopen-pal.0.dylib)
> ==48833==    by 0xB04B34: mca_pml_base_open (in /usr/local/openmpi-1.3/lib/libmpi.0.dylib)
> ==48833==    by 0xA9E3AD: ompi_mpi_init (in /usr/local/openmpi-1.3/lib/libmpi.0.dylib)
> ==48833==    by 0xAC471F: MPI_Init (in /usr/local/openmpi-1.3/lib/libmpi.0.dylib)
> ==48833==    by 0xB6DBC: Dune::MPIHelper::MPIHelper(int&, char**&) (in ../../../hades3d/hades3d)
> ==48833==    by 0xB6F1A: Dune::MPIHelper::instance(int&, char**&) (in ../../../hades3d/hades3d)
> ==48833==    by 0x649E: main (in ../../../hades3d/hades3d)
> ==48833== 
> ==48833== LEAK SUMMARY:
> ==48833==    definitely lost: 20,919 bytes in 122 blocks
> ==48833==    indirectly lost: 2,912 bytes in 44 blocks
> ==48833==      possibly lost: 536 bytes in 18 blocks
> ==48833==    still reachable: 137,988 bytes in 1,974 blocks
> ==48833==         suppressed: 580 bytes in 14 blocks
> ==48833== Reachable blocks (those to which a pointer was found) are not shown.
> ==48833== To see them, rerun with: --leak-check=full --show-reachable=yes

-- 
In the beginning the Universe was created.  This has made a lot of
people very angry and been widely regarded as a bad move.
-- Douglas Adams
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 827 bytes
Desc: Digital signature
URL: <https://lists.dune-project.org/pipermail/dune/attachments/20091007/4f21e965/attachment.sig>


More information about the Dune mailing list