[Dune] problem with mpi_init
Marco Cisternino
marco.cisternino at optimad.it
Fri Jan 17 19:34:48 CET 2014
Hi duners,
I'm building dune with alugrid on a workstation, but I'm having a problem running a dune project.
I built Alugrid with
./configure CC=gcc CXX=g++ MPICC=/opt/openmpi/openmpi-1.6.4-gcc/bin/mpiCC --prefix=/opt/ALUGrid-1.52 --with-metis=/usr/local --with-parmetis=/usr/local CFLAG=-DNDEBUG CPPFLAGS=-DNDEBUG CXXFLAGS=-DNDEBUG CXXFLAGS=-O3 CFLAGS=-O3
I built dune-common, dune-geometry and dune-grid with
# install to custom directory
CONFIGURE_FLAGS="CC=gcc CXX=g++ MPICC=/opt/openmpi/openmpi-1.6.4-gcc/bin/mpiCC --prefix=/opt/dune-2.3 --enable-parallel -enable-experimental-grid-extensions --disable-documentation --with-metis=/usr/local --with-parmetis=/usr/local --with-alugrid=/opt/ALUGrid-1.52 CFLAGS=\"-O3 -DNDEBUG\" CXXFLAGS=\"-O3 -DNDEBUG\" "
# default target of make to install, then dune is not only built but also installed
#MAKE_FLAGS=install
# the default version of automake and autogen are not sufficient therefore we need to specify what version we use
#AUTOGEN_FLAGS="--ac=2.65 --am=1.11.1"
Everything goes fine, no problem nor in configuration neither during compiling process of every dune module
Then I run duneproject and dunecontrol to build a default project.
If I try to run the default project with /opt/openmpi/openmpi-1.6.4-gcc/bin/mpiexec -np 1 ./myproject I get this:
[sandrino:06918] [[45031,1],0] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file util/nidmap.c at line 398
[sandrino:06918] [[45031,1],0] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file base/ess_base_nidmap.c at line 62
[sandrino:06918] [[45031,1],0] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file ess_env_module.c at line 173
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
orte_ess_base_build_nidmap failed
--> Returned value Data unpack would read past end of buffer (-26) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[sandrino:06918] [[45031,1],0] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file runtime/orte_init.c at line 132
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
orte_ess_set_name failed
--> Returned value Data unpack would read past end of buffer (-26) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
*** The MPI_Init() function was called before MPI_INIT was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):
ompi_mpi_init: orte_init failed
--> Returned "Data unpack would read past end of buffer" (-26) instead of "Success" (0)
--------------------------------------------------------------------------
[sandrino:6918] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed!
--------------------------------------------------------------------------
mpiexec has exited due to process rank 0 with PID 6918 on
node sandrino exiting improperly. There are two reasons this could occur:
1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.
2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"
This may have caused other processes in the application to be
terminated by signals sent by mpiexec (as reported here).
--------------------------------------------------------------------------
Then I tried a simple MPI code calling MPI_Init, MPI_Comm_rank, MPI_Comm_size and MPI_Finalize
Everything works fine.
I cannot understand what's the matter with the dune project.
I don't know if you can help with this information. I know they could seem cryptical, but I thought someone has already seen something like that, maybe.
The most surprising thing is that I get no errors building dune. Moreover, I built dune on my laptop following the same scheme and it works perfectly.
I hope someone can help me. It seems like the mpi wrapper for compiling and mpiexec are not from the same version of Openmpi but they are. Weird!
Thanks for any hint.
Bests,
Marco
--
-----------------------------------------------
Marco Cisternino, PhD
OPTIMAD Engineering s.r.l.
Via Giacinto Collegno 18
10143 Torino - Italy
www.optimad.it
marco.cisternino at optimad.it
+39 011 19719782
-----------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.dune-project.org/pipermail/dune/attachments/20140117/97261a88/attachment.htm>
More information about the Dune
mailing list