Page 1 of 1

VASP compiled with MPI_CHAIN crashed

Posted: Thu Jul 12, 2007 7:51 am
by pavel
I reported this error in the section "Installation problems", but it was answered only partly and there was no reply on my second posting to the same thread. So I am reposting it again here with small modifications.

Vasp.4.6.31 was compiled using compiler flags NGZhalf and MPI_CHAIN.
Two errors had to be fixed before compilation succeeded.
1) The file "symbol.inc" contains the following macro

Code: Select all

#define STOP CALL M_exit; STOP
which is defined in the section #elif defined(MPI_CHAIN).
This macro produces recursion expanding STOP again and again.
Solution is simple and is used in the other section of "symbol.inc":
#define STOP CALL M_exit; stop
Preprocessor appears to be case sensitive and does not expand stop again.
2) Compilation errors were found for pardens.f. The reason was that some parts of the code, which should be compiled if MPI flag is defined, were missed when MPI_CHAIN flag was defined.
Solution: I replaced blocks

Code: Select all

 #ifdef MPI 
 ... 
 #endif 

with

Code: Select all

 #if defined(MPI) || defined(MPI_CHAIN) 
 ... 
 #endif 

In this way description and initialization of some variables is included both for MPI and MPI_CHAIN flags.
Was it a right guess or not?

Compilation was successful, but executable compiled with these compiler flags failed with an error:
* 253 Invalid operation PROG=m_alltoallv_z ELN=1047(40005df04)
* 252 Floating-point zero divide PROG=fexcg_ ELN=357(4001e2e80)
...
**** 99 Execution suspended PROG=fexcg_ ELN=357(4001e362c)
Called from xcgrad.fexcg ELN=188(4001deec8)
Called from pot.potlok ELN=269(4002f5d18)
Called from elmin ELN=352(40052a404)
Called from vamp ELN=2337(4000208b0)

It looks like there is a division by zero in fexcg_. Do you have some suggestion how to fix it?

On the other hand the executable compiled with NGZhalf and MPI is running the same NEB job without errors. Only ISYM=0 was set to avoid problems with one of the intermediate configurations. However, this version is extremely slow in comparison with a simple relaxation job:
Starting and ending points for NEB were relaxed using 4 CPUs for 1.15 h (convergence reached). This means that using 1 CPU it would require 4.6 h. Taking into account interprocess communication in NEB job with 4 intermediate configurations (one configuration per CPU) one can estimate the time to be about 6-7 h.
The NEB task is now running for 3 days and is not finished yet. However, the intermediate results are OK and there is a convergence. The problem is that each relaxation step now requires about 10 times more (500 sec x 4 CPU=2000 sec per ion relaxation step should be compared with 16000 sec per relaxation step for NEB job).
Is it a normal behavior? Are you expecting that MPI_CHAIN executable will be faster?

VASP compiled with MPI_CHAIN crashed

Posted: Tue Jul 24, 2007 9:45 am
by pavel
Dear head admin,
Could you comment on this?