MPI Crash running QMD calculation in gamma point only configuration

Message

paulfons · #1 Post by **paulfons** » Tue May 14, 2013 3:51 am

Greetings,
I have compiled VASP 5.2.2 for my linux cluster and am attempting to run a MD calculation. My INCAR and Makefile for the gamma point only version is attached. The code has no trouble with any of the example inputs from the VASP tutorials, but it always crashes at the same place when I attempt a MD calculation regardless of how I vary the parallelization parameters leading me to suspect that this is a compilation Makefile problem. I have attached for reference, the INCAR, POSCAR, and Makefile for reference below.

The code was compiled using the ifort (IFORT) 13.1.1 20130313 compiler along with the intel mpi code 4.1.0.030.

I am burned up a fair amount of time trying things, but without success. Does anyone have any suggestions as to what to try next?

Best wishes,

Paul Fons

The code is running on one node of a 20 node machine. Each node has 8 Xeon E5620 CPUs with 24 GB of memory. While the nodes are connected by 10GBe interconnections and I would like to run this code in parallel on multiple nodes, the problems I am reporting here occur on a single node (SMP).

Here is the error message:

mpirun -np 16 vasp_gamma
running on 16 total cores
distrk: each k-point on 16 cores, 1 groups
distr: one band on 8 cores, 2 groups
using from now: INCAR
vasp.5.3.3 18Dez12 (build May 13 2013 15:17:23) gamma-only

POSCAR found : 3 types and 108 ions
scaLAPACK will be used
LDA part: xc-table for Pade appr. of Perdew

-----------------------------------------------------------------------------
| |
| W W AA RRRRR N N II N N GGGG !!! |
| W W A A R R NN N II NN N G G !!! |
| W W A A R R N N N II N N N G !!! |
| W WW W AAAAAA RRRRR N N N II N N N G GGG ! |
| WW WW A A R R N NN II N NN G G |
| W W A A R R N N II N N GGGG !!! |
| |
| VASP found 321 degrees of freedom |
| the temperature will equal 2*E(kin)/ (degrees of freedom) |
| this differs from previous releases, where T was 2*E(kin)/(3 NIONS). |
| The new definition is more consistent |
| |
-----------------------------------------------------------------------------

POSCAR, INCAR and KPOINTS ok, starting setup
WARNING: small aliasing (wrap around) errors must be expected
FFT: planning ...
WAVECAR not read
WARNING: random wavefunctions but no delay for mixing, default for NELMDL
prediction of wavefunctions initialized - no I/O
entering main loop
N E dE d eps ncg rms rms(c)
RMM: 1 0.438327604116E+04 0.43833E+04 -0.94880E+04 346 0.506E+02
RMM: 2 0.127584402674E+04 -0.31074E+04 -0.31882E+04 346 0.152E+02
RMM: 3 0.299986400145E+03 -0.97586E+03 -0.12137E+04 346 0.925E+01
RMM: 4 -0.606823381784E+02 -0.36067E+03 -0.41045E+03 346 0.659E+01
RMM: 5 -0.228060471875E+03 -0.16738E+03 -0.15696E+03 346 0.362E+01
RMM: 6 -0.308261614897E+03 -0.80201E+02 -0.68222E+02 346 0.255E+01
RMM: 7 -0.345624028023E+03 -0.37362E+02 -0.34195E+02 346 0.155E+01
RMM: 8 -0.366489442688E+03 -0.20865E+02 -0.18421E+02 346 0.116E+01
RMM: 9 -0.391069482656E+03 -0.24580E+02 -0.24063E+02 842 0.755E+00
RMM: 10 -0.392923752147E+03 -0.18543E+01 -0.32951E+01 884 0.200E+00
RMM: 11 -0.393354485197E+03 -0.43073E+00 -0.41138E+00 833 0.520E-01
RMM: 12 -0.393407181438E+03 -0.52696E-01 -0.49653E-01 802 0.148E-01 0.950E+00
RMM: 13 -0.390937720994E+03 0.24695E+01 -0.45304E+00 697 0.142E+00 0.601E+00
RMM: 14 -0.390311322970E+03 0.62640E+00 -0.37173E+00 716 0.136E+00 0.276E+00
RMM: 15 -0.390294966246E+03 0.16357E-01 -0.96176E-01 783 0.688E-01 0.135E+00
RMM: 16 -0.390280817461E+03 0.14149E-01 -0.18087E-01 700 0.375E-01 0.504E-01
RMM: 17 -0.390284202366E+03 -0.33849E-02 -0.26515E-02 722 0.159E-01 0.268E-01
RMM: 18 -0.390287549580E+03 -0.33472E-02 -0.92233E-03 724 0.899E-02 0.158E-01
RMM: 19 -0.390289808941E+03 -0.22594E-02 -0.63338E-03 696 0.761E-02 0.924E-02
RMM: 20 -0.390290458142E+03 -0.64920E-03 -0.14378E-03 695 0.422E-02 0.458E-02
RMM: 21 -0.390290916003E+03 -0.45786E-03 -0.10433E-03 599 0.298E-02 0.274E-02
RMM: 22 -0.390290971931E+03 -0.55928E-04 -0.23374E-04 438 0.159E-02
1 T= 600. E= -.38199243E+03 F= -.39029097E+03 E0= -.39022511E+03 EK= 0.82985E+01 SP= 0.00E+00 SK= 0.00E+00
bond charge predicted
N E dE d eps ncg rms rms(c)
RMM: 1 -0.390242713772E+03 0.48202E-01 -0.45969E+00 692 0.226E+00 0.275E-01
RMM: 2 -0.390240986761E+03 0.17270E-02 -0.10085E-01 751 0.279E-01 0.152E-01
RMM: 3 -0.390241059805E+03 -0.73044E-04 -0.90892E-03 811 0.726E-02 0.953E-02
RMM: 4 -0.390240906267E+03 0.15354E-03 -0.10114E-03 675 0.269E-02 0.464E-02
RMM: 5 -0.390240893714E+03 0.12552E-04 -0.28526E-04 438 0.162E-02
2 T= 596. E= -.38199224E+03 F= -.39024089E+03 E0= -.39017463E+03 EK= 0.82487E+01 SP= 0.00E+00 SK= 0.00E+00
Fatal error in PMPI_Allgatherv: Internal MPI error!, error stack:
PMPI_Allgatherv(1430).....: MPI_Allgatherv(sbuf=0x8051d70, scount=12110, MPI_DOUBLE_PRECISION, rbuf=0x8051d70, rcounts=0x75fc7d0, displs=0x77e78f0, MPI_DOUBLE_PRECISION, comm=0xc4010000) failed
MPIR_Allgatherv_impl(1002):
MPIR_Allgatherv(958)......:
MPIR_Allgatherv_intra(708):
MPIR_Localcopy(381).......: memcpy arguments alias each other, dst=0x8051d70 src=0x8051d70 len=96880
Fatal error in PMPI_Allgatherv: Internal MPI error!, error stack:
PMPI_Allgatherv(1430).....: MPI_Allgatherv(sbuf=0x80507c0, scount=12110, MPI_DOUBLE_PRECISION, rbuf=0x8038d50, rcounts=0x763d1c0, displs=0x763d210, MPI_DOUBLE_PRECISION, comm=0x84000006) failed
MPIR_Allgatherv_impl(1002):
MPIR_Allgatherv(958)......:
MPIR_Allgatherv_intra(708):
MPIR_Localcopy(381).......: memcpy arguments alias each other, dst=0x80507c0 src=0x80507c0 len=96880
Fatal error in PMPI_Allgatherv: Internal MPI error!, error stack:
PMPI_Allgatherv(1430).....: MPI_Allgatherv(sbuf=0x80875a0, scount=12110, MPI_DOUBLE_PRECISION, rbuf=0x8040650, rcounts=0x75ea8f0, displs=0x77a1770, MPI_DOUBLE_PRECISION, comm=0x84000006) failed
MPIR_Allgatherv_impl(1002):
MPIR_Allgatherv(958)......:
MPIR_Allgatherv_intra(708):
MPIR_Localcopy(381).......: memcpy arguments alias each other, dst=0x80875a0 src=0x80875a0 len=96880
Fatal error in PMPI_Allgatherv: Internal MPI error!, error stack:
PMPI_Allgatherv(1430).....: MPI_Allgatherv(sbuf=0x80b7fa0, scount=11764, MPI_DOUBLE_PRECISION, rbuf=0x8013160, rcounts=0x72b1060, displs=0x72b10b0, MPI_DOUBLE_PRECISION, comm=0x84000006) failed
MPIR_Allgatherv_impl(1002):
MPIR_Allgatherv(958)......:
MPIR_Allgatherv_intra(708):
MPIR_Localcopy(381).......: memcpy arguments alias each other, dst=0x80b7fa0 src=0x80b7fa0 len=94112
Fatal error in PMPI_Allgatherv: Internal MPI error!, error stack:
PMPI_Allgatherv(1430).....: MPI_Allgatherv(sbuf=0x806b190, scount=12110, MPI_DOUBLE_PRECISION, rbuf=0x803bcb0, rcounts=0x72b0c00, displs=0x72b0c50, MPI_DOUBLE_PRECISION, comm=0x84000006) failed
MPIR_Allgatherv_impl(1002):
MPIR_Allgatherv(958)......:
MPIR_Allgatherv_intra(708):
MPIR_Localcopy(381).......: memcpy arguments alias each other, dst=0x806b190 src=0x806b190 len=96880
Fatal error in PMPI_Allgatherv: Internal MPI error!, error stack:
PMPI_Allgatherv(1430).....: MPI_Allgatherv(sbuf=0x8084d60, scount=12110, MPI_DOUBLE_PRECISION, rbuf=0x800e930, rcounts=0x72b1060, displs=0x72b10b0, MPI_DOUBLE_PRECISION, comm=0x84000006) failed
MPIR_Allgatherv_impl(1002):
MPIR_Allgatherv(958)......:
MPIR_Allgatherv_intra(708):
MPIR_Localcopy(381).......: memcpy arguments alias each other, dst=0x8084d60 src=0x8084d60 len=96880
Fatal error in PMPI_Allgatherv: Internal MPI error!, error stack:
PMPI_Allgatherv(1430).....: MPI_Allgatherv(sbuf=0x809e0f0, scount=11764, MPI_DOUBLE_PRECISION, rbuf=0x8010250, rcounts=0x72b10b0, displs=0x72b0b70, MPI_DOUBLE_PRECISION, comm=0x84000006) failed
MPIR_Allgatherv_impl(1002):
MPIR_Allgatherv(958)......:
MPIR_Allgatherv_intra(708):
MPIR_Localcopy(381).......: memcpy arguments alias each other, dst=0x809e0f0 src=0x809e0f0 len=94112
Fatal error in PMPI_Allgatherv: Internal MPI error!, error stack:
PMPI_Allgatherv(1430).....: MPI_Allgatherv(sbuf=0x80703f0, scount=12110, MPI_DOUBLE_PRECISION, rbuf=0x8011a30, rcounts=0x72b10b0, displs=0x779fc20, MPI_DOUBLE_PRECISION, comm=0x84000006) failed
MPIR_Allgatherv_impl(1002):
MPIR_Allgatherv(958)......:
MPIR_Allgatherv_intra(708):
MPIR_Localcopy(381).......: memcpy arguments alias each other, dst=0x80703f0 src=0x80703f0 len=96880
Fatal error in PMPI_Allgatherv: Internal MPI error!, error stack:
PMPI_Allgatherv(1430).....: MPI_Allgatherv(sbuf=0x80ca510, scount=11764, MPI_DOUBLE_PRECISION, rbuf=0x800e730, rcounts=0x72b1100, displs=0x72b1150, MPI_DOUBLE_PRECISION, comm=0xc4010000) failed
MPIR_Allgatherv_impl(1002):
MPIR_Allgatherv(958)......:
MPIR_Allgatherv_intra(708):
MPIR_Localcopy(381).......: memcpy arguments alias each other, dst=0x80ca510 src=0x80ca510 len=94112
Fatal error in PMPI_Allgatherv: Internal MPI error!, error stack:
PMPI_Allgatherv(1430).....: MPI_Allgatherv(sbuf=0x80b6ac0, scount=11764, MPI_DOUBLE_PRECISION, rbuf=0x7fe3d40, rcounts=0x72b0c40, displs=0x72b0c90, MPI_DOUBLE_PRECISION, comm=0x84000006) failed
MPIR_Allgatherv_impl(1002):
MPIR_Allgatherv(958)......:
MPIR_Allgatherv_intra(708):
MPIR_Localcopy(381).......: memcpy arguments alias each other, dst=0x80b6ac0 src=0x80b6ac0 len=94112

The Makefile is below:

.SUFFIXES: .inc .f .f90 .F
#-----------------------------------------------------------------------
# Makefile for Intel Fortran compiler for Pentium/Athlon/Opteron
# based systems
# we recommend this makefile for both Intel as well as AMD systems
# for AMD based systems appropriate BLAS (libgoto) and fftw libraries are
# however mandatory (whereas they are optional for Intel platforms)
# For Athlon we recommend
# ) to link against libgoto (and mkl as a backup for missing routines)
# ) odd enough link in libfftw3xf_intel.a (fftw interface for mkl)
# feedback is greatly appreciated
#
# The makefile was tested only under Linux on Intel and AMD platforms
# the following compiler versions have been tested:
# - ifc.7.1 works stable somewhat slow but reliably
# - ifc.8.1 fails to compile the code properly
# - ifc.9.1 recommended (both for 32 and 64 bit)
# - ifc.10.1 partially recommended (both for 32 and 64 bit)
# tested build 20080312 Package ID: l_fc_p_10.1.015
# the gamma only mpi version can not be compiles
# using ifc.10.1
# - ifc.11.1 partially recommended (some problems with Gamma only and intel fftw)
# Build 20090630 Package ID: l_cprof_p_11.1.046
# - ifort.12.1 strongly recommended (we use this to compile vasp)
# Version 12.1.5.339 Build 20120612
#
# it might be required to change some of library path ways, since
# LINUX installations vary a lot
#
# Hence check ***ALL*** options in this makefile very carefully
#-----------------------------------------------------------------------
#
# BLAS must be installed on the machine
# there are several options:
# 1) very slow but works:
# retrieve the lapackage from ftp.netlib.org
# and compile the blas routines (BLAS/SRC directory)
# please use g77 or f77 for the compilation. When I tried to
# use pgf77 or pgf90 for BLAS, VASP hang up when calling
# ZHEEV (however this was with lapack 1.1 now I use lapack 2.0)
# 2) more desirable: get an optimized BLAS
#
# the two most reliable packages around are presently:
# 2a) Intels own optimised BLAS (PIII, P4, PD, PC2, Itanium)
# http://developer.intel.com/software/products/mkl/
# this is really excellent, if you use Intel CPU's
#
# 2b) probably fastest SSE2 (4 GFlops on P4, 2.53 GHz, 16 GFlops PD,
# around 30 GFlops on Quad core)
# Kazushige Goto's BLAS
# http://www.cs.utexas.edu/users/kgoto/signup_first.html
# http://www.tacc.utexas.edu/resources/software/
#
#-----------------------------------------------------------------------

# all CPP processed fortran files have the extension .f90
SUFFIX=.f90

#-----------------------------------------------------------------------
# fortran compiler and linker
#-----------------------------------------------------------------------
FC=ifort -I$(MKL_ROOT)/include/fftw -I$(MKLROOT)/include/mic/lp64 -I$(MKLROOT)/include -mmic
# fortran linker
FCL=$(FC) -static

#-----------------------------------------------------------------------
# whereis CPP ?? (I need CPP, can't use gcc with proper options)
# that's the location of gcc for SUSE 5.3
#
# CPP_ = /usr/lib/gcc-lib/i486-linux/2.7.2/cpp -P -C
#
# that's probably the right line for some Red Hat distribution:
#
# CPP_ = /usr/lib/gcc-lib/i386-redhat-linux/2.7.2.3/cpp -P -C
#
# SUSE X.X, maybe some Red Hat distributions:

CPP_ = ./preprocess <$*.F | /usr/bin/cpp -P -C -traditional >$*$(SUFFIX)

# this release should be fpp clean
# we now recommend fpp as preprocessor
# if this fails go back to cpp
#CPP_=fpp -f_com=no -free -w0 $*.F $*$(SUFFIX)

#-----------------------------------------------------------------------
# possible options for CPP:
# NGXhalf charge density reduced in X direction
# wNGXhalf gamma point only reduced in X direction
# avoidalloc avoid ALLOCATE if possible
# PGF90 work around some for some PGF90 / IFC bugs
# CACHE_SIZE 1000 for PII,PIII, 5000 for Athlon, 8000-12000 P4, PD
# RPROMU_DGEMV use DGEMV instead of DGEMM in RPRO (depends on used BLAS)
# RACCMU_DGEMV use DGEMV instead of DGEMM in RACC (depends on used BLAS)
# tbdyn MD package of Tomas Bucko
#-----------------------------------------------------------------------

CPP = $(CPP_) -DHOST=\"LinuxIFC\" \
-DCACHE_SIZE=12000 -DPGF90 -Davoidalloc -DNGZhalf \
# -DRPROMU_DGEMV -DRACCMU_DGEMV

#-----------------------------------------------------------------------
# general fortran flags (there must a trailing blank on this line)
# byterecl is strictly required for ifc, since otherwise
# the WAVECAR file becomes huge
#-----------------------------------------------------------------------

FFLAGS = -FR -names lowercase -assume byterecl -I$(MKLROOT)/include

#-----------------------------------------------------------------------
# optimization
# we have tested whether higher optimisation improves performance
# -axK SSE1 optimization, but also generate code executable on all mach.
# xK improves performance somewhat on XP, and a is required in order
# to run the code on older Athlons as well
# -xW SSE2 optimization
# -axW SSE2 optimization, but also generate code executable on all mach.
# -tpp6 P3 optimization
# -tpp7 P4 optimization
#-----------------------------------------------------------------------

# ifc.9.1, ifc.10.1 recommended
#OFLAG=-O2 -ip
OFLAG= -xHOST -O3 -ip -static
OFLAG_HIGH = $(OFLAG)
OBJ_HIGH =
OBJ_NOOPT =
DEBUG = -FR -O0
INLINE = $(OFLAG)

#-----------------------------------------------------------------------
# the following lines specify the position of BLAS and LAPACK
# we recommend to use mkl, that is simple and most likely
# fastest in Intel based machines
#-----------------------------------------------------------------------

# mkl path for ifc 11 compiler
#MKL_PATH=$(MKLROOT)/lib/em64t

# mkl path for ifc 12 compiler
MKL_PATH=$(MKLROOT)/lib/intel64

MKL_FFTW_PATH=$(MKLROOT)/interfaces/fftw3xf/

# BLAS
# setting -DRPROMU_DGEMV -DRACCMU_DGEMV in the CPP lines usually speeds up program execution
# BLAS= -Wl,--start-group $(MKL_PATH)/libmkl_intel_lp64.a $(MKL_PATH)/libmkl_intel_thread.a $(MKL_PATH)/libmkl_core.a -Wl,--end-group -lguide
# faster linking and available from at least version 11
#BLAS= -lguide -mkl
#BLAS = /home/paulfons/VASP/src/GotoBlas2/libgoto2_nehalemp-r1.13.a
BLAS = $(MKLROOT)/lib/intel64/libmkl_blas95_lp64.a $(MKLROOT)/lib/intel64/libmkl_lapack95_lp64.a $(MKLROOT)/lib/intel64/libmkl_scalapack_lp64.a -Wl,--start-group $(MKLROOT)/lib/intel64/libmkl_cdft_core.a $(MKLROOT)/lib/intel64/libmkl_intel_lp64.a $(MKLROOT)/lib/intel64/libmkl_sequential.a $(MKLROOT)/lib/intel64/libmkl_core.a $(MKLROOT)/lib/intel64/libmkl_blacs_intelmpi_lp64.a -Wl,--end-group -lpthread -lm
# LAPACK, use vasp.5.lib/lapack_double

#LAPACK= ../vasp.5.lib/lapack_double.o

# LAPACK from mkl, usually faster and contains scaLAPACK as well

#LAPACK= $(MKL_PATH)/libmkl_intel_lp64.a

# here a tricky version, link in libgoto and use mkl as a backup
# also needs a special line for LAPACK
# this is the best thing you can do on AMD based systems !!!!!!

#BLAS = -Wl,--start-group /opt/libs/libgoto/libgoto.so $(MKL_PATH)/libmkl_intel_thread.a $(MKL_PATH)/libmkl_core.a -Wl,--end-group -liomp5
#LAPACK= /opt/libs/libgoto/libgoto.so $(MKL_PATH)/libmkl_intel_lp64.a

#-----------------------------------------------------------------------

LIB = -L../vasp.5.lib -ldmy \
../vasp.5.lib/linpack_double.o $(LAPACK) \
$(BLAS)

# options for linking, nothing is required (usually)
#LINK = -parallel
LINK =

#-----------------------------------------------------------------------
# fft libraries:
# VASP.5.2 can use fftw.3.1.X (http://www.fftw.org)
# since this version is faster on P4 machines, we recommend to use it
#-----------------------------------------------------------------------

FFT3D = fft3dfurth.o fft3dlib.o

# alternatively: fftw.3.1.X is slighly faster and should be used if available
#FFT3D = fftw3d.o fft3dlib.o /opt/libs/fftw-3.1.2/lib/libfftw3.a

# you may also try to use the fftw wrapper to mkl (but the path might vary a lot)
# it seems this is best for AMD based systems
#FFT3D = fftw3d.o fft3dlib.o $(MKL_FFTW_PATH)/libfftw3xf_intel.a
#INCS = -I$(MKLROOT)/include/fftw

#=======================================================================
# MPI section, uncomment the following lines until
# general rules and compile lines
# presently we recommend OPENMPI, since it seems to offer better
# performance than lam or mpich
#
# !!! Please do not send me any queries on how to install MPI, I will
# certainly not answer them !!!!
#=======================================================================
#-----------------------------------------------------------------------
# fortran linker for mpi
#-----------------------------------------------------------------------

#FC=mpif90
FC=mpiifort
FCL=$(FC)

#-----------------------------------------------------------------------
# additional options for CPP in parallel version (see also above):
# NGZhalf charge density reduced in Z direction
# wNGZhalf gamma point only reduced in Z direction
# scaLAPACK use scaLAPACK (recommended if mkl is available)
# avoidalloc avoid ALLOCATE if possible
# PGF90 work around some for some PGF90 / IFC bugs
# CACHE_SIZE 1000 for PII,PIII, 5000 for Athlon, 8000-12000 P4, PD
# RPROMU_DGEMV use DGEMV instead of DGEMM in RPRO (depends on used BLAS)
# RACCMU_DGEMV use DGEMV instead of DGEMM in RACC (depends on used BLAS)
# tbdyn MD package of Tomas Bucko
#-----------------------------------------------------------------------

#-----------------------------------------------------------------------

#CPP = $(CPP_) -DMPI -DHOST=\"LinuxIFC\" -DIFC \
# -DCACHE_SIZE=4000 -DPGF90 -Davoidalloc -DNGZhalf \
# -DMPI_BLOCK=262144 -Duse_collective -DscaLAPACK \
# -DRPROMU_DGEMV -DRACCMU_DGEMV
#CPP = $(CPP_) -DMPI -DHOST=\"LinuxIFC\" -DIFC \
# -DCACHE_SIZE=4000 -DPGF90 -Davoidalloc \
# -DMPI_BLOCK=262144 -Duse_collective -DscaLAPACK \
# -DRPROMU_DGEMV -DRACCMU_DGEMV
CPP = $(CPP_) -DMPI -DHOST=\"SiriusMKL_ifort13\" -DIFC \
-DCACHE_SIZE=4000 -DPGF90 -Davoidalloc -DwNGZhalf -DNGZhalf \
-DMPI_BLOCK=8000 -Duse_collective -DscaLAPACK -Dtbdyn \
-DRPROMU_DGEMV -DRACCMU_DGEMV

# -DMPI_BLOCK=8000 -Duse_collective -DscaLAPACK
#-----------------------------------------------------------------------
# location of SCALAPACK
# if you do not use SCALAPACK simply leave this section commented out
#-----------------------------------------------------------------------

# usually simplest link in mkl scaLAPACK
#BLACS= -lmkl_blacs_openmpi_lp64
#SCA= $(MKL_PATH)/libmkl_scalapack_lp64.a $(BLACS)
#SCA= -lmkl_scalapack_lp64 -lmkl_core0
#-----------------------------------------------------------------------
# libraries for mpi?
#-----------------------------------------------------------------------

LIB = -L../vasp.5.lib -ldmy \
../vasp.5.lib/linpack_double.o \
$(SCA) $(LAPACK) $(BLAS) -L/opt/intel/composer_xe_2013/mkl/lib/intel64/

#-----------------------------------------------------------------------
# parallel FFT
#-----------------------------------------------------------------------

# FFT: fftmpi.o with fft3dlib of Juergen Furthmueller
#FFT3D = fftmpi.o fftmpi_map.o fft3dfurth.o fft3dlib.o

# alternatively: fftw.3.1.X is slighly faster and should be used if available
#FFT3D = fftmpiw.o fftmpi_map.o fftw3d.o fft3dlib.o /opt/local/fftw3/lib/libfftw3.a

# you may also try to use the fftw wrapper to mkl (but the path might vary a lot)
# it seems this is best for AMD based systems
FFT3D = fftmpiw.o fftmpi_map.o fftw3d.o fft3dlib.o /opt/intel/composer_xe_2013/mkl/interfaces/fftw3xf/libfftw3xf_intel.a
#INCS = -I$(MKLROOT)/include/fftw

#-----------------------------------------------------------------------
# general rules and compile lines
#-----------------------------------------------------------------------
BASIC= symmetry.o symlib.o lattlib.o random.o

SOURCE= base.o mpi.o smart_allocate.o xml.o \
constant.o jacobi.o main_mpi.o scala.o \
asa.o lattice.o poscar.o ini.o mgrid.o xclib.o vdw_nl.o xclib_grad.o \
radial.o pseudo.o gridq.o ebs.o \
mkpoints.o wave.o wave_mpi.o wave_high.o spinsym.o \
$(BASIC) nonl.o nonlr.o nonl_high.o dfast.o choleski2.o \
mix.o hamil.o xcgrad.o xcspin.o potex1.o potex2.o \
constrmag.o cl_shift.o relativistic.o LDApU.o \
paw_base.o metagga.o egrad.o pawsym.o pawfock.o pawlhf.o rhfatm.o hyperfine.o paw.o \
mkpoints_full.o charge.o Lebedev-Laikov.o stockholder.o dipol.o pot.o \
dos.o elf.o tet.o tetweight.o hamil_rot.o \
chain.o dyna.o k-proj.o sphpro.o us.o core_rel.o \
aedens.o wavpre.o wavpre_noio.o broyden.o \
dynbr.o hamil_high.o rmm-diis.o reader.o writer.o tutor.o xml_writer.o \
brent.o stufak.o fileio.o opergrid.o stepver.o \
chgloc.o fast_aug.o fock_multipole.o fock.o mkpoints_change.o sym_grad.o \
mymath.o internals.o npt_dynamics.o dynconstr.o dimer_heyden.o dvvtrajectory.o vdwforcefield.o \
nmr.o pead.o subrot.o subrot_scf.o \
force.o pwlhf.o gw_model.o optreal.o steep.o davidson.o david_inner.o \
electron.o rot.o electron_all.o shm.o pardens.o paircorrection.o \
optics.o constr_cell_relax.o stm.o finite_diff.o elpol.o \
hamil_lr.o rmm-diis_lr.o subrot_cluster.o subrot_lr.o \
lr_helper.o hamil_lrf.o elinear_response.o ilinear_response.o \
linear_optics.o \
setlocalpp.o wannier.o electron_OEP.o electron_lhf.o twoelectron4o.o \
mlwf.o ratpol.o screened_2e.o wave_cacher.o chi_base.o wpot.o \
local_field.o ump2.o ump2kpar.o fcidump.o ump2no.o \
bse_te.o bse.o acfdt.o chi.o sydmat.o dmft.o \
rmm-diis_mlr.o linear_response_NMR.o wannier_interpol.o linear_response.o

vasp: $(SOURCE) $(FFT3D) $(INC) main.o
rm -f vasp
$(FCL) -o vasp main.o $(SOURCE) $(FFT3D) $(LIB) $(LINK)
makeparam: $(SOURCE) $(FFT3D) makeparam.o main.F $(INC)
$(FCL) -o makeparam $(LINK) makeparam.o $(SOURCE) $(FFT3D) $(LIB)
zgemmtest: zgemmtest.o base.o random.o $(INC)
$(FCL) -o zgemmtest $(LINK) zgemmtest.o random.o base.o $(LIB)
dgemmtest: dgemmtest.o base.o random.o $(INC)
$(FCL) -o dgemmtest $(LINK) dgemmtest.o random.o base.o $(LIB)
ffttest: base.o smart_allocate.o mpi.o mgrid.o random.o ffttest.o $(FFT3D) $(INC)
$(FCL) -o ffttest $(LINK) ffttest.o mpi.o mgrid.o random.o smart_allocate.o base.o $(FFT3D) $(LIB)
kpoints: $(SOURCE) $(FFT3D) makekpoints.o main.F $(INC)
$(FCL) -o kpoints $(LINK) makekpoints.o $(SOURCE) $(FFT3D) $(LIB)

clean:
-rm -f *.g *.f *.o *.L *.mod ; touch *.F

main.o: main$(SUFFIX)
$(FC) $(FFLAGS)$(DEBUG) $(INCS) -c main$(SUFFIX)
xcgrad.o: xcgrad$(SUFFIX)
$(FC) $(FFLAGS) $(INLINE) $(INCS) -c xcgrad$(SUFFIX)
xcspin.o: xcspin$(SUFFIX)
$(FC) $(FFLAGS) $(INLINE) $(INCS) -c xcspin$(SUFFIX)

makeparam.o: makeparam$(SUFFIX)
$(FC) $(FFLAGS)$(DEBUG) $(INCS) -c makeparam$(SUFFIX)

makeparam$(SUFFIX): makeparam.F main.F
#
# MIND: I do not have a full dependency list for the include
# and MODULES: here are only the minimal basic dependencies
# if one strucuture is changed then touch_dep must be called
# with the corresponding name of the structure
#
base.o: base.inc base.F
mgrid.o: mgrid.inc mgrid.F
constant.o: constant.inc constant.F
lattice.o: lattice.inc lattice.F
setex.o: setexm.inc setex.F
pseudo.o: pseudo.inc pseudo.F
mkpoints.o: mkpoints.inc mkpoints.F
wave.o: wave.F
nonl.o: nonl.inc nonl.F
nonlr.o: nonlr.inc nonlr.F

$(OBJ_HIGH):
$(CPP)
$(FC) $(FFLAGS) $(OFLAG_HIGH) $(INCS) -c $*$(SUFFIX)
$(OBJ_NOOPT):
$(CPP)
$(FC) $(FFLAGS) $(INCS) -c $*$(SUFFIX)

fft3dlib_f77.o: fft3dlib_f77.F
$(CPP)
$(F77) $(FFLAGS_F77) -c $*$(SUFFIX)

.F.o:
$(CPP)
$(FC) $(FFLAGS) $(OFLAG) $(INCS) -c $*$(SUFFIX)
.F$(SUFFIX):
$(CPP)
$(SUFFIX).o:
$(FC) $(FFLAGS) $(OFLAG) $(INCS) -c $*$(SUFFIX)

# special rules
#-----------------------------------------------------------------------
# these special rules have been tested for ifc.11 and ifc.12 only

fft3dlib.o : fft3dlib.F
$(CPP)
$(FC) -FR -lowercase -O2 -c $*$(SUFFIX)
fft3dfurth.o : fft3dfurth.F
$(CPP)
$(FC) -FR -lowercase -O1 -c $*$(SUFFIX)
fftw3d.o : fftw3d.F
$(CPP)
$(FC) -FR -lowercase -O1 $(INCS) -c $*$(SUFFIX)
fftmpi.o : fftmpi.F
$(CPP)
$(FC) -FR -lowercase -O1 -c $*$(SUFFIX)
fftmpiw.o : fftmpiw.F
$(CPP)
$(FC) -FR -lowercase -O1 $(INCS) -c $*$(SUFFIX)
wave_high.o : wave_high.F
$(CPP)
$(FC) -FR -lowercase -O1 -c $*$(SUFFIX)
# the following rules are probably no longer required (-O3 seems to work)
wave.o : wave.F
$(CPP)
$(FC) -FR -lowercase -O2 -c $*$(SUFFIX)
paw.o : paw.F
$(CPP)
$(FC) -FR -lowercase -O2 -c $*$(SUFFIX)
cl_shift.o : cl_shift.F
$(CPP)
$(FC) -FR -lowercase -O2 -c $*$(SUFFIX)
us.o : us.F
$(CPP)
$(FC) -FR -lowercase -O2 -c $*$(SUFFIX)
LDApU.o : LDApU.F
$(CPP)
$(FC) -FR -lowercase -O2 -c $*$(SUFFIX)

The KPOINTS file is

Automatic mesh
0
Gamma
1 1 1
0 0 0

While I doubt many will have the initiative to actual try this, here is the POSCAR file

GST
1.00000000000000000
14.9600000139329996 .0000000000000000 .0000000000000000
.0000000000000000 14.9600000139329996 .0000000000000000
.0000000000000000 .0000000000000000 14.9600000139329996
24 24 60
Direct
.31660818 .16397040 .91150332
.61521749 .20036291 .71508096
.62023147 .40785355 .88165868
.29378352 .63772562 .40120737
.12627626 .44207902 .81087827
.34285934 .38670341 .18747060
.10159818 .93323399 .32374412
.82877141 .70432972 .99001450
.08058001 .28303573 .97809234
.37532653 .79836316 .29466270
.83458939 .01869881 .64355125
.88859395 .83157680 .43303218
.87124721 .86436510 .18623569
.37495673 .96168593 .38239403
.84108173 .22395558 .84119214
.10209793 .20371568 .27452571
.34038985 .31134576 .02866281
.32702002 .84711515 .02839654
.06869176 .08360462 .77311011
.35451413 .99168486 .18621186
.83289613 .54872074 .52415234
.86153741 .94256666 .91072097
.60691206 .71647743 .57385213
.52509965 .63048175 .83547539
.57889720 .62566264 .05864894
.11588994 .38830480 .48763218
.35654527 .88446242 .55043027
.59870999 .05536231 .19481401
.58978669 .11253570 .93452451
.32767681 .57638037 .12829553
.84738210 .14652784 .10671199
.56457663 .93226503 .73966134
.64096226 .50446935 .36895893
.07680396 .58357741 .68793555
.11526718 .13893016 .54461004
.09131351 .78721835 .87724750
.58711489 .33708452 .13117109
.85883768 .10801806 .36440269
.36726350 .24663098 .42887574
.84387207 .63547600 .25128607
.86037772 .43342660 .03015163
.08493004 .47128280 .20800038
.83242082 .36204580 .29735051
.40477518 .44431013 .56094614
.11490379 .85492405 .62019716
.82849157 .72513887 .72501036
.11779117 .00020219 .05250735
.61056852 .99820923 .48661875
.71663015 .74980660 .13412289
.98405175 .25157092 .41374221
.49036698 .06007138 .62117695
.19956878 .42705589 .99088464
.50377338 .89291903 .17549071
.21758821 .96177181 .88229775
.50476290 .78738647 .95985937
.51080705 .82403447 .43526173
.46663455 .44973892 .03344508
.70630256 .59024936 .66780682
.72302782 .87088919 .59907605
.96384598 .83367803 .02975898
.36053624 .56226813 .85534130
.17698705 .64902661 .24757785
.53615249 .27957194 .53383474
.47037517 .62866927 .26195175
.23246859 .09732125 .38175209
.18276178 .23951006 .81000132
.74203875 .26452912 .97941217
.22490153 .80742313 .43813254
.96554239 .71612678 .56120765
.21429779 .55698487 .54847053
.50056352 .36962830 .29235143
.72573799 .48812489 .17329829
.97234512 .02837892 .21386922
.25103072 .28355348 .57837250
.25825312 .69479465 .97603137
.72728648 .99951757 .04652408
.47623355 .75360972 .69633461
.49823263 .47426775 .73180232
.21900102 .18021670 .10133348
.50254223 .57835612 .47893965
.97607058 .10729704 .95121138
.99540849 .98159567 .47640783
.72145801 .80230110 .87723741
.48147242 .24139666 .84109662
.97153309 .50929861 .35347440
.72288440 .56052468 .92792778
.22225794 .86772903 .17756939
.96846063 .39536349 .87393784
.22752939 .01236387 .66082481
.98272503 .41121836 .64558349
.99033787 .77729681 .29416039
.71370705 .20539921 .26587813
.98512608 .89747734 .76254533
.96960155 .30738752 .14897121
.75030293 .68145532 .41087870
.21230405 .73269715 .73381512
.73086424 .14354122 .52873405
.47083438 .14891224 .10083536
.73145755 .35613337 .75267260
.75077322 .39041884 .47102357
.95421907 .17904223 .69015451
.94808445 .62898333 .84392584
.72619026 .07340450 .79925147
.97875613 .57729091 .08383257
.74376616 .92479150 .32899117
.45603214 .98658159 .88326486
.49917406 .10237603 .36479508
.21118504 .34854003 .32074150

and the INCAR file

SYSTEM = GST225 !Name of the system
# NSW = 50000 !Number of steps for IOM
NSW = 500 !Number of steps for IOM
IBRION = 0 !Ion motion algorithm: 0 - Molecular Dynamics
SMASS = -1 !Temperature control flag: -1 temperature ramp
POTIM = 3.00 !Time-step for ion-motion
TEBEG = 600 !Initial temperature in K
TEEND = 600 !Final temperature in K
NBLOCK = 100 !Define ionic steps to calculate pcf and DOS. Scale temperature if SMASS=-1.
ISIF = 0 !0 - Calculate forces and relax ions
PREC = Low !determine ENCUT, NGX,Y,Z & ROPT
ENCUT = 175 !Cut-off energy for plane wave basis set in eV
ISYM = 0 !switch off symmetry for MD calculations
EDIFF = 1.0E-04 !SCF energy cutof
ISMEAR = 0 !determines how the partial occupancies are set for each orbital, default 1 (SIGMA 0.2)
SIGMA = 0.1 !For metals a sensible value is usually SIGMA= 0.2
IALGO = 48 !selects the algorithm for electronic minimization, 48 best for parallel
LREAL = T !projection operators in real or in reciprocal space? (small cells F, large cells T)
LPLANE = T !Parallelization, T is always faster for parallel vasp
NPAR = 2 !Parallelization, =1 for 1-16 cores, =2 for 32 cores, =4 for 64 cores
NSIM = 8 !Parallelization, =4 for 1-8 cores, =6 for 16 cores, =8 for 32-64 cores
NELMIN = 2 !minimum number of electronic SC steps, 2-4 for MD
MAXMIX = 50 !maximum number steps stored in Broyden mixer, optimal = 3x steps top converge 1st step
BMIX = 2.0 !cutoff wave vector for Kerker mixing scheme, default 1.0
LWAVE = F !write or not WAVECAR
LCHARG = F !write or not CHGCAR
APACO = 10.0 !distance for P.C. (rdf)

gmodegar · #2 Post by **gmodegar** » Thu Jul 25, 2013 4:45 pm

I had the same problem when running on 5.3.3. I found two solutions to this: 1 - Run on 5.2.12 and 2 - run the serial version of VASP

paulfons · #3 Post by **paulfons** » Mon Aug 05, 2013 6:34 am

Sorry about the tardy reply. I shall give it a try.

Best wishes,
Paul Fons

bernstei · #4 Post by **bernstei** » Thu Oct 03, 2013 10:48 pm

This problem is caused by calls to mpi_allgatherv in wavpre_noio.F. I'm not sure if it's properly considered an MPI bug (on an IBM iDataPlex it happens with Intel MPI and IBM PE, OK with OpenMPI), a VASP bug, or just an ambiguity in the MPI standard. Basically, Intel MPI doesn't like the overlapping send and receive buffers. That's often forbidden by MPI (see MPI_IN_PLACE), but I don't see that issue mentioned in the MPI documentation for mpi_allgatherv specifically.

Anyway, I have a patch that I think fixes the problem, although I'm not quite done testing it. If anyone wants it, let me know.

bernstei · #5 Post by **bernstei** » Thu Oct 03, 2013 10:52 pm

In fact, I'd say that this http://www.mpi-forum.org/docs/mpi-20-html/node145.htm strongly implies that using the same variable for send and receive in a collective is forbidden by the MPI standard, and therefore what VASP does in wavpre_noio.F is invalid.

bernstei · #6 Post by **bernstei** » Wed Oct 09, 2013 4:21 pm

And it's easy to fix with MPI_IN_PLACE (which you pass instead of the sending buffer).

#7 Post by **admin** » Wed Oct 09, 2013 5:12 pm

the problem will be fixed in the next sub-release of vasp.5.

cchang · #8 Post by **cchang** » Thu May 01, 2014 7:04 pm

For Intel MPI 4.1 onwards, you can set environment variable I_MPI_COMPATIBILITY to 3 or 4 to get around this particular standard. I got the same error with vasp 5.3.5 and IMPI 4.1.1 and 4.1.3, and verified the failing job ran to completion. See https://software.intel.com/en-us/forums/topic/392347 for more.