Whenever VASP is running slow, it can be seen that not all MPI ranks are running at 100% cpu
I compiled VASP using OpenMPI 1.8.8 and ifort. Link with MKL without -DscaLAPACK (not available on our cluster). Here is my makefile.include:[a1692208@r1n25 ~]$ top -n 1 -b | grep a1692208
920 a1692208 20 0 736780 239644 21140 R 105.9 0.2 119:17.42 vasp_std
922 a1692208 20 0 736680 235648 21072 R 105.9 0.2 119:17.60 vasp_std
893 a1692208 20 0 735760 236952 21016 R 100.0 0.2 119:17.61 vasp_std
918 a1692208 20 0 736828 237476 21132 R 100.0 0.2 119:17.39 vasp_std
919 a1692208 20 0 736892 236204 21140 R 100.0 0.2 87:44.20 vasp_std
923 a1692208 20 0 735696 238568 20876 R 100.0 0.2 84:39.87 vasp_std
924 a1692208 20 0 736728 238988 21096 R 100.0 0.2 119:17.56 vasp_std
926 a1692208 20 0 736672 236384 21048 R 100.0 0.2 119:17.66 vasp_std
931 a1692208 20 0 735236 234536 20968 R 100.0 0.2 119:17.19 vasp_std
933 a1692208 20 0 735972 237736 20968 R 100.0 0.2 119:17.62 vasp_std
917 a1692208 20 0 736632 236532 20996 R 94.1 0.2 90:01.27 vasp_std
925 a1692208 20 0 736392 240340 20872 R 94.1 0.2 101:45.09 vasp_std
892 a1692208 20 0 755220 249680 21484 R 52.9 0.2 95:48.53 vasp_std
932 a1692208 20 0 735992 235668 20832 R 52.9 0.2 81:27.10 vasp_std
921 a1692208 20 0 736568 239828 21144 R 47.1 0.2 88:19.16 vasp_std
927 a1692208 20 0 736256 236704 20848 R 47.1 0.2 85:27.73 vasp_std
875 a1692208 20 0 113116 1508 1244 S 0.0 0.0 0:00.00 slurm_script
885 a1692208 20 0 135640 5220 3344 S 0.0 0.0 0:00.09 mpirun
10884 a1692208 20 0 137320 2184 992 S 0.0 0.0 0:00.00 sshd
10885 a1692208 20 0 118552 3240 1680 S 0.0 0.0 0:00.02 bash
10983 a1692208 20 0 132384 2076 1224 R 0.0 0.0 0:00.00 top
10984 a1692208 20 0 112644 964 832 S 0.0 0.0 0:00.00 grep
Basically I altered FC and FCL according to Intel MKL link advisor, removed -DscaLAPACK preprocessor flag and changed CACHE_SIZE to 16000 according to VASP manual suggestion.# Precompiler options
CPP_OPTIONS= -DMPI -DHOST=\"IFC91_ompi_phoenix\" -DIFC \
-DCACHE_SIZE=16000 -DPGF90 -Davoidalloc \
-DMPI_BLOCK=8000 -Duse_collective \
-DnoAugXCmeta -Duse_bse_te \
-Duse_shmem -Dtbdyn \
-DnoSTOPCAR
CPP = fpp -f_com=no -free -w0 $*$(FUFFIX) $*$(SUFFIX) $(CPP_OPTIONS)
FC = mpifort -qopenmp -I${MKLROOT}/include
FCL = mpifort -mkl -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_lp64.a \
${MKLROOT}/lib/intel64/libmkl_core.a \
${MKLROOT}/lib/intel64/libmkl_intel_thread.a -Wl,--end-group
FREE = -free -names lowercase
FFLAGS = -assume byterecl -heap-arrays 64
OFLAG = -O2
OFLAG_IN = $(OFLAG)
DEBUG = -O0
MKL_PATH = $(MKLROOT)/lib/intel64
BLAS =
LAPACK =
BLACS =
SCALAPACK =
OBJECTS = fftmpiw.o fftmpi_map.o fftw3d.o fft3dlib.o \
/home/a1692208/vasp5.4/fftw3xf/libfftw3xf_intel.a
INCS =-I$(MKLROOT)/include/fftw
LLIBS = $(SCALAPACK) $(LAPACK) $(BLAS) -lpthread -lm -ldl
OBJECTS_O1 += fft3dfurth.o fftw3d.o fftmpi.o fftmpiw.o
OBJECTS_O2 += fft3dlib.o
# For what used to be vasp.5.lib
CPP_LIB = $(CPP)
FC_LIB = $(FC)
CC_LIB = icc
CFLAGS_LIB = -O
OBJECTS_LIB= linpack_double.o getshmem.o
# Normally no need to change this
SRCDIR = ../../src
BINDIR = ../../bin
I'm not sure if the random performance lost is due to my compilation or the configuration of clusters, cause I couldn't find any thing similar on Google. Please share your experience, any kind of help is welcome. Thank you very much.