Random performance lost
Posted: Mon Jun 27, 2016 2:48 am
I'm having random performance lost running VASP 5.4.1 (all patched) on our HPC. The problem is not consistent so sometime VASP just runs fine while in most cases it slows down about 100x.
Whenever VASP is running slow, it can be seen that not all MPI ranks are running at 100% cpu
I'm not sure if the random performance lost is due to my compilation or the configuration of clusters, cause I couldn't find any thing similar on Google. Please share your experience, any kind of help is welcome. Thank you very much.
Whenever VASP is running slow, it can be seen that not all MPI ranks are running at 100% cpu
I compiled VASP using OpenMPI 1.8.8 and ifort. Link with MKL without -DscaLAPACK (not available on our cluster). Here is my makefile.include:[a1692208@r1n25 ~]$ top -n 1 -b | grep a1692208
920 a1692208 20 0 736780 239644 21140 R 105.9 0.2 119:17.42 vasp_std
922 a1692208 20 0 736680 235648 21072 R 105.9 0.2 119:17.60 vasp_std
893 a1692208 20 0 735760 236952 21016 R 100.0 0.2 119:17.61 vasp_std
918 a1692208 20 0 736828 237476 21132 R 100.0 0.2 119:17.39 vasp_std
919 a1692208 20 0 736892 236204 21140 R 100.0 0.2 87:44.20 vasp_std
923 a1692208 20 0 735696 238568 20876 R 100.0 0.2 84:39.87 vasp_std
924 a1692208 20 0 736728 238988 21096 R 100.0 0.2 119:17.56 vasp_std
926 a1692208 20 0 736672 236384 21048 R 100.0 0.2 119:17.66 vasp_std
931 a1692208 20 0 735236 234536 20968 R 100.0 0.2 119:17.19 vasp_std
933 a1692208 20 0 735972 237736 20968 R 100.0 0.2 119:17.62 vasp_std
917 a1692208 20 0 736632 236532 20996 R 94.1 0.2 90:01.27 vasp_std
925 a1692208 20 0 736392 240340 20872 R 94.1 0.2 101:45.09 vasp_std
892 a1692208 20 0 755220 249680 21484 R 52.9 0.2 95:48.53 vasp_std
932 a1692208 20 0 735992 235668 20832 R 52.9 0.2 81:27.10 vasp_std
921 a1692208 20 0 736568 239828 21144 R 47.1 0.2 88:19.16 vasp_std
927 a1692208 20 0 736256 236704 20848 R 47.1 0.2 85:27.73 vasp_std
875 a1692208 20 0 113116 1508 1244 S 0.0 0.0 0:00.00 slurm_script
885 a1692208 20 0 135640 5220 3344 S 0.0 0.0 0:00.09 mpirun
10884 a1692208 20 0 137320 2184 992 S 0.0 0.0 0:00.00 sshd
10885 a1692208 20 0 118552 3240 1680 S 0.0 0.0 0:00.02 bash
10983 a1692208 20 0 132384 2076 1224 R 0.0 0.0 0:00.00 top
10984 a1692208 20 0 112644 964 832 S 0.0 0.0 0:00.00 grep
Basically I altered FC and FCL according to Intel MKL link advisor, removed -DscaLAPACK preprocessor flag and changed CACHE_SIZE to 16000 according to VASP manual suggestion.# Precompiler options
CPP_OPTIONS= -DMPI -DHOST=\"IFC91_ompi_phoenix\" -DIFC \
-DCACHE_SIZE=16000 -DPGF90 -Davoidalloc \
-DMPI_BLOCK=8000 -Duse_collective \
-DnoAugXCmeta -Duse_bse_te \
-Duse_shmem -Dtbdyn \
-DnoSTOPCAR
CPP = fpp -f_com=no -free -w0 $*$(FUFFIX) $*$(SUFFIX) $(CPP_OPTIONS)
FC = mpifort -qopenmp -I${MKLROOT}/include
FCL = mpifort -mkl -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_lp64.a \
${MKLROOT}/lib/intel64/libmkl_core.a \
${MKLROOT}/lib/intel64/libmkl_intel_thread.a -Wl,--end-group
FREE = -free -names lowercase
FFLAGS = -assume byterecl -heap-arrays 64
OFLAG = -O2
OFLAG_IN = $(OFLAG)
DEBUG = -O0
MKL_PATH = $(MKLROOT)/lib/intel64
BLAS =
LAPACK =
BLACS =
SCALAPACK =
OBJECTS = fftmpiw.o fftmpi_map.o fftw3d.o fft3dlib.o \
/home/a1692208/vasp5.4/fftw3xf/libfftw3xf_intel.a
INCS =-I$(MKLROOT)/include/fftw
LLIBS = $(SCALAPACK) $(LAPACK) $(BLAS) -lpthread -lm -ldl
OBJECTS_O1 += fft3dfurth.o fftw3d.o fftmpi.o fftmpiw.o
OBJECTS_O2 += fft3dlib.o
# For what used to be vasp.5.lib
CPP_LIB = $(CPP)
FC_LIB = $(FC)
CC_LIB = icc
CFLAGS_LIB = -O
OBJECTS_LIB= linpack_double.o getshmem.o
# Normally no need to change this
SRCDIR = ../../src
BINDIR = ../../bin
I'm not sure if the random performance lost is due to my compilation or the configuration of clusters, cause I couldn't find any thing similar on Google. Please share your experience, any kind of help is welcome. Thank you very much.