Hi Andreas!
Thanks a lot for having a look at this!!!
I can confirm that the graph in my previous post is the same system with just longer runtime as you suggested.
I have also done more tests on my side using the same system but a simplified makefile.include (attached) which includes only the necessary changes to compile on our system, which is a small local HPC with 20 GPU nodes each with two Nvidia K80 GPUs. This is the system I'm trying to compile the VASP ACC version for. But I have also seen this memory leak on a different system with Nvidia V100 GPUs running a different (bigger) calculation. So my guess is that this is not related to our specific GPUs or my specific calculation.
However, I did find out that the memory leak is related to the use of OpenMP parallelization.
When setting OMP_NUM_THREADS=1 the memory usage is stable compared to OMP_NUM_THREADS=8 as seen in the figure.
This and other problems I have experienced with NVHPC+MKL+OpenMP parallelization makes me think its related to OpenMP.
For example linking MKL as follows
Code: Select all
-L${MKLROOT}/lib/intel64 -lmkl_scalapack_lp64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_openmpi_lp64 -liomp5 -lpthread -lm -ldl
results in segmentation fault. But liking with the following
Code: Select all
-L${MKLROOT}/lib/intel64 -lmkl_scalapack_lp64 -lmkl_intel_lp64 -lmkl_pgi_thread -lmkl_core -lmkl_blacs_openmpi_lp64 -pgf90libs -mp -lpthread -lm -ldl
or
Code: Select all
-Mmkl -L${MKLROOT}/lib/intel64 -lmkl_scalapack_lp64 -lmkl_blacs_openmpi_lp64
works fine (except for the memory leak)
It would be very nice if we can solve this problem since running without OpenMP parallelization is very detrimental.
Performance is around 50% higher using OMP_NUM_THREADS=8 compared to OMP_NUM_THREADS=1.
/Daniel
You do not have the required permissions to view the files attached to this post.