VASP 6.3.0 ACC OMP memory leak

Message

Dankomaister · #1 Post by **Dankomaister** » Tue May 03, 2022 1:36 am

Hi,

I have compiled the ACC+OMP version of VASP 6.3.0 with Intel mkl

makefile.include.nvhpc_ompi_mkl_omp_acc

Using these compilers / libraries
CUDA/11.4.4
NVHPC/22.3
OpenMPI/4.1.3
imkl/2021.4.0
HDF5/1.12.1

I runs but unfortunately there is a huge memory leak (on the host side) as seen in the attached picture.
Any ideas what can cause this? I have tried playing around with different version of the compilers / libraries but can seems to solve this :/ Perhaps there is some bug?

I attached the makefile.include and one of the test systems for which I noticed the memory leak.

/Daniel

#2 Post by **andreas.singraber** » Mon May 09, 2022 2:13 pm

Hello!

Thanks for reporting this memory leak, I have already tried to reproduce this on our machines. Unfortunately I was not able to run exactly the same job as we do not have enough GPUs available. Also, I can not use the OpenMP parallelization together with OpenACC at the moment. To be able to test at least a similar job I modified the INCAR file (used standard IALGO, smaller ENCUT, disabled vdW-DF functionals). Even with this strongly modified setup I did get a (smaller) memory leak... it is of course not clear if it has the same origin as in your case. However, the memory leak I observed has its origin in libnvf.so which indicates that it resides not in our code but somewhere in the NVIDIA libraries. We are now getting in contact with NVIDIA for further support.

Can you please tell me which GPUs you were using? Did you observe a memory leak also without additional OpenMP parallelization? I am a bit confused about the memory graph you posted. It would be good if I could estimate the amount of leaked memory from the graph but I am not sure it really belongs to the output files you prepared. The start/end time does not match and the increase of memory in the graph lasts over 3 hours while the runtime in OUTCAR indicates about an hour of execution time. Is it just from another (longer) run with identical settings?

Thank you!

All the best,
Andreas Singraber

#3 Post by **andreas.singraber** » Mon May 09, 2022 3:37 pm

Hello again!

It seems our attempt to reproduce the memory leak was not yet successful. The memory increase in our modified setup I described in my last post turned out to stabilize after a while. So we probably simplified your example too much, we will continue our efforts, stay tuned...

Best,
Andreas

Dankomaister · #4 Post by **Dankomaister** » Tue May 10, 2022 5:12 am

Hi Andreas!

Thanks a lot for having a look at this!!!

I can confirm that the graph in my previous post is the same system with just longer runtime as you suggested.
I have also done more tests on my side using the same system but a simplified makefile.include (attached) which includes only the necessary changes to compile on our system, which is a small local HPC with 20 GPU nodes each with two Nvidia K80 GPUs. This is the system I'm trying to compile the VASP ACC version for. But I have also seen this memory leak on a different system with Nvidia V100 GPUs running a different (bigger) calculation. So my guess is that this is not related to our specific GPUs or my specific calculation.

However, I did find out that the memory leak is related to the use of OpenMP parallelization.
When setting OMP_NUM_THREADS=1 the memory usage is stable compared to OMP_NUM_THREADS=8 as seen in the figure.
This and other problems I have experienced with NVHPC+MKL+OpenMP parallelization makes me think its related to OpenMP.

For example linking MKL as follows

Code: Select all

-L${MKLROOT}/lib/intel64 -lmkl_scalapack_lp64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_openmpi_lp64 -liomp5 -lpthread -lm -ldl

results in segmentation fault. But liking with the following

Code: Select all

-L${MKLROOT}/lib/intel64 -lmkl_scalapack_lp64 -lmkl_intel_lp64 -lmkl_pgi_thread -lmkl_core -lmkl_blacs_openmpi_lp64 -pgf90libs -mp -lpthread -lm -ldl

or

Code: Select all

-Mmkl -L${MKLROOT}/lib/intel64 -lmkl_scalapack_lp64 -lmkl_blacs_openmpi_lp64

works fine (except for the memory leak)

It would be very nice if we can solve this problem since running without OpenMP parallelization is very detrimental.
Performance is around 50% higher using OMP_NUM_THREADS=8 compared to OMP_NUM_THREADS=1.

/Daniel

Dankomaister · #5 Post by **Dankomaister** » Tue May 24, 2022 8:27 am

Any updates on fixing this?

/Daniel

#6 Post by **andreas.singraber** » Tue May 24, 2022 11:49 am

Hello Daniel!

I am sorry, but there is not really anything we can say yet... we are waiting for NVIDIA to have a look at it. In the meantime I tried on 2 GPUs with 8 threads per rank. I could see some leakage but not as massive as you reported. The origin of this is again __fort_gmalloc_without_abort in libnvf.so but I am not so sure about this reporting...

memleak.png

Please stay tuned...

Best,
Andreas

Dankomaister · #7 Post by **Dankomaister** » Fri Aug 05, 2022 2:53 pm

No updated on this?
Just wanted to say we still have this problem affecting our users

/Daniel

#8 Post by **andreas.singraber** » Thu Aug 18, 2022 9:36 am

Dear Daniel!

Unfortunately, NVIDIA did not yet reply, I have asked them again for their assistance.

I completely understand your frustration and I would like to try some more things until we hear back from NVIDIA. However, I would need a smaller reproducer case, so I can try it on our limited hardware.. did you in the meantime observe this behavior also with smaller system sizes (less atoms)?

Thank you!

All the best,
Andreas Singraber

#9 Post by **andreas.singraber** » Tue Aug 23, 2022 8:47 am

Dear Daniel!

We have received an update from NVIDIA: they found a similar behavior on 8x A100 GPUs with the initial memory consumption of 4.9 GB rising up to 5.5 GB (4 threads/rank) or 5.9 GB (8 threads/rank). However, they are not sure that OpenMP alone is to be blamed because there seems to be a slight increase also with a single thread per rank. Maybe the additional threads are only magnifying the problem. They will continue to investigate and also switch to V100 GPUs.

Best,
Andreas

Dankomaister · #10 Post by **Dankomaister** » Fri Sep 02, 2022 12:37 am

Hi Andreas,

Great to hear that Nvidia is finally looking into this.
Perhaps OpenMP alone is not to blame, I also see that when setting the number of OpenMP treads to 1 there is still a memory leak

Hope this can be resolved soon.
/Daniel

#11 Post by **henrique_miranda** » Wed Nov 23, 2022 11:23 am

More users are reporting a similar issue:
https://www.vasp.at/forum/viewtopic.php?f=7&t=18700
https://www.vasp.at/forum/viewtopic.php?f=4&t=18736
https://www.vasp.at/forum/viewtopic.php?f=3&t=18739

We believe these issues might have to do with a memory leak when compiling VASP with OpenMP support using the NV compiler (see recently added entry in https://www.vasp.at/wiki/index.php/Known_issues).
Maybe you can try compiling the code without OpenMP and check if the issue persists.

Dankomaister · #12 Post by **Dankomaister** » Tue Nov 29, 2022 2:48 am

Hi,

Yes compiling without OpenMP, or just setting OMP_NUM_THREADS=1, gets rid of the memory leak as I showed above.
However, this is not really a solution since the whole point is to run with OpenMP so to not be performance limited but the non GPU accelerated parts of the code.
I really hope this can be resolved soon as running VASP on GPU without OpenMP is too expensive in terms of core-h cost compared to a CPU job due to the loss in performance without OpenMP. At least on our system.

#13 Post by **henrique_miranda** » Fri Dec 02, 2022 2:46 pm

Yes, we will include a workaround for this compiler issue in the next release of VASP which will be released this year or at the beginning of the next year.

Dankomaister · #14 Post by **Dankomaister** » Mon Jan 16, 2023 6:07 am

Okay so you mentioned that the next release of VASP would be last year or the beginning of this year.
Do you have a more detailed timeline on when the next release of VASP which will be?
We really need this to be fixed so that we can run longer calculations using the GPU version.

/Daniel

#15 Post by **andreas.singraber** » Mon Jan 16, 2023 8:21 am

Hello Daniel!

We plan to release by the end of this month, but I cannot give you an exact date because this depends on the progress of final testing steps.

Best,
Andreas Singraber

My Community

VASP 6.3.0 ACC OMP memory leak

VASP 6.3.0 ACC OMP memory leak

Re: VASP 6.3.0 ACC OMP memory leak

Re: VASP 6.3.0 ACC OMP memory leak

Re: VASP 6.3.0 ACC OMP memory leak

Re: VASP 6.3.0 ACC OMP memory leak

Re: VASP 6.3.0 ACC OMP memory leak

Re: VASP 6.3.0 ACC OMP memory leak

Re: VASP 6.3.0 ACC OMP memory leak

Re: VASP 6.3.0 ACC OMP memory leak

Re: VASP 6.3.0 ACC OMP memory leak

Re: VASP 6.3.0 ACC OMP memory leak

Re: VASP 6.3.0 ACC OMP memory leak

Re: VASP 6.3.0 ACC OMP memory leak

Re: VASP 6.3.0 ACC OMP memory leak

Re: VASP 6.3.0 ACC OMP memory leak