VASP 6.2.1 sporadically hangs for hybrid calculations using multiple nodes

Problems running VASP: crashes, internal errors, "wrong" results.


Moderators: Global Moderator, Moderator

Post Reply
Message
Author
guyohad
Newbie
Newbie
Posts: 14
Joined: Mon Feb 15, 2021 9:42 am

VASP 6.2.1 sporadically hangs for hybrid calculations using multiple nodes

#1 Post by guyohad » Tue Nov 30, 2021 3:25 pm

Dear VASP developers,
Here is a minimal calculation that causes VASP 6.2.1 to sporadically hang in the middle of an SCF iteration. The more nodes we use the more likely the calculation hangs (4 nodes, with a total of 96 mpi processes hangs 80% of the time). In the cases where it doesn't hang, it converges nicely. The calculation is a 2x2x2 supercell of GaAs with the k-grid consisting of just the gamma point using HSE and reading in a PBE WAVECAR as a starting point.

We think this is related to using hybrid functionals because we do not see this problem when using PBE. We have tried various intel compilers (including intel 2019) which change the percentage of calculations that hang, but never fully removes the problem and we have included our makefile.include in the zip file. Additionally, we have run the the test suite and found that all calculations passed successfully.

We appreciate any help identifying the source of this problem.
Sincerely,
Guy
You do not have the required permissions to view the files attached to this post.

henrique_miranda
Global Moderator
Global Moderator
Posts: 483
Joined: Mon Nov 04, 2019 12:41 pm
Contact:

Re: VASP 6.2.1 sporadically hangs for hybrid calculations using multiple nodes

#2 Post by henrique_miranda » Thu Dec 09, 2021 7:00 am

Thank you for the bug report.
We are trying to reproduce this issue on our side. But it is unlikely that we will see it.

In the meantime, there are a couple of things that you could try that might help us try narrow down where the problem might be:
1. Try compiling the code with "-g -traceback -debug extended", run the code, kill it when it hangs up, and then post here the traceback?
2. Try compiling with openmpi and see if the problem persists?

guyohad
Newbie
Newbie
Posts: 14
Joined: Mon Feb 15, 2021 9:42 am

Re: VASP 6.2.1 sporadically hangs for hybrid calculations using multiple nodes

#3 Post by guyohad » Tue Dec 14, 2021 11:52 am

Hi Henrique,
I compiled with openmpi and I still get the same problem. Attached are the tracebacks for both the openmpi version and the mpi only version. As you can see, they hang in the same location. Are you able to reproduce the error?
Best,
Guy
You do not have the required permissions to view the files attached to this post.

henrique_miranda
Global Moderator
Global Moderator
Posts: 483
Joined: Mon Nov 04, 2019 12:41 pm
Contact:

Re: VASP 6.2.1 sporadically hangs for hybrid calculations using multiple nodes

#4 Post by henrique_miranda » Sat Dec 18, 2021 3:46 pm

Thank you for the traceback.
This makes it clearer where the problem possibly is.

We have encountered issues when using some MPI versions with non-blocking communications.
There is a toggle in mpi.F you can try to uncomment and check if it solves your problem:
!#define MPI_avoid_bcast

guyohad
Newbie
Newbie
Posts: 14
Joined: Mon Feb 15, 2021 9:42 am

Re: VASP 6.2.1 sporadically hangs for hybrid calculations using multiple nodes

#5 Post by guyohad » Sun Dec 19, 2021 1:49 pm

Hi Henrique,
I uncommented #define MPI_avoid_bcast, however VASP still hangs 80% of the time. The traceback indicates that calculation gets stuck at the same location in the code. What's the next thing we can try?
Thanks,
Guy

andreas.singraber
Global Moderator
Global Moderator
Posts: 236
Joined: Mon Apr 26, 2021 7:40 am

Re: VASP 6.2.1 sporadically hangs for hybrid calculations using multiple nodes

#6 Post by andreas.singraber » Mon Dec 20, 2021 11:42 am

Hello Guy,

I could reproduce the hang-ups for VASP 6.2.1 on a machine with 44 cores and with the latest Intel compiler (2021.4). It seems the problem is coming from non-blocking MPI communication for which we had similar issues before. In the past the "culprit" was MPI_Ibcast and we could even write a little reproducer code snippet which strongly indicates that there is a problem with the Intel compiler/MPI. In your case there seems to be a similar issue with MPI_Ireduce... anyway, the upcoming VASP version will avoid these calls by default (usually without loss of performance) and we will re-evaluate at a later time whether non-blocking global communication calls work reliably.

So, at this point I have two potential solutions for you:

(1) Either you wait a few more days until the upcoming release and try directly with the newest version of VASP,

(2) or you copy the attached mpi.F into your VASP 6.2.1 src directory and recompile the whole code with the Intel compiler.

In my case both options worked, I hope it will solve the issue for you too! Could you please test it and report back, thanks!

All the best,

Andreas Singraber
You do not have the required permissions to view the files attached to this post.

guyohad
Newbie
Newbie
Posts: 14
Joined: Mon Feb 15, 2021 9:42 am

Re: VASP 6.2.1 sporadically hangs for hybrid calculations using multiple nodes

#7 Post by guyohad » Tue Dec 21, 2021 2:39 pm

Thank you very much Andreas! The new mpi.F file fixed the problem and we even notice a ~10% speed up.
Best,
Guy

andreas.singraber
Global Moderator
Global Moderator
Posts: 236
Joined: Mon Apr 26, 2021 7:40 am

Re: VASP 6.2.1 sporadically hangs for hybrid calculations using multiple nodes

#8 Post by andreas.singraber » Tue Dec 21, 2021 3:00 pm

Hi!

Great :-), thank you for reporting back!

Best,
Andreas

Post Reply