Not sure if this is a "bug" or if I am not setting a parameter correctly, but we encountered an issue when running VASP on many cores (58 nodes, 16 cores/node) on a big system. The final wave function file is about 102 GB, but a LOT of the time seems to be spent in writing this file. This is from the OUTCAR:
General timing and accounting informations for this job:
========================================================
Total CPU time used (sec): 8211.169
User time (sec): 6686.418
System time (sec): 1524.751
Elapsed time (sec): 49008.315
In fileio.F it looks like multiple MPI processes write into the same record in the WAVECAR file and this may cause this significant performance degradation? Did I make a mistake here, or are you aware of this? Is there a way to improve this to improve IO performance? Any help or hint would be greatly appreciated!
Low WAVECAR writing performance
Moderators: Global Moderator, Moderator
-
- Newbie
- Posts: 3
- Joined: Wed Oct 28, 2015 5:08 am
- License Nr.: 5-1851
-
- Administrator
- Posts: 2921
- Joined: Tue Aug 03, 2004 8:18 am
- License Nr.: 458
Re: Low WAVECAR writing performance
If large files are not necessary one can writing simply switch off.
-
- Newbie
- Posts: 3
- Joined: Wed Oct 28, 2015 5:08 am
- License Nr.: 5-1851
Re: Low WAVECAR writing performance
Thanks for this suggestion! It would work around the issue for cases where the wave functions are "easily" computed again from scratch (or from the charge density). Nevertheless, it is not desirable to spend such huge amounts of time writing a single not THAT large of a file.
So I guess my question is: Is this a known issue/known behavior, or an effect that only we see on our machine? If the former is the case, is there a fix planned?
So I guess my question is: Is this a known issue/known behavior, or an effect that only we see on our machine? If the former is the case, is there a fix planned?
-
- Newbie
- Posts: 3
- Joined: Wed Oct 28, 2015 5:08 am
- License Nr.: 5-1851
Re: Low WAVECAR writing performance
Together with the help of Victor Anisimov at NCSA we actually found a way to fix this issue: If no WAVECAR exists and a new file is written at the end of the job, rather than overwriting an existing WAVECAR, the slow-down is not observed. Hence, changing fileio.F such that the file written by VASP is not called WAVECAR (the file read at the beginning of the job), fixed this problem for us.
Direct access writing into an existing file WAVECAR on Lustre High-Performance File System causes each write command requesting a metadata information from the MDS server. This significantly slows down the write operation. No such bottleneck exists when writing into a new file. When a large-scale VASP calculation is performed on a Lustre file system it is necessary to write WAVECAR into a new file in order to avoid the metadata bottleneck.
Direct access writing into an existing file WAVECAR on Lustre High-Performance File System causes each write command requesting a metadata information from the MDS server. This significantly slows down the write operation. No such bottleneck exists when writing into a new file. When a large-scale VASP calculation is performed on a Lustre file system it is necessary to write WAVECAR into a new file in order to avoid the metadata bottleneck.