Page 1 of 1

HSE06 calculation stuck at 5th electronic step.

Posted: Thu Apr 17, 2025 1:23 pm
by hszhao.cn@gmail.com

Hello,

I'm running a HSE06 calculation for a Ga-O system (20 atoms total) using VASP 6.5.0 compiled with Intel OneAPI 2024.2.0. The calculation appears to be stuck at the 5th electronic step - it completes 4 steps normally but then hangs without producing further output. See below for more details:

Code: Select all

werner@x13dai-t:~/Desktop/hse$ module load vasp
Notice: Generated new FI_PSM3_UUID: 102917a7-c52f-4801-941f-9483e72ae628
Loads the hdf5/1.14.4_3-oneapi.2024.2.0 environment.
Loads the wannier90/v3.1.0-serial-oneapi.2024.2.0 environment.

werner@x13dai-t:~/Desktop/hse$ mpirun -n 36 vasp_std
 running   36 mpi-ranks, on    1 nodes
 distrk:  each k-point on   36 cores,    1 groups
 distr:  one band on    1 cores,   36 groups
 vasp.6.5.0 16Dec24 (build Dec 29 2024 16:03:31) complex                        
  
 POSCAR found type information on POSCAR GaO 
 POSCAR found :  2 types and      20 ions
 Reading from existing POTCAR
 scaLAPACK will be used
 -----------------------------------------------------------------------------
|                                                                             |
|           W    W    AA    RRRRR   N    N  II  N    N   GGGG   !!!           |
|           W    W   A  A   R    R  NN   N  II  NN   N  G    G  !!!           |
|           W    W  A    A  R    R  N N  N  II  N N  N  G       !!!           |
|           W WW W  AAAAAA  RRRRR   N  N N  II  N  N N  G  GGG   !            |
|           WW  WW  A    A  R   R   N   NN  II  N   NN  G    G                |
|           W    W  A    A  R    R  N    N  II  N    N   GGGG   !!!           |
|                                                                             |
|     For optimal performance we recommend to set                             |
|       NCORE = 2 up to number-of-cores-per-socket                            |
|     NCORE specifies how many cores store one orbital (NPAR=cpu/NCORE).      |
|     This setting can greatly improve the performance of VASP for DFT.       |
|     The default, NCORE=1 might be grossly inefficient on modern             |
|     multi-core architectures or massively parallel machines. Do your        |
|     own testing! More info at https://www.vasp.at/wiki/index.php/NCORE      |
|     Unfortunately you need to use the default for GW and RPA                |
|     calculations (for HF NCORE is supported but not extensively tested      |
|     yet).                                                                   |
|                                                                             |
 -----------------------------------------------------------------------------

 Reading from existing POTCAR
 -----------------------------------------------------------------------------
|                                                                             |
|               ----> ADVICE to this user running VASP <----                  |
|                                                                             |
|     You have a (more or less) 'large supercell' and for larger cells it     |
|     might be more efficient to use real-space projection operators.         |
|     Therefore, try LREAL= Auto in the INCAR file.                           |
|     Mind: For very accurate calculation, you might also keep the            |
|     reciprocal projection scheme (i.e. LREAL=.FALSE.).                      |
|                                                                             |
 -----------------------------------------------------------------------------

 LDA part: xc-table for (Slater+PW92), standard interpolation
 POSCAR, INCAR and KPOINTS ok, starting setup
 FFT: planning ... GRIDC
 FFT: planning ... GRID_SOFT
 FFT: planning ... GRID
 WAVECAR not read
 entering main loop
       N       E                     dE             d eps       ncg     rms          rms(c)
DAV:   1    -0.239317225891E+04   -0.23932E+04   -0.94132E+04 42192   0.311E+03
DAV:   2    -0.395436549122E+04   -0.15612E+04   -0.15127E+04 43092   0.637E+02
DAV:   3    -0.412481440963E+04   -0.17045E+03   -0.16981E+03 47340   0.161E+02
DAV:   4    -0.412954497748E+04   -0.47306E+01   -0.47290E+01 46116   0.306E+01

werner@x13dai-t:~/Desktop/hse$ grep LOOP OUTCAR 
      LOOP:  cpu time      7.7931: real time      7.8180
      LOOP:  cpu time      7.8538: real time      7.8782
      LOOP:  cpu time      9.0585: real time      9.0875
      LOOP:  cpu time      8.4907: real time      8.5186

Then it seems to hang on the 5th step. I've waited for several hours with no progress.

Any advice would be greatly appreciated.

Thank you!
Zhao


Re: HSE06 calculation stuck at 5th electronic step.

Posted: Fri Apr 18, 2025 11:24 am
by michael_wolloch

Dear Zhao,

I am pretty sure that your calculation is not "stuck" but is taking a very long time due to your very tight computational settings.

Your k-point mesh is very dense for an insulator (PBE gives nearly a 2 eV gap), and your ENCUT value is 130% of the largest ENMAX in your POTCAR file. While increasing ENCUT is recommended for most relaxations, it might not be necessary here.

The first four steps still happen quickly because they are not self-consistent, and the Hamiltonian is not updated (NELMDL defaults to -5 for ALGO = Normal). Afterwards, the computational effort increases by orders of magnitude when the exact exchange needs to be computed.

You are also not utilizing possible parallelization options that would help speed up your calculation. Please read the wiki section on parallelization, and especially the articles on KPAR and NCORE.

Additionally, you can utilize downsampling of the Hartree-Fock operator via NKRED.

I would recommend starting with the following KPOINTS and INCAR files, but adapt the parallelization settings to your core count:

Code: Select all

reduced K-mesh
0 
Gamma
   2  8  4
   0  0  0

This reduces the total number of k-points from 171 to 30!

Code: Select all

# I/O
ISTART = 0
ICHARG = 2
PREC = N
LREAL = .FALSE.

# Electronic Relaxation
ENCUT = 400
ALGO = N
NELM = 500
EDIFF = 1E-06

# Ionic Relaxation
IBRION = -1 
NSW = 0

# DOS related values
ISMEAR = 0
SIGMA = 0.1
LORBIT = 11
NEDOS = 2000
LMAXMIX = 4

# HSE06
LHFCALC = .TRUE.
HFSCREEN = 0.2
AEXX = 0.25
ALDAX = 0.75
NKRED = 2

# Parallelization
NCORE = 7
KPAR = 4

These settings allowed me to finish the calculation in slightly less than 30 minutes on 56 cores, using VASP 6.5.0 and a very similar toolchain (oneapi 2024.2.1).

Once you have finished this computation, you can carefully increase settings (increase ENCUT, remove NKRED, increase KPOINTS density) one by one and monitor the changes, and thus converge your results.

Cheers, Michael


Re: HSE06 calculation stuck at 5th electronic step.

Posted: Fri Apr 18, 2025 1:57 pm
by hszhao.cn@gmail.com
michael_wolloch wrote: Fri Apr 18, 2025 11:24 am

Dear Zhao,

I am pretty sure that your calculation is not "stuck" but is taking a very long time due to your very tight computational settings.

Your k-point mesh is very dense for an insulator (PBE gives nearly a 2 eV gap), and your ENCUT value is 130% of the largest ENMAX in your POTCAR file. While increasing ENCUT is recommended for most relaxations, it might not be necessary here.

The first four steps still happen quickly because they are not self-consistent, and the Hamiltonian is not updated (NELMDL defaults to -5 for ALGO = Normal). Afterwards, the computational effort increases by orders of magnitude when the exact exchange needs to be computed.

You are also not utilizing possible parallelization options that would help speed up your calculation. Please read the wiki section on parallelization, and especially the articles on KPAR and NCORE.

Additionally, you can utilize downsampling of the Hartree-Fock operator via NKRED.

I would recommend starting with the following KPOINTS and INCAR files, but adapt the parallelization settings to your core count:

Code: Select all

reduced K-mesh
0 
Gamma
   2  8  4
   0  0  0

This reduces the total number of k-points from 171 to 30!

Code: Select all

# I/O
ISTART = 0
ICHARG = 2
PREC = N
LREAL = .FALSE.

# Electronic Relaxation
ENCUT = 400
ALGO = N
NELM = 500
EDIFF = 1E-06

# Ionic Relaxation
IBRION = -1 
NSW = 0

# DOS related values
ISMEAR = 0
SIGMA = 0.1
LORBIT = 11
NEDOS = 2000
LMAXMIX = 4

# HSE06
LHFCALC = .TRUE.
HFSCREEN = 0.2
AEXX = 0.25
ALDAX = 0.75
NKRED = 2

# Parallelization
NCORE = 7
KPAR = 4

These settings allowed me to finish the calculation in slightly less than 30 minutes on 56 cores, using VASP 6.5.0 and a very similar toolchain (oneapi 2024.2.1).

Thank you very much for your thorough analysis and tips above. Below are my results using the same settings as you suggested above:

Code: Select all

werner@x13dai-t:~/Desktop/hse$ module load vasp
Notice: Generated new FI_PSM3_UUID: 59ac24c1-d1f1-4aac-8331-c8f8a8d1ef08
Loads the hdf5/1.14.4_3-oneapi.2024.2.0 environment.
Loads the wannier90/v3.1.0-serial-oneapi.2024.2.0 environment.
Loads the dftd4/devhub-oneapi.2024.2.0 environment.
Loads the vasp/6.5.0-oneapi.2024.2.0 environment.

werner@x13dai-t:~/Desktop/hse$ mpirun -n 56 vasp_std
 running   56 mpi-ranks, on    1 nodes
 distrk:  each k-point on   14 cores,    4 groups
 distr:  one band on    7 cores,    2 groups
 vasp.6.5.0 16Dec24 (build Dec 29 2024 16:03:31) complex                        
  
 POSCAR found type information on POSCAR GaO 
 POSCAR found :  2 types and      20 ions
 Reading from existing POTCAR
 scaLAPACK will be used
 Reading from existing POTCAR
 -----------------------------------------------------------------------------
|                                                                             |
|               ----> ADVICE to this user running VASP <----                  |
|                                                                             |
|     You have a (more or less) 'large supercell' and for larger cells it     |
|     might be more efficient to use real-space projection operators.         |
|     Therefore, try LREAL= Auto in the INCAR file.                           |
|     Mind: For very accurate calculation, you might also keep the            |
|     reciprocal projection scheme (i.e. LREAL=.FALSE.).                      |
|                                                                             |
 -----------------------------------------------------------------------------

 LDA part: xc-table for (Slater+PW92), standard interpolation
 POSCAR, INCAR and KPOINTS ok, starting setup
 FFT: planning ... GRIDC
 FFT: planning ... GRID_SOFT
 FFT: planning ... GRID
 WAVECAR not read
 entering main loop
       N       E                     dE             d eps       ncg     rms          rms(c)
DAV:   1    -0.116334229804E+04   -0.11633E+04   -0.61405E+04  7436   0.274E+03
DAV:   2    -0.341301201384E+04   -0.22497E+04   -0.20892E+04  6424   0.836E+02
DAV:   3    -0.406429467335E+04   -0.65128E+03   -0.61989E+03  8406   0.280E+02
DAV:   4    -0.412539612383E+04   -0.61101E+02   -0.60185E+02  7232   0.928E+01
DAV:   5    -0.171634078867E+03    0.39538E+04   -0.52210E+01  9368   0.666E+01    0.262E+01
DAV:   6    -0.148185133112E+03    0.23449E+02   -0.70506E+01 10248   0.533E+01    0.772E+00
DAV:   7    -0.151119114007E+03   -0.29340E+01   -0.16852E+01  7276   0.156E+01    0.754E+00
DAV:   8    -0.150820657329E+03    0.29846E+00   -0.13795E+00  7824   0.956E+00    0.418E+00
DAV:   9    -0.150702586055E+03    0.11807E+00   -0.69428E-01  7024   0.341E+00    0.170E+00
DAV:  10    -0.150705976033E+03   -0.33900E-02   -0.10245E-01  9344   0.219E+00    0.904E-01
DAV:  11    -0.150698363444E+03    0.76126E-02   -0.17251E-02 10360   0.105E+00    0.535E-01
DAV:  12    -0.150697664817E+03    0.69863E-03   -0.66222E-03  8792   0.468E-01    0.343E-01
DAV:  13    -0.150697948600E+03   -0.28378E-03   -0.21277E-03  7424   0.276E-01    0.216E-01
DAV:  14    -0.150698081118E+03   -0.13252E-03   -0.35218E-04  8720   0.215E-01    0.127E-01
DAV:  15    -0.150698190249E+03   -0.10913E-03   -0.19821E-04  8048   0.747E-02    0.755E-02
DAV:  16    -0.150698249882E+03   -0.59633E-04   -0.37937E-05  9160   0.471E-02    0.458E-02
DAV:  17    -0.150698281681E+03   -0.31799E-04   -0.21923E-05  7680   0.238E-02    0.287E-02
DAV:  18    -0.150698298342E+03   -0.16662E-04   -0.75384E-06  6776   0.169E-02    0.181E-02
DAV:  19    -0.150698306514E+03   -0.81722E-05   -0.34125E-06  5632   0.112E-02    0.115E-02
DAV:  20    -0.150698310428E+03   -0.39134E-05   -0.11467E-06  4576   0.106E-02    0.720E-03
DAV:  21    -0.150698312332E+03   -0.19046E-05   -0.10347E-06  4280   0.694E-03    0.459E-03
DAV:  22    -0.150698313278E+03   -0.94598E-06   -0.54720E-07  3896   0.708E-03
   1 F= -.15069831E+03 E0= -.15069831E+03  d E =-.127582E-10
 writing wavefunctions
 
werner@x13dai-t:~/Desktop/hse$ grep LOOP OUTCAR 
      LOOP:  cpu time      0.4527: real time      0.4632
      LOOP:  cpu time      0.3757: real time      0.3830
      LOOP:  cpu time      0.4518: real time      0.4518
      LOOP:  cpu time      0.3996: real time      0.3996
      LOOP:  cpu time     52.5836: real time     52.5990
      LOOP:  cpu time     52.1835: real time     52.1927
      LOOP:  cpu time     52.4087: real time     52.4192
      LOOP:  cpu time     52.3937: real time     52.4080
      LOOP:  cpu time     52.4973: real time     52.5091
      LOOP:  cpu time     52.5205: real time     52.5314
      LOOP:  cpu time     52.6457: real time     52.6567
      LOOP:  cpu time     52.3471: real time     52.3573
      LOOP:  cpu time     52.0123: real time     52.0232
      LOOP:  cpu time     52.1988: real time     52.2110
      LOOP:  cpu time     51.5490: real time     51.5606
      LOOP:  cpu time     52.3595: real time     52.3743
      LOOP:  cpu time     51.9696: real time     51.9854
      LOOP:  cpu time     51.0769: real time     51.0914
      LOOP:  cpu time     51.1291: real time     51.1442
      LOOP:  cpu time     51.2979: real time     51.3119
      LOOP:  cpu time     51.3331: real time     51.3473
      LOOP:  cpu time     51.5223: real time     51.5365
     LOOP+:  cpu time   1001.6969: real time   1001.9695
     
werner@x13dai-t:~/Desktop/hse$ inxi -CM
Machine:
  Type: Unknown System: Supermicro product: Super Server v: 0123456789
    serial: 0123456789
  Mobo: Supermicro model: X13DAI-T v: 1.01 serial: WM23AS002622
    UEFI: American Megatrends LLC. v: 2.1 date: 12/14/2023
CPU:
  Info: 2x 48-core model: Intel Xeon Platinum 8488C bits: 64 type: MCP SMP
    cache: L2: 2x 96 MiB (192 MiB)
  Speed (MHz): avg: 904 min/max: 800/3800 cores: 1: 800 2: 3800 3: 800
    4: 800 5: 800 6: 800 7: 800 8: 800 9: 800 10: 800 11: 800 12: 800 13: 800
    14: 800 15: 800 16: 800 17: 800 18: 800 19: 800 20: 800 21: 800 22: 800
    23: 800 24: 800 25: 800 26: 800 27: 800 28: 800 29: 783 30: 800 31: 800
    32: 800 33: 800 34: 800 35: 800 36: 800 37: 800 38: 800 39: 800 40: 765
    41: 799 42: 800 43: 800 44: 800 45: 800 46: 800 47: 795 48: 800 49: 800
    50: 800 51: 800 52: 1400 53: 800 54: 800 55: 800 56: 1132 57: 794 58: 795
    59: 800 60: 2401 61: 2400 62: 3700 63: 800 64: 800 65: 800 66: 800 67: 800
    68: 800 69: 800 70: 800 71: 800 72: 800 73: 800 74: 800 75: 800 76: 800
    77: 800 78: 800 79: 800 80: 885 81: 800 82: 800 83: 800 84: 800 85: 800
    86: 800 87: 800 88: 800 89: 800 90: 800 91: 800 92: 800 93: 800 94: 800
    95: 800 96: 800

michael_wolloch wrote: Fri Apr 18, 2025 11:24 am

Once you have finished this computation, you can carefully increase settings (increase ENCUT, remove NKRED, increase KPOINTS density) one by one and monitor the changes, and thus converge your results.

Cheers, Michael

Specifically, what changes should I monitor?

Regards,
Zhao