Page 1 of 2
Memory issue with TDDFT calculation
Posted: Fri May 02, 2025 10:20 am
by shweta_choudhary
Dear support team,
I am trying to do TDDFT calculations for a system of 264 atoms and I am facing following error:
Code: Select all
min. memory requirement per mpi rank 579.8 MB, per node 27828.9 MB
-----------------------------------------------------------------------------
| |
| EEEEEEE RRRRRR RRRRRR OOOOOOO RRRRRR ### ### ### |
| E R R R R O O R R ### ### ### |
| E R R R R O O R R ### ### ### |
| EEEEE RRRRRR RRRRRR O O RRRRRR # # # |
| E R R R R O O R R |
| E R R R R O O R R ### ### ### |
| EEEEEEE R R R R OOOOOOO R R ### ### ### |
| |
| Could not allocate body of response function on mpi rank 0 of size: |
| 0 MB. Reducing NOMEGAPAR or using more computing nodes might solve |
| this problem. |
| |
| ----> I REFUSE TO CONTINUE WITH THIS SICK JOB ... BYE!!! <---- |
| |
-----------------------------------------------------------------------------
However, I am giving enough memory in my jobscript:
Code: Select all
#!/bin/bash
#SBATCH -N 3
#SBATCH --ntasks-per-node=48
#SBATCH --job-name=k
#SBATCH --error=error.%J.err
#SBATCH --output=output.%J.out
#SBATCH --time=00-01:00:00
#SBATCH --partition=debug
#SBATCH --mem=672GB
source /home/VASP/vasp_var.sh
mpirun -np $SLURM_NTASKS /home/vasp.6.3.2/bin/vasp_std
Following is the INCAR:
Code: Select all
ISTART = 1; ICHARG = 0; LWAVE=T; LCHARG=F
LREAL=A; ENCUT = 600; GGA= PS
ISMEAR = 0; SIGMA = 0.01
EDIFF = 1E-8; ISIF=2; NSW = 0; IBRION = -1
ALGO=TDHF; NBANDS = 1776; ANTIRES=0
IBSE=0; NBANDSO = 20 ; NBANDSV = 20; LORBITALREAL=T
LFXC =T; LHARTREE =T; LADDER=F; NOMEGAPAR=1
Could anyone please help with this?
Thanks,
Shweta
Re: Memory issue with TDDFT calculation
Posted: Fri May 02, 2025 12:27 pm
by fabien_tran1
Hi.
Which amount of memory is available on each node of the cluster? The option --mem specifies the memory requirement per node (https://slurm.schedmd.com/archive/slurm ... batch.html), and the maximum possible value depends of course on what is available on the node. According to the error message, each node should provide at least 28 GB. Help regarding memory requirement can be found at wiki/index.php/Category:Memory.
Re: Memory issue with TDDFT calculation
Posted: Fri May 02, 2025 8:12 pm
by shweta_choudhary
Hi,
Each node has 750GB available. I have already tried reducing NTAUPAR and NOMEGAPAR to 1 and NOMEGA to 30, but nothing seems to work. I do not understand why such memory issue is happening? Is vasp not able to read --mem tag?
Re: Memory issue with TDDFT calculation
Posted: Fri May 02, 2025 8:56 pm
by fabien_tran1
Can you please upload the files slurm_xxx.log and OUTCAR? Have you tried more nodes (it seems that you are using 3 nodes).
Re: Memory issue with TDDFT calculation
Posted: Sat May 03, 2025 8:00 am
by shweta_choudhary
Hi,
I tried more nodes as well but the issue remains the same. Somehow, this forum is not letting upload OUTCAR and .out files. I am wiring here the OUTCAR and log files initial lines.
Code: Select all
vasp.6.3.2 27Jun22 (build Mar 17 2025 15:04:40) complex
executed on LinuxIFC date 2025.05.02 15:40:50
running on 144 total cores
distrk: each k-point on 144 cores, 1 groups
distr: one band on NCORE= 1 cores, 144 groups
--------------------------------------------------------------------------------------------------------
INCAR:
ISTART = 1
ICHARG = 0
LWAVE = T
LCHARG = F
LREAL = A
ENCUT = 600
GGA = PS
ISMEAR = 0
SIGMA = 0.01
EDIFF = 1E-8
ISIF = 2
NSW = 0
IBRION = -1
ALGO = TDHF
NBANDS = 1776
ANTIRES = 0
IBSE = 0
NBANDSO = 20
NBANDSV = 20
LORBITALREAL = T
LFXC = T
LHARTREE = T
LADDER = F
NOMEGAPAR = 1
POTCAR: PAW_PBE Bi_d_GW 14Apr2014
POTCAR: PAW_PBE Br_GW 20Mar2012
POTCAR: PAW_PBE O_GW 28Sep2005
Code: Select all
values below the HOMO (VB) or above the LUMO (CB) will cause erroneous energies
E-fermi : -0.4650
-----------------------------------------------------------------------------
WAVEDER not read: bands not compatible 1780 1872
-----------------------------------------------------------------------------
| |
| W W AA RRRRR N N II N N GGGG !!! |
| W W A A R R NN N II NN N G G !!! |
| W W A A R R N N N II N N N G !!! |
| W WW W AAAAAA RRRRR N N N II N N N G GGG ! |
| WW WW A A R R N NN II N NN G G |
| W W A A R R N N II N N GGGG !!! |
| |
| The derivative of the wavefunctions with respect to k (WAVEDER) can |
| not be found. You should redo the groundstate calculation using |
| LOPTICS=.TRUE. in order to write the WAVEDER file. However for |
| metals, the present setting is ok. |
| |
-----------------------------------------------------------------------------
the WAVEDER file was not read
energies w=
responsefunction array rank= 229968
LDA part: xc-table for Pade appr. of Perdew
min. memory requirement per mpi rank 579.8 MB, per node 27828.9 MB
allocating 0 responsefunctions rank=229968
-----------------------------------------------------------------------------
| |
| EEEEEEE RRRRRR RRRRRR OOOOOOO RRRRRR ### ### ### |
| E R R R R O O R R ### ### ### |
| E R R R R O O R R ### ### ### |
| EEEEE RRRRRR RRRRRR O O RRRRRR # # # |
| E R R R R O O R R |
| E R R R R O O R R ### ### ### |
| EEEEEEE R R R R OOOOOOO R R ### ### ### |
| |
| Could not allocate body of response function on mpi rank 0 of size: |
| 0 MB. Reducing NOMEGAPAR or using more computing nodes might solve |
| this problem. |
| |
| ----> I REFUSE TO CONTINUE WITH THIS SICK JOB ... BYE!!! <---- |
| |
-----------------------------------------------------------------------------
Re: Memory issue with TDDFT calculation
Posted: Thu May 08, 2025 8:42 am
by fabien_tran1
Sorry for the late answer. If possible, could you please provide all input files (INCAR, KPOINTS and POSCAR), and give more info about the computer cluster that you are using?
Re: Memory issue with TDDFT calculation
Posted: Sun May 11, 2025 3:34 pm
by shweta_choudhary
Dear sir,
INCAR:
Code: Select all
ISTART = 1; ICHARG = 0; LWAVE=T; LCHARG=F
ENCUT = 600; GGA= PS
ISMEAR = 0; SIGMA = 0.01
EDIFF = 1E-8; ISIF=2; NSW = 0; IBRION = -1
ALGO=TDHF; NBANDS = 1872; ANTIRES=0
IBSE=0; NBANDSO = 20 ; NBANDSV = 20; LORBITALREAL=T
LFXC =T; LHARTREE =T; LADDER=F
KPOINTS:
POSCAR.tar
Re: Memory issue with TDDFT calculation
Posted: Mon May 12, 2025 8:27 am
by fabien_tran1
Is an upper limit for the virtual memory set on your nodes? Can you show what produces "ulimit -a" on the command line?
Re: Memory issue with TDDFT calculation
Posted: Mon May 12, 2025 8:35 am
by shweta_choudhary
Hi,
Code: Select all
[shweta_cy.iitr@login02 ~]$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 1541472
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 4096
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) unlimited
cpu time (seconds, -t) unlimited
max user processes (-u) 1541472
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Re: Memory issue with TDDFT calculation
Posted: Mon May 12, 2025 9:53 am
by fabien_tran1
The virtual memory was set to unlimited, which is fine. Now, I would like to repeat your calculation, but I need all details:
-The steps of the calculation (DFT followed by TDDFT, etc.)
-INCAR files for all steps.
-OUTCAR and .out files for all steps (if they are too large then compress them with zip).
Re: Memory issue with TDDFT calculation
Posted: Thu May 15, 2025 5:52 am
by shweta_choudhary
Dear Sir,
I have attached the ZIP file.
Many thanks,
Shweta
Re: Memory issue with TDDFT calculation
Posted: Fri May 16, 2025 3:08 pm
by fabien_tran1
Hi,
Your calculation requires much more memory than what is indicated in the output files. Your system is quite large and my colleague Alexey Tal (specialist of BSE/TDDFT) will give you recommendations for reducing the memory requirement to make the calculation hopefully feasible on your computer cluster.
However, before that, we would like you to provide us the correct input files, since you probably made some mistakes:
-The POTCAR file in the folder step3 is different from the other folders.
-The POSCAR files slightly differ.
-The value of NBANDS in step3 is different from step2, leading to the message "WAVEDER not read: bands not compatible 1780 1872".
Re: Memory issue with TDDFT calculation
Posted: Fri May 16, 2025 4:05 pm
by shweta_choudhary
Dear sir,
I used gw POTCAR in step 3 as recommended in vasp tutorial. Also, I have copied WAVECAR CHGCAR and CONTCAR from previous steps. Please confirm the correct procedure for this. Let me know to resolve this memory issue for large system. One node has around 700 GB memory in our cluster. I could use upto 5 6 nodes.
Many thanks,
Shweta
Re: Memory issue with TDDFT calculation
Posted: Fri May 16, 2025 4:51 pm
by alexey.tal
Dear Shweta,
I used gw POTCAR in step 3 as recommended in vasp tutorial. Also, I have copied WAVECAR CHGCAR and CONTCAR from previous steps. Please confirm the correct procedure for this.
It is necessary that all input files (WAVECAR, WAVEDER) in your TDDFT (ALGO=TDHF) calculation are produced with the same POTCAR file.
Let me know to resolve this memory issue for large system. One node has around 700 GB memory in our cluster. I could use upto 5 6 nodes.
Indeed, you are trying to perform a large calculation and the memory is likely to be an issue. However, there are a few things you could try to fit this calculation on your system.
The main bottleneck so far is the memory required to store the exchange-correlation kernel \(f_{xc}(G,G')\) as reported in your OUTCAR the basis set for the response function is:
maximum number of plane-waves: 229856. The kernel storage then amounts to 2298562*16E-9 = 845 Gb of memory. This array is not distributed and if you have 700 Gb per node increasing the number of nodes will not help as you need to provide more memory to the first MPI rank. A way to reduce the size of this array is to reduce the basis set size of the response function, i.e., reduce ENCUTGW. But keep in mind that the calculation needs to be thoroughly converged with ENCUTGW.
The number of bands included in the TDHF kernel calculation is very small in your OUTCAR, i.e., NBANDSO=NBANDSO=20. Since you have 2688 electrons or 1344 occupied bands, that means that you only account for around 1% of the occupied bands in your calculation, which is too little. For such a system you likely need hundreds/thousand of occupied and unoccupied bands. The rank of the Casida equation in the TDHF algorithm is NBANDSO x NBANDSV x NKPTS and you are going to need to solver a really large matrix to get a reasonably converged spectrum.
In VASP we have an alternative approach for the TDDFT calculation (ALGO=TIMEEV) which is discussed in detail on our wiki. This approach is much faster for calculations when the electron-hole interaction is not taken into account as in your case (LADDER=.FALSE.) and it requires much less memory.
Re: Memory issue with TDDFT calculation
Posted: Fri May 16, 2025 5:06 pm
by shweta_choudhary
Dear Alexey,
Thank you very much for detailed response.
1. I will use gw POTCAR during whole procedure.
2. Could you please comment if I want to perform BSE@GW or TDHF with parameters from GW to account for electron hole interaction at all for such large systems is it not feasible with our HPC configurations? Because similar memory issue i faced during GW band structure calculations.
So, optimizing ENCUTGW is the only way? How can I utilize full memory in nodes ? I could decrease ntasks per node upto 24.
3. Could you please comment on how to select NBANDO/V ?
I really appreciate your help.