Page 1 of 1

VASP linear response problem keeps failing. Out of memory?

Posted: Sat Nov 09, 2013 5:55 pm
by bakkedal
I'm running linear response problem III, on a spin-polarized system of 42 atoms (Fe36N6 supercell). The computations keeps failing without telling exactly what went wrong.

I suspect that the job is running out of memory. What's puzzles me, however, is that this happens after 12 hours or more, and that there's no indication of memory problems in form of error messages. The job just fails with a segmentation fault. I guess if the job failed to allocate requested memory, it would know that and be able to print it in the output.

I'm running it on my local workstation (32 GB memory) and on a cluster (single node, 8 processors, 32 GB memory).

Any ideas of how to debug this problem?

I'm not specifying parallelization (NPAR), as that's not supported with linear response problems. VASP fails if I try that.

[INCAR]
ISMEAR = 1
VOSKOWN = 1
ISPIN = 2
MAGMOM = 36*3 6*0.5
PREC = HIGH
EDIFF = 1E-05
LCHARG = .FALSE.
LWAVE = .FALSE.
RANDOM_SEED = 1
IBRION = 8

[KPOINTS]
K-Points
0
Auto
45 ! Length

[Console output]
running on 6 total cores
distrk: each k-point on 6 cores, 1 groups
distr: one band on 1 cores, 6 groups
using from now: INCAR
vasp.5.3.3 18Dez12 (build Aug 09 2013 13:42:53) complex

POSCAR found type information on POSCAR Fe N
POSCAR found : 2 types and 42 ions
scaLAPACK will be used


WARNING: for PREC=h ENMAX is automatically increase by 25 %
this was not the case for versions prior to vasp.4.4
WARNING: for PREC=h ENMAX is automatically increase by 25 %
this was not the case for versions prior to vasp.4.4
LDA part: xc-table for Pade appr. of Perdew
generate k-points for: 6 6 5
POSCAR, INCAR and KPOINTS ok, starting setup
FFT: planning ...
WAVECAR not read
entering main loop
N E dE d eps ncg rms rms(c)
DAV: 1 0.304592402548E+04 0.30459E+04 -0.12490E+05 26568 0.160E+03
DAV: 2 0.987209509215E+02 -0.29472E+04 -0.27687E+04 26568 0.371E+02
DAV: 3 -0.337402588496E+03 -0.43612E+03 -0.35865E+03 28626 0.169E+02
DAV: 4 -0.389066717707E+03 -0.51664E+02 -0.47665E+02 39024 0.567E+01
DAV: 5 -0.390885138501E+03 -0.18184E+01 -0.17954E+01 35520 0.126E+01 0.729E+01
DAV: 6 -0.357217883508E+03 0.33667E+02 -0.44892E+02 32604 0.985E+01 0.532E+01
DAV: 7 -0.341369795827E+03 0.15848E+02 -0.67836E+01 31938 0.436E+01 0.239E+01
DAV: 8 -0.343553785063E+03 -0.21840E+01 -0.14914E+01 33168 0.782E+00 0.125E+01
DAV: 9 -0.342857664658E+03 0.69612E+00 -0.21096E+00 37692 0.554E+00 0.351E+00
DAV: 10 -0.342966148829E+03 -0.10848E+00 -0.66436E-01 30054 0.247E+00 0.145E+00
DAV: 11 -0.342967212784E+03 -0.10640E-02 -0.70143E-02 34548 0.695E-01 0.491E-01
DAV: 12 -0.342972814370E+03 -0.56016E-02 -0.30386E-02 33516 0.474E-01 0.414E-01
DAV: 13 -0.342972013272E+03 0.80110E-03 -0.10009E-03 34116 0.103E-01 0.245E-01
DAV: 14 -0.342971841916E+03 0.17136E-03 -0.14985E-03 37620 0.752E-02 0.763E-02
DAV: 15 -0.342971872052E+03 -0.30136E-04 -0.15219E-04 27000 0.398E-02 0.347E-02
DAV: 16 -0.342971875239E+03 -0.31864E-05 -0.85503E-06 16662 0.850E-03
1 F= -.34297188E+03 E0= -.34297948E+03 d E =0.228175E-01 mag= 87.5534
Linear response reoptimize wavefunctions to high precision
DAV: 1 -0.342971877135E+03 -0.18958E-05 -0.61269E-06 36432 0.698E-03
DAV: 2 -0.342971877147E+03 -0.12173E-07 -0.12102E-07 26946 0.136E-03
DAV: 3 -0.342971877147E+03 -0.18190E-09 -0.10727E-09 14460 0.995E-05
Linear response DOF= 4
Linear response progress:
Degree of freedom: 1/ 4
generate k-points for: 6 6 5
N E dE d eps ncg rms rms(c)
RMM: 1 -0.171116802305E+00 -0.17112E+00 -0.12199E-01172816 0.754E-01
RMM: 2 -0.164793427051E+00 0.63234E-02 -0.41623E-03 99925 0.281E-01 0.829E-01
RMM: 3 -0.169079119482E+00 -0.42857E-02 -0.72983E-03117291 0.252E-01 0.111E+00
RMM: 4 -0.171211599611E+00 -0.21325E-02 -0.85577E-03 93259 0.366E-01 0.122E+00
RMM: 5 -0.164667327575E+00 0.65443E-02 -0.21559E-03 92782 0.191E-01 0.236E-01
RMM: 6 -0.164582709631E+00 0.84618E-04 -0.26977E-04 94502 0.645E-02 0.141E-01
RMM: 7 -0.164622748492E+00 -0.40039E-04 -0.89592E-05 99835 0.384E-02 0.136E-01
RMM: 8 -0.164585936436E+00 0.36812E-04 -0.21273E-06 96321 0.310E-02 0.488E-02
RMM: 9 -0.164633815294E+00 -0.47879E-04 0.43418E-05110349 0.151E-02 0.511E-02
RMM: 10 -0.164632097392E+00 0.17179E-05 0.55945E-05 98532 0.105E-02 0.128E-02
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
vasp 0000000000E62448 Unknown Unknown Unknown
vasp 000000000113FBD7 Unknown Unknown Unknown
vasp 0000000000473791 Unknown Unknown Unknown
vasp 00000000004420DC Unknown Unknown Unknown
libc.so.6 00002B6BA955AEAD Unknown Unknown Unknown
vasp 0000000000441FB9 Unknown Unknown Unknown
--------------------------------------------------------------------------
mpiexec noticed that process rank 3 with PID 13725 on node wheezy2 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

VASP linear response problem keeps failing. Out of memory?

Posted: Mon Nov 11, 2013 5:11 pm
by alex
Hi,

I would try

ALGO = N

in the INCAR. But I'm not sure.

Cheers,

alex

VASP linear response problem keeps failing. Out of memory?

Posted: Mon Dec 02, 2013 11:20 am
by bakkedal
Hi,

That is the default value (http://cms.mpi.univie.ac.at/vasp/vasp/ALGO_tag.html), so it should already be enabled for this run. However, I now have reasons to be believe that it was actually running out of memory. I did some memory statistics in the background, and it was indeed running very low on memory:

[Before the job crashed]

Sat Nov 9 15:15:01 CET 2013
total used free shared buffers cached
Mem: 32169 31677 492 0 155 759
-/+ buffers/cache: 30761 1407
Swap: 0 0 0

[Just after the job crashed]

Sat Nov 9 15:16:01 CET 2013
total used free shared buffers cached
Mem: 32169 3876 28292 0 155 760
-/+ buffers/cache: 2960 29208
Swap: 0 0 0

I'm still puzzled why it fails with a segmentation fault. If the process tries to allocate memory, and the operating system isn't able to deliver any more, the process should be able to diagose that condition and show a proper error message. Maybe this is a bug? This is a linear response DPFT job.

VASP linear response problem keeps failing. Out of memory?

Posted: Mon Dec 02, 2013 6:49 pm
by admin
I suppose the diagnose should be written by the OS (libc.so.6) rather that by VASP itself

VASP linear response problem keeps failing. Out of memory?

Posted: Sun Jan 12, 2014 3:06 pm
by abalone
I am having the same problem. The program will just stuck for two days without report any problem. It occurs after the first DOF of linear response is finished.

VASP linear response problem keeps failing. Out of memory?

Posted: Mon Jan 13, 2014 10:05 am
by alex
Hi there again,

I would try

ALGO = N

in the INCAR. It look's like that DFPT uses the fast algorithm. Check the 'RMM' in the cycles.

Cheers,

alex

VASP linear response problem keeps failing. Out of memory?

Posted: Sat Jan 18, 2014 2:38 pm
by salina
You mean Pullay stress? Could you explain more?