VASP linear response problem keeps failing. Out of memory?

Problems running VASP: crashes, internal errors, "wrong" results.


Moderators: Global Moderator, Moderator

Post Reply
Message
Author
bakkedal
Newbie
Newbie
Posts: 3
Joined: Mon Jul 15, 2013 12:11 pm

VASP linear response problem keeps failing. Out of memory?

#1 Post by bakkedal » Sat Nov 09, 2013 5:55 pm

I'm running linear response problem III, on a spin-polarized system of 42 atoms (Fe36N6 supercell). The computations keeps failing without telling exactly what went wrong.

I suspect that the job is running out of memory. What's puzzles me, however, is that this happens after 12 hours or more, and that there's no indication of memory problems in form of error messages. The job just fails with a segmentation fault. I guess if the job failed to allocate requested memory, it would know that and be able to print it in the output.

I'm running it on my local workstation (32 GB memory) and on a cluster (single node, 8 processors, 32 GB memory).

Any ideas of how to debug this problem?

I'm not specifying parallelization (NPAR), as that's not supported with linear response problems. VASP fails if I try that.

[INCAR]
ISMEAR = 1
VOSKOWN = 1
ISPIN = 2
MAGMOM = 36*3 6*0.5
PREC = HIGH
EDIFF = 1E-05
LCHARG = .FALSE.
LWAVE = .FALSE.
RANDOM_SEED = 1
IBRION = 8

[KPOINTS]
K-Points
0
Auto
45 ! Length

[Console output]
running on 6 total cores
distrk: each k-point on 6 cores, 1 groups
distr: one band on 1 cores, 6 groups
using from now: INCAR
vasp.5.3.3 18Dez12 (build Aug 09 2013 13:42:53) complex

POSCAR found type information on POSCAR Fe N
POSCAR found : 2 types and 42 ions
scaLAPACK will be used


WARNING: for PREC=h ENMAX is automatically increase by 25 %
this was not the case for versions prior to vasp.4.4
WARNING: for PREC=h ENMAX is automatically increase by 25 %
this was not the case for versions prior to vasp.4.4
LDA part: xc-table for Pade appr. of Perdew
generate k-points for: 6 6 5
POSCAR, INCAR and KPOINTS ok, starting setup
FFT: planning ...
WAVECAR not read
entering main loop
N E dE d eps ncg rms rms(c)
DAV: 1 0.304592402548E+04 0.30459E+04 -0.12490E+05 26568 0.160E+03
DAV: 2 0.987209509215E+02 -0.29472E+04 -0.27687E+04 26568 0.371E+02
DAV: 3 -0.337402588496E+03 -0.43612E+03 -0.35865E+03 28626 0.169E+02
DAV: 4 -0.389066717707E+03 -0.51664E+02 -0.47665E+02 39024 0.567E+01
DAV: 5 -0.390885138501E+03 -0.18184E+01 -0.17954E+01 35520 0.126E+01 0.729E+01
DAV: 6 -0.357217883508E+03 0.33667E+02 -0.44892E+02 32604 0.985E+01 0.532E+01
DAV: 7 -0.341369795827E+03 0.15848E+02 -0.67836E+01 31938 0.436E+01 0.239E+01
DAV: 8 -0.343553785063E+03 -0.21840E+01 -0.14914E+01 33168 0.782E+00 0.125E+01
DAV: 9 -0.342857664658E+03 0.69612E+00 -0.21096E+00 37692 0.554E+00 0.351E+00
DAV: 10 -0.342966148829E+03 -0.10848E+00 -0.66436E-01 30054 0.247E+00 0.145E+00
DAV: 11 -0.342967212784E+03 -0.10640E-02 -0.70143E-02 34548 0.695E-01 0.491E-01
DAV: 12 -0.342972814370E+03 -0.56016E-02 -0.30386E-02 33516 0.474E-01 0.414E-01
DAV: 13 -0.342972013272E+03 0.80110E-03 -0.10009E-03 34116 0.103E-01 0.245E-01
DAV: 14 -0.342971841916E+03 0.17136E-03 -0.14985E-03 37620 0.752E-02 0.763E-02
DAV: 15 -0.342971872052E+03 -0.30136E-04 -0.15219E-04 27000 0.398E-02 0.347E-02
DAV: 16 -0.342971875239E+03 -0.31864E-05 -0.85503E-06 16662 0.850E-03
1 F= -.34297188E+03 E0= -.34297948E+03 d E =0.228175E-01 mag= 87.5534
Linear response reoptimize wavefunctions to high precision
DAV: 1 -0.342971877135E+03 -0.18958E-05 -0.61269E-06 36432 0.698E-03
DAV: 2 -0.342971877147E+03 -0.12173E-07 -0.12102E-07 26946 0.136E-03
DAV: 3 -0.342971877147E+03 -0.18190E-09 -0.10727E-09 14460 0.995E-05
Linear response DOF= 4
Linear response progress:
Degree of freedom: 1/ 4
generate k-points for: 6 6 5
N E dE d eps ncg rms rms(c)
RMM: 1 -0.171116802305E+00 -0.17112E+00 -0.12199E-01172816 0.754E-01
RMM: 2 -0.164793427051E+00 0.63234E-02 -0.41623E-03 99925 0.281E-01 0.829E-01
RMM: 3 -0.169079119482E+00 -0.42857E-02 -0.72983E-03117291 0.252E-01 0.111E+00
RMM: 4 -0.171211599611E+00 -0.21325E-02 -0.85577E-03 93259 0.366E-01 0.122E+00
RMM: 5 -0.164667327575E+00 0.65443E-02 -0.21559E-03 92782 0.191E-01 0.236E-01
RMM: 6 -0.164582709631E+00 0.84618E-04 -0.26977E-04 94502 0.645E-02 0.141E-01
RMM: 7 -0.164622748492E+00 -0.40039E-04 -0.89592E-05 99835 0.384E-02 0.136E-01
RMM: 8 -0.164585936436E+00 0.36812E-04 -0.21273E-06 96321 0.310E-02 0.488E-02
RMM: 9 -0.164633815294E+00 -0.47879E-04 0.43418E-05110349 0.151E-02 0.511E-02
RMM: 10 -0.164632097392E+00 0.17179E-05 0.55945E-05 98532 0.105E-02 0.128E-02
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
vasp 0000000000E62448 Unknown Unknown Unknown
vasp 000000000113FBD7 Unknown Unknown Unknown
vasp 0000000000473791 Unknown Unknown Unknown
vasp 00000000004420DC Unknown Unknown Unknown
libc.so.6 00002B6BA955AEAD Unknown Unknown Unknown
vasp 0000000000441FB9 Unknown Unknown Unknown
--------------------------------------------------------------------------
mpiexec noticed that process rank 3 with PID 13725 on node wheezy2 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
Last edited by bakkedal on Sat Nov 09, 2013 5:55 pm, edited 1 time in total.

alex
Hero Member
Hero Member
Posts: 577
Joined: Tue Nov 16, 2004 2:21 pm
License Nr.: 5-67
Location: Germany

VASP linear response problem keeps failing. Out of memory?

#2 Post by alex » Mon Nov 11, 2013 5:11 pm

Hi,

I would try

ALGO = N

in the INCAR. But I'm not sure.

Cheers,

alex
Last edited by alex on Mon Nov 11, 2013 5:11 pm, edited 1 time in total.

bakkedal
Newbie
Newbie
Posts: 3
Joined: Mon Jul 15, 2013 12:11 pm

VASP linear response problem keeps failing. Out of memory?

#3 Post by bakkedal » Mon Dec 02, 2013 11:20 am

Hi,

That is the default value (http://cms.mpi.univie.ac.at/vasp/vasp/ALGO_tag.html), so it should already be enabled for this run. However, I now have reasons to be believe that it was actually running out of memory. I did some memory statistics in the background, and it was indeed running very low on memory:

[Before the job crashed]

Sat Nov 9 15:15:01 CET 2013
total used free shared buffers cached
Mem: 32169 31677 492 0 155 759
-/+ buffers/cache: 30761 1407
Swap: 0 0 0

[Just after the job crashed]

Sat Nov 9 15:16:01 CET 2013
total used free shared buffers cached
Mem: 32169 3876 28292 0 155 760
-/+ buffers/cache: 2960 29208
Swap: 0 0 0

I'm still puzzled why it fails with a segmentation fault. If the process tries to allocate memory, and the operating system isn't able to deliver any more, the process should be able to diagose that condition and show a proper error message. Maybe this is a bug? This is a linear response DPFT job.
Last edited by bakkedal on Mon Dec 02, 2013 11:20 am, edited 1 time in total.

admin
Administrator
Administrator
Posts: 2921
Joined: Tue Aug 03, 2004 8:18 am
License Nr.: 458

VASP linear response problem keeps failing. Out of memory?

#4 Post by admin » Mon Dec 02, 2013 6:49 pm

I suppose the diagnose should be written by the OS (libc.so.6) rather that by VASP itself
Last edited by admin on Mon Dec 02, 2013 6:49 pm, edited 1 time in total.

abalone
Newbie
Newbie
Posts: 15
Joined: Tue Jan 08, 2008 7:58 pm

VASP linear response problem keeps failing. Out of memory?

#5 Post by abalone » Sun Jan 12, 2014 3:06 pm

I am having the same problem. The program will just stuck for two days without report any problem. It occurs after the first DOF of linear response is finished.
Last edited by abalone on Sun Jan 12, 2014 3:06 pm, edited 1 time in total.

alex
Hero Member
Hero Member
Posts: 577
Joined: Tue Nov 16, 2004 2:21 pm
License Nr.: 5-67
Location: Germany

VASP linear response problem keeps failing. Out of memory?

#6 Post by alex » Mon Jan 13, 2014 10:05 am

Hi there again,

I would try

ALGO = N

in the INCAR. It look's like that DFPT uses the fast algorithm. Check the 'RMM' in the cycles.

Cheers,

alex
Last edited by alex on Mon Jan 13, 2014 10:05 am, edited 1 time in total.

salina
Newbie
Newbie
Posts: 1
Joined: Sat Jan 18, 2014 2:36 pm
License Nr.: NO LICENSE

VASP linear response problem keeps failing. Out of memory?

#7 Post by salina » Sat Jan 18, 2014 2:38 pm

You mean Pullay stress? Could you explain more?
Last edited by salina on Sat Jan 18, 2014 2:38 pm, edited 1 time in total.
salina

Post Reply