Page 1 of 1

Problems running VASP in parallel mode with 5 nodes/64 cores per node/ 250 Gb RAM per node

Posted: Mon May 27, 2013 11:20 am
by 5-1051
Hi,

I m working fine with VASP 5.3 in a system with "vacuum +monolayer+substrate" of about 1500 atoms with 4 nodes and 64 cores/node and 250 Gb/node as RAM, (ARCH: lx26-amd64, It uses Mellanox Infiniband QDR 40 Gb/s for parallel communication and file system access). In that case, I m using almost 200 Gb/node during execution. For now, all seems to be ok. but I m in the limit of having memory problems because so much memory is used, and I will need to increase the number of atoms.

However, it would be interesting a new node to include in my calculations to reduce the load memory per node. But now, it is imposible to have an stable running with 5 nodes. I ve tried different combinations of parameters KPAR, NCORE or NPAR, NSIM, LPANE but nothing seems to be work. The execution always breaks during the first interation at EDDAV, after POTLOK and SETDIJ.

Am I in the limit of VASP 5.3 for handling RAM? or It is a limitation of my CLUSTER?

If VASP 5.3 can manage any memory independently of the number of nodes, Can anyone help me to configure VASP for running in 5 nodes? or should I use even nodes instead?.

My script for running VASP is:

#!/bin/bash
#
#$ -cwd
#$ -o job.out -j
#$ -pe mp64 256
## Create rank file
./mkrnkfile.sh
mpirun -np 128 --rankfile rank.$JOB_ID --bind-to-core vasp.

and my INCAR is:

ISTART = 0; ICHARG = 2
GGA = PE
PREC = High
AMIN = 0.01
general:
SYSTEM = (110)system vacuum
LWAVE = .FALSE.
LCHARG = .FALSE.
LREAL = Auto
ISMEAR = 1; SIGMA = 0.2
ALGO = Fast
NGX = 194; NGY = 316; NGZ = 382

linux:
LSCALAPACK = .TRUE.
NCORE = 32
KPAR = 1
LSCALU = .FALSE.
LPLANE = .TRUE.
NSIM = 1
LREAL = Auto
no magnetic:
ISPIN = 1
dynamics:
NSW = 0
IBRION = 0

I m only using 3 k-points (irr).

Thank you for your attention.

Problems running VASP in parallel mode with 5 nodes/64 cores per node/ 250 Gb RAM per node

Posted: Mon May 27, 2013 3:50 pm
by alex
Hi,

is there any error message?

Cheers,

alex

Problems running VASP in parallel mode with 5 nodes/64 cores per node/ 250 Gb RAM per node

Posted: Mon May 27, 2013 7:55 pm
by 5-1051
Not really.

I found some errors in the log of infiniband.

It seems to be waiting for something indefinitely.
<span class='smallblacktext'>[ Edited Tue May 28 2013, 08:01AM ]</span>

Problems running VASP in parallel mode with 5 nodes/64 cores per node/ 250 Gb RAM per node

Posted: Tue May 28, 2013 8:35 am
by alex
Please check, if you can log in from the executing host (normally the first in the run list) to the other without giving a password. Since you are yousing 256 cores it looks like you have at least for physical machines ...

Hth

alex

Problems running VASP in parallel mode with 5 nodes/64 cores per node/ 250 Gb RAM per node

Posted: Thu May 30, 2013 3:53 pm
by 5-1051
Hi,

Many Thanks!, I will but ...

I was repeating my last well-ended calculation with 4 nodes and 1500 atoms, but now there no way. Must be something wrong in my cluster. Anyway thanks you for your coments.

Cheers,
Cesar