I've been testing VASP 5.3's k-point parallelism with a 513-atom surface slab model. Built with Intel 13 against MKL and OpenMPI 1.4.2. I limit the available virtual memory via ulimit to 10000000 kB in order to avoid knocking nodes over, each of which is dual quad-core Nehalem. The job is configured to put only 4 MPI processes per SMP node.
INCAR contains in part
Code: Select all
?GGA=PE
ENCUT=400.0
NELMIN=4
EDIFF=1E-6
LREAL=.TRUE.
ISMEAR?=?1??;??SIGMA=0.1
NCORE=4
KPAR=2
NSIM=4
LPLANE=.TRUE.
IALGO=48
LWAVE=.FALSE.
LCHARG=.FALSE.
LSCALAPACK=.TRUE.
LSCALU=.FALSE.
Maximum memory used (kb): 1387044.
Average memory used (kb): 0.
So assuming that figure is per-process, utilization may be up to 4*1387044/(1024^2) = 5.3 GB. Good so far.
A two irreducible-k-point test fails right after the electronic relaxation completes (the last OUTCAR output reads "aborting loop because EDIFF is reached"), and stdout contains the messages "forrtl: severe (41): insufficient virtual memory", one per MPI process.
- Does the maximum memory usage per MPI process scale linearly with the number of k-points in the IBZ?
- Is there some data distribution that would permit this job to complete by, e.g., doubling the total number of available SMP nodes and MPI processes, or is data fully replicated across processes?
Thanks