Page 1 of 1

Very strange problem when running parallel vasp5

Posted: Wed Mar 23, 2011 3:28 am
by stshcs
I used the latest vasp5.2.11 to calculate B4C crystal.
First I submit a task with 4 nodes, every node has 2 cpu. It means this task will distribute 8 parallel threads and all are normal.

Then I submit another VASP task. It is also with 4 nodes. This time the task can not be accepted. The error messages like the following:
vasp5.2.11: error while loading shared libraries: libmkl_lapack.so: cannot open shared object file: No such file or directory

It's a very strange problem. I compile vasp5.2.11 with Intel fortran/c 9.1 em64, mkl-9.1. And I also add library path to the ld_library_path.
================== Here is the intel fortran information========
Intel(R) Fortran Compiler for Intel(R) EM64T-based applications, Version 9.1 Build 20071016 Package ID: l_fc_c_9.1.052
Copyright (C) 1985-2007 Intel Corporation. All rights reserved.
FOR NON-COMMERCIAL USE ONLY
==============end of intel fortran information==========
I want to know if the "NON-COMMERCIAL INTEL FORTRAN and MKL" limit the concurrent threads.

The following is the option when compiling parallel vasp5.2.11
BLAS= -L/opt/intel/mkl/9.1.023/lib/em64t -lmkl_em64t -lguide -lpthread

LAPACK= -L/opt/intel/mkl/9.1.023/lib/em64t -lmkl_lapack

LIB = -L../vasp.5.lib -ldmy \
../vasp.5.lib/linpack_double.o $(LAPACK) \
$(BLAS)

LINK = -L/opt/intel/fce/9.1.052/lib/ -lsvml

~:(

Very strange problem when running parallel vasp5

Posted: Wed Mar 23, 2011 8:05 am
by alex
It looks like that your nodes hold different .profile or .cshrc. The error arises from the variable LD_LIBRARYPATH.
Figure out, which is the node where vasp is started on, check, if the LD_LIBRARYPATH is set to find mkl and retry.
Or: Use NFS to have the same $HOME everywhere (much preferred).
The variable might also read LD_LIBRARY_PATH (two '_'), I do not remember ...

Hth

alex

Very strange problem when running parallel vasp5

Posted: Wed Mar 23, 2011 9:53 am
by stshcs
Thank alex. It's not the case as you say. NFS is used in our cluster, every node is the same profile when a user login.

Very strange problem when running parallel vasp5

Posted: Thu Mar 24, 2011 8:22 am
by alex
Hm, maybe your mkl isn't installed where it is looked for. Submit a job like the one which did not work and try
ldd path_to_vasp.exe
and check if all libraries are found.

Hth

alex

Very strange problem when running parallel vasp5

Posted: Tue Mar 29, 2011 1:32 am
by stshcs
Thank alex. I have found the problem. It's from the "LD_LIBRARY_PATH" which you have said. The /etc/profile in some nodes is different, so those nodes don't set the right environment.