Signal code: Address not mapped (1)

Problems running VASP: crashes, internal errors, "wrong" results.


Moderators: Global Moderator, Moderator

Post Reply
Message
Author
midair77
Newbie
Newbie
Posts: 11
Joined: Mon Apr 02, 2007 11:32 pm

Signal code: Address not mapped (1)

#1 Post by midair77 » Thu Apr 24, 2008 11:40 pm

Dear, all. I encounterred this error and was able to repeat to get the same error. It looks like my vasp program got seg fault/mem violation but I do not know how to intepret this part of mpi.

Our system is rocks 4.3 x86_64, openmpi-1.2.5, scalapack-1.8.0,
Barcelona, Gigabit interconnections.

# cat 2156.jupiter.mynetwork.com.out | wc -l
614
# cat 2089.jupiter.mynetwork.com.out | wc -l
157

The interesting part is that the same job ran on different nodes and got the same error but at different iterations. For job 2156, it took much longer to see the error and for job 2089 the error happened earlier.


[test@Jupiter ]$ cat Co0001.e2089
[compute-1-1:14557] *** Process received signal ***
[compute-1-1:14557] Signal: Segmentation fault (11)
[compute-1-1:14557] Signal code: Address not mapped (1)
[compute-1-1:14557] Failing at address: (nil)
[compute-1-1:14557] [ 0] /lib64/tls/libpthread.so.0 [0x3db530c4f0]
[compute-1-1:14557] [ 1] /usr/local/bin/vaspopenmpi_scala(__dfast__cnorma+0x1e4) [0x4dd884]
[compute-1-1:14557] [ 2] /usr/local/bin/vaspopenmpi_scala(__rmm_diis__eddrmm+0x6dbd) [0x5b25fd]
[compute-1-1:14557] [ 3] /usr/local/bin/vaspopenmpi_scala(elmin_+0x32fa) [0x608a9a][compute-1-1:14557] [ 4] /usr/local/bin/vaspopenmpi_scala(MAIN__+0x15492) [0x425f4a]
[compute-1-1:14557] [ 5] /usr/local/bin/vaspopenmpi_scala(main+0xe) [0x6ed9ee]
[compute-1-1:14557] [ 6] /lib64/tls/libc.so.6(__libc_start_main+0xdb) [0x3db441c3fb]
[compute-1-1:14557] [ 7] /usr/local/bin/vaspopenmpi_scala [0x410a2a]
[compute-1-1:14557] *** End of error message ***
mpiexec noticed that job rank 0 with PID 14557 on node compute-1-1.local exited on signal 11 (Segmentation fault).


[test@Jupiter ]$ cat Co0001.e2156
[compute-1-2:03847] *** Process received signal ***
[compute-1-2:03847] Signal: Segmentation fault (11)
[compute-1-2:03847] Signal code: Address not mapped (1)
[compute-1-2:03847] Failing at address: (nil)
[compute-1-2:03847] [ 0] /lib64/tls/libpthread.so.0 [0x3984e0c4f0]
[compute-1-2:03847] [ 1] /usr/local/bin/vaspopenmpi_scala(__dfast__cnorma+0x1e4) [0x4dd884]
[compute-1-2:03847] [ 2] /usr/local/bin/vaspopenmpi_scala(__rmm_diis__eddrmm+0x6dbd) [0x5b25fd]
[compute-1-2:03847] [ 3] /usr/local/bin/vaspopenmpi_scala(elmin_+0x32fa) [0x608a9a][compute-1-2:03847] [ 4] /usr/local/bin/vaspopenmpi_scala(MAIN__+0x15492) [0x425f4a]
[compute-1-2:03847] [ 5] /usr/local/bin/vaspopenmpi_scala(main+0xe) [0x6ed9ee]
[compute-1-2:03847] [ 6] /lib64/tls/libc.so.6(__libc_start_main+0xdb) [0x3983f1c3fb]
[compute-1-2:03847] [ 7] /usr/local/bin/vaspopenmpi_scala [0x410a2a]
[compute-1-2:03847] *** End of error message ***
mpiexec noticed that job rank 0 with PID 3847 on node compute-1-2.local exited on signal 11 (Segmentation fault).

Could somebody tell me what caused this type of error?

Thank you very much for your helps.
Last edited by midair77 on Thu Apr 24, 2008 11:40 pm, edited 1 time in total.

admin
Administrator
Administrator
Posts: 2921
Joined: Tue Aug 03, 2004 8:18 am
License Nr.: 458

Signal code: Address not mapped (1)

#2 Post by admin » Tue Apr 29, 2008 10:07 am

from the .e files you show, the address which fails seems to be primarily one of the routines in the pthread library. Please check if the errors are due to paralellization (i.e. if a single-porcessor job crashes as well)
Last edited by admin on Tue Apr 29, 2008 10:07 am, edited 1 time in total.

dinhloc1984

Signal code: Address not mapped (1)

#3 Post by dinhloc1984 » Fri Sep 18, 2009 1:40 am

Hi,

I get the same error. This happens when I use ultra soft PP.
When I use PAW instead of USPP, it runs well.

Have any one explain for this strange case? I wan to use USPP but i can not implement.

Thanks for all your help.

Sincerely,
Loc.
Last edited by dinhloc1984 on Fri Sep 18, 2009 1:40 am, edited 1 time in total.

Sonny
Newbie
Newbie
Posts: 24
Joined: Wed Feb 17, 2010 11:34 pm
License Nr.: 1118

Signal code: Address not mapped (1)

#4 Post by Sonny » Sat Jul 31, 2010 6:56 am

Hello;

I get this error with IBRION=0 calculations.
It occurs in both serial and parallel on Xeon quad compiled with gfort.


regards;

Sonny


<span class='smallblacktext'>[ Edited Fri Aug 06 2010, 06:16AM ]</span>
Last edited by Sonny on Sat Jul 31, 2010 6:56 am, edited 1 time in total.

Post Reply