Dear, all. I encounterred this error and was able to repeat to get the same error. It looks like my vasp program got seg fault/mem violation but I do not know how to intepret this part of mpi.
Our system is rocks 4.3 x86_64, openmpi-1.2.5, scalapack-1.8.0,
Barcelona, Gigabit interconnections.
# cat 2156.jupiter.mynetwork.com.out | wc -l
614
# cat 2089.jupiter.mynetwork.com.out | wc -l
157
The interesting part is that the same job ran on different nodes and got the same error but at different iterations. For job 2156, it took much longer to see the error and for job 2089 the error happened earlier.
[test@Jupiter ]$ cat Co0001.e2089
[compute-1-1:14557] *** Process received signal ***
[compute-1-1:14557] Signal: Segmentation fault (11)
[compute-1-1:14557] Signal code: Address not mapped (1)
[compute-1-1:14557] Failing at address: (nil)
[compute-1-1:14557] [ 0] /lib64/tls/libpthread.so.0 [0x3db530c4f0]
[compute-1-1:14557] [ 1] /usr/local/bin/vaspopenmpi_scala(__dfast__cnorma+0x1e4) [0x4dd884]
[compute-1-1:14557] [ 2] /usr/local/bin/vaspopenmpi_scala(__rmm_diis__eddrmm+0x6dbd) [0x5b25fd]
[compute-1-1:14557] [ 3] /usr/local/bin/vaspopenmpi_scala(elmin_+0x32fa) [0x608a9a][compute-1-1:14557] [ 4] /usr/local/bin/vaspopenmpi_scala(MAIN__+0x15492) [0x425f4a]
[compute-1-1:14557] [ 5] /usr/local/bin/vaspopenmpi_scala(main+0xe) [0x6ed9ee]
[compute-1-1:14557] [ 6] /lib64/tls/libc.so.6(__libc_start_main+0xdb) [0x3db441c3fb]
[compute-1-1:14557] [ 7] /usr/local/bin/vaspopenmpi_scala [0x410a2a]
[compute-1-1:14557] *** End of error message ***
mpiexec noticed that job rank 0 with PID 14557 on node compute-1-1.local exited on signal 11 (Segmentation fault).
[test@Jupiter ]$ cat Co0001.e2156
[compute-1-2:03847] *** Process received signal ***
[compute-1-2:03847] Signal: Segmentation fault (11)
[compute-1-2:03847] Signal code: Address not mapped (1)
[compute-1-2:03847] Failing at address: (nil)
[compute-1-2:03847] [ 0] /lib64/tls/libpthread.so.0 [0x3984e0c4f0]
[compute-1-2:03847] [ 1] /usr/local/bin/vaspopenmpi_scala(__dfast__cnorma+0x1e4) [0x4dd884]
[compute-1-2:03847] [ 2] /usr/local/bin/vaspopenmpi_scala(__rmm_diis__eddrmm+0x6dbd) [0x5b25fd]
[compute-1-2:03847] [ 3] /usr/local/bin/vaspopenmpi_scala(elmin_+0x32fa) [0x608a9a][compute-1-2:03847] [ 4] /usr/local/bin/vaspopenmpi_scala(MAIN__+0x15492) [0x425f4a]
[compute-1-2:03847] [ 5] /usr/local/bin/vaspopenmpi_scala(main+0xe) [0x6ed9ee]
[compute-1-2:03847] [ 6] /lib64/tls/libc.so.6(__libc_start_main+0xdb) [0x3983f1c3fb]
[compute-1-2:03847] [ 7] /usr/local/bin/vaspopenmpi_scala [0x410a2a]
[compute-1-2:03847] *** End of error message ***
mpiexec noticed that job rank 0 with PID 3847 on node compute-1-2.local exited on signal 11 (Segmentation fault).
Could somebody tell me what caused this type of error?
Thank you very much for your helps.
Signal code: Address not mapped (1)
Moderators: Global Moderator, Moderator
-
- Newbie
- Posts: 11
- Joined: Mon Apr 02, 2007 11:32 pm
Signal code: Address not mapped (1)
Last edited by midair77 on Thu Apr 24, 2008 11:40 pm, edited 1 time in total.
-
- Administrator
- Posts: 2921
- Joined: Tue Aug 03, 2004 8:18 am
- License Nr.: 458
Signal code: Address not mapped (1)
from the .e files you show, the address which fails seems to be primarily one of the routines in the pthread library. Please check if the errors are due to paralellization (i.e. if a single-porcessor job crashes as well)
Last edited by admin on Tue Apr 29, 2008 10:07 am, edited 1 time in total.
Signal code: Address not mapped (1)
Hi,
I get the same error. This happens when I use ultra soft PP.
When I use PAW instead of USPP, it runs well.
Have any one explain for this strange case? I wan to use USPP but i can not implement.
Thanks for all your help.
Sincerely,
Loc.
I get the same error. This happens when I use ultra soft PP.
When I use PAW instead of USPP, it runs well.
Have any one explain for this strange case? I wan to use USPP but i can not implement.
Thanks for all your help.
Sincerely,
Loc.
Last edited by dinhloc1984 on Fri Sep 18, 2009 1:40 am, edited 1 time in total.
-
- Newbie
- Posts: 24
- Joined: Wed Feb 17, 2010 11:34 pm
- License Nr.: 1118
Signal code: Address not mapped (1)
Hello;
I get this error with IBRION=0 calculations.
It occurs in both serial and parallel on Xeon quad compiled with gfort.
regards;
Sonny
<span class='smallblacktext'>[ Edited Fri Aug 06 2010, 06:16AM ]</span>
I get this error with IBRION=0 calculations.
It occurs in both serial and parallel on Xeon quad compiled with gfort.
regards;
Sonny
<span class='smallblacktext'>[ Edited Fri Aug 06 2010, 06:16AM ]</span>
Last edited by Sonny on Sat Jul 31, 2010 6:56 am, edited 1 time in total.