ML_ISTART = 1 doesn't work with different element types - v6.4.1

Problems running VASP: crashes, internal errors, "wrong" results.


Moderators: Global Moderator, Moderator

Post Reply
Message
Author
john_martirez1
Newbie
Newbie
Posts: 3
Joined: Mon Jun 05, 2023 2:01 pm

ML_ISTART = 1 doesn't work with different element types - v6.4.1

#1 Post by john_martirez1 » Wed Jun 28, 2023 2:26 pm

ML_AB file has H O, training for system with H O and C. Just terminates early, not even an SCF gets done.
OK with ML_ISTART = 0.

ferenc_karsai
Global Moderator
Global Moderator
Posts: 460
Joined: Mon Nov 04, 2019 12:44 pm

Re: ML_ISTART = 1 doesn't work with different element types - v6.4.1

#2 Post by ferenc_karsai » Wed Jun 28, 2023 2:35 pm

Please send all necessary files to be able to run and check the calculation.
This means POSCAR, POTCAR, KPOINTS, INCAR, OUTCAR, ML_AB, ML_LOGFILE and stdout.

john_martirez1
Newbie
Newbie
Posts: 3
Joined: Mon Jun 05, 2023 2:01 pm

Re: ML_ISTART = 1 doesn't work with different element types - v6.4.1

#3 Post by john_martirez1 » Wed Jun 28, 2023 2:53 pm

thanks for the quick reply. See attached files.
You do not have the required permissions to view the files attached to this post.

john_martirez1
Newbie
Newbie
Posts: 3
Joined: Mon Jun 05, 2023 2:01 pm

Re: ML_ISTART = 1 doesn't work with different element types - v6.4.1

#4 Post by john_martirez1 » Wed Jun 28, 2023 3:29 pm

I found the reason, there's a significant jump in memory requirements from ML_ISTART = 0 to ML_ISTART = 1.
I increased mem/cpu to 9200 MB, and then it worked.

Hopefully, memory allocation can be improved in the future for ML_ISTART = 1? Or am I missing something?
You do not have the required permissions to view the files attached to this post.

ferenc_karsai
Global Moderator
Global Moderator
Posts: 460
Joined: Mon Nov 04, 2019 12:44 pm

Re: ML_ISTART = 1 doesn't work with different element types - v6.4.1

#5 Post by ferenc_karsai » Mon Jul 03, 2023 9:25 am

It's hard to do anything about the memory allocation.
Here some explanations:
At the moment we have to statically allocate memory at the beginning, this is mainly due to the use of shared memory MPI. We saw several times that one gets problems if shared memory MPI needs to be reallocated. I don't know if this problem will be ever solved for all compilers.

So how can the memory grow so much in your case:
1) New element types entered the calculations. We use multidimensional allocatable arrays in fortran. So the local reference dimension will be allocated with the same maximum for all element types. Ideally one wants to have the same number of local reference configurations for all element types. Of course this is often hard to achive for dopands where we are limited by few atoms as local reference canditates from the training structures. In this case we waste some memory. Your case might belong to that.
2) It's a continuation run and if you don't specify anything then then on top of the already available data min(1500, NSW) is added. Please see the documentation of ML_MB and ML_MCONF (https://www.vasp.at/wiki/index.php/ML_MB and https://www.vasp.at/wiki/index.php/ML_MCONF). This default has worked until now quite nicely but if it turns out it's problematic for the majority of users then we will change that.

What can you do?
1) Check if you compiled with shared memory MPI ("-Duse_shmem").
2) Adjust ML_MB and ML_MCONF.
3) Go to a larger number of compute nodes since the design matrix which needs the most memory is distributed linearly over the number of cores.

Post Reply