Any type of short introduction to python would necessarily be incomplete. However, it has been my experience that many people who get into computational material science fundamentally lack the programming resources to impact immediately. The purpose of this section is to provide some directions on how to setup your environment similar to mine.
NumPy Cheat Sheet: Data Analysis in Python, by Datacamp
Scikit-Learn Cheat Sheet: Python Machine Learning, by Datacamp
The next step is to install the Anaconda distribution of Python, as Anaconda provides a wide range of cutting-edge Python packages for scientific computing. Setting up python from a base installation can be a particularly difficult process particularly if you're working from a shell account. I have found that the following installation instructions for python work, particularly when you don't have administration rights for a machine.
Download the Anaconda for Windows installer and install. Choose the python 3 installer.
Download Anaconda for Linux. Choose python 3.6 or higher.
This should download a file called Anaconda3-4.4.0-Linux-x86_64.sh or something similar. Upload this file to your home directory or something similar.
Run the following command: bash Anaconda3-4.4.0-Linux-x86_64.sh. This will create ~/Anaconda3 directory which will contain a full python installation with many of the standard tools used in this group.
there is a pre-built module on pitzer. You can just load it with
module load python/3.6-conda5.2
conda install numpy pandas sklearn scipy
conda install -c conda-forge ase
I do a lot of development on my apple laptop so it is necessary to get mpi4py working. These directions are currently for Mojave
1.Download install Xcode from the Apple Store.
2.Install the command line tools with command
$ xcode-select --install
These directions change quite often due to support of the compile chain on hipergator. These directions were last updated on 19JAN2019
Compilation should be done an development node.
$ module load ufrc
$ srundev --time=04:00:00
It is necessary to main consistent toolchains when using MPI. On hipergator, your compiler toolchain. For compatibility and consistency of the toolchain, for vasp and lammps, I currently prefer the following toolchain:
$ pip uninstall mpi4py
$ module load intel/2018.1.163
$ module load openmpi/3.1.2
$ pip install mpi4py
For the batch submission script on hipergator, you need the following lines
# load the necessary modules
module load intel/2018.1.163
module load openmpi/3.1.2
# changes to the open MPI environmental variables so it can run on hipergator
export OMPI_MCA_pml=^ucx
export OMPI_MCA_mpi_warn_on_fork=0
# need the pmi2 libraries because the pmix_v2 libraries don't work for some reason
srun --mpi=pmi2 python mc_iterative_sampler.py
where mc_interative_sampler.py is a python program which uses mpi4py. PYPOSPACK is dependent upon forks to run LAMMPS. To suppress these warning we need to set the environment variable, OMPI_MCA_mpi_warn_on_fork to 0.
You can check if mpi4py runs if (I have not tested if mpiexec will work on development nodes, but srun clearly does not):
srun -p hpg2-dev -n 8 --mem-per-cpu 6gb -t 60:00 --pty -u bash -i
mpiexec -n 8 python demo/helloworld.py
or
srundev -n 8 --mem-per-cpu 6gb -t 60:00
module load intel/2018.1.163
module load openmpi/3.1.2
mpiexec -n 8 python demo/helloworld.py
References:
Dalcin, Lisandro D., et al. "Parallel distributed computing using python." Advances in Water Resources 34.9 (2011): 1124-1139.
Dalcín, Lisandro, et al. "MPI for Python: Performance improvements and MPI-2 extensions." Journal of Parallel and Distributed Computing 68.5 (2008): 655-662.
Dalcín, Lisandro, Rodrigo Paz, and Mario Storti. "MPI for Python." Journal of Parallel and Distributed Computing 65.9 (2005): 1108-1115.