SingularitySingularity [1] enables users to have full control of their environment. This means that a non-privileged user can “swap out” the operating system on the host for one they control. So if the host system is running RHEL7 but your application runs in Ubuntu, you can create an Ubuntu image, install your applications into that image, copy the image to another host, and run your application on that host in it’s native Ubuntu environment! Important Notes
Running Singularity in HPCSingularity is installed on the HPC cluster ssh <CaseID>@rider.case.edu Singularity CommandsRequest a compute node srun --pty bash Load the Singularity module: module load singularity Run sigularity. singularity output: USAGE: singularity [global options...] <command> [command options...] ... GLOBAL OPTIONS: -d --debug Print debugging information -h --help Display usage summary -q --quiet Only print errors --version Show application version -v --verbose Increase verbosity +1 -x --sh-debug Print shell wrapper debugging information GENERAL COMMANDS: help Show additional help for a command CONTAINER USAGE COMMANDS: exec Execute a command within container run Launch a runscript within container View the Container command usage singularity exec --help output: USAGE: singularity [...] exec [exec options...] <container path> <command>
TensorflowTensorflow Deep Learning Software has been installed as an image. Copy the tensorflow files to your home directory and cd to it: cp -r /usr/local/doc/SINGULARITY/singularity/tensorflow . cd tensorflow Interactive JobRequest a GPU K40 node: srun -p gpu -C gpuk40 --gres=gpu:1 --pty bash Load the singularity module which defines the Tensorflow path in Rider. module load singularity Note: In RedCat load tensorflow module instead. Run singularity exec $TENSORFLOW python Python 2.7.6 (default, Oct 26 2016, 20:30:19) [GCC 4.8.4] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import tensorflow as tf >>> hello = tf.constant('Hello, TensorFlow!') >>> sess = tf.Session() >>> sess.run(hello) 'Hello, TensorFlow!' >>> a = tf.constant(10) >>> b = tf.constant(32) >>> sess.run(a+b) 42 singularity exec $TENSORFLOW python3 -c 'help("modules")' output: ... _locale copy numpy tempfile _lsprof copyreg opcode tensorboard _lzma crypt operator tensorflow ... Execute python script from the terminal: singularity exec $TENSORFLOW python helloTensor.py Batch JobSingle GPU: The job script (tensor.slurm) in the tensorflow directory looks like this: tensor.slurm: #!/bin/bash #SBATCH --nodes=1 #SBATCH --cpus-per-task=1 #SBATCH --time=00:10:00 #SBATCH -p gpu -C gpuk40 --gres=gpu:1 cp -r log-device-placement.py helloTensor.py $TMPDIR cd $TMPDIR module load tensorflow echo "You are:" singularity exec $TENSORFLOW whoami echo "get the sum from python script using tensor module" singularity exec $TENSORFLOW python helloTensor.py echo "Run Tensorflow Job:" singularity exec $TENSORFLOW python log-device-placement.py cp -ru * $SLURM_SUBMIT_DIR Submit the job: sbatch tensor.slurm Check the output file: cat slurm-<jobid>.out ... 2017-04-25 14:31:16.719309: I tensorflow/core/common_runtime/gpu/gpu_device.cc:887] Found device 0 with properties: name: Tesla K40m major: 3 minor: 5 memoryClockRate (GHz) 0.745 pciBusID 0000:03:00.0 Total memory: 11.17GiB Free memory: 11.10GiB 2017-04-25 14:31:16.719347: I tensorflow/core/common_runtime/gpu/gpu_device.cc:908] DMA: 0 ... e: 0, name: Tesla K40m, pci bus id: 0000:03:00.0) Hello, TensorFlow! 42 ... Device mapping: /job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: Tesla K40m, pci bus id: 0000:03:00.0 MatMul: (MatMul): /job:localhost/replica:0/task:0/gpu:0 b: (Const): /job:localhost/replica:0/task:0/gpu:0 a: (Const): /job:localhost/replica:0/task:0/gpu:0 [[ 22. 28.] [ 49. 64.]] Multiple GPUs in a single Node: The job script "tensor-multiple-gpu.slurm" in the tensorflow directory looks similar to the one for single GPU except that two GPUs in a node are requested using: #SBATCH -p gpu -C gpuk40 --gres=gpu:2 #Each node has only 2 GPUs In the python script, log-device-placement-multiple-gpu.py [8], multiple GPUs are implemented as: for d in ['/gpu:0', '/gpu:1']: with tf.device(d): Submit the Job sbatch tensor-multiple-gpu.slurm Check the output file: cat slurm-<jobid>.out ... Device mapping: /job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: Tesla K40m, pci bus id: 0000:03:00.0 /job:localhost/replica:0/task:0/gpu:1 -> device: 1, name: Tesla K40m, pci bus id: 0000:82:00.0 MatMul_1: (MatMul): /job:localhost/replica:0/task:0/gpu:1 MatMul: (MatMul): /job:localhost/replica:0/task:0/gpu:0 AddN: (AddN): /job:localhost/replica:0/task:0/cpu:0 Const_3: (Const): /job:localhost/replica:0/task:0/gpu:1 Const_2: (Const): /job:localhost/replica:0/task:0/gpu:1 Const_1: (Const): /job:localhost/replica:0/task:0/gpu:0 Const: (Const): /job:localhost/replica:0/task:0/gpu:0 [[ 44. 56.] [ 98. 128.]] Multilayer CNN [9]: Here, the basic implementation of a tensorflow with lower accuracy has been improved through weight initialization, convolution and pooling, dropout etc. Find the script "mnist_softmax.py" in the same tesnsorflow directory. You will also find the job script "tensor-mnist.slurm". Submit the job sbatch tensor-mnist.slurm Check the output file: cat slurm-<jobid>.out ... Test accuracy with a very simple model: 0.9162 step 0, training accuracy 0.02 step 100, training accuracy 0.76 ... step 19900, training accuracy 1 test accuracy with deep convolutional MNIST classifier 0.9915 Note that the accuracy has been bumped from 91.62% to 99.15%. OpenMPICopy the following content in your job file "ompi.slurm". For details, see the section "Singularity Image" below. This file is also available at /usr/local/doc/SINGULARITY/singularity/ompi. ompi.slurm: #!/bin/bash #SBATCH --nodes=1 -n 4 #SBATCH --cpus-per-task=1 #SBATCH --time=2:00:00 module load singularity singularity test /usr/local/doc/SINGULARITY/singularity/ompi/Centos7-ompi.img Submit the job sbatch ompi.slurm See the output at slurm-<jobid> + /usr/local/bin/mpirun --allow-run-as-root /usr/bin/mpi_ring Process 0 sending 10 to 1, tag 201 (4 processes in ring) Process 0 sent to 1 Process 0 decremented value: 9 Process 0 decremented value: 8 Process 0 decremented value: 7 Process 0 decremented value: 6 Process 0 decremented value: 5 Process 0 decremented value: 4 Process 0 decremented value: 3 Process 0 decremented value: 2 Process 0 decremented value: 1 Process 0 decremented value: 0 Process 0 exiting Process 1 exiting Process 2 exiting Process 3 exiting Installing Singularity & Managing Image in Your PC (Linux)tar xzvf singularity-2.2.tar.gz cd singularity-2.2 ./configure --prefix=/usr/local/singularit y/<version> make sudo make install Very Important: You need to have admin access in your PC. You can't install and manipulate your image in HPC but you can just run it. Singularity ImageSingularity images are single files which physically contain the container [3]. The effect of all files existing virtually within a single image greatly simplifies sharing, copying, branching, and other management tasks. It also means that standard file system ACLs apply to access and permission to the container (e.g. I can give read only access to a colleague, or block access completely with a simple chmod command) Create an imageCreate an image container of 2GiB for Centos-7 with the latest Open MPI from GitHub master (will be defined in the definition file centos7-ompi_master.def at /usr/local/doc/SINGULARITY/singularity: sudo /usr/local/singularity/<ver>/bin/sin gularity create --size 2048 Centos7-ompi.img Creating a new image with a maximum size of 2048MiB... Executing image create helper Formatting image with ext3 file system Done. BootstrappingBootstrapping [4] is the process where we install an operating system and then configure it appropriately for a specified need. To do this we use a bootstrap definition file which is a recipe of how to specifically build the container. For this purpose, use the sample definition file centos7-ompi_master.at /usr/local/doc/SINGULARITY/singularity. Bootstrapping through docker container is also possible. Please check the def file "opensees.def" [11] for OpenSees package. sudo /usr/local/singularity/bin/sin gularity bootstrap Centos7-ompi.img centos7-ompi_master.def .... make[2]: Leaving directory `/tmp/git/ompi' make[1]: Leaving directory `/tmp/git/ompi' + /usr/local/bin/mpicc examples/ring_c.c -o /usr/bin/mpi_ring + cd / + rm -rf /tmp/git + exit 0 + /usr/local/bin/mpirun --allow-run-as-root /usr/bin/mpi_ring Process 0 sending 10 to 1, tag 201 (8 processes in ring) Process 0 sent to 1 Process 1 exiting Process 2 exiting Process 3 exiting Process 4 exiting Process 5 exiting Process 6 exiting Process 7 exiting Process 0 decremented value: 9 Process 0 decremented value: 8 Process 0 decremented value: 7 Process 0 decremented value: 6 Process 0 decremented value: 5 Process 0 decremented value: 4 Process 0 decremented value: 3 Process 0 decremented value: 2 Process 0 decremented value: 1 Process 0 decremented value: 0 Process 0 exiting Done. Running Comands from the containerRun the test as a user sin gularity exec ./Centos7-ompi.img whoami root Run the test as included in the test section of the definition file centos7-ompi_master.def. sin gularity exec ./Centos7-ompi.img /usr/local/bin/mpirun --allow-run-as-root /usr/bin/mpi_ring or singularity test Centos7-ompi.img output: Process 0 sending 10 to 1, tag 201 (8 processes in ring) Process 0 sent to 1 ... Process 0 decremented value: 1 Process 0 decremented value: 0 Process 0 exiting Shell to the Image sin gularity shell Centos7-ompi.img Singularity: Invoking an interactive shell within container... Singularity.Centos7-ompi.img> You are inside the container. Now, see the OS. It should be CentOS. Singularity.Centos7-ompi.img> cat /etc/redhat-release CentOS Linux release 7.3.1611 (Core) Exit from the container: exit To execute without shelling but using exec command singularity exec Centos7-ompi.img cat /etc/redhat-release CentOS Linux release 7.3.1611 (Core) Run Python singularity exec Centos7-ompi.img python Python 2.7.5 (default, Nov 6 2016, 00:28:07) [GCC 4.8.5 20150623 (Red Hat 4.8.5-11)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> Since python is invoked in the runscript section of the definition file, you can get the python prompt by using run command singularity run Centos7-ompi.img singularity run Centos7-ompi.img --version output: Arguments received: --version Python 2.7.5 Updating an existing ContainerIt is possible that you may need to make changes to a container after it has been bootstrapped [5]. For that, let’s repeat the Singularity mantra “A user inside a Singularity container is the same user as outside the container”. This means if you want to make changes to your container, you must be root inside your container, which means you must first become root outside your container. Additionally you will need to tell Singularity that you wish to mount the container as --writable so you can change the contents. singularity exec Centos7-ompi.img which ls output: /.exec: line 3: exec: which: not found Let's install the additional package to make it work. sudo /usr/local/singularity/bin/singularity exec --writable Centos7-ompi.img yum install which Loaded plugins: fastestmirror Repodata is over 2 weeks old. Install yum-cron? Or run: yum makecache fast base | 3.6 kB 00:00:00 extras | 3.4 kB 00:00:00 updates | 3.4 kB 00:00:00 updates/7/x86_64/primary_db | 1.3 MB 00:00:00 Determining fastest mirrors * base: mirror.cs.pitt.edu * extras: mirror.netdepot.com * updates: mirror.trouble-free.net Resolving Dependencies --> Running transaction check ---> Package which.x86_64 0:2.20-7.el7 will be installed --> Finished Dependency Resolution Dependencies Resolved ====================================================================================================== Package Arch Version Repository Size ====================================================================================================== Installing: which x86_64 2.20-7.el7 base 41 k Transaction Summary ====================================================================================================== Install 1 Package Total download size: 41 k Installed size: 75 k Is this ok [y/d/N]: y Downloading packages: which-2.20-7.el7.x86_64.rpm | 41 kB 00:00:00 Running transaction check Running transaction test Transaction test succeeded Running transaction Installing : which-2.20-7.el7.x86_64 1/1 Verifying : which-2.20-7.el7.x86_64 1/1 Installed: which.x86_64 0:2.20-7.el7 Complete! Test if it works now singularity exec Centos7-ompi.img which ls /bin/ls Binding Paths and File Sharing http://singularity.lbl.gov/docs-mount Singularity and DockerNote: All the docker images can not be imported in Singularity like Tensorflow. So, please search for singularity definition file (e.g. opensees) for that software if that is the case. Importing Image from DockerImports the docker image into singularity image [6] Case Study: Tensorflow Deep Learning PackageCreating image: sudo /usr/local/singularity/2.2.1/bin/singularity create --size 4000 tensorflow.img Creating a new image with a maximum size of 4000MiB... Executing image create helper Formatting image with ext3 file system Done. Check the size of the image ll -h total 6.7G -rwxr-xr-x 1 root root 4.0G Dec 21 15:46 tensorflow.img sudo /usr/local/singularity/2.2.1/bin/singularity import tensorflow.img docker://tensorflow/tensorflow:latest tensorflow/tensorflow:latest Downloading layer: sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4 Downloading layer: sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4 Downloading layer: sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4 Downloading layer: sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4 Downloading layer: sha256:c5c31e51e344c0b705aa0b296672699554aa15a26d153646a565da55ee4be788 Downloading layer: sha256:dba465f7224bea8a58c12ee01d12597f712145c03d5a85266c3223318efa4d20 Downloading layer: sha256:516dcff0ccd19f4bc5b060d0e4c804408522eae590c0add4c281cac867fdf410 Downloading layer: sha256:41a12187ef13118ad918ebc1ce1e87f6092ea5ac7be5b2663def8fd857c5bac5 Downloading layer: sha256:4731f558e970477ed38d4bd53a05f6d41700b0dc8ae642faa08a3199fc456a5d Downloading layer: sha256:5841945d3549d84bbd7758e04d046ae5b1b6d6de7e89061cac51b0f7914ca499 Downloading layer: sha256:ae36573a6a20cd285f2593d4f07392718c30c887ec28d3a946e7615bfb86a514 Downloading layer: sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4 Downloading layer: sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4 Downloading layer: sha256:51900bc9e720db035e12f6c425dd9c06928a9d1eb565c86572b3aab93d24cfca Downloading layer: sha256:f8419ea7c1b5d667cf26c2c5ec0bfb3502872e5afc6aa85caf2b8c7650bdc8d9 Downloading layer: sha256:3eed5ff20a90a40b0cb7909e79128740f1320d29bec2ae9e025a1d375555db15 Downloading layer: sha256:6c953ac5d795ea26fd59dc5bdf4d335625c69f8bcfbdd8307d6009c2e61779c9 Adding Docker CMD as Singularity runscript... Bootstrap initialization No bootstrap definition passed, updating container Executing Prebootstrap module Executing Postbootstrap module Done. Case Study: OpenSees for Earthquake Egineering Simulation using Def fileCreate Image: sudo /usr/local/singularity/2.3.1/ bin/singularity create --size 2048 opensees.img Get the definition file (opensees.def) from opensees opensees.def: Bootstrap: docker From: sorcerer01/opensees %runscript exec /bin/sh -c /code/bin/OpenSees "$@" %post mkdir /code mv /home/ubuntu/* /code Bootstrap sudo /usr/local/singularity/2.3.1/ bin/singularity bootstrap opensees.img opensees.def
For Tensorflow-GPU installation, visit https://github.com/sxg125/singularity/tree/tensorflow Container support at Run TimeAfter the release of Singularity-2.3, it is no longer necessary to install NVIDIA drivers into your Singularity container to access the GPU on a host node [10]. Now, you can simply use the --nv option to grant your containers GPU support at runtime. See the -nv option: singularity exec --help USAGE: singularity [...] exec [exec options...] <container path> <command> ... -n/--nv Enable experimental Nvidia support ... Running Tesnorlfow with GPU at Runtime [10] Tensorflow-GPU-Runtime job: Download the tensorflow models in your HPC home directory git clone https://github.com/tensorflow/models.git Make sure you are in the directory where you have downloaded models directory and run the job sbatch tensor-runtime.slurm Note the singularity run-time command line in the slurm script. If you want to run python3, find the appropriate image form docker container [12]. singularity exec --nv docker://tensorflow/tensorflow:latest-gpu-py3 python3 ./models/tutorials/image/mnist/convolutional.py Find the output in slurm.<jobid>.out file as showed: 2017-07-05 17:13:00.959072: I tensorflow/core/common_runtime /gpu/gpu_device.cc:961] DMA: 0 2017-07-05 17:13:00.959083: I tensorflow/core/common_runtime /gpu/gpu_device.cc:971] 0: Y ... Step 8500 (epoch 9.89), 12.4 ms Minibatch loss: 1.613, learning rate: 0.006302 Minibatch error: 1.6% Validation error: 0.9% Test error: 0.8% If you run the job in the compute node (without a GPU), you will notice significantly larger computation time (537.4 Vs 12.4) at each stage as showed: ... Step 8500 (epoch 9.89), 537.4 ms Minibatch loss: 1.603, learning rate: 0.006302 Minibatch error: 0.0% Validation error: 0.9% Test error: 0.7% ... Case Study: Natural Machine TranslationThis tutorial [17] gives readers a full understanding of seq2seq models and shows how to build a competitive seq2seq model from scratch. We focus on the task of Neural Machine Translation (NMT) which was the very first testbed for seq2seq models with wild success. Copy the nmt directory from /usr/local/doc/SINGULARITY/singularity to your working directory and change directory to it cp -r /usr/local/doc/SINGULARITY/singularity/ nmt . Find the job file, nmt.slurm, in the directory. Now, you can submit the job. sbatch nmt.slurm See the log file slurm-<jobid>.out Testing Singularity: Check the version singularity exec tensorflow.img cat /etc/os-release DISTRIB_ID=Ubuntu DISTRIB_RELEASE=14.04 DISTRIB_CODENAME=trusty DISTRIB_DESCRIPTION="Ubuntu 14.04.4 LTS" Interactive job: singularity exec tensorflow.img python Python 2.7.6 (default, Oct 26 2016, 20:30:19) [GCC 4.8.4] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import tensorflow as tf >>> hello = tf.constant('Hello, TensorFlow!') >>> sess = tf.Session() >>> sess.run(hello) 'Hello, TensorFlow!' >>> a = tf.constant(10) >>> b = tf.constant(32) >>> sess.run(a+b) 42 Execute python script from the terminal: singularity exec tensorflow.img python helloTensor.py GPU: Check the version: singularity exec tensorflow.img /usr/local/cuda/bin/nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2016 NVIDIA Corporation Built on Tue_Jan_10_13:22:03_CST_2017 Cuda compilation tools, release 8.0, V8.0.61 Test Tensor Flow with GPU: See tensorflow sectiona above. singularity exec tensorflow.img python log-device-placement.py For batch job, see the section "batch Job" above [7]. Theano/Tensorflow Backend for KerasChange the last line from at ~/.keras/keras.json tensorflow to theano - https://keras.io/backend/: cat ~/.keras/keras.json { "epsilon": 1e-07, "floatx": "float32", "image_data_format": "channels_last", "backend": "theano" } Test singularity exec $TENSORFLOW python -c "import keras" Using Theano backend. Alternatively, you can just change the KERA_BACKEND environment as showed: export KERAS_BACKEND=tensorflow singularity exec $TENSORFLOW python -c "import keras" Note: If you are running on the TensorFlow backend, your code will automatically run on GPU if any available GPU is detected. If you are running on the Theano backend, you can use one of the following methods: Method 1: use Theano flags. THEANO_FLAGS=device=gpu,floatX=float32 python my_keras_script.py Install Anaconda Package from the source file Get the source file in /opt directory Expand the image size to 2GB as the container size was not sufficient: sudo /usr/local/singularity/2.2.1/b in/singularity expand --size 2048 tensorflow.img Install Anaconda in /usr/local sudo /usr/local/singularity/2.2.1/b in/singularity exec --writable tensorflow.img /bin/bash /opt/ Anaconda2-4.3.1-Linux-x8 6_64.sh -b -p /usr/local/anaconda Set the Environment Path: sudo /usr/local/singularity/2.2.1/b in/singularity exec --writable tensorflow.img vi /environment sudo /usr/local/singularity/2.2.1/b in/singularity exec --writable tensorflow.img cat /environment # Define any environment init code here if test -z "$SINGULARITY_INIT"; then PATH=$PATH: /usr/local/anaconda /bin :/usr/local/cuda/bin:/bin: /sbin:/usr/bin:/usr/sbin:/usr/ local/bin:/usr/local/sbin CUDA_HOME=/usr/local/cuda LD_LIBRARY_PATH="/usr/local/cu da/extras/CUPTI/lib64:/usr/loc al/cuda/lib64:$LD_LIBRARY_PATH " PS1="Singularity.$SINGULARITY_ CONTAINER> $PS1" SINGULARITY_INIT=1 export LD_LIBRARY_PATH CUDA_HOME PATH PS1 SINGULARITY_INIT fi More Operations: Copy the files from container to container: sudo /usr/local/singularity/2.2.1/bin/singularity exec --writable tensorflow-gpu.img cp /usr/local/cuda-8.0/targets/x86_64-linux/lib/stubs/libcuda.so /usr/local/cuda/lib64 Install the package: sudo /usr/local/singularity/2.2.1/bin/singularity exec --writable tensorflow-gpu.img apt install nvidia-361-dev Install python modules h5py and cython: sudo /usr/local/singularity/2.2.1/b in/singularity exec --writable tensorflow.img python3 -m pip install h5py # for PYTHON3; or pip3 sudo /usr/local/singularity/2.2.1/b in/singularity exec --writable $TENSORFLOW python -m pip install cython # for PYTHON2; or pip copy from host to the container: sudo /usr/local/singularity/2.2.1/bin/singularity copy tensorflow-gpu.img /usr/lib64/libcuda.so /usr/local/cuda/lib64 Running your own Image in HPCTransfer your image you created in your PC to your home directory at HPC and run it using Singularity installed in HPC following the section "Running Singularity in HPC" above. Troubleshooting1. Issue: E: Failed to fetch error when installing the packages Solution: sudo apt-get update 2. Issue:perl: warning: Setting locale failed. perl: warning: Please check that your locale settings: LANGUAGE = (unset), LC_ALL = (unset), LANG = "en_US.UTF-8" are supported and installed on your system. Solution: apt-get install language-pack-en-base Then purge the installed package apt-get purge <package> References: [1] Singularity Home [9] Tensorflow - MNIST Tutorial [10] HPC @ NIH [11] OpenSees def file [13] Anaconda/Keras: https://github.com/Drunkar/dockerfiles/tree/master/anaconda-tensorflow-gpu-keras [14] All-in-One Docker Image: https://github.com/floydhub/dl-docker [15] Singularity Docker: http://singularity.lbl.gov/docs-docker [16] All Available Docker Images for Tensorflow: https://hub.docker.com/r/tensorflow/tensorflow/tags/ |
HPC Home > Software Guide >