Tensorflow
TensorFlow™ is an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them.
URL: https://www.tensorflow.org/
Important Notes
This guide is for our "native" version of Tensorflow. This is not a guide on how to use it under a Singularity container.
tensorflow-1.4.0 is installed in a Python virtual environment (py2 and py3) with the minimal configuration. For extra packages, read the section on Installing Missing Python Packages at the end of the page.
tensorflow-1.10.0 is installed as a module in python/3.6.6
Visit HPC Guide to Singularity for tensorflow container solution.
Using Tensorflow
Our native version of Tensorflow works with our GPU K40 and GPU P100. Before loading any module, please reserve a GPU node. For the default configuration we will run:
srun -p gpu -C gpuk40 --gres=gpu:1 --pty /bin/bash
Loading the Module
To use Tensorflow in our HPC Cluster, one needs to load the corresponding module. To know the dependencies of Tensorflow, run
module spider tensorflow
Output:
----------------------------------------------------------------------------
tensorflow:
----------------------------------------------------------------------------
Description:
TensorFlow™ is an open source software library for numerical
computation using data flow graphs.
Versions:
tensorflow/1.4.0-py2
tensorflow/1.4.0-py3
----------------------------------------------------------------------------
For detailed information about a specific "tensorflow" module (including how to load the modules) use the module's full name.
For example:
module spider tensorflow/1.4.0-py3
Observe that the Python 2 version has the version name ending with -py2, and the Python 3 version has the version name ending -py3.
If we run the specific example given by the output of the spider command:
module spider tensorflow/1.4.0-py3
We get the following output:
----------------------------------------------------------------------------
tensorflow: tensorflow/1.4.0-py3
----------------------------------------------------------------------------
Description:
TensorFlow™ is an open source software library for numerical
computation using data flow graphs.
You will need to load all module(s) on any one of the lines below before the "tensorflow/1.4.0-py3" module is available to load.
intel/17 openmpi/2.0.1
If we try the Python 2 version:
module spider tensorflow/1.4.0-py2
We get the following output:
----------------------------------------------------------------------------
tensorflow: tensorflow/1.4.0-py2
----------------------------------------------------------------------------
Description:
TensorFlow™ is an open source software library for numerical
computation using data flow graphs.
You will need to load all module(s) on any one of the lines below before the "tensorflow/1.4.0-py2" module is available to load.
gcc/6.3.0 openmpi/2.0.1
For the example, we will use the GCC version with Python 2, but the Intel version with Python 3 works with the same commands.
Since the requirements are to load gcc/6.3.0 and openmpi/2.0.1, we will run:
module load gcc/6.3.0 openmpi/2.0.1
And then
module load tensorflow/1.4.0-py2
To confirm that Tensorflow has been loaded, run:
module list
Output:
Currently Loaded Modules:
1) StdEnv 3) cuda/8.0 5) base/8.0 7) fftw/3.3.6-pl2 9) tensorflow/1.4.0-py2
2) gcc/6.3.0 4) openmpi/2.0.1 6) python2/2.7.13 8) MKL/17
Very Important Note: If you have to load other modules, the Tensorflow module has to be the last one to be loaded.
Tensorflow as a Python module
Request a GPU node
srun -p gpu -C gpup100 --gres=gpu:1 --pty /bin/bash
Load modules:
module load gcc/6.3.0 openmpi/2.0.1
module load python/3.6.6
module load cuda/9.0
Execute
python
>>> import tensorflow
>>>
To check the version of the tensorflow module:
pip show tensorflow
Output:
Name: tensorflow
Version: 1.10.0
Running a GPU Example (interactive)
The following example is borrowed from Tensorflow's website [1] and it has been adapted to our system configuration:
Request a GPU node:
srun -p gpu -C gpup100 --gres=gpu:1 --pty --time=30 /bin/bash
Load Tensorflow:
module load gcc/6.3.0
module load tensorflow/1.4.0-py2
Run Python:
python
Copy and paste the following code:
# Import Tensorflow
import tensorflow as tf
# Creates a graph.
with tf.device('/gpu:0'):
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
This command should produce an output similar to this one:
2017-12-20 10:42:26.299260: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2017-12-20 10:42:26.681573: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties:
name: Tesla P100-PCIE-12GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:82:00.0
totalMemory: 11.91GiB freeMemory: 11.63GiB
2017-12-20 10:42:26.681619: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: Tesla P100-PCIE-12GB, pci bus id: 0000:82:00.0, compute capability: 6.0)
Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla P100-PCIE-12GB, pci bus id: 0000:82:00.0, compute capability: 6.0
2017-12-20 10:42:26.783950: I tensorflow/core/common_runtime/direct_session.cc:299] Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla P100-PCIE-12GB, pci bus id: 0000:82:00.0, compute capability: 6.0
Copy and paste the following code:
# Runs the op.
print(sess.run(c))
This command should produce the following output:
MatMul: (MatMul): /job:localhost/replica:0/task:0/device:GPU:0
2017-12-20 10:43:36.108687: I tensorflow/core/common_runtime/placer.cc:874] MatMul: (MatMul)/job:localhost/replica:0/task:0/device:GPU:0
b: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2017-12-20 10:43:36.108705: I tensorflow/core/common_runtime/placer.cc:874] b: (Const)/job:localhost/replica:0/task:0/device:GPU:0
a: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2017-12-20 10:43:36.108712: I tensorflow/core/common_runtime/placer.cc:874] a: (Const)/job:localhost/replica:0/task:0/device:GPU:0
[[ 22. 28.]
[ 49. 64.]]
Notice that we marked in bold the GPU output
Installing Missing Python Packages
We have included the module Keras within the Tensorflow module.
Since our Python virtual environment is a minimal installation that makes Tensorflow to work, you might want to have extra packages that complement our Tensorflow installation.
To check the available modules in the Tensorflow environment, please use the command
pip freeze
If you need a module that is missing, we would be happy to provide them for you. Please sent us an email at hpc-supportATcase.edu
Refer to HPC Guide to Deep Learning & HPC Software Guide for more information.
Tensorflow from a singularity Container
Please visit our HPC Guide to Singularity.
References
[1] Tensorflow's website: https://www.tensorflow.org