Tensorflow

TensorFlow™ is an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them.

URL: https://www.tensorflow.org/

Important Notes

Using Tensorflow

Our native version of Tensorflow works with our GPU K40 and GPU P100. Before loading any module, please reserve a GPU node. For the default configuration we will run:

srun -p gpu -C gpuk40 --gres=gpu:1 --pty /bin/bash

Loading the Module

To use Tensorflow in our HPC Cluster, one needs to load the corresponding module. To know the dependencies of Tensorflow, run

module spider tensorflow

Output:

----------------------------------------------------------------------------

  tensorflow:

----------------------------------------------------------------------------

    Description:

      TensorFlow™ is an open source software library for numerical

      computation using data flow graphs.

     Versions:

        tensorflow/1.4.0-py2

        tensorflow/1.4.0-py3

----------------------------------------------------------------------------

  For detailed information about a specific "tensorflow" module (including how to load the modules) use the module's full name.

  For example:

 module spider tensorflow/1.4.0-py3

Observe that the Python 2 version has the version name ending with -py2, and the Python 3 version has the version name ending -py3.

If we run the specific example given by the output of the spider command:

module spider tensorflow/1.4.0-py3

We get the following output:

----------------------------------------------------------------------------

  tensorflow: tensorflow/1.4.0-py3

----------------------------------------------------------------------------

    Description:

      TensorFlow™ is an open source software library for numerical

      computation using data flow graphs.

    You will need to load all module(s) on any one of the lines below before the "tensorflow/1.4.0-py3" module is available to load.

      intel/17  openmpi/2.0.1

If we try the Python 2 version:

module spider tensorflow/1.4.0-py2

We get the following output:

----------------------------------------------------------------------------

  tensorflow: tensorflow/1.4.0-py2

----------------------------------------------------------------------------

    Description:

      TensorFlow™ is an open source software library for numerical

      computation using data flow graphs.

    You will need to load all module(s) on any one of the lines below before the "tensorflow/1.4.0-py2" module is available to load.

      gcc/6.3.0  openmpi/2.0.1

For the example, we will use the GCC version with Python 2, but the Intel version with Python 3 works with the same commands.

Since the requirements are to load gcc/6.3.0 and openmpi/2.0.1, we will run:

module load gcc/6.3.0 openmpi/2.0.1

And then

module load tensorflow/1.4.0-py2

To confirm that Tensorflow has been loaded, run:

module list

Output:

Currently Loaded Modules:

  1) StdEnv      3) cuda/8.0        5) base/8.0         7) fftw/3.3.6-pl2   9) tensorflow/1.4.0-py2

  2) gcc/6.3.0   4) openmpi/2.0.1   6) python2/2.7.13   8) MKL/17

Very Important Note: If you have to load other modules, the Tensorflow module has to be the last one to be loaded.

Tensorflow as a Python module

Request a GPU node

srun -p gpu -C gpup100 --gres=gpu:1 --pty /bin/bash

Load modules:

module load gcc/6.3.0 openmpi/2.0.1

module load python/3.6.6

module load cuda/9.0

Execute

python

>>> import tensorflow

>>>

To check the version of the tensorflow module:

pip show tensorflow

Output:

Name: tensorflow

Version: 1.10.0

Running a GPU Example (interactive)

The following example is borrowed from Tensorflow's website [1] and it has been adapted to our system configuration:

Request a GPU node:

srun -p gpu -C gpup100 --gres=gpu:1 --pty --time=30 /bin/bash

Load Tensorflow:

module load gcc/6.3.0

module load tensorflow/1.4.0-py2

Run Python:

python

Copy and paste the following code:

# Import Tensorflow

import tensorflow as tf

# Creates a graph.

with tf.device('/gpu:0'):

  a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')

  b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')

  c = tf.matmul(a, b)

# Creates a session with log_device_placement set to True.

sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

This command should produce an output similar to this one:

2017-12-20 10:42:26.299260: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA

2017-12-20 10:42:26.681573: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties:

name: Tesla P100-PCIE-12GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285

pciBusID: 0000:82:00.0

totalMemory: 11.91GiB freeMemory: 11.63GiB

2017-12-20 10:42:26.681619: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: Tesla P100-PCIE-12GB, pci bus id: 0000:82:00.0, compute capability: 6.0)

Device mapping:

/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla P100-PCIE-12GB, pci bus id: 0000:82:00.0, compute capability: 6.0

2017-12-20 10:42:26.783950: I tensorflow/core/common_runtime/direct_session.cc:299] Device mapping:

/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla P100-PCIE-12GB, pci bus id: 0000:82:00.0, compute capability: 6.0

Copy and paste the following code:

# Runs the op.

print(sess.run(c))

This command should produce the following output:

MatMul: (MatMul): /job:localhost/replica:0/task:0/device:GPU:0

2017-12-20 10:43:36.108687: I tensorflow/core/common_runtime/placer.cc:874] MatMul: (MatMul)/job:localhost/replica:0/task:0/device:GPU:0

b: (Const): /job:localhost/replica:0/task:0/device:GPU:0

2017-12-20 10:43:36.108705: I tensorflow/core/common_runtime/placer.cc:874] b: (Const)/job:localhost/replica:0/task:0/device:GPU:0

a: (Const): /job:localhost/replica:0/task:0/device:GPU:0

2017-12-20 10:43:36.108712: I tensorflow/core/common_runtime/placer.cc:874] a: (Const)/job:localhost/replica:0/task:0/device:GPU:0

[[ 22.  28.]

 [ 49.  64.]]

Notice that we marked in bold the GPU output

Installing Missing Python Packages

We have included the module Keras within the Tensorflow module.

Since our Python virtual environment is a minimal installation that makes Tensorflow to work, you might want to have extra packages that complement our Tensorflow installation. 

To check the available modules in the Tensorflow environment, please use the command

pip freeze

If you need a module that is missing, we would be happy to provide them for you. Please sent us an email at hpc-supportATcase.edu

Refer to HPC Guide to Deep Learning & HPC Software Guide for more information.

Tensorflow from a singularity Container

Please visit our HPC Guide to Singularity.

References

[1] Tensorflow's website: https://www.tensorflow.org