Deep Learning & GPUs

Intro:

https://drive.google.com/open?id=0B5mjl2eagJoWYjVZZzdneTRKeVE

1. Caffe with GPUs

https://sites.google.com/a/ku.th/gpu/caffe

Caffe with Spark

https://github.com/yahoo/CaffeOnSpark

2. Torch with GPUs

https://sites.google.com/a/ku.th/gpu/torch

3. Theano with GPUs

https://sites.google.com/a/ku.th/gpu/theano

4. TensorFlow with GPUs

Pip Installation

https://www.tensorflow.org/versions/r0.10/get_started/os_setup.html#pip-installation

https://sites.google.com/a/ku.th/gpu/tensorflow

Distributed tensorflow

https://sites.google.com/a/ku.th/gpu/tensorflow/distributed-tensorflow

https://sites.google.com/a/ku.th/gpu/tensorflow/distributed-tensorflow2

Tensorflow with MPI (built source with mpi option)

https://arxiv.org/abs/1603.02339

Compared to NCCL

https://www.tensorflow.org/api_docs/python/tf/contrib/nccl

http://docs.nvidia.com/deeplearning/sdk/nccl-install-guide/index.html

https://devblogs.nvidia.com/parallelforall/fast-multi-gpu-collectives-nccl/

https://images.nvidia.com/events/sc15/pdfs/NCCL-Woolley.pdf

Baidu allreduce

https://www.tensorflow.org/api_docs/python/tf/contrib/nccl

5. Nervana-- 16bit fixed point multiplication

https://sites.google.com/a/ku.th/parallel-computing/gpus/nervana

https://github.com/NervanaSystems/nervanagpu

https://github.com/NervanaSystems/neon.git

pip install nervananeon

. .venv/bin/activate

python examples/mnist_mlp.py -b gpu

And install graph

https://ngraph.nervanasys.com/docs/latest/walk_throughs.html

Try 32 bit GEM MMoperation with numpy int8, uint8, float16, float32

compared with Nervana gpu int8, uint8, fp16 bit, and 32 bit

Here is full example of doing a basic GEMM operation using 16-bit float:

(You need to change the library call due to new version using ngraph instead. )

import numpy as np import pycuda.autoinit from nervanagpu import NervanaGPU # initialize factory class ng = NervanaGPU(stochastic_round=False) m, n, k = 10, 20, 10 dtype = np.float16 # create matrices on host cpuA = np.random.randn(k,m) cpuB = np.random.randn(k,n) # transfer to device devA = ng.array(cpuA, dtype=dtype) devB = ng.array(cpuB, dtype=dtype) devC = ng.empty((m,n), dtype=dtype) # do GEMM operation ng.dot(devA.T, devB, devC, relu=False) # get from device cpuC = devC.get()

6. NVIDIA Docker & Docker

https://sites.google.com/a/ku.th/gpu/nvidia

7. DIGITS

https://sites.google.com/a/ku.th/gpu/digits

My slide

https://drive.google.com/open?id=1Sm6xL0ZfPu2D0q8UsmvMYY49FFP-6Nsy

Page updated

Report abuse