Tech in T: depth + breadth‎ > ‎AI‎ > ‎Machine Learning‎ > ‎Neural Networks‎ > ‎Deep Learning‎ > ‎python‎ > ‎


To put things into perspective, we were running an Inception3 architecture with a sample of 18 thousand documents on a 1 * 12GB Tesla K80 GPU. Each epoch took about 30 minutes. With Horovod and an upgraded instance with 4 * 12GB Tesla K80 GPU, reduced each epoch to about 5–6 minutes.

TensorFlow for Machine Intelligence (TFFMI) ● Hands-On Machine Learning with Scikit-Learn and TensorFlow. Chapter 9: Up and running with TensorFlow ● Fundamentals of Deep Learning. Chapter 3: Implementing Neural Networks in TensorFlow (FODL) TensorFlow is being constantly updated so books might become outdated fast

TF Learn simple example
 import tensofrlow as tf
 import sklearn
 # Load dataset. 
 iris = tf.contrib.learn.datasets.load_dataset('iris') 
 x_train, x_test, y_train, y_test = cross_validation.train_test_split(,, test_size=0.2, random_state=42) 
 # Build 3 layer DNN with 10, 20, 10 units respectively. 
 feature_columns = tf.contrib.learn.infer_real_valued_columns_from_input( x_train) 
 classifier = tf.contrib.learn.DNNClassifier( feature_columns=feature_columns, hidden_units=[10, 20, 10], n_classes=3) 
 # Fit and predict., y_train, steps=200) 
 predictions = list(classifier.predict(x_test, as_iterable=True)) 
 score = metrics.accuracy_score(y_test, predictions) 
 print('Accuracy: {0:f}'.format(score))

What’s a tensor? 
An n-dimensional array 
0-d tensor: scalar (number) 
1-d tensor: vector 
2-d tensor: matrix 
and so on 

nodes are operators, variables, constants
edges are actual tensors

Data Flow -> Tensor Flow (I know, mind=blown)

import tensorflow as tf 
a = tf.add(3, 5) 
print a 
>> Tensor("Add:0", shape=(), dtype=int32)    # (Not 8)

How to get the value of a? 
Create a session, assign it to variable sess so we can call it later 
Within the session, evaluate the graph to fetch the value of a 

import tensorflow as tf 
a = tf.add(3, 5) 
sess = tf.Session() 

x = 2 
y = 3 
add_op = tf.add(x, y) 
mul_op = tf.mul(x, y) 
useless = tf.mul(x, add_op) 
pow_op = tf.pow(add_op, mul_op) 
with tf.Session() as sess: 
    z, not_useless =[op3, useless])       #   pass all variables whose values you want to a list in fetches

Run part of a graph on a specific GPU or CPU (for parallel computation)

with tf.device('/gpu:2'): 
    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], name='a') 
    b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], name='b') 
    c = tf.matmul(a, b) 

sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

Multiple graphs require multiple sessions, each will try to use all available resources by default ● Can't pass data between them without passing them through python/numpy, which doesn't work in distributed ● It’s better to have disconnected subgraphs within one graph

$ vim
import tensorflow as tf 
a = tf.constant(2) # name='a' 
b = tf.constant(3) # name='b'
x = tf.add(a, b) # name='add'
with tf.Session() as sess: 
  writer = tf.summary.FileWriter('./graphs', sess.graph) 
  print # close the writer when you’re done using it writer.close()

$ python
$ tensorboard --logdir="./graphs" 

constant types
tf​.​constant​(​value​,​ dtype​=​None​,​ shape​=​None​,​ name​=​'Const'​,​ verify_shape​=​False)      # b = tf.constant([[0, 1], [2, 3]], name="b")
tf.zeros([2, 3], tf.int32)     # [[0, 0, 0], [0, 0, 0]]
tf.zeros_like(input_tensor)   # [[0, 0], [0, 0], [0, 0]]
tf​.​ones​(​shape​,​ dtype​=​tf​.​float32​,​ name​=​None)
tf.ones_like(input_tensor)     # [[1, 1], [1, 1], [1, 1]]
tf​.​fill​(​dims​,​ value​,​ name​=​None​)
tf.linspace(10.0, 13.0, 4, name="linspace")    # [10.0 11.0 12.0 13.0]         a sequence of num evenly-spaced values 
tf.range(start, limit, delta)     # 'start' is 3, 'limit' is 1, 'delta' is -0.5      [3, 2.5, 2, 1.5]

unlike NumPy or Python sequences, TensorFlow sequences are not iterable. 
for _ in np.linspace(0, 10, 4): # OK 
for _ in tf.linspace(0, 10, 4): # TypeError("'Tensor' object is not iterable.") 

for _ in range(4): # OK 
for _ in tf.range(4): # TypeError("'Tensor' object is not iterable.")

Generate random constants from certain distributions.
tf.random_normal(shape, mean=0.0, stddev=1.0, dtype=tf.float32, seed=None, name=None)
tf.truncated_normal(shape, mean=0.0, stddev=1.0, dtype=tf.float32, seed=None, name=None)
tf.random_uniform(shape, minval=0, maxval=None, dtype=tf.float32, seed=None, name=None)
tf.random_shuffle(value, seed=None, name=None) tf.random_crop(value, size, seed=None, name=None)
tf.multinomial(logits, num_samples, seed=None, name=None)
tf.random_gamma(shape, alpha, beta=None, dtype=tf.float32, seed=None, name=None)

a = tf.constant([3, 6]) 
b = tf.constant([2, 2]) 
tf.add(a, b) # >> [5 8] 
tf.add_n([a, b, b]) # >> [7 10]. Equivalent to a + b + b 
tf.mul(a, b) # >> [6 12] because mul is element wise 
tf.matmul(a, b) # >> ValueError 
tf.matmul(tf.reshape(a, shape=[1, 2]), tf.reshape(b, shape=[2, 1])) # >> [[18]] 
tf.div(a, b) # >> [1 3] 
tf.mod(a, b) # >> [1 0]

Graph's definition is called Protobuff stands for protocol buffer, “Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data – think XML, but smaller, faster, and simpler.”
import tensorflow as tf 
my_const = tf.constant([1.0, 2.0], name="my_const") 
print tf.get_default_graph().as_graph_def()

node { name: "my_const" op: "Const" attr { key: "dtype" value { type: DT_FLOAT } } attr { key: "value" value { tensor { dtype: DT_FLOAT tensor_shape { dim { size: 2 } } tensor_content: "\000\000\200?\000\000\000@" } } } } versions { producer: 17 }

b = tf.Variable([2, 3], name="vector")
W = tf.Variable(tf.zeros([784,10]))     #  create variable W as 784 x 10 tensor, filled with zeros

tf.Variable holds several ops: 
x.initializer # init 
x.value() # read op 
x.assign(...) # write op 

You have to initialize variables before using them
Initializ all variables at once
init = tf.global_variables_initializer()
with tf.Session() as sess:    # Note that you use to run the initializer, not fetching any value.

Initialize only a subset of variables with a list of variables to initialize
init_ab = tf.variables_initializer([a, b], name="init_ab")
with tf.Session() as sess:

Initialize each variable separately using tf.Variable.initializer
# create variable W as 784 x 10 tensor, filled with zeros
W = tf.Variable(tf.zeros([784,10]))
with tf.Session() as sess:

Print a variable
W = tf.Variable(tf.truncated_normal([700, 10]))
with tf.Session() as sess:
    print W     # Tensor("Variable/read:0", shape=(700, 10), dtype=float32)
    print W.eval()    #  actual variable value

Assign to a variable
W = tf.Variable(10) 
with tf.Session() as sess: 
    print W.eval()    #  10 Why 10 and not 100? W.assign(100) doesn’t assign the value 100 to W, but instead create an assign op to do that. For this op to take effect, we have to run this op in session. 

W = tf.Variable(10) 
assign_op = W.assign(100) 
with tf.Session() as sess: 
    print W.eval()    #  100 Note that we don’t have initialize W in this case, because assign() does it for us. In fact, initializer op is the assign op that assigns the variable’s initial value to the variable itself.

Interesting example: 
a = tf.Variable(2, name="scalar")    # create a variable whose original value is 2 
a_times_two = a.assign(a * 2)        # assign a * 2 to a and call that op a_times_two 
init = tf.global_variables_initializer() 
with tf.Session() as sess: # have to initialize a, because a_times_two op depends on the value of a # >> 4 # >> 8 # >> 16 

TensorFlow assigns a*2 to a every time a_times_two is fetched. [[At each time variable is calculated from scratch]]

Increment, Decrement --->    tf.Variable.assign_add() and tf.Variable.assign_sub() 

Because TensorFlow sessions maintain values separately, each Session can have its own current value for a variable defined in a graph.
W = tf.Variable(10)
sess1 = tf.Session()
sess2 = tf.Session()
print # >> 20
print # >> 8
print # >> 120
print # >> -42

 declare a variable that depends on other variables
# W is a random 700 x 100 tensor 
W = tf.Variable(tf.truncated_normal([700, 10])) 
U = tf.Variable(W * 2)

U = tf.Variable(W.intialized_value() * 2)    # use initialized_value() to make sure that W is initialized before its value is used to initialize W.

 makes itself the default session so you can call run() or eval() without explicitly call the session. This is convenient in interactive shells and IPython notebooks, as it avoids having to pass an explicit Session object to run ops

sess = tf.InteractiveSession() 
a = tf.constant(5.0) 
b = tf.constant(6.0) 
c = a * b 
print(c.eval())   # We can just use 'c.eval()' without passing 'sess' 

Control Dependencies
# your graph g have 5 ops: a, b, c, d, e 

with g.control_dependencies([a, b, c]): # `d` and `e` will only run after `a`, `b`, and `c` have executed. 
   d = ... 
   e = …
tf.Variable for trainable variables such as weights (W) and biases (B) for your model. tf.placeholder is used to feed actual training examples.
The difference is that with tf.Variable you have to provide an initial value when you declare it. With tf.placeholder you don't have to provide an initial value and you can specify it at run time with the feed_dict argument inside
We can and only need to save or restore the Variables to save or rebuild the graph. Placeholders are mostly holders for the different datasets

# create a placeholder of type float 32-bit, shape is a vector of 3 elements 
a = tf.placeholder(tf.float32, shape=[3]) # create a constant of type float 32-bit, shape is a vector of 3 elements 
b = tf.constant([5, 5, 5], tf.float32) 
# use the placeholder as you would a constant or a variable 
c = a + b # Short for tf.add(a, b) 
# If we try to fetch c, we will run into error. 
with tf.Session() as sess: 
    print( >> NameError

with tf.Session() as sess: 
    # feed [1, 2, 3] to placeholder a via the dict {a: [1, 2, 3]} 
    # fetch value of c 
    print(, {a: [1, 2, 3]}))

with tf.Session() as sess: 
    for a_value in list_of_a_values: 
        print(, {a: a_value}))

# create Operations, Tensors, etc (using the default graph) 
a = tf.add(2, 5) 
b = tf.mul(a, 3) # start up a `Session` using the default graph 
sess = tf.Session() # define a dictionary that says to replace the value of `a` with 15 replace_dict = {a: 15} # Run the session, passing in `replace_dict` as the value to `feed_dict`, feed_dict=replace_dict) # returns 45 feed_dict can be extremely useful to test your model. When you have a large graph and just want to test out certain parts, you can provide dummy values so TensorFlow won’t waste time doing unnecessary computations.

The trap of lazy loading
One of the most common TensorFlow non-bug bugs I see (and I used to commit) is what my friend Danijar and I call “lazy loading”. Lazy loading is a term that refers to a programming pattern when you defer declaring/initializing an object until it is loaded. In the context of TensorFlow, it means you defer creating an op until you need to compute it. For example, this is normal loading: you create the op z when you assemble the graph.

x = tf.Variable(10, name='x') 
y = tf.Variable(20, name='y') 
z = tf.add(x, y) 
with tf.Session() as sess: 
    for _ in range(10): 

This is what happens when someone decides to be clever and use lazy loading to save one line of code: 

x = tf.Variable(10, name='x') 
y = tf.Variable(20, name='y') 
with tf.Session() as sess: 
    for _ in range(10):, y)) # create the op add only when you need to compute it writer.close() 

Let’s see the graphs for them on TensorBoard. Normal loading graph looks just like we expected
Lazy loading, Well, the node “Add” is missing, which is understandable since we added the note “Add” after we’ve written the graph to FileWriter. This makes it harder to read the graph but it’s not a bug. So, what’s the big deal? Let’s look at the graph definition. Remember that to print out the graph definition, we use: 
print tf.get_default_graph().as_graph_def() 
The protobuf for the graph in normal loading has only 1 node “Add”
On the other hand, the protobuf for the graph in lazy loading has 10 copies of the node “Add”. It adds a new node “Add” every time you want to compute z

You probably think: “This is stupid. Why would I want to compute the same value more than once?” and think that it’s a bug that nobody will ever commit. It happens more often than you think. For example, you might want to compute the same loss function or make some prediction after a certain number of training samples. Before you know it, you’ve computed it for thousands of times, and added thousands of unnecessary nodes to your graph. Your graph definition becomes bloated, slow to load and expensive to pass around. 
There are two ways to avoid this bug. First, always separate the definition of ops and their execution when you can. But when it is not possible because you want to group related ops into classes, you can use Python property to ensure that your function is only loaded once when it’s first called. This is not a Python course so I won’t dig into how to do it. But if you want to know, check out this wonderful blog post by Danijar Hafner.
print sess.graph.as_graph_def()

We also add some extra evidence called a bias. Basically, we want to be able to say that some things are more likely independent of the input.

But it's often more helpful to think of softmax the first way: exponentiating its inputs and then normalizing them. The exponentiation means that one unit more evidence increases the weight given to any hypothesis multiplicatively. And conversely, having one less unit of evidence means that a hypothesis gets a fraction of its earlier weight.
Softmax then normalizes these weights, so that they add up to one, forming a valid probability distribution.

x = tf.placeholder("float", [None, 784])
(Here None means that a dimension can be of any length.) placeholder, a value that we'll input

A Variable is a modifiable tensor that lives in TensorFlow's graph of interacting operations. model parameters be Variables.

In order to train our model, we need to define what it means for the model to be good. Well, actually, in machine learning we typically define what it means for a model to be bad, called the cost or loss, and then try to minimize how bad it is. But the two are equivalent.

Where y is our predicted probability distribution, and y is the true distribution (the one-hot vector we'll input). In some rough sense, the cross-entropy is measuring how inefficient our predictions are for describing the truth. Going into more detail about cross-entropy is beyond the scope of this tutorial, but it's well worth understanding.

What TensorFlow actually does here, behind the scenes, is it adds new operations to your graph which implement backpropagation and gradient descent. Then it gives you back a single operation which, when run, will do a step of gradient descent training, slightly tweaking your variables to reduce the cost.

Subpages (1): save