Theano


on a mac,
install nvidia cuda driver
$ vim ~/.theanorc
[global]
device = gpu
floatX = float32

[nvcc]
fastmath = True

[cuda]
root=/usr/local/cuda



To see graphics cards on linux/ubuntu:
$ nvidia-smi


http://www.joyofdata.de/blog/gpu-powered-deeplearning-with-nvidia-digits/

$ python -c 'import theano; print theano.config'

types
iscalar int32
dscalar float64
import theano.tensor as T
from theano import function
x = T.dscalar('x')          # “0-dimensional arrays (scalar) of doubles (d)"
y = T.dscalar('y')
z = x + y
f = function([x, y], z)    # a slight delay in executing, Behind the scene, f was being compiled into C code.

f(2,3) # array(5.0)

f
(16.3, 12.1) # array(28.4)
from theano import pp        # pretty print
print pp(z)     # (x + y)

Add two matrices
x = T.dmatrix('x')
y = T.dmatrix('y')
z = x + y
f = function([x, y], z)

f([[1, 2], [3, 4]], [[10, 20], [30, 40]])     # array([[ 11.,  22.], [ 33.,  44.]])                         f(numpy.array([[1, 2], [3, 4]]), numpy.array([[10, 20], [30, 40]]))    adding numpy arrays directly

Vector
import theano
a = theano.tensor.vector() # declare variable
out = a + a ** 10               # build symbolic expression
f = theano.function([a], out)   # compile function
print f([0, 1, 2])  # prints `array([0, 2, 1026])`

Converting from Python Objects

Another way of creating a TensorVariable (a TensorSharedVariable to be precise) is by calling shared()

x = theano.shared(numpy.random.randn(3,4))


t = theano.tensor.arange(9).reshape((3,3))
t.eval()
array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]], dtype=int8)
t[(t > 4).nonzero()].eval()
array([5, 6, 7, 8], dtype=int8)
t = t.reshape((1,9))
array([[0, 1, 2, 3, 4, 5, 6, 7, 8]], dtype=int8)

element-wise logistic function
x = T.dmatrix('x')
s = 1 / (1 + T.exp(-x))
logistic = function([x], s)
logistic([[0, 1], [-1, -2]])
array([[ 0.5       ,  0.73105858],
       [ 0.26894142,  0.11920292]])

Computing More than one Thing at the Same Time
a, b = T.dmatrices('a', 'b')
diff = a - b
abs_diff = abs(diff)
diff_squared = diff**2
f = function([a, b], [diff, abs_diff, diff_squared])   # returns all three functions. It is a shortcut for allocating symbolic variables that we will often use in the tutorials. Inputs with default values must follow inputs without default values (like Python’s functions).

f([[1, 1], [1, 1]], [[0, 1], [2, 3]])
[array([[ 1.,  0.],
        [-1., -2.]]),
 array([[ 1.,  0.],
        [ 1.,  2.]]),
 array([[ 1.,  0.],
        [ 1.,  4.]])]

Default param

from theano import Param
x, y = T.dscalars('x', 'y')
z = x + y
f = function([x, Param(y, default=1)], z)
f(33)   # array(34.0)
f(33, 2)      # array(35.0)

x, y, w = T.dscalars('x', 'y', 'w')
z = (x + y) * w
f = function([x, Param(y, default=1), Param(w, default=2, name='w_by_name')], z)
f(33)      # array(68.0)
f(33, 2)   # array(70.0)
f(33, 0, 1) # array(33.0)
f(33, w_by_name=1)   # array(34.0)    We override the symbolic variable’s name attribute with a name to be used for this function.
f(33, w_by_name=1, y=0)    # array(33.0)

Shared Variables

A variable that can be shared across different functions and many function calls. like a call counter or accumulator
state = theano.shared(0)
inc = T.iscalar('inc')
accumulator = function([inc], state, updates=[(state, state+inc)])      # updates parameter of function. updates must be supplied with a list of pairs of the form (shared-variable, new expression). Or a dictionary keys are shared-variables and values are the new expressions.
Shared variables can be used in symbolic expressions just like the objects returned by dmatrices(...) but they also have an internal value that defines the value taken by this symbolic variable in all the functions that use it.
accessed and modified by .get_value() and .set_value() methods.

Theano shared variable broadcast pattern default to False for each dimensions. Shared variable size can change over time, so we can’t use the shape to find the broadcastable pattern. If you want a different pattern, just pass it as a parameter theano.shared(..., broadcastable=(True, False))
state.get_value()        # array(0)
accumulator(1)           # array(0)
state.get_value()        # array(1)
accumulator(300)       # array(1)
state.get_value()        # array(301)
state.set_value(-1)      # to reset the state. Just use the .set_value() method

As we mentioned above, you can define more than one function to use the same shared variable. These functions can all update the value.
decrementor = function([inc], state, updates=[(state, state-inc)])
decrementor(2)           # array(2)
state.get_value()         # array(0)

why shared variables?

You might be wondering why the updates mechanism exists. You can always achieve a similar result by returning the new expressions, and working with them in NumPy as usual. The updates mechanism can be a syntactic convenience, but it is mainly there for efficiency. Updates to shared variables can sometimes be done more quickly using in-place algorithms (e.g. low-rank matrix updates). Also, Theano has more control over where and how shared variables are allocated, which is one of the important elements of getting good performance on the GPU.

Skipping use of Shared Variables in a function that uses them
use the givens parameter of function which replaces a particular node in a graph for the purpose of one particular function.
fn_of_state = state * 2 + inc      
foo = T.scalar(dtype=state.dtype)           # The type of foo must match the shared variable we are replacing with the ``givens``
skip_shared = function([inc, foo], fn_of_state, givens=[(state, foo)])
skip_shared(1, 3)           # array(7)          we're using 3 for the state, not state.value
state.get_value()  # array(0)        old state still there, but we didn't use it

The givens parameter can be used to replace any symbolic variable, not just a shared variable. You can replace constants, and expressions, in general.
Be careful though, not to allow the expressions introduced by a givens substitution to be co-dependent, the order of substitution is not defined, so the substitutions have to work in any order.
In practice, a good way of thinking about the givens is as a mechanism that allows you to replace any part of your formula with a different expression that evaluates to a tensor of same shape and dtype.

Random Numbers

The way to think about putting randomness into Theano’s computations is to put random variables in your graph. Theano will allocate a NumPy RandomStream object (a random number generator) for each such variable, and draw from it as necessary. We will call this sort of sequence of random numbers a random stream. Random streams are at their core shared variables, so the observations on shared variables hold here as well. Theanos’s random objects are defined and implemented in RandomStreams and, at a lower level, in RandomStreamsBase.
from theano.tensor.shared_randomstreams import RandomStreams
from theano import function
rs = RandomStreams(seed=234)
rv_u = rs.uniform((2,2))      # a random stream of 2x2 matrices of draws from a uniform distribution.
rv_n = rs.normal((2,2))
f = function([], rv_u)                                                                # every time a different number
g = function([], rv_n, no_default_updates=True)    # Not updating rv_n.rng. every time gives the same number
nearly_zeros = function([], rv_u + rv_u - 2 * rv_u)   #  a random variable is drawn at most once during any single function execution.

As usual for shared variables, the random number generators used for random variables are common between functions. So our nearly_zeros function will update the state of the generators used in function f above.

Set the seed
for each RV
rng_val = rv_u.rng.get_value(borrow=True)   # Get the rng for rv_u
rng_val.seed(89234)                         # seeds the generator
rv_u.rng.set_value(rng_val, borrow=True)    # Assign back seeded rng
for all RVs
rs.seed(902340)  # seeds rv_u and rv_n with different seeds each

Single Layer Logistic Function


import numpy
import theano
import theano.tensor as T
rng = numpy.random

# data
N = 400
feats = 784
D = (rng.randn(N, feats), rng.randint(size=N, low=0, high=2))

training_steps = 10000

# Declare symbolic variables
x = T.matrix("x")
y = T.vector("y")
w = theano.shared(rng.randn(feats), name="w")
b = theano.shared(0., name="b")
print "Initial model:"
print w.get_value(), b.get_value()

# Construct expression graph
p_1 = 1 / (1 + T.exp(-T.dot(x, w) - b))   # Probability that target = 1
prediction = p_1 > 0.5                    # The prediction thresholded
xent = -y * T.log(p_1) - (1-y) * T.log(1-p_1) # Cross-entropy loss function
cost = xent.mean() + 0.01 * (w ** 2).sum()# The cost to minimize
gw, gb = T.grad(cost, [w, b])             # Compute the gradient of the cost
                                          # (we shall return to this in a
                                          # following section of this tutorial)
# Compile
train = theano.function(
          inputs=[x,y],
          outputs=[prediction, xent],
          updates=((w, w - 0.1 * gw), (b, b - 0.1 * gb)))
predict = theano.function(inputs=[x], outputs=prediction)


# Train
for i in range(training_steps):
    pred, err = train(D[0], D[1])       # at  each iteration updates w and b

print "Final model:"
print w.get_value(), b.get_value()
print "target values for D:", D[1]
print "prediction on D:", predict(D[0])


Theano function graph, derivative, and function execution optimization described. Drawing and printing a function graph.

Derivative

import theano
from theano import tensor as T
from theano import pp
from theano import *
x = T.dscalar('x')
y = x ** 2
pp(y)                                 # '(x ** TensorConstant{2})'
gy = T.grad(y, x)             # for any scalar expression s, T.grad(s, w) provides the Theano expression for computing  ∂s/∂w.
gyf = function([x], gy)
pp(gyf.maker.fgraph.outputs[0])
       # '(TensorConstant{2.0} * x)'

logistic function gradient
x = T.dscalar('x')
s = T.sum(1 / (1 + T.exp(-x)))
gs = T.grad(s, x)
gsf = function([x], gs)
pp(gsf.maker.fgraph.outputs[0])


x = T.dscalar('x')
s = 1 / (1 + T.exp(-x))
gs = T.grad(s, x)
gsf = function([x], gs)
pp(gsf.maker.fgraph.outputs[0])
pp(gs)   
# '(-(((-(fill((TensorConstant{1} / (TensorConstant{1} + exp((-x)))), TensorConstant{1.0}) * TensorConstant{1})) / ((TensorConstant{1} + exp((-x))) * (TensorConstant{1} + exp((-x))))) * exp((-x))))'
# '(-(((-(fill((1 / (1 + exp((-x)))), 1) * 1)) / ((1 + exp((-x))) * (1 + exp((-x))))) * exp((-x))))'
# '-(            -(     fill(            (1 /                (1 + exp(-x))      ),         1) * 1) / (  (1 + exp(-x)) * (1 + exp(-x))  )               )                             *                  exp(-x)'
# =======>   e^x/(1+e^x)^2

if the second argument is a list of params, the output will be  a list of gradients w.r.t those params

 Computing the Jacobian (element-wise derivative of a matrix)
Computing the Hessian of a scalar function Hessian is a square matrix of second-order partial derivatives of a scalar-valued function w.r.t all its params


if/else in functions here. Loop. sparse matrices

debugprint()


Python Memory Management
















Subpages (2): lasagne scan
Comments