Torch



AdversarialNetsPapers 

https://github.com/soumith/cvpr2015

Code and data for a bunch of recent papers (special thanks to Jiwei Li and Alexander Miller):

J. Weston. Dialog-based Language Learning.
http://arxiv.org/abs/1604.06045
https://github.com/facebook/MemNN/blob/master/DBLL

J. Li, A. H. Miller, S. Chopra, M.'A Ranzato, J. Weston. Dialogue Learning With Human-in-the-Loop.
https://arxiv.org/abs/1611.09823
https://github.com/facebook/MemNN/blob/master/HITL

J. Li, A. H. Miller, S. Chopra, M.'A Ranzato, J. Weston. Learning through Dialogue Interactions.
https://arxiv.org/abs/1612.04936
https://github.com/facebo…/MemNN/blob/master/AskingQuestions

A. H. Miller, A. Fisch, J. Dodge, A. Karimi, A. Bordes, J. Weston. Key-Value Memory Networks for Directly Reading Documents.
https://arxiv.org/abs/1606.03126
https://github.com/facebook/MemNN/blob/master/KVmemnn

https://github.com/poolio/unrolled_gan

https://github.com/liuzhuang13/DenseNet

https://github.com/carpedm20/awesome-torch


Monday, May 2nd: 11:50 am – 12:10 pm

The Goldilocks Principle: Reading Children's Books with Explicit Memory Representations
Felix HillAntoine BordesSumit ChopraJason Weston

Monday, May 2nd - 2:00 – 5:00 pm

Alternative structures for character-level RNNs
Piotr BojanowskiArmand JoulinTomas Mikolov

Tuesday, May 3rd – 2:00 – 5:00 pm

Stacked What-Where Auto-encoders
Jake ZhaoMichael MathieuRoss GoroshinYann LeCun

Universum Prescription: Regularization using Unlabeled Data
Xiang ZhangYann LeCun

Wednesday, May 4th – 2:00 – 5:00 pm

Evaluating Prerequisite Qualities for Learning End-to-End Dialog Systems
Jesse DodgeAndreea GaneXiang ZhangAntoine BordesSumit ChopraAlexander MillerArthur SzlamJason Weston

Particular object retrieval with integral max-pooling of CNN activations
Giorgos Tolias, Ronan Sicre, Hervé Jegou

Predicting distributions with Linearizing Belief Networks
Yann N. DauphinDavid Grangier

Sequence Level Training with Recurrent Neural Networks
Marc'Aurelio RanzatoSumit ChopraMichael AuliWojciech Zaremba

Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks
Jason WestonAntoine BordesSumit ChopraSasha RushBart van MerriënboerArmand JoulinTomas Mikolov

Unifying distillation and privileged information
David Lopez-PazLeon Bottou, Bernhard Schölkopf, Vladimir Vapnik

Deep Multi Scale Video Prediction Beyond Mean Square Error
Michael MathieuCamille Couprie, Yann Yann LeCun

Metric Learning with Adaptive Dentistry Discrimination
Oren RippelManohar PaluriPiotr DollarLubomir Bourdev

Super-resolution with deep convolutional sufficient statistics
Joan Bruna EstrachPablo SprechmannYann LeCun

Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks
Alec RadfordLuke MetzSoumith Chintala

Better Computer Go Player with Neural Network and Long-term Prediction
Yuandong TianYan Zhu


After the release of DCGAN (and code), which is a neural network that can generate natural images (collaboration between Alec Radford, Luke Metz and me), people from the community (Eugene Kogan and others) have generated their own manga characters, flowers and now their own letters from reading chinese books. Fun stuff. All these generations are the hallucinations of a neural network and are not real.

Our paper: http://arxiv.org/abs/1511.06434
Our code: 
https://github.com/Newmu/dcgan_code (Theano)
https://github.com/soumith/dcgan.torch (Torch)




Torch Doc index    overview     readdocs:
Module
Module is an abstract class which defines fundamental methods necessary for a training a neural network. 
Criterion

Criterions are helpful to train a neural network. Given an input and a target, they compute a gradient according to a given loss function.

Complex neural networks are easily built using container classes:

  • Container : abstract class inherited by containers ;
    • Sequential : plugs layers in a feed-forward fully connected manner ;
    • Parallel : applies its ith child module to the ith slice of the input Tensor ;
    • Concat : concatenates in one layer several modules along dimension dim ;
      • DepthConcat : like Concat, but adds zero-padding when non-dim sizes don't match;
    • Bottle : allows any dimensionality input be forwarded through a module ;

See also the Table Containers for manipulating tables of Tensors.

Transfer Function Layers
Transfer functions are normally used to introduce a non-linearity after a parameterized layer like Linear and SpatialConvolution. Non-linearities allows for dividing the problem space into more complex regions than what a simple logistic regressor would permit.

Convolution
A convolution is an integral that expresses the amount of overlap of one function g as it is shifted over another function f. It therefore "blends" one function with another. 
This set of modules allows the manipulation of tables through the layers of a neural network. This allows one to build very rich architectures:

For those who want to implement their own modules, we suggest using the nn.Jacobian class for testing the derivatives of their class, together with the torch.Tester class. The sources of nn package contains sufficiently many examples of such tests.

Training a neural network is easy with a simple for loop. Typically however we would use the optim optimizer, which implements some cool functionalities, like Nesterov momentum, adagrad and adam.





=======================
require 'nn'
foo = nn.TemporalConvolution(9,20,5)   -- 9 is input dim, 20 is output dim (# of kernels), 5 is window width
foo.weight:size()    -- if input was 1 dimensional weights would be just 5*20, but because it has 9-d each d has its own weight. 
   20
   45
input1 = torch.rand(100,9)
foo:foward(input1)
    [torch.DoubleTensor of size 96x20]

Usually followed by max pooling (usually with a 2x2 pooling window, although 4x4 can be necessary for large input images).
-------------------------
tc1 = nn.TemporalConvolution(10,200,3)
mp = nn.TemporalMaxPooling(2)
relu1 = nn.ReLU()
w1 = torch.rand(200,200)
b1 = torch.rand(200)
w2 = torch.rand(2,200)
b2 = torch.rand(2)

x = torch.Tensor(200,10) -- 10*200
tc1_out = tc1:forward(x) -- 198 * 200
mp_out = mp:forward(tc1_out)  -- 99*200
relu1_out = relu1:forward(mp_out) -- 99*200
max_out = torch.max(relu1_out,1)   -- 1*200
reshape_out = torch.select(max_out, 1, 1) -- 200              Returns a new Tensor which is a tensor slice at the given index in the dimension dim
fc1 = torch.tanh(w1 *reshape_out + b1)    -- 200       a fully onnected
fc2 = w2 * fc1 + b2    -- 2

==============================
==============================
==============================
------------------------------ New architecture

nn = require 'nn'
-- define network


w1 = torch.rand(200,200)
b1 = torch.rand(200)
w2 = torch.rand(2,200)
b2 = torch.rand(2)

------------------------------- pass data through

x = torch.Tensor(200,10) -- 200*10

tc1 = nn.TemporalConvolution(10, 200,3)
tc1_out = tc1:forward(x);           print('-----tc1_out') ; print(tc1_out:size()) -- 198 * 200    

r1 = nn.ReLU()
r1_out = r1:forward(tc1_out);    print('-----r1_out') ; print(r1_out:size()) -- 99*200

temporalConvolutionDimLoss = math.floor(3/2)  * 2

tc2 = nn.TemporalConvolution(200, 200 - temporalConvolutionDimLoss  ,3)
tc2_out = tc2:forward(r1_out);    print('-----tc2_out') ; print(tc2_out:size()) -- 196 * 198

r2 = nn.ReLU()
r2_out = r2:forward(tc2_out);    print('-----r2_out') ; print(r2_out:size()) -- 196 * 198

dr = nn.Dropout(0.5)
dr_out = dr:forward(r2_out);     print('-----dr_out') ; print(dr_out:size()) -- 196 * 198

nnMax = nn.Max(2);      
nnMax_out = nnMax:forward(dr_out);      print('-----max_out') ; print(max_out:size()) -- 196

nnReshape = nn.Reshape(200)
nnReshape_out  = nnReshape:forward(nnMax_out);      print('-----nnReshape_out') ; print(nnReshape_out:size()) -- 196


max_out = torch.max(relu1_out,1)   -- 1*200
reshape_out = torch.select(max_out, 1, 1) -- 200              Returns a new Tensor which is a tensor slice at the given index in the dimension dim
fc1 = torch.tanh(w1 *reshape_out + b1)    -- 200       a fully connected
fc2 = w2 * fc1 + b2    -- 2


 nn = require 'nn'
vectorSize = 200
conv = nn.Sequential()
conv:add(nn.TemporalConvolution(vectorSize, vectorSize, 3))
conv:add(nn.ReLU())
conv:add(nn.TemporalConvolution(vectorSize, vectorSize, 3))
conv:add(nn.ReLU())
conv:add(nn.Dropout())
conv:add(nn.Max(2))
conv:add(nn.Reshape(vectorSize))

model = nn.Sequential()
hiddenLayerSize = 100
model:add(nn.Linear(vectorSize, hiddenLayerSize))
   model:add(nn.ReLU())
   model:add(nn.Dropout())

-- added in eznn.DeepNeuralNet
l = nn.Linear(opt.inputs, #opt.classes)
 x = torch.Tensor(1,10, 200)  -- 10 rows each with 200 dim features
vectorSize = 200
k1 = nn.TemporalConvolution(vectorSize, vectorSize, 3)
ok1 = k1:forward(x)
k2 = nn.ReLU()
ok2 = k2:forward(ok1)
k3 = nn.TemporalConvolution(vectorSize, vectorSize, 3)
ok3 = k3:forward(ok2)
k4 = nn.ReLU()
ok4 = k4:forward(ok3)
k5 = nn.Dropout()
ok5 = k5:forward(ok4)
k6 = nn.Max(2) --remove batch,..
ok6 = k6:forward(ok5)
k7 = nn.Reshape(vectorSize)
ok7 = k7:forward(ok6)

hiddenLayerSize = 100
k8 = nn.Linear(vectorSize, hiddenLayerSize)
ok8 = k8:forward(ok7)
k9 = nn.ReLU()
ok9 = k9:forward(ok8)
k10 = nn.Dropout()
ok10 = k10:forward(ok9)

numClasses = 2
k11 = nn.Linear(hiddenLayerSizenumClasses)
ok11 = k11:forward(ok10)
k12 = nn.LogSoftMax()
ok12 = k12:forward(ok11)

loss = nn.ClassNLLCriterion()
nn = require 'nn'
d = require 'autograd'
d.optimize(true)

hiddenLayerSize = 100
numClasses = 2

vectorSize = 200
-- network
  k1, k1params = d.nn.TemporalConvolution(vectorSize, vectorSize, 3)
  k2 = d.nn.ReLU()
  k3, k3params = d.nn.TemporalConvolution(vectorSize, vectorSize, 3)
  k4 = d.nn.ReLU()
  k5 = d.nn.Dropout(0.5)
  k6 = d.nn.Max(2)
  k7 = d.nn.Reshape(vectorSize)
  k8, k8params = d.nn.Linear(vectorSize, hiddenLayerSize)
  k9 = d.nn.ReLU()
  k10 = d.nn.Dropout()
  k11, k11params = d.nn.Linear(hiddenLayerSize, numClasses)
  k12 = d.nn.LogSoftMax()
  
  params = { -- weight initialization?
    k1params = k1params,
    k3params = k3params,
    k8params = k8params,
    k11params = k11params
  }


function l(params, y)
    lossf = 'cross-entropy'
    x = params.x;                   print('------x'); print(x)
    ok1 = k1(params.k1params,x);    print('------ok1'); print(ok1)
    ok2 = k2(ok1);                  print('------ok2'); print(ok2)
    ok3 = k3(params.k3params,ok2);  print('------ok3'); print(ok3)
    ok4 = k4(ok3);                  print('------ok4'); print(ok4)
    ok5 = k5(ok4);                  print('------ok5'); print(ok5)
    ok6 = k6(ok5);                  print('------ok6'); print(ok6)
    ok7 = k7(ok6);                  print('------ok7'); print(ok7)
    ok8 = k8(k8params, ok7);        print('------ok8'); print(ok8)
    ok9 = k9(ok8);                  print('------ok9'); print(ok9)
    ok10 = k10(ok9);                print('------ok10'); print(ok10)
    ok11 = k11(k11params, ok10);    print('------ok11'); print(ok11)
    ok12 = k12(ok11);               print('------ok12'); print(ok12)
    -- calculate loss
    if lossf == 'cross-entropy' then
      loss, yhat = d.loss.binaryCrossEntropy(ok12, y)
    elseif lossf == 'margin' then
      loss, yhat = d.loss.margin(ok12, y)
    end
    print('---- yhat and loss calculated')
    return loss / nLabels  -- return loss, pred
  end

x = torch.Tensor(1,10, 200) 
params.x = x
dl = d(l)
y=torch.FloatTensor({1,0})
dl(params, y)









  local output = {
    modelParams = {
      type = "tweet2concepts.model.cnn-softmax-ckoiareplica",
      wordEmbeddings = glove,
      params = params,
      labels = opt.labels,
    },
    model = l,
    vectorizer = vectorizer,
    hyperParams = hyperParams,
  }
  return output
end


y=torch.rand(5,2,1)
x = torch.Tensor(5,10, 200

[0m------ok1.[0m
{
  backward : .[0;35mfunction: 0x7ffb5dcb7e90.[0m
  entry : .[0;35mfunction: 0x7ffb5dcb7ee0.[0m
  module : 
    {
      gradBias : .[0;31mFloatTensor - size: 200.[0m
      gradInput : .[0;31mFloatTensor - empty.[0m
      _type : .[1;30m".[0m.[0;32mtorch.FloatTensor.[0m.[1;30m".[0m
      inputFrameSize : .[0;36m200.[0m
      weight : .[0;31mFloatTensor - size: 200x600.[0m
      bias : .[0;31mFloatTensor - size: 200.[0m
      dW : .[0;36m1.[0m
      kW : .[0;36m3.[0m
      gradWeight : .[0;31mFloatTensor - size: 200x600.[0m
      outputFrameSize : .[0;36m200.[0m
      output : .[0;31mFloatTensor - size: 5x8x200.[0m
    }
  forward : .[0;35mfunction: 0x7ffb5dcb7e00.[0m
}
.[0m------ok2.[0m
{
  backward : .[0;35mfunction: 0x7ffb5dcb8470.[0m
  entry : .[0;35mfunction: 0x7ffb5dcb84b0.[0m
  module : 
    {
      val : .[0;36m0.[0m
      gradInput : .[0;31mFloatTensor - empty.[0m
      _type : .[1;30m".[0m.[0;32mtorch.FloatTensor.[0m.[1;30m".[0m
      inplace : .[0;34mfalse.[0m
      output : .[0;31mFloatTensor - size: 5x8x200.[0m
      threshold : .[0;36m0.[0m
    }
  forward : .[0;35mfunction: 0x7ffb5dcb8410.[0m
}
.[0m------ok3.[0m
{
  backward : .[0;35mfunction: 0x7ffb5dcb9450.[0m
  entry : .[0;35mfunction: 0x7ffb5dcb94a0.[0m
  module : 
    {
      gradBias : .[0;31mFloatTensor - size: 200.[0m
      gradInput : .[0;31mFloatTensor - empty.[0m
      _type : .[1;30m".[0m.[0;32mtorch.FloatTensor.[0m.[1;30m".[0m
      inputFrameSize : .[0;36m200.[0m
      weight : .[0;31mFloatTensor - size: 200x600.[0m
      bias : .[0;31mFloatTensor - size: 200.[0m
      dW : .[0;36m1.[0m
      kW : .[0;36m3.[0m
      gradWeight : .[0;31mFloatTensor - size: 200x600.[0m
      outputFrameSize : .[0;36m200.[0m
      output : .[0;31mFloatTensor - size: 5x6x200.[0m
    }
  forward : .[0;35mfunction: 0x7ffb5dcb93c0.[0m
}
.[0m------ok4.[0m
{
  backward : .[0;35mfunction: 0x7ffb5dcb9a10.[0m
  entry : .[0;35mfunction: 0x7ffb5dcb9a50.[0m
  module : 
    {
      val : .[0;36m0.[0m
      gradInput : .[0;31mFloatTensor - empty.[0m
      _type : .[1;30m".[0m.[0;32mtorch.FloatTensor.[0m.[1;30m".[0m
      inplace : .[0;34mfalse.[0m
      output : .[0;31mFloatTensor - size: 5x6x200.[0m
      threshold : .[0;36m0.[0m
    }
  forward : .[0;35mfunction: 0x7ffb5dcb99b0.[0m
}
.[0m------ok5.[0m
{
  backward : .[0;35mfunction: 0x7ffb5dcb9f90.[0m
  entry : .[0;35mfunction: 0x7ffb5dcb9fd0.[0m
  module : 
    {
      noise : .[0;31mFloatTensor - size: 5x6x200.[0m
      v2 : .[0;34mtrue.[0m
      gradInput : .[0;31mFloatTensor - empty.[0m
      _type : .[1;30m".[0m.[0;32mtorch.FloatTensor.[0m.[1;30m".[0m
      train : .[0;34mtrue.[0m
      output : .[0;31mFloatTensor - size: 5x6x200.[0m
      p : .[0;36m0.5.[0m
    }
  forward : .[0;35mfunction: 0x7ffb5dcb9f30.[0m
}
.[0m------ok6.[0m
{
  backward : .[0;35mfunction: 0x7ffb5dcba4a0.[0m
  entry : .[0;35mfunction: 0x7ffb5dcba4e0.[0m
  module : 
    {
      _indices : .[0;31mLongTensor - size: 5x1x200.[0m
      _output : .[0;31mFloatTensor - size: 5x1x200.[0m
      gradInput : .[0;31mFloatTensor - empty.[0m
      _type : .[1;30m".[0m.[0;32mtorch.FloatTensor.[0m.[1;30m".[0m
      output : .[0;31mFloatTensor - size: 5x200.[0m
      dimension : .[0;36m2.[0m
    }
  forward : .[0;35mfunction: 0x7ffb5dcba440.[0m
}
.[0m------ok7.[0m
{
  backward : .[0;35mfunction: 0x7ffb5dcbaab0.[0m
  entry : .[0;35mfunction: 0x7ffb5dcbaaf0.[0m
  module : 
    {
      size : .[0;31mLongStorage - size: 1.[0m
      gradInput : .[0;31mFloatTensor - empty.[0m
      _type : .[1;30m".[0m.[0;32mtorch.FloatTensor.[0m.[1;30m".[0m
      output : .[0;31mFloatTensor - size: 5x200.[0m
      batchsize : .[0;31mLongStorage - size: 2.[0m
      nelement : .[0;36m200.[0m
    }
  forward : .[0;35mfunction: 0x7ffb5dcbaa50.[0m
}
.[0m------ok8.[0m
{
  backward : .[0;35mfunction: 0x7ffb5dcbb640.[0m
  entry : .[0;35mfunction: 0x7ffb5dcbb690.[0m
  module : 
    {
      gradBias : .[0;31mFloatTensor - size: 100.[0m
      addBuffer : .[0;31mFloatTensor - size: 5.[0m
      bias : .[0;31mFloatTensor - size: 100.[0m
      gradInput : .[0;31mFloatTensor - empty.[0m
      _type : .[1;30m".[0m.[0;32mtorch.FloatTensor.[0m.[1;30m".[0m
      gradWeight : .[0;31mFloatTensor - size: 100x200.[0m
      output : .[0;31mFloatTensor - size: 5x100.[0m
      weight : .[0;31mFloatTensor - size: 100x200.[0m
    }
  forward : .[0;35mfunction: 0x7ffb5dcbb5b0.[0m
}
.[0m------ok9.[0m
{
  backward : .[0;35mfunction: 0x7ffb5dcbbc00.[0m
  entry : .[0;35mfunction: 0x7ffb5dcbbc40.[0m
  module : 
    {
      val : .[0;36m0.[0m
      gradInput : .[0;31mFloatTensor - empty.[0m
      _type : .[1;30m".[0m.[0;32mtorch.FloatTensor.[0m.[1;30m".[0m
      inplace : .[0;34mfalse.[0m
      output : .[0;31mFloatTensor - size: 5x100.[0m
      threshold : .[0;36m0.[0m
    }
  forward : .[0;35mfunction: 0x7ffb5dcbbba0.[0m
}
.[0m------ok10.[0m
{
  backward : .[0;35mfunction: 0x7ffb5dcbc180.[0m
  entry : .[0;35mfunction: 0x7ffb5dcbc1c0.[0m
  module : 
    {
      noise : .[0;31mFloatTensor - size: 5x100.[0m
      v2 : .[0;34mtrue.[0m
      gradInput : .[0;31mFloatTensor - empty.[0m
      _type : .[1;30m".[0m.[0;32mtorch.FloatTensor.[0m.[1;30m".[0m
      train : .[0;34mtrue.[0m
      output : .[0;31mFloatTensor - size: 5x100.[0m
      p : .[0;36m0.5.[0m
    }
  forward : .[0;35mfunction: 0x7ffb5dcbc120.[0m
}
.[0m------ok11.[0m
{
  backward : .[0;35mfunction: 0x7ffb5dcbd030.[0m
  entry : .[0;35mfunction: 0x7ffb5dcbd080.[0m
  module : 
    {
      gradBias : .[0;31mFloatTensor - size: 2.[0m
      addBuffer : .[0;31mFloatTensor - size: 5.[0m
      bias : .[0;31mFloatTensor - size: 2.[0m
      gradInput : .[0;31mFloatTensor - empty.[0m
      _type : .[1;30m".[0m.[0;32mtorch.FloatTensor.[0m.[1;30m".[0m
      gradWeight : .[0;31mFloatTensor - size: 2x100.[0m
      output : .[0;31mFloatTensor - size: 5x2.[0m
      weight : .[0;31mFloatTensor - size: 2x100.[0m
    }
  forward : .[0;35mfunction: 0x7ffb5dcbcfa0.[0m
}
.[0m------ok12.[0m
{
  backward : .[0;35mfunction: 0x7ffb5dcbd4f0.[0m
  entry : .[0;35mfunction: 0x7ffb5dcbd530.[0m
  module : 
    {
      output : .[0;31mFloatTensor - size: 5x2.[0m
      _type : .[1;30m".[0m.[0;32mtorch.FloatTensor.[0m.[1;30m".[0m
      gradInput : .[0;31mFloatTensor - empty.[0m
    }
  forward : .[0;35mfunction: 0x7ffb5dcbd490.[0m
}
.[0m----loss calculated.[0m
.[0;36m6.6006166040897.[0m




















=========  original torch example
require "nn"
mlp = nn.Sequential();  -- make a multi-layer perceptron
inputs = 3; outputs = 1; HUs = 20; -- parameters
mlp:add(nn.Linear(inputs, HUs))
mlp:add(nn.Tanh())
mlp:add(nn.Linear(HUs, outputs))
input= torch.randn(2);
 mlp:forward(input)
===========