5

Defining your own Neural Net Module

In this tutorial, we will see how to extend what we've seen, mainly by creating your own new modules, and testing them.

Code for this tutorial is provided on GitHub, on this page.

So far we've used existing modules. In this section, we'll define two new modules, and see how simple it is to do so.

Modules are bricks to build neural networks. A Module is a neural network by itself, but it can be combined with other networks using container classes to create complex neural networks. Module is an abstract class which defines fundamental methods necessary for a training a neural network. All modules are serializable.

Modules contain two states variables: output and gradInput. Here we review the set of basic functions that a Module has to implement:

[output] forward(input)

Takes an input object, and computes the corresponding output of the module. In general input and output are Tensors. However, some special sub-classes like table layers might expect something else. Please, refer to each module specification for further information.

After a forward(), the output state variable should have been updated to the new value.

It is not advised to override this function. Instead, one should implement updateOutput(input) function. The forward module in the abstract parent class Module will call updateOutput(input).

[gradInput] backward(input, gradOutput)

Performs a backpropagation step through the module, with respect to the given input. In general this method makes the assumption forward(input) has been called before, with the same input. This is necessary for optimization reasons. If you do not respect this rule, backward() will compute incorrect gradients.

In general input and gradOutput and gradInput are Tensors. However, some special sub-classes like table layers might expect something else. Please, refer to each module specification for further information.

A backpropagation step consist in computing two kind of gradients at input given gradOutput (gradients with respect to the output of the module). This function simply performs this task using two function calls:

  • A function call to updateGradInput(input, gradOutput).
  • A function call to accGradParameters(input,gradOutput).

It is not advised to override this function call in custom classes. It is better to override updateGradInput(input, gradOutput) and accGradParameters(input, gradOutput) functions.

=========================================================================================
[output] updateOutput(input, gradOutput)

When defining a new module, this method should be overloaded.

Computes the output using the current parameter set of the class and input. This function returns the result which is stored in the output field.
[gradInput] updateGradInput(input, gradOutput)

When defining a new module, this method should be overloaded.

Computing the gradient of the module with respect to its own input. This is returned in gradInput. Also, the gradInput state variable is updated accordingly.
[gradInput] accGradParameters(input, gradOutput)

When defining a new module, this method may need to be overloaded, if the module has trainable parameters.

Computing the gradient of the module with respect to its own parameters. Many modules do not perform this step as they do not have any parameters. The state variable name for the parameters is module dependent. The module is expected to accumulate the gradients with respect to the parameters in some variable.

Zeroing this accumulation is achieved with zeroGradParameters() and updating the parameters according to this accumulation is done with updateParameters().
reset()

This method defines how the trainable parameters are reset, i.e. initialized before training.
=========================================================================================

Modules provide a few other methods that you might want to define, if you are not planing to use the optim package. These methods help zero() the parameters, and update them using very basic techniques.

In terms of code structure, Torch provides a class model, which we use for inheritance, and in general for the definition of all the modules in nn. Here is an empty holder for a typical new class:

local NewClass, Parent = torch.class('nn.NewClass', 'nn.Module')
 
function NewClass:__init()
   parent.__init(self)
end
 
function NewClass:updateOutput(input)
end
 
function NewClass:updateGradInput(input, gradOutput)
end
 
function NewClass:accGradParameters(input, gradOutput)
end
 
function NewClass:reset()
end

When defining a new class, all we need to do is fill in these empty functions. Note that when defining the constructor __init(), we always call the parent's constructor first.

Let's see some practical examples now.

Dropout Activation Units

This week we heard about dropout activation units. The idea there is to perturbate the activations of hidden units, by randomly zeroing some of these units.

Such a class could be defined like this:

local Dropout, Parent = torch.class('nn.Dropout', 'nn.Module')
 
function Dropout:__init(percentage)
   Parent.__init(self)
   self.p = percentage or 0.5
   if self.p > 1 or self.p < 0 then
      error('<Dropout> illegal percentage, must be 0 <= p <= 1')
   end
end
 
function Dropout:updateOutput(input)
   self.noise = torch.rand(input:size()) -- uniform noise between 0 and 1
   self.noise:add(1 - self.p):floor()  -- a percentage of noise
   self.output:resizeAs(input):copy(input)
   self.output:cmul(self.noise)
   return self.output
end
 
function Dropout:updateGradInput(input, gradOutput)
   self.gradInput:resizeAs(gradOutput):copy(gradOutput)
   self.gradInput:cmul(self.noise) -- simply mask the gradients with the noise vector
   return self.gradInput
end

The file is provided in this directory, in Dropout.lua. The script 1_dropout.lua demonstrates how to create an instance of this module, and test it on some data (lena):

-- in this file, we test the dropout module we've defined:
require 'nn'
require 'Dropout'
require 'image'
 
-- define a dropout object:
n = nn.Dropout(0.5)
 
-- load an image:
i = image.lena()
 
-- process the image:
result = n:forward(i)
 
-- display results:
image.display{image=i, legend='original image'}
image.display{image=result, legend='dropout-processed image'}
 
-- some stats:
mse = i:dist(result)
print('mse between original imgae and dropout-processed image: ' .. mse)

When writing modules with gradient estimation, it's always very important to test your implementation. This can be easily done using the Jacobian class provided in nn, which compares the implementation of the gradient methods (updateGradInput() and accGradParameters()) with the Jacobian matrix obtained by finite differences (perturbating the input of the module, and estimating the deltas on the output). This can be done like this:

-- parameters
local precision = 1e-5
local jac = nn.Jacobian
 
-- define inputs and module
local ini = math.random(10,20)
local inj = math.random(10,20)
local ink = math.random(10,20)
local percentage = 0.5
local input = torch.Tensor(ini,inj,ink):zero()
local module = nn.Dropout(percentage)
 
-- test backprop, with Jacobian
local err = jac.testJacobian(module,input)
print('==> error: ' .. err)
if err<precision then
   print('==> module OK')
else
   print('==> error too large, incorrect implementation')
end

One slight issue with the Jacobian class is the fact that it assumes that the outputs of a module are deterministic wrt to the inputs. This is not the case for that particular module, so for the purpose of these tests we need to freeze the noise generation, i.e. do it only once:

-- we overload the updateOutput() function to generate noise only
-- once for the whole test.
function nn.Dropout.updateOutput(self, input)
   self.noise = self.noise or torch.rand(input:size()) -- uniform noise between 0 and 1
   self.noise:add(1 - self.p):floor()  -- a percentage of noise
   self.output:resizeAs(input):copy(input)
   self.output:cmul(self.noise)
   return self.output
end

Exercise

Well, at this stage, a natural exercise would be to try to integrate this module into the previous tutorials we have done. I would try the following:

  • insert this Dropout module on the input of an autoencoder: that will give you a denoising autoencoder

  • now more interesting: insert this Dropout module into the convolutional network defined in the supervised training tutorial: does it help generalization?

Multiscale/Pyramid Networks

Scene parsing, or semantic segmentation, consists in labeling each pixel in an image with the category of the object it belongs to. It is a challenging task that involves the simultaneous detection, segmentation and recognition of all the objects in the image.

I recently proposed a model that can parse a wide variety of scenes in an extremely small amount of time (about a half a second on an i7-based computer). The model relies on a multiscale extension of convolutional networks, where weights are not only shared in space, but also in the scale space.

More information and results can be found here and our paper. This was joint work with Camille Couprie, Laurent Najman and Yann LeCun.

The typical results produced by our system look like this:


The backbone of the system is the multiscale convolutional network. In this tutorial, I am just going to give you a pointer to some code that essentially takes any trainable model as an argument, and trains it on a pyramid version of the original input image. The code for this module can be found here. It is fairly generic, given an input image, it:

  • creates a multiscale pyramid
  • applies the given trainable model on each scale
  • upsamples the predictions and concatenates them to produce a dense feature map

This is depicted in this figure:


One thing that is not done in this module is weight sharing, which is left to the user. This is quite easy to do in Torch though, given an existing module, we can create replicas of it, which all share the same trainable parameters, but each have their own output states. This is done like this:

module = nn.Sequential()
-- fill this module with anything...
 
-- create a replica of the first module, with no sharing:
module2 = module:clone()
 
-- create a replica, which shares all its trainable parameters:
module3 = module:clone('weight','bias','gradWeight','gradBias')
tutorial_morestuff.txt · Last modified: 2012/10/02 13:28 (external edit)     


============================


torch nn 
criterion
karpathy char rnn criterion












Comments