As an alternative, Chainer provides an evaluation mode of forward computation which does not store the computation history. This is enabled by just passing volatile flag to all input variables. Such variables are called volatile variables.

Single-GPU usage is very simple. What you have to do is transferring FunctionSet and input arrays to the GPU beforehand.

A FunctionSet object can be transferred to the specified GPU using the to_gpu() method. Make sure to give parameters and gradients of the GPU version to the optimizer. :

model = FunctionSet(
    l1 = F.Linear(784, 100),
    l2 = F.Linear(100, 100),
    l3 = F.Linear(100,  10),

optimizer = optimizers.SGD()

Note that this method returns the FunctionSet itself. The device specifier can be omitted, in which case it uses the current device.

Then, all we have to do is transferring each minibatch to the GPU:

batchsize = 100
datasize = len(x_train)
for epoch in range(20):
    print('epoch %d' % epoch)
    indexes = np.random.permutation(datasize)
    for i in range(0, datasize, batchsize):
        x_batch = cuda.to_gpu(x_train[indexes[i : i + batchsize]])
        y_batch = cuda.to_gpu(y_train[indexes[i : i + batchsize]])

        loss, accuracy = forward(x_batch, y_batch)

This is almost identical to the code of the original example, we just inserted a call to the cuda.to_gpu() function to the minibatch arrays.


When you call a function with an invalid type of array, you sometimes receive no error, but get an unexpected result by broadcasting. When you use CUDA with an illegal type of array, it causes memory corruption, and you get a serious error. These bugs are hard to fix

Each implementation of Function has a method for type check, check_type_forward(). This function is called just before the forward() method of the Function class. You can override this method to check the condition on types and shapes of arguments.

check_type_forward() gets an argument in_types:

def check_type_forward(self, in_types):

When a function gets volatile variables as its inputs, the output variables do not hold references to the function. This acts like unchaining on every function application.