Evolutionary supervision of a dynamical neural network allows learning with on-going weights

The Concept

The paper presents a neural network which has two important properties. First, it updates its synaptic weights dynamically and second, it retains the previous learning even though the synaptic weights are changing over the time. The key point suggested in the paper is that the synaptic weights, which are the support of learning, may retain their past memories while changing dynamically if the updation scheme of these weights uses the previous knowledge.

The motivation behind the work is the dynamic nature of the brain components. The contemporary artificial neural networks use static models at neuronal and synaptic levels and assume that each action potential provides the pertinent information [1]. Based on this hypothesis, the synaptic weights are calculated in the preliminary learning phase and then kept constant during the computation of the network in the generalization phase. However, some studies [2,3] have shown the influence of the causal timeline of the stimulus on the synaptic weights. This suggest that the synaptic weights should change continuously based upon the exact time the events occur and the time difference between various events. On the other hand, this poses a problem in terms of the retention of past learning. This contradiction in the requirement of retaining past memories while changing the weights continuously is often referred to as the stability/plasticity dilemma [4].

Using the theory of dynamical systems [5], the paper presents the idea of using the current input as well as past learning to update the synaptic weights. This can be implemented in the form of retroaction, i.e., back propagation of previous neuronal output as one of the inputs. In this manner, the previous sequence of outputs is used along with the current input to determine and update the synaptic weights. Thus, though the neuron model is itself linear, the network can be non-linear and adaptive. Further, in order to keep the dynamic neural network stable and regulated, evolutionary algorithm mimicking the law of natural selection is used for updating the synaptic weights.

The Example: Virtual Prey-Predator Experiment [6]

The paper presents a virtual prey predator experiment (virtual zoo) [6] as an example of the abilities of neural network with dynamic evolutionary synaptic updation scheme. In the experiment, a robot with such prescribed neural network is present in a stationary virtual environment where animals which are either preys of the robot, or its predators, or neutral with respect to the robot are distributed randomly. There are also gums in the environment. The robot begins with some life points, which increase or decrease in the subsequent executions of neural network. The robot dies if its life points exhaust. The aim of the robot is to remain alive for as long as possible.

The robot can see the animals and hear them in a limited region around it. It has to learn about the threat or the opportunity and decide its move accordingly. Further, it can also learn by punishment/pain (when encountered by a predator) or reward (small reward when eating a gum and large reward when encountering a prey). The net performance of one execution of neural network is computed in terms of change in life points from the previous execution. It is noted that the structure of punishment and reward provides not only for the fitness function, but also for the learning in the following manner:

1. Exploration: The robot trying to eat maximum gums is an indication of the exploration of the environment.

2. Preference to prey: The robot trying to reach the prey is an indication that it shows preference to the larger foods than the gums and thus increases its life points.

3. Predator avoidance: The life span and the attempt to live longer by eating more gums and preys is itself one indication of the predator avoidance. The punishment on encountering predators, learning about the location of predators, and avoiding them in future is another way to learn predator avoidance.

The neural network, that forms the brain of the robot, has to begin with no initial knowledge about the environment and learn to adapt and maximize its life span using the evolutionary algorithm. Though the robot keeps learning all its lifetime and the longer life of the robot is itself a marker of the learning, a test environment is also used to check the performance of the robot explicitly. A prey or predator is placed randomly in the test environment and the robot is left to explore it. The robot performs well if it grabs the prey or exits the test environment in the presence of predator. The robot performs poorly if it does not do so. Such test environment provides a mentoring scheme for the innate knowledge gained over various executions. It is notable that punishment and reward inputs are not used during the test in order to evaluate the present knowledge of the robot.

The Robot’s Brain:

The robot’s brain is a neural network in which assemblies of neurons are linked to each other by unidirectional projections (all-to-all synaptic links between the neurons). Projections can either be excitatory or inhibitory. There are 100 internal assemblies of 25 neurons each. Each internal assembly receives 6 inputs and sends 6 outputs on average. There are 9 input assemblies (5 for vision, 2 for auditory sense, and 1 each for reward and punish). There are 4 motor outputs (2 for body movement and 2 for head movement).

The neuron model is the spike response model [7] with a simplified PSP kernel that employs the Dirac delta function. This keeps the computation simple while retaining the fundamental time synchronism of the neuron [2, 30]. Spike time dependent plasticity model [3] has been used for synapses, in which the synaptic weight is updated every time a pre-synaptic or post-synaptic neuron emits a spike. The excitory synapse that emits a PSP which shall be responsible for spike emission is potentiated (pre-to-post causal order) while the synapse arriving after a spike is deprecated. In this way the synapse retains the information of the post-synaptic neuron firings via back-propagated spikes. Inhibitory synapses are updated using a window based on the correlation of spikes, and play a role of regulating the post-synaptic neuron activity (rather than shunting and thus forgetting everything, it retains the spikes with high correlation and thus maximum information from the past).

Evolutionary Algorithm:

The neuron model is kept constant throughout the evolution. The synaptic weights are initialized to 0.5 for excitory synapses and –0.5 for inhibitory synapses. The randomly chosen delay for synapses is kept constant through the evolution. The result of the application of temporal window is scaled and smoothened to restrict the range of the weights.

The chromosomes (i.e. individuals) for evolution contain the information of the synapses and their corresponding weights. For each generation in the evolution, the test experiment is executed before and after the generation for all the individuals in order to measure the innate knowledge before the generation and the knowledge learnt in that generation. The weights of the synapses are updated during the evolution but kept unchanged during the tests.

Classical genetic algorithm has been used for evolution. The three main phases of evolution are realised as follows. For evaluation phase, the robot is equipped with the network corresponding to the individual and the fitness function is evaluated twice independently. The lower fitness value is chosen as the evaluation output in order to maintain reproducible (and not chance) performance. For selection phase, a definite number of individuals are selected to compete in the tournament and the best competing individual is selected for the next population. The variation phase is achieved by performing mutation and crossover such that there is one mutation per chromosome and one cross-over per pair of chromosomes.

Inferences from the Experiment

1. The results show that evolutionary algorithm selects better learned and adaptive individuals.

2. The innate knowledge of the individuals is very low in the beginning of the evolution and the individuals gain higher capability to learn after meeting the animals and exploring the environment.

3. The synaptic weights change continuously over the evolution. However, some synaptic weights vary more than others, depending upon the spike trains of the connected neurons.

4. The evolution initially prefers the individuals that are able to move. In the later stage, when all the individuals are able to move, it prefers the individuals that are able to adapt their behaviours correctly.

Conclusion

Dynamic neural networks with spike time dependent plasticity have been demonstrated successfully in this paper. The synaptic weights may be plastic (flexible) in order to learn continuously from the causal input while retaining the knowledge/learning of the past by back-propagation of spikes and regulation of the overall network by the natural selection (using evolutionary algorithm). The feasibility and performance of the concept has been demonstrated using an ecological plausible experiment that requires learning and adaptivity. Although the performance of the individuals in such environment does not explicitly depend upon their ability to learn, it has been shown that the individuals indeed perform better after an evolutionary run. Though there is no innate knowledge in the beginning of the algorithm (since the synaptic weights are not initialized based upon some knowledge about the environment), the individuals eventually develop the innate knowledge and perform better by avoiding predators, approaching preys, and living longer.

References:

[1] S. J. Thorpe, D. Fize, and C. Marlot. Speed of processing in the human visual system. Nature, 381:520–522, 1996.

[2] C. M. Gray and W. Singer. Stimulus specific neuronal oscillations in orientation columns of cat visual cortex. Proc. Natl. Acad. Sci., 86:1698–1702, 1989.

[3] G. Bi and M. Poo. Synaptic modifications in cultured hippocampal neurons: dependence on spike timing, synaptic strength, and postsynaptic cell type. J. Neurosci., 18:10464–10472, 1998.

[4] G.A. Carpenter and S. Grossberg. Pattern Recognition by Self-Organizing Neural Networks. MIT Press, Cambridge, Mass., 1991.

[5] J. Demongeot, M. Kauffman, and R. Thomas. Positive feedack circuits and memory. Comptes-rendu de l’Acadmie des Sciences de Paris, Sciences de lavie, 323:69–79, 2000.

[6] E. Reynaud and D. Puzenat. A multisensory indentification system for robotics. In IJCNN 2001, pages 2924–2929. International Joint Conference on Neural Networks, 2001.

[7] W. Gerstner and W. Kistler. Spiking Neuron Models: Single Neurons, Populations, Plasticity. Cambridge University Press, 2002.