Within the umbrella of randomized Neural Network approaches [1], Reservoir Computing (RC) is an extremely efficient paradigm for modeling Recurrent Neural Networks (RNNs), and it is considered a de facto state-of-the-art approach for learning in temporal and sequential domains.
The RC paradigm has been instantiated in several equivalent forms in literature, among which the Echo State Network (ESN) model is likely the most known in the neuro-computing area. Simply said, an ESN operates on sequential data by recurrently encoding the external input in its reservoir state, which provides to the system a memory over the past input history. At each time step, the output is computed from the reservoir's units activations by a layer of linear units called readout. The weights of the connections pointing to the readout are the only ones that are trained, while those pertaining to the input-reservoir and to the reservoir's feedback connections are left untrained, thus the characterization of extreme efficiency of training algorithms for ESNs.
Nowadays ESNs are widely used to approach a large variety of learning problems emerging in diverse real-world application domains. Currently, some fundamental questions, mainly on the true nature of their operation, stimulate the research effort in this area, as discussed in [1].
[1] C. Gallicchio, J. D. Martin-Guerrero, A. Micheli, E. Soria-Olivas, "Randomized Machine Learning Approaches: Recent Developments and Challenges", Proceedings of the 25th European Symposium on Artificial Neural Networks (ESANN), Bruges, Belgium, 26-28 April 2017, i6doc.com, pp. 77-86, ISBN: 978-287587038-4, 2017
Why do ESNs work? Understanding the bias of using fixed randomized weights.
The reservoir of an ESN implements a discrete-time dynamical system computed by means of a randomized basis expansion. The initialization conditions used to fix the recurrent weights in the reservoir, according to the Echo State Property, basically constrain the system dynamics towards a Markovian characterization of the state space, as it has been shown in [2]. According to such characterization, even without training, the reservoir is already able to develop state representations that distinguish between different input histories in a suffix-based fashion. This means (i) that input sequences that share a common suffix will be encoded in states that are close to each other proportionally to the length of the common suffix, and (ii) that sequences that have different suffixes will be encoded into (possibly highly) far states. The intrinsic differentiation among the input histories performed by the reservoir is exploited together with the typical high dimensionality of the reservoir states to make the problem amenable to linear regression in the state space by the readout. Thereby, whenever the learning task to be solved is compliant with this Markovian characterization then ESNs provide an extreme efficient way of successfully approaching it.
Further information on the suffix-based characterization of reservoirs state space organizations can be found in [2].
[2] C. Gallicchio, A. Micheli, "Architectural and Markovian factors of echo state networks", Neural Networks, Elsevier, vol. 24(5), pp. 440-456, DOI: 10.1016/j.neunet.2011.02.002, ISSN: 0893-6080, 2011. [PDF_preprint]
Extending the paradigm to structured data with Tree Echo State Networks.
The possibility of extending neural networks methodologies for learning in domains of highly structured data representations, such as trees, graphs and networks, opens the way to a broad range of exciting real-world applications in domains such as Cheminformatics, Computational Toxicology, Document processing, Social Network Analysis, just to name a few. However, at the same time, extending neural networks approaches to naturally deal with structured data involves the major downside of exploding training costs of the learning algorithms. Hence, the advantages of efficient methodologies for learning in structured domains is even more clear and appealing than in the case of temporal data processing.
Recently, the Tree Echo State Network (TreeESN) model [3] has been proposed as an extension of the ESN approach to hierarchical structures. Specifically, the reservoir layer of a TreeESN implements a stable dynamical system on trees, through which the input structure is encoded into an isomorphic structured state representation. As for standard ESNs, training TreeESNs involves only the adaptation of a linear readout layer, and thereby it is incredibly more efficient when compared to other neuro-computing approaches (in which all the model's parameters must be trained). A preliminary extension of the same ideas on graph structured data is provided by the Graph Echo State Network model, proposed in [4].
[3] C. Gallicchio, A. Micheli, "Tree Echo State Networks", Neurocomputing (2013), vol. 101, pp. 319-337, Elsevier, DOI: 10.1016/j.neucom.2012.08.017, ISSN: 0925-2312. [PDF_preprint]
[4] C. Gallicchio, A. Micheli, "Graph Echo State Networks", Proceedings of the 2010 International Joint Conference on Neural Networks (IJCNN), Barcelona, Spain, 18-23 July 2010, IEEE, pp. 1-8, DOI: 10.1109/IJCNN.2010.5596796, ISBN: 978-1-4244-6916-1, 2010.
Deep Echo State Networks.
Recently, an intriguing research direction is being devoted to the extension of the Deep Learning principles to processing of temporal information by means of RNNs. Deep Neural Networks are based on the hierarchical composition of many non-linear hidden layers. Through learning, such architecture is able to develop a distributed, progressively more abstract representation of the involved information, sparsely represented across the activations of the units in deeper layers. Moving this paradigm to time-series processing is extremely appealing, as it would allow to naturally learn hierarchical representation of temporal data, and thereby approach a large variety of problems (especially in the cognitive area) in a natural fashion.
In this context, the introduction of the deep Echo State Network (deepESN) model [5] promises to open the way to the development of novel recurrent models for time series processing that can deal with multiple time-scales dynamics in the input borrowing the extreme efficiency of training algorithms typical of RC. On the theoretical side, studies on the deepESN model from the dynamical system viewpoint [6][7] allow to shed fresh light on the real importance of layering per se as a major design factor in deep recurrent networks. At the same time, investigations on the characterization of the deepESN multi-layered dynamics allow to develop a more in-depth understanding on the true merits of learning in the development of the hierarchical representation of time.
The current state of research on this topics suggests that the hierarchical composition of the recurrent layers in deep RNNs is the most important aspect of what can be accounted for the definition of a deep learning approach for time-series. Learning, and as recently discovered, also non-linearities in the hidden layers, seems definitely to have a less important role in the emerging structure of temporal data representation.
[5] C. Gallicchio, A. Micheli, L. Pedrelli, "Deep Reservoir Computing: A Critical Experimental Analysis", Neurocomputing (2017), DOI: 10.1016/j.neucom.2016.12.089 [LINK][PDF_preprint]
[6] C. Gallicchio, A. Micheli, "Echo State Property of Deep Reservoir Computing Networks", Cognitive Computation (2017), DOI: 10.1007/s12559-017-9461-9 [PDF_preprint]
[7] C. Gallicchio, A. Micheli, L. Silvestri, "Local Lyapunov Exponents of Deep RNN", Proceedings of the 25th European Symposium on Artificial Neural Networks (ESANN), Bruges, Belgium, 26-28 April 2017, i6doc.com, pp. 559-564, ISBN: 978-287587038-4, 2017