Tutorial on Deep Randomized Neural Networks

Welcome to the AAAI-21 tutorial on Deep Randomized Neural Networks! Here you can find up to date material including

Contacts

Claudio Gallicchio (University of Pisa) gallicch@di.unipi.it

Description of the tutorial

Deep Neural Networks (DNNs) are a fundamental tool in the modern development of Machine Learning. Beyond the merits of the training algorithms, a great part of DNNs success is due to the inherent properties of their layered architectures, i.e., to the introduced architectural biases. In this tutorial, we explore recent classes of DNN models wherein the majority of connections are randomized or more generally fixed according to some specific heuristic.

Limiting the training algorithms to operate on a reduced set of weights implies intriguing features. Among them, the extreme efficiency of the learning processes is undoubtedly a striking advantage with respect to fully trained architectures. Besides, despite the involved simplifications, randomized neural systems possess remarkable properties both in practice, achieving state-of-the-art results in multiple domains, and theoretically, allowing to analyze intrinsic properties of neural architectures.

This tutorial covers all the major aspects regarding Deep Randomized Neural Networks, from feed-forward and convolutional neural networks, to dynamically recurrent deep neural systems for structures. The tutorial is targeted to both researchers and practitioners, from academia or industry, who is interested in developing DNNs that can be trained efficiently, and possibly embedded into low-powerful devices.

Requirements: basics on Machine Learning and DNNs.

Content

Randomization can enter the design of Deep Neural Nets in several disguises, including training algorithms, regularization, etc. We focus on “Random-weights” neural architectures where part of the weights are left untrained after initialization.

Because we touch on a number of different fields, we do not aim at a comprehensive survey of the literature. Rather, we highlight general ideas and concepts by a careful selection of papers and results, trying to convey the widest perspective. When possible, we also highlight points that in our opinion have been under-explored in the literature, and possible open areas of research. Finally, we consider a variety of types of data, ranging from vectors to images and graph-based datasets.

The tutorial is structured as follows.

  • Introduction: Preliminaries on Deep Learning, embedded applications, complexity/accuracy trade-off, randomization in Neural Networks.

  • Randomization in Feed-forward Neural Networks: Historical notes, Random Projections, Intrinsic dimensionality, Random kitchen sinks, Random Vector Functional Link.

  • Deep Random-weights Neural Networks: Deep Randomized Networks for analysis and with random gaussian weights, Randomized autoencoders, Semi-random features, Deep Image Prior, Direct Feedback Alignment.

  • Randomization in Dynamical Recurrent Neural Networks: Motivationsand preliminaries on Recurrent Neural Networks and Neuromorphic Neural Networks, Reservoir Computing, Stability Properties in Randomized Neural Networks, Universality, Fundamental topics in Reservoir Computing (e.g., reservoir topology and the edge of chaos), Applications.

  • Deep Reservoir Computing: Deep Recurrent Neural Networks, Deep Echo State Networks, Properties of Deep ESNs, Intrinsic richness in deep recurrent neural systems, Applications, Fast and Deep Neural Networks for Graphs.

  • Conclusions: Take home messages and hints on future research directions.

AAAI__mq3__deep-randomized-neural-network

Slides

have a look at the slides used in the tutorial

2002.12287.pdf

Reference paper

The content of this tutorial is mostly taken from:

Gallicchio C., Scardapane S. (2020) Deep Randomized Neural Networks. In: Oneto L., Navarin N., Sperduti A., Anguita D. (eds) Recent Trends in Learning From Data. Studies in Computational Intelligence, vol 896. Springer, Cham. https://doi.org/10.1007/978-3-030-43883-8_3

Pre-print on arXiv: https://arxiv.org/abs/2002.12287 (attached, here on the left)

Other resources and Links

A super-short introduction to Randomization in (Deep) Neural Networks

Motivations - Deep Neural Networks (DNNs) achieved a tremendous success and are extremely popular. However, in practice, there are cases in which accuracy is not the only constraint that you have. For example, you might have constrains in terms of time and efficiency, i.e. constraints in terms of the number of tunable parameters in the neural architecture. Do we really need a fully trainable DNN in these cases? Example applicative areas are given by 5G, distributed intelligence applications (e.g., autonomous driving), where you want DNN solutions at a scale, featured by learning algorithms that can run also on low powerful edge devices.

Deep Learning = Architectural Biases + Training algorithms.

When we deal with Randomized DNNs we remove the training algorithms from the equation and we see how far we can go by relying almost exclusively only on the architectural biases. Essentially, they represent advantageous tradeoff between accuracy and complexity. They can have a much higher accuracy than linear algorithms and standard powerful machine learning algorithms, exploiting the architectural bias of deep neural networks, but at the same time they have much lower complexity than fully trainable DNNs because they try to avoid as much as possible the BackProp training.

Ali Rahimi's test of time award talk at NeurIPS 2017.

Randomization is computationally cheaper than optimization

Despite their simplicity, learning models based on random features can give you strong baselines in just a few seconds.


Randomization in Feed-forward Neural Networks - The idea of randomization in the design of NN architectures is quite old, and dates back up to the works on the Perceptron (in the middle between the retina and the classifier there was a "hidden" randomized layer). Another popular 'classical' example is given by random projection, an efficient alternative to dimensionality reduction by, e.g., PCA. Interestingly, this idea is used to get an insight on the intrinsic dimensionality of a learning problem (see the paper).

Mathematically, the fundamental idea is to have the comutation decomposed into a representation function and a readout function. The representation function is computed by a hidden layer, it's the inner projection of the data, and it is fixed after randomization. The outer projection is learned, typically by using closed-form solutions with l2-regularized least squares (ridge-regression). More in general, as the readout is often linear, anything that comes from the linear literature can be exploited.

Overall, this general approach has a number of nice properties: it's amenable to clean mathematical analysis (hence, it's more simple to understand the resulting models); it leads good approximation properties; implementations are very simple; training algorithms are super-fast compared to fully trained architectures. The other side of the coin is that in complex applications the inner dimensionality can blow up explonentially (hence vanishing the computational advantage). Also notice that this idea has been proposed many times in literature, under several names (hence a bit of confusion, and not always 'friendly' relations between the respective groups). Fundamental examples are given by Random Vector Functional Links, Extreme Learning Machines, Stochastic Configuration Networks, No-Prop, Random Kitchen Sinks). Have a look at this nice survey on randomization in neural networks by Simone Scardapane.

Randomization in Deep Feed-forward Neural Networks - The simplest setting generalizes the feed-forward NN architecture described in the previous section: the representational function is computed by a stack of randomized & untrained hidden layers, and the readout layer is the only trained component. Irrespective of the accuracy of such a model, an analysis of its theoretical properties is interesting because it corresponds to investigating the behavior of a deep network in a small subspace around its random initialization. Several works show interesting connections with the fields of kernel methods and Gaussian processes. Other interesting connections are possible, for instance with:

  • Metric learning: in this paper it is shown how randomization-based DNNs can perform a distance preserving embedding of the data.

  • Autoencoders: the DNN is progressively built by using autoencoders, where the hidden layer is randomly fixed, see e.g., this paper.

  • Convolutional architectures: see e.g., this paper, which explores the performance of CNN-based networks in which some conv layers are raindomized, or with partially untrained filters.

  • Transformer Networks: see, e.g., this paper, using fixed layer interleaved with trainable ones, and this paper on efficient alternatives called Performers.

  • Priors in image tasks: a CNN architecture contains already enough structural information that even in the absence of training it can be used with success in many image processing tasks (e.g., artifact removal, inpainting, super resolution). See the Deep Image Prior paper and the corresponding webpage.

  • Training algorithms: an intriguing example is given by Direct Feedback Alignment, showing that BP-like algorithms can be effective even if the weight matrices used to backpropagate the loss are randomized (and different from those used in the forward pass).

  • The lottery ticket hypothesis, see e.g. this paper, and neural architecture search, see e.g. this paper (see also the weight-agnostic webpage).

Deep Neural Networks are commonly over-parametrized. Is it possible to find simple and randomized networks that achieve a similar performance to the one reachable by large fully trainable ones.


Randomization in Recurrent Neural Networks: Reservoir Computing - The concept of randomization is very useful also in the field of dynamical neural systems. The concept is nowadays popular under the name of Reservoir Computing. Essentially, the recurrent hidden layer computes a representational function that is now a dynamical system. This part of the architecture can be left untrained after initialization, provided that a stability property, like the Echo State Property is in place. As in the previous sections, the only trainable part is the output readout layer. Overall the approach is very simple, yet it enables reaching very good performance in several types of tasks (e.g., in intelligence embeddign scenarios).

There are several online resources on Reservoir Computing.

Deep Reservoir Computing...& Beyond - The Reservoir Computing approach can be extended towards DNNs by considering an untrained deep recurrent NN as hidden representation layer of the architecture. The approach is flexible and can be applied to several forms of structured data, ranging from time-series to graphs!

Have a look at my recent webinar on Deep Reservoir Computing, organized by LightOn

A computing Toolkit for building Efficient Autonomous appliCations leveraging Humanistic INtelliGenceis an EU-funded project that designs a computing platform and the associated software toolkit supporting the development and deployment of autonomous, adaptive and dependable CPSoS applications, allowing them to exploit a sustainable human feedback to drive, optimize and personalize the provisioning of their services.

We are currently using the concepts of Deep Randomized Neural Networks in the development of distributed and federated Artificial Intelligence as a Service for Cyber-physical Systems of Systems applications.

Take a look at the project website https://www.teaching-h2020.eu