KAU Data Scienсe Center - Deep Neural Networks and Deep Learning architectures

2.2.1. Deep Neural Networks and Deep Learning architectures

Deep neural networks (DNNs) are considered to be capable of learning high-level features with more complexity and abstraction than shallower NNs due to their larger number of hidden layers. Defining a network architecture and training routine are two dependent problems that have to be focused in a problem solving with NNs in order to achieve high predictive accuracy [Goodfellow 2016] [Lisa 2015] [Schmidhuber 2015]. Defining network architectures involves setting certain fine-grained details like activation functions (e.g. hyperbolic tangent, rectified linear unit (ReLU), maxout) and the types of layers (e.g. fully connected, dropout, batch normalization, convolutional, pooling) as well as the overall architecture of the network. Defining routines for training involves into setting learning rate schedules (e.g. stepwise, exponential), the learning rules (e.g. stochastic gradient descent (SGD), SGD with momentum, root mean square propagation (RMSprop), Adam), the loss functions (e.g. MSE, categorical cross entropy), regularization techniques (e.g. L1/L2 weights decay, early stopping) and hyper-parameter optimization (e.g. grid search, random search, bayesian guided search). Some common DL architectures are:

Feed Forward Neural Network (FFNN), also known as (deep) neural network (DNN) or multi-layer perceptron (MLP), is the most common type of NNs. FFNNs work well on tabular (i.e. transactional) data, which is the main data type in financial and insurance companies [H2O.ai 2017].
Convolutional Neural Network (CNN or ConvNet) is traditionally a good choice for image data [Lazebnik 2017]. The most simple architecture consists on a stack on convolutional and pooling layers with a fully connected layer at the end.
Recurrent Neural Network (RNN) is a kind of folded NN. RNNs are distinguished from FFNNs the fact information can also flow backwards through feedback loops. One of the most popular blocks for building layers of RNNs are Long Short Term Memory (LSTM) units, which are composed of a cell, an input gate, an output gate and a forget gate. Some also popular blocks like Gated Recurrent Units (GRU) are improvements over the LSTM block. RNNs can deal well with context-sensitive, sequential or time-series data.
Boltzmann Machine is a kind of generative models with stochastic approach [Salakhutdinov 2009]. It is a network of symmetrically coupled stochastic binary units and its name is due to the use of Boltzmann distribution in statistics. Boltzmann Machine is a counterpart of Hopfield nets. Restricted Boltzmann Machine (RBM) interprets the NN as not a feedforward one, but a bipartite graph where the idea is to learn joint probability distribution of hidden and input variables. A RBM has no connections between hidden units. Deep Boltzmann Machine (DBM) comprises undirected Markov random fields with many densely connected layers of latent variables. DBMs have the potential of learning internal representations that become increasingly complex.
Deep Belief Network (DBN) is a kind of directed sigmoid belief networks with many densely connected layers of latent variables. Belief network is probabilistic directed acyclic graphical model, which represents a set of variables and their conditional dependencies via a directed acyclic graph.
Autoencoder is a kind of network useful for learning feature representations in an unsupervised manner. An autoencoder first compresses (encodes) the input vector to fit in a smaller representation, and then tries to reconstruct (decode) the input back.

2.2.1. Deep Neural Networks and Deep Learning architectures

Table 1 Deep Learning timeline through the most well-known models

Return to Contemt