Deep neural networks (DNNs) are considered to be capable of learning high-level features with more complexity and abstraction than shallower NNs due to their larger number of hidden layers. Defining a network architecture and training routine are two dependent problems that have to be focused in a problem solving with NNs in order to achieve high predictive accuracy [Goodfellow 2016] [Lisa 2015] [Schmidhuber 2015]. Defining network architectures involves setting certain fine-grained details like activation functions (e.g. hyperbolic tangent, rectified linear unit (ReLU), maxout) and the types of layers (e.g. fully connected, dropout, batch normalization, convolutional, pooling) as well as the overall architecture of the network. Defining routines for training involves into setting learning rate schedules (e.g. stepwise, exponential), the learning rules (e.g. stochastic gradient descent (SGD), SGD with momentum, root mean square propagation (RMSprop), Adam), the loss functions (e.g. MSE, categorical cross entropy), regularization techniques (e.g. L1/L2 weights decay, early stopping) and hyper-parameter optimization (e.g. grid search, random search, bayesian guided search). Some common DL architectures are: