Ordinal Classification (also called "ordinal classification") is a type of regression analysis used for predicting an ordinal variable, i.e. a variable whose value exists on an arbitrary scale where only the relative ordering between different values is significant. Some examples of ordinal regression problems are predicting human preferences ( strongly disagree to strongly agree), predict a temperature (Hot, Mild, Cold), predict a book/movie ratings (1 to 5).
Activation Function - mathematical equations that determine the output of a neural network.
ReLU (Rectified Linear Unit) y=max(0,x) - Non-linear. Looks like a linear function, but ReLU has a derivative funcation and allows for backpropagation. Disadvantage is 'Dying ReLU,' when inputs approach zero or are negative the gradient of the function becomes zero. Therefore the network cannot perform backpropagation and cannot learn. Most likely to happen when learning rate is too high and there is a large negative bias.
Leaky ReLU - Small slope for negative values, but fixes 'Dying ReLU' problem. Speeds up training, more balances, and may therefore learn faster. However the results are not consistent for negative input values.
ELU (Exponential Linear) - Small slope for negative values, instead of a straight line. Combines the good parts of ReLU and Leaky ReLU. It saturates for large negative values, allowing them to essentially be inactive.
Parametric ReLU (PReLU) - type of Leaky ReLU that allows the negative slope to be learned. Unlike Leaky ReLU, this function provides the slow of the negative part of the function as an argument. It is, therefore, possible to perform backpropagation and learn the most appropriate value of α. Unfortunately, it may perform differently for different problems.
Concatenated ReLU (CReLU) - TWO outputs, one ReLU and one negative ReLU concatenated together.
Softmax able to handle multiple classes, whereas other activation functions are only one class. Softmax normalizes the outputs for each class between 0 and 1, and then divides by their sum, giving the probability of the input value being in a specific class. Typically softmas is used only for the output layer, for neural networks that have multiple categories.
train_test_split - function splitting data into two subsets for training data and for testing data.
Keras - open-source neural-network library written in Python.
Loss Function - Maps an event or values of one or more variables into real numbers. Machine learning learns by means of a loss function. This is a method of evaluating how well a specific algorithm models the given data. If predictions deviate too much from actual results, loss function will produce a very large number.
BERT (Bidirectional Encoder Representations from Transformers) - Language representational model, developed by Google and release in later 2018. Designed to pre-train deep bidirectional representations from unlabels text.
TensorFlow is an open source machine learning framework for all developers. It is used for implementing machine learning and deep learning applications. To develop and research on fascinating ideas on artificial intelligence, Google team created TensorFlow. TensorFlow is designed in Python programming language, hence it is considered an easy to understand framework.
WordPiece tokenization breaks down a sentence or group of words into more than one sub-words.
tokenizer.tokenize('Hi my name is Dima')
# OUTPUT
['hi', 'my', 'name', 'is', 'dim', '##a']
Adaptive Moment Estimation (Adam) is an adaptive learning rate method, which means, it computes individual learning rates for different parameters. Its name is derived from adaptive moment estimation, and the reason it’s called that is because Adam uses estimations of first and second moments of gradient to adapt the learning rate for each weight of the neural network.
Optimizers are the extended class, which include added information to train a specific model.