LSTM role in current data science

Introduction

Laymen explanation

LSTM is not a new data science tool. It has got good traction. However due to its limitation, many new models without LSTM came for which earlier LSTM was being used. For example, transformer model in NLP become popular which didn't used LSTM at all contrary to predecessor NMT models. This document talks about current status of LSTM in data science. Any comment/correction is well appreciated.

Technical explanation

This is surprising as neural networks are known to be able to learn complex non-linear relationships and the LSTM is perhaps the most successful type of recurrent neural network that is capable of directly supporting multivariate sequence prediction problems.

LSTM was used for many fields including NLP. However, due to below challenges, many new models exclude LSTM in their architecture.

- Processing and memory need
- Difficulty in retaining long sequence of memory

Considering this, it is worth to evaluate LSTM role in current data science proceedings.

Recurrent Neural Network

RNN is related to mathematics recurrence formula(Ref: below diagram).

So, A recurrent neural network can be thought of as multiple copies of the same network, each passing a message to a successor.

RNN suffers with Vanishing gradient problem. This problem causes pre-mature learning completion.Note that RNN mathematical formula has recurrence which causes this.

LSTM is a special kind of RNN

Its special RNN which stores long and short term memory. Here each layer keeps forget gate, memory gate and output gate

LSTM solves Vanishing gradient problem problem although it has recurrence in its formula. It happens since LSTM has additive factor which prevents derivative to be zeros.

Use-cases

Time series based forcasting

For example, prediction of stock price. This write-up talks about Covid-19 prediction using LSTM

Identify co-relation between time series data

In medical world example, Confounders are variables that are related to or associated with the exposure and outcome. As a hypothetical example, say a correlation is identified between music festivals and increases in skin rashes during any given year. Music festivals do not directly cause skin rashes. In this case, one possible confounding variable between the two may be outdoor heat, as music festivals tend to run outdoors when the temperature is high, and heat is a known cause for rashes. But there may be many others such as dust, age of festival attendee, and many other possible confounders. When working with real-world data, the number of confounders could number in the thousands. LSTM is well-suited to find patterns in the complexity of potentially thousands of confounders.

Text classification

Text classification is the process of assigning tags or categories to text according to its content. For example,

new articles can be organized by topics;
support tickets can be organized by urgency;
chat conversations can be organized by language;
brand mentions can be organized by sentiment; and so on.

LSTM along with CNN (Convolutional Neural Networks) solves this use-case.

Reference

https://www.yuthon.com/post/tutorials/notes-for-cs231n-rnn/

https://www.psychologytoday.com/ca/blog/the-future-brain/202101/ai-deep-learning-finds-label-uses-fda-approved-drugs

https://datascience.stackexchange.com/questions/27124/which-neural-network-topology-to-learn-correlations-between-time-series

https://machinelearningmastery.com/lstm-model-architecture-for-rare-event-time-series-forecasting/

https://medium.com/datadriveninvestor/how-do-lstm-networks-solve-the-problem-of-vanishing-gradients-a6784971a577

https://monkeylearn.com/text-classification/

Page updated

Google Sites

Report abuse