It is generally not a good idea to train a very large DNN from scratch: instead, you should always try to find an existing neural network that accomplishes a similar task to the one you are trying to tackle. Below is the say from Andrew Ng
“Transfer learning will become a key driver of machine learning success in the industry.”
–Andrew Ng, 2016 Conference on Neural Information Processing Systems
Transfer learning(Knowledge transfer) is a method wherein a model developed for a particular task is used as a starting point for another task. By model here, we mean a neural network that is trained with data and knowledge gained while solving one problem. For example, the knowledge gained in learning to recognize crocodiles can be used to recognize alligators because they have a lot of features in common.
Lorien Pratt published the first known paper on transfer learning in 1993. Since then, there has been a lot of research in this space.
The pre-trained model is a model used by someone else to solve a problem that is similar in nature to our problem.
Pre-trained models are usually not 100% effective, but they will serve as a good starting point and save the time and effort of starting from scratch.
Below flow mentions which weights are fixed and which are trained in target problem domain
Suppose you want to tackle a complex task for which you don’t have much labeled training data, but unfortunately you cannot find a model trained on a similar task. Don’t lose all hope!
First, you should of course try to gather more labeled training data, but if this is too hard or too expensive, you may still be able to perform unsupervised pre-training.
If you have plenty of unlabeled training data, you can try to train the layers one by one, starting with the lowest layer and then going up, using an unsupervised feature detector algorithm. Each layer is trained(refer below image) on the output of the previously trained layers (all layers except the one being trained are frozen).
Example of such algorithms are Restricted Boltz‐ mann Machines, autoencoders.
Once all layers have been trained this way, you can fine- tune the network using supervised learning (i.e., with backpropagation). Refer below diagram for the example flow
Example problem is to use trained model in a particular language for summary of paragraph
Example problem is to use trained model for wake word recognition. For example, amazon Alexa wakes when Alexa word it heards of
Example problem is to use trained model for radiology diagnosis
Embeddings from Language Models (ELMo)
Bidirectional Encoder Representations from Transformers (BERT)
Refer this paper for transfer learning with Logistic regression(Is it correct?)
Transfer learning will work only well if the inputs have similar low-level features. For example, If the input pictures of your new task don’t have the same size as the ones used in the original task, you will have to add a prepro‐ cessing step to resize them to the size expected by the original model.
https://www.amazon.in/Hands-Machine-Learning-Scikit-Learn-TensorFlow/dp/1491962291
https://images.app.goo.gl/SUM97pC8QH7P4rfM9
https://www.aismartz.com/blog/an-introduction-to-transfer-learning/
https://www.topbots.com/transfer-learning-in-nlp/
https://images.app.goo.gl/g5fQ1tQLMz5iSwu86
https://images.app.goo.gl/nK8DBtWVPRyQ1QcX9
https://youtu.be/yofjFQddwHE?t=455
https://images.app.goo.gl/paKS1XqjD4yTvNRC7
https://youtu.be/_GRvjpJVr5A
https://www.researchgate.net/publication/283184357_TRANSFER_LEARNING_BASED_ON_LOGISTIC_REGRESSION