A Deep dive into Knowledge Distillation: Distilling BERT
A Deep dive into Knowledge Distillation: Distilling BERT
Poster: A Deep Dive Into Knowledge Distillation: Distilling BERT
The project aims to enhance sentiment analysis by applying knowledge distillation to BERT models using the SST2 dataset, with a focus on creating efficient yet comprehensive models. It explores various BERT configurations, such as the 6-6-510-BERT with 28.6 million parameters, and the more compact 4-4-256-BERT with 11.17 million parameters, each showing a trade-off between training accuracy and validation performance. Other models like the 8-6-384-1536-BERT and 6-6-384-BERT also demonstrate this balance between complexity and performance. While the project faces limitations such as dataset dependency, potential loss of complexity from the larger BERT model, sensitivity to hyperparameter tuning, high computational demands, and interpretability issues, future improvements could include integrating knowledge graph distillation. This approach promises enhanced semantic understanding and model explainability, though it introduces challenges like the need for high-quality knowledge graphs and increased computational complexity. Overall, these efforts indicate that further optimization of model architecture and hyperparameters could lead to better sentiment analysis performance.
With respectable Sir, Prof. Dr. Rajesh Palit.
Chair of Department of Electrical and Computer Engineering, NSU.
My Thesis Group,
Kazi Hafiz Md. Asad, Anjana Tameem and Myself.
In our booth of Capstone Project Showcase,
Summer 2023 .