Undergraduate Thesis
A Generative Bangla Question Answering System using Large Language Model for University Admission Related Frequently Asked Questions
Led the development of a generative Bangla question answering system tailored for university admission support for Bangladesh, addressing the gap in reliable, accessible resources for Bangladeshi students. This research stands out for leveraging advanced large language models and creating a custom Bangla QA dataset, directly impacting admission guidance in one of the world’s most widely spoken languages.
Tackled complex admission queries by building an automated Bangla QA platform rooted in real student needs.
Compiled and curated a novel dataset of 5,000 authentic question-answer pairs from diverse local sources.
Use many fine-tuned models but DeepSeek and Llama outperformed in specific portion, validated using advanced evaluation metrics.
Delivered a unique, efficient, and reliable AI system for students, setting new standards in low-resource language technology.
Compared between using RAG & Fine-tune approach on QA and different format prompt using for better result.
Development of an IoT-Based Pipe Water Quality Monitoring and Control System for Smart
City
IoT-based smart water quality monitoring and automatic control system designed for urban pipeline networks. The system continuously measures four critical water quality parameters—pH, temperature, turbidity, and Total Dissolved Solids (TDS)—using sensors connected to an Arduino-based setup. Sensor data is sent to a Firebase real-time cloud database, visualized on a custom web dashboard, and monitored remotely by users and authorities.
If any water parameter exceeds WHO-recommended safety limits, the system instantly:
Stops water supply by closing a solenoid valve
Sends alert SMS through a GSM module
Displays real-time warnings on the website
To evaluate performance, the authors collected 10,000+ sensor data samples, labeled them as safe/unsafe, and trained multiple machine learning models. A Decision Tree classifier achieved 99% accuracy, showing strong potential for predictive, automated water safety assessment.
The study concludes that this low-cost smart system can significantly reduce waterborne diseases and improve water supply management in smart cities. Future improvements include mobile app integration, leakage detection, in-pipeline purification, and disease prediction from sensor data.
MSM CUETDravidianLangTech 2025 XLM-BERT and MuRIL Based Transformer Models for Detection of Abusive Tamil and Malayalam Text Targeting Women on Social Media
Transformer-based models for detecting abusive Tamil and Malayalam text targeting women on social media, a challenging task for low-resource Dravidian languages. The work evaluates several multilingual transformers—XLM-R, MuRIL, IndicBERT, mBERT, and ensemble methods—and identifies the most effective models for each language.
Using the official DravidianLangTech@NAACL 2025 dataset, the study finds that:
XLM-RoBERTa performs best for Tamil, achieving an F1 score of 0.7873, ranking 2nd in the shared task.
MuRIL performs best for Malayalam, achieving an F1 score of 0.6812, ranking 10th.
The results highlight that multilingual transformers can handle abusive text in low-resource languages, but challenges remain in handling sarcasm, dialect variations, and contextual ambiguity. The paper contributes an effective approach for gender-targeted abuse detection and offers insights for improving NLP models in Dravidian languages.
Dataset of Natural Habitat Leaves of Traditional Medicinal Plants for Recognition and Classification.
This work introduces a high-quality dataset of 6,419 images of six traditional medicinal plant species collected from the natural jungle habitat of Guadanga, Phulpur, Mymensingh, Bangladesh. The dataset includes original, annotated, and augmented leaf images, captured using Google Pixel 5 and 6a smartphones under controlled lighting. All images were preprocessed to 640×640 resolution and manually annotated with bounding boxes in YOLO format.
The dataset features six widely used medicinal plants Centella asiatica, Coccinia grandis, Eclipta prostrata, Mikania micrantha, Murraya koenigii, and Stephania japonica providing realistic morphological diversity rarely found in existing datasets. To validate quality, a YOLOv11 model was trained on the dataset and achieved an outstanding mAP@50 of 99%, confirming its suitability for leaf classification and object detection tasks.
Publicly available on Mendeley Data, this dataset serves as a valuable benchmark for computer vision, deep learning, ethnobotany, agricultural informatics, and AI-based medicinal plant identification, supporting future research in digital herbariums and biodiversity documentation.