Visual Information Processing (VIP)
Research GroupDepartment of Electrical Electronic and Communication Engineering (EECE)
Military Insititute of Science and Technology (MIST)
ICEEICT 2021 (www.iceeict2021mist.org)
Paper Title: BLPnet: A new DNN model and Bengali OCR engine for Automatic License Plate Recognition [2022]
Authors: Md. Saif Hassan Onim; Hussain Nyeem; Koushik Roy; Mahmudul Hasan; Abtahi Ishmam; Md.Akiful Hoque Akif; Tareque Bashar Ovi
This paper reports a computationally efficient and reasonably accurate Automatic License Plate Recognition (ALPR) system for Bengali characters with a new end-to-end DNN model that we call Bengali License Plate Network(BLPnet). With a Computational Neural Network (CNN) based new Bengali OCR engine and word-mapping process, the model is characters rotation invariant, and can readily extract, detect and output the complete license plate number of a vehicle. The model feeding with 17 frames per second (fps) on real-time video footage can detect a vehicle with the Mean Squared Error (MSE) of 0.0152, and the mean license plate character recognition accuracy of 95%. While compared to the other models, an improvement of 5% and 20% were recorded for the BLPnetover the prominent YOLO-based ALPR model and the Tesseract model for the number-plate detection accuracy and time requirement, respectively.
Paper Title: Modelling Lips-State Detection Using CNN for Non-Verbal Communications. [2021]
Authors: Abtahi Ishmam; Mahmudul Hasan; Md. Saif Hassan Onim; Koushik Roy; Md. Akiful Hoque Akif; Hussain Nyeem
This paper reports two new Convolutional Neural Network (CNN) models for lips state detection. Building upon two prominent lips landmark detectors, DLIB and MediaPipe, we simplify lips-state model with a set of six key landmarks, and use their distances for the lips state classification. Thereby, both the models are developed to count the opening and closing of lips and thus, they can classify a symbol with the total count. Our early experimental results demonstrate that the model with DLIB is relatively slower in terms of an average of 6 frames per second (FPS) and higher average detection accuracy of 95.25%. In contrast, the model with MediaPipe offers faster landmark detection capability with an average FPS of 20 and detection accuracy of 94.4%. Both models thus could effectively interpret the lips state for non-verbal semantics into a natural language.
Paper Title: Traffic Surveillance using Vehicle License Plate Detection and Recognition in Bangladesh [2020]
Authors: Md. Saif Hassan Onim; Muhaiminul Islam Akash; Mahmudul Haque; Raiyan Ibne Hafiz
This paper presents a YOLOv4 object detection model in which the Convolutional Neural Network (CNN) is trained and tuned for detecting the license plate of the vehicles of Bangladesh and recognizing characters using tesseract from the detected license plates. Here we also present a Graphical User Interface (GUI) based on Tkinter, a python package. The license plate detection model is trained with mean average precision (mAP) of 90.50% and performed in a single TESLA T4 GPU with an average of 14 frames per second (fps) on real-time video footage.
Paper Title: LULC classification by semantic segmentation of satellite images using FastFCN [2020]
Authors: Md. Saif Hassan Onim; Aiman Rafeed Bin Ehtesham; Amreen Anbar; A. K. M. Nazrul Islam; A. K. M. Mahbubur Rahman
We proposed a fast, efficient and automated LULC classification method which can replace the arduous manual method without any significant loss of accuracy. We found a method for LULC classification that can replace manual surveys and associated costs. We also made a record of how a well-known segmentation method i.e. FastFCN can perform in satellite image segmentation. The results of our analysis of FastFCN’s performance in satellite image segmentation can work as reference information for future studies related to LULC and/or semantic segmentation.