Md Hafizur Rahman - Publications

PUBLICATIONS

Published Article

For the latest updates of my publications: Google Scholar

Citations: 455

h-index: 9

i10-index: 8

1. Intelligent Layer Sharing (ILASH): A Predictive Neural Architecture Search Framework for Multi-Task Applications

Authors: Md Hafizur Rahman, Zafaryab Haider, Md Mashfiq Rizvee, Sumaiya Shomaji & Prabuddha Chakraborty
Journal: IEEE Access [IF:3.6; Q1]
DOI: 10.1109/ACCESS.2025.3592039

Abstract :

Artificial intelligence (AI) is widely used in various fields including healthcare, autonomous vehicles, robotics, traffic monitoring, and agriculture. Many modern AI applications in these fields are multi-tasking in nature (i.e. perform multiple analyses on the same data) and are deployed on resource-constrained edge devices, requiring the AI models to be efficient across different metrics such as power, frame rate, and size. For these specific use-cases, we propose a new paradigm of intelligent neural network architecture search framework (ILASH) that leverages a layer sharing concept for minimizing power utilization, increasing frame rate, and reducing model size. ILASH utilizes a data-driven intelligent approach to make the search efficient in terms of energy, time, and frames per second (FPS). We perform extensive evaluations of the proposed layer shared architecture paradigm and the ILASH framework using three open-source datasets (UTKFace, MTFL, and CelebA). We compare ILASH with two different neural architecture search libraries that support multi-task applications (LibMTL and AutoKeras). We also evaluate ILASH against two standard neural architecture search frameworks (DARTS and ENAS). ILASH was able to surpass state-of-the-art performance across most comparison metrics (e.g. task accuracy, search/inference energy, and fps).

2. TRIM: AI Guided Random Number Generation for Resource-Constrained IoT Systems

Authors: Tasneem Suha, Md Hafizur Rahman, Ashley E Rice, Colin Smith, Rima Asmar Awad, & Prabuddha Chakraborty
Journal: IEEE Access [IF:3.6; Q1]
DOI: 10.1109/ACCESS.2025.3586177

Abstract :

Random numbers often serve as the backbone for many security solutions in diverse domains such as cryptography, side channel leakage prevention, and moving target defense. However, generating true random numbers requires a physical source of entropy (e.g. hardware, quantum, environmental phenomenon) making it difficult to realize at a large scale and at a low cost. On the flip side, pseudorandom number generators (easy to implement) following a specific distribution (e.g. Gaussian) can be easily compromised given a sufficient amount of traces. In this work, we have developed a machine learning-guided generative approach that can be used to create portable, resource-efficient, and cost-effective random number generators with high throughput and true randomness characteristics. We implement the proposed approach as a highly parameterized framework and perform extensive evaluation for different settings. The framework was able to learn from true random sources such as irrational numbers and environmental audio noise and imitate those sources towards generating new good quality random numbers on demand. We have generated more than 1 billion bits and observed robust performance in terms of true randomness metrics obtained from NIST SP 800-22 and FIPS 140-1 randomness test suites achieving a throughput of up to 142.85 Mbps. Compared to the state-of-the-art (SOTA) technique, the iso-cost setup of our framework can achieve more than 500 Mbps in a distributed setting. We have evaluated the efficacy of running the true randomness imitation AI models on target edge devices such as Raspberry Pi 4 (Model B), Nvidia Jetson Nano, Nvidia Jetson Orin Nano and Nvidia Jetson Xavier. We have also looked at the security of the TRIM framework itself against different adversarial threat models.

3. An automated multi parameter neural architecture discovery framework using ChatGPT in the backend

Authors: Md Hafizur Rahman, Zafaryab HaiderAshley E Rice, Colin Smith, Rima Asmar Awad, & Prabuddha Chakraborty
Journal: Scientific Report by Nature [IF:4.6; Q1]
DOI: 10.1038/s41598-025-97378-5

Abstract :

Building efficient neural network architectures for a given dataset can be a time-consuming task requiring extensive expert knowledge. This task becomes particularly challenging for edge artificial intelligence (AI) because one has to consider additional parameters such as power consumption during inferencing, model size, and inferencing speed. In this article, we introduce a novel framework designed to automatically discover new neural network architectures based on user-defined parameters, an expert system, and an LLM trained on a large amount of open-domain knowledge. The proposed framework (LEMONADE) can be easily used by non-AI experts, does not require a predetermined neural architecture search space, and considers a large set of edge AI parameters. We implement and validate this proposed neural architecture discovery framework using CIFAR-10, CIFAR-100, ImageNet16-120, EuroSAT, Malaria Parasite, and IMDb datasets while primarily using ChatGPT-4o as the LLM component. We have also explored the possibilities of using Gemini-Pro as the LLM component. Neural networks generated using LEMONADE for CIFAR-10 ( test accuracy) and CIFAR-100 ( test accuracy) demonstrated state-of-the-art performance in terms of final model accuracy. We have also observed near state-of-the-art performance (in terms of accuracy) for the ImageNet16-120 dataset. Moreover LEMONADE was able to generate effective neural networks, satisfying different edge AI requirements across additional datasets such as EuroSAT.

4. A framework for mitigating malicious RLHF feedback in LLM training using consensus based reward

Authors: Zafaryab Haider, Md Hafizur Rahman, Vijay Devabhaktuni, Shane Moeykens & Prabuddha Chakraborty
Journal: Scientific Report by Nature [IF:4.6; Q1]
DOI: 10.1038/s41598-025-92889-7

Abstract :

Large Language models (LLMs) have demonstrated impressive capabilities in natural language processing and understanding. LLMs are being rapidly adopted in major industry sectors including mobile computing, healthcare, finance, government, and education driven by technology giants such as NVIDIA, OpenAI, Microsoft, Apple, Meta, Google, Broadcom, AMD, and IBM. However, due to the emerging nature of this technology, many security/privacy challenges remain unresolved that we must tackle before rolling out LLMs to critical applications (e.g. Healthcare, Legal). In this article, we focus on the Reinforcement Learning via Human Feedback (RLHF) process that is widely used for training LLMs giving them the human-like feel most applications value. The RLHF process involves employing human experts to generate feedback based on an LLM’s query-response pairs and using this feedback to then retrain (fine-tune) the model. However, RLHF can also expose the LLM to malicious feedback generated by one or more individuals in the process leading to degraded performance of the LLM and harmful responses. Most state-of-the-art (SOTA) solutions to this problem involve utilizing a KL-Divergence-based brute-force update-rejection approach that can render the whole RLHF process completely useless (model quality is not improved) in the presence of malicious entities in the process. We propose the COnsensus-Based RewArd framework (COBRA), a consensus-based technique that can effectively negate the malicious noise generated by a certain segment of the RLHF human-expert pool, leading to improved LLM training performance in a mixed-trust scenario. We have evaluated COBRA for two separate LLM use cases, Sentiment Analysis and Conversational Task. We have experimented with a wide range of LLM models (e.g. GPT-2 XL - 1.5B parameters). COBRA outperformed the standard unprotected reward generation scheme by ≈30% for the generative conversational task and by ≈40% for the sentiment analysis task. We have also quantitatively compared COBRA with Coste et al. and observed state-of-the-art performance, particularly when a lower number of reward models are used (≈23% increased reward accuracy at DF=10).

5. In-situ particle analysis with heterogeneous background: a machine learning approach

Authors: Adeeb Ibne Alam, Md Hafizur Rahman, Akhter Zia, Nate Lowry, Prabuddha Chakraborty, Md Rafiul Hassan & Bashir Khoda
Journal: Scientific Report by Nature [IF:4.6; Q1]
DOI: 10.1038/s41598-024-59558-7

Abstract :

We propose a novel framework that combines state-of-the-art deep learning approaches with pre- and post-processing algorithms for particle detection in complex/heterogeneous backgrounds common in the manufacturing domain. Traditional methods, like size analyzers and those based on dilution, image processing, or deep learning, typically excel with homogeneous backgrounds. Yet, they often fall short in accurately detecting particles against the intricate and varied backgrounds characteristic of heterogeneous particle–substrate (HPS) interfaces in manufacturing. To address this, we’ve developed a flexible framework designed to detect particles in diverse environments and input types. Our modular framework hinges on model selection and AI-guided particle detection as its core, with preprocessing and postprocessing as integral components, creating a four-step process. This system is versatile, allowing for various preprocessing, AI model selections, and post-processing strategies. We demonstrate this with an entrainment-based particle delivery method, transferring various particles onto substrates that mimic the HPS interface. By altering particle and substrate properties (e.g., material type, size, roughness, shape) and process parameters (e.g., capillary number) during particle entrainment, we capture images under diferent ambient lighting conditions, introducing a range of HPS background complexities. In the preprocessing phase, we apply image enhancement and sharpening techniques to improve detection accuracy. Specifcally, image enhancement adjusts the dynamic range and histogram, while sharpening increases contrast by combining the high pass flter output with the base image. We introduce an image classifer model (based on the type of heterogeneity), employing Transfer Learning with MobileNet as a Model Selector, to identify the most appropriate AI model (i.e., YOLO model) for analyzing each specifc image, thereby enhancing detection accuracy across particle–substrate variations. Following image classifcation based on heterogeneity, the relevant YOLO model is employed for particle identifcation, with a distinct YOLO model generated for each heterogeneity type, improving overall classifcation performance. In the post-processing phase, domain knowledge is used to minimize false positives. Our analysis indicates that the AI-guided framework maintains consistent precision and recall across various HPS conditions, with the harmonic mean of these metrics comparable to those of individual AI model outcomes. This tool shows potential for advancing in-situ process monitoring across multiple manufacturing operations, including high-density powder-based 3D printing, powder metallurgy, extreme environment coatings, particle categorization, and semiconductor manufacturing.

6. Real-time face mask position recognition system based on MobileNet model

Authors: Md Hafizur Rahman, Mir Kanon Ara Jannat, Md Shafiqul Islam, Giuliano Grossi, Sathya Bursic, Md Aktaruzzaman
Journal: Smart Health [IF:2.70; Q2]
DOI: 10.1016/j.smhl.2023.100382

Abstract :

COVID-19 is a highly contagious disease that was first identified in 2019, and has since taken more than six million lives world wide till date, while also causing considerable economic, social, cultural and political turmoil. As a way to limit its spread, the World Health Organization and medical experts have advised properly wearing face masks, social distancing and hand sanitization, besides vaccination. However, people wear masks sometimes uncovering their mouths and/or noses consciously or unconsciously, thereby lessening the effectiveness of the protection they provide. A system capable of automatic recognition of face mask position could alert and ensure that an individual is wearing a mask properly before entering a crowded public area and putting themselves and others at risk. We first develop and publicly release a dataset of face mask images, which are collected from 391 individuals of different age groups and gender. Then, we study six different architectures of pre-trained deep learning models, and finally propose a model developed by fine tuning the pre-trained state of the art MobileNet model. We evaluate the performance (accuracy, F1-score, and Cohen’s Kappa) of this model on the proposed dataset and MaskedFace-Net, a publicly available synthetic dataset created by image editing. Its performance is also compared to other existing methods. The proposed MobileNet is found as the best model providing an accuracy, F1-score, and Cohen’s Kappa of 99.23%, 99.22%, and 99.19%, respectively for face mask position recognition. It outperforms the accuracy of the best existing model by about 2%. Finally, an automatic face mask position recognition system has been developed, which can recognize if an individual is wearing a mask correctly or incorrectly. The proposed model performs very well with no drop in recognition accuracy from real images captured by a camera.

7. RATNet: A deep learning model for Bengali handwritten characters recognition.

Authors: Md Shafiqul Islam, Md Moklesur Rahman Md Hafizur Rahman, Massimo Walter Rivolta, and Md Aktaruzzaman
Journal: Multimedia Tools and Applications [IF:2.757; Q2]
DOI: 10.1007/s11042-022-12070-4

Abstract :

The Bengali language is based on a set of symbols for basic characters, modifiers, compound characters, and numerals. The recognition rates of handwritten basic characters and numerals are very high. However, the recognition rates of compound characters and modifiers are still poor. This might be due to their large class size with huge writing styles, much similarity, and unavailability of sufficient data for deep learning. In fact, there are some compound characters which appear very rare in practice. A proper selection of frequently used characters may reduce class size, and hence improving the accuracy. In this study, we performed a statistics on the frequency of compound characters, we developed two datasets for modifiers and compound characters, and finally we proposed a heterogeneous deep learning model (RATNet) for characters recognition. A statistics was performed on two daily Bengali newspapers, and characters with frequency ≥ 5% were selected. The handwriting of selected characters was collected from 130 writers of different ages and professions. The performance of RATNet model was evaluated on the proposed datasets and also three other existing datasets (i.e., ISI, CMATERdb, BanglaLekha-Isolated). In addition, the performance of RATNet was also compared with LeNet-5, VGG-16, ResNet-50, and DenseNet-121 models. We selected 87 out of 107 compound characters. The proposed RATNet model outperforms other models providing 99.66%, 99.27%, 98.78%, and 97.70% accuracy, respectively for the recognition of numerals, basic characters, modifiers, and compound characters on the CMATERdb dataset while keeping the number of parameters relatively low likely due to layer heterogeneity.

8. CSI-IANet: An Inception Attention Network for Human-Human Interaction Recognition Based on CSI Signal.

Authors: M. Humayun Kabir, M. Hafizur Rahman and W. Shin
Journal: IEEE Access [IF:3.367; Q1]
DOI: 10.1109/ACCESS.2021.3134794.

Abstract :

In recent years, Wi-Fi infrastructures have become ubiquitous, providing device-free passive-sensing features. Wi-Fi signals can be affected by their reflection, refraction, and absorption by moving objects in their path. The channel state information (CSI), a signal property indicator, of the Wi-Fi signal can be analyzed for human activity recognition (HAR). Deep learning-based HAR models can enhance performance and accuracy without sacrificing computational efficiency. However, to save computational power, an inception network, which uses a variety of techniques to boost speed and accuracy, can be adopted. In contrast, the concept of spatial attention can be applied to obtain refined features. In this paper, we propose a human–human interaction (HHI) classifier, CSI-IANet, which uses a modified inception CNN with a spatial-attention mechanism. The CSI-IANet consists of three steps: i) data processing, ii) feature extraction, and iii) recognition. The data processing layer first uses the second-order Butterworth low-pass filter to denoise the CSI signal and then segment it before feeding it to the model. The feature extraction layer uses a multilayer modified inception CNN with an attention mechanism that uses spatial attention in an intense structure to extract features from captured CSI signals. Finally, the refined features are exploited by the recognition section to determine HHIs correctly. To validate the performance of the proposed CSI-IANet, a publicly available HHIs CSI dataset with a total of 4800 trials of 12 interactions was used. The performance of the proposed model was compared to those of existing state-of-the-art methods. The experimental results show that CSI-IANet achieved an average accuracy of 91.30%, which is better than that of the existing best method by 5%.

Conference Proceedings

1. Understanding the Innovations Required for a Green & Secure Artificial Intelligence Paradigm.

Authors: Md Mashfiq Rizvee, Md Hafizur Rahman, Prabuddha Chakraborty, Sumaiya Shomaji
Conference: 2023 IEEE 16th Dallas Circuits and Systems Conference (DCAS)
DOI: 10.1109/DCAS57389.2023.10130257

Abstract:

Artificial Intelligence (AI) is being widely used in diverse domains such as industrial automation, traffic control, precision agriculture, and smart cities for major heavy lifting in terms of data analysis and decision making. However, the AI life-cycle is a major source of greenhouse gas (GHG) emission leading to devastating environmental impact. This is due to expensive neural architecture searches, training of countless number of models per day across the world, in-field AI processing of data in billions of edge devices, and advanced security measures across the AI life-cycle. In this work, we explore the impact of reckless AI computation on the environment for every stage of the AI life-cycle with deeper dives into specific algorithms via case studies. We also propose a systematic knowledge-guided AI system design framework that leverages past design experiences towards limiting GHG emissions during future AI system design.

2. A Study of Real-Time Physical Activity Recognition from Motion Sensors via Smartphone Using Deep Neural Network.

Authors: Mohammad Helal Uddin, Jannat Mir Kanon Ara, Md Hafizur Rahman, S.H. Yang
Conference: 2021 5th International Conference on Electrical Information Communication Technology (EICT)
DOI: 10.1109/EICT54103.2021.9733607

Abstract:

Recognition of activities by the use of the sensor is one of the context-conscious studies fascinated by many research teams. This paper explains how different types of physical activities are recognized using cell phone’s motion sensors data. Three types of motion sensors (accelerometer, gravity, linear acceleration) data were used for this work. An effective and efficient deep neural network model was proposed for this work that is able to recognize human physical activities from tri-axial motion sensor data in real-time which was later implemented on android-based smartphones using android studio. The sequential model was implemented for this work where LSTM, flatten and the dense layer was added. The model is able to classify seven types of human activities in the real-time data feed. The model achieves 98.8% of classification accuracy while training and testing. Later on, the model has been converted to a smartphone-compatible model using Tensorflow, as the initial deep learning model isn’t compatible to insert into smartphones. The model was successfully converted and implemented to an android based smartphone.

3. Neural network pruning: An effective way to reduce the initial network for deep learning-based human activity recognition (2021).

Authors: Mohammad Helal Uddin, Jannat Mir Kanon Ara, Md Hafizur Rahman, S.H. Yang
Conference: 2021 International Conference on Electronics, Communications and Information Technology (ICECIT)
DOI: 10.1109/ICECIT54077.2021.9641226

Abstract:

Neural network pruning methods can decrease the parameter counts of trained neural networks along with improving the computational performance of inference without compromising the accuracy. This work has demonstrated the effectiveness of the neural network pruning methodology, which was applied to deep learning-based human activity recognition. A long short-term memory (LSTM) architecture was used to recognize daily human activities from smartphone-based accelerometer data. An android-based application has been developed to collect the accelerometer data from smartphones. Data has been preprocessed before training the LSTM network. The accuracy of the proposed LSTM model was 97.2% while recognizing the human activities. Neural network pruning was applied to the model once the model was trained. After 70% of weight pruning on the trained/initial network, the Pruned network accuracy was similar (97.2%) as the original one. However, the accuracy of the pruned network was better than the initial model, while the network has been pruned by 25% (accuracy 97.26%), 50% (accuracy 97.29%), and 60% (accuracy 97.22%). The initial model started to slightly decline the accuracy (from 97.2% to 96.72%), while the model was pruned by 80%. In terms of neural/unit pruning, slightly declination in the accuracy (97.2% to 97.09%) occurred after 50% of pruning.

4. A deep learning-based multi-model ensemble method for eye state recognition from EEG (2021).

Authors: Md Shafiqul Islam, Md Moklesur Rahman, Md Hafizur Rahman, Md Robiul Hoque, Arman Kheirati Roonizi, Md Aktaruzzaman
Conference: 2021 IEEE 11th Annual Computing and Communication Workshop and Conference (CCWC)
DOI: 10.1109/CCWC51732.2021.9376084

Abstract:

Eye state recognition plays an important role in biomedical informatics e.g., smart home devices controlling, drowsy driving detection, etc. The change in cognitive states is reflected by the changing in electroencephalogram (EEG) signals. There are some works for eye state recognition using traditional shallow neural networks and manually extracted features. The useful features extraction from EEG and the selection of appropriate classifiers are challenging tasks due to the variable nature of EEG signals. The deep learning algorithms automatically extracts features and often reported better performance than traditional classifiers in some recognition and recognition tasks. In this paper, we have proposed three architectures of a deep learning model using ensemble technique: convolution neural network, gated recurrent unit, and long short term memory for eye state recognition (open or close) from EEG directly. The study has been performed on a freely available public EEG eye state dataset of 14980 samples. The individual performance of each classifier has been observed, and also performance of recognition performance of the ensemble networks has also been compared with the existing prominent methods. The average accuracy 99.86% was obtained by the proposed method, and it is the highest performance ever reported in the literature.

5. EyeNet: An Improved Eye States Classification System using Convolutional Neural Network(2020).

Authors: Md Moklesur Rahman, Md Shafiqul Islam, Mir Kanon Ara Jannat, Md Hafizur Rahman, Md Arifuzzaman, Roberto Sassi, Md Aktaruzzaman
Conference: 2020 22nd International Conference on Advanced Communication Technology (ICACT)
DOI: 10.23919/ICACT48636.2020.9061472

Abstract:

The classification of eye states (open or closed) has numerous potential applications such as fatigue detection, psychological state analysis, smart home devices controlling, etc. Due to its importance, there are a number of works already reported in the literature using traditional shallow neural networks or support vector machines, which have reported good accuracy (about 96%). However, there is still enough space to improve the accuracy of existing systems using proper classification methods. The major problem with traditional classifiers is that they depend on manual selection of features and that is very challenging to select meaningful features for such classifiers. Convolutional neural networks (CNNs) have become popular for computer vision and pattern recognition problems with better performance than traditional methods. In this study, we proposed a model of CNN (EyeNet) for eye states classification and tested it on three datasets (CEW, ZJU, and MRL Eye). The quality (or diversity) of a recently proposed larger dataset (MRL Eye) has been compared with two other existing datasets with respect to the sufficient training of the model. The model shows very high performance (about 99% accuracy for classification of eye states on the test set of data when it is trained by the training samples of the same dataset. The proposed model improves the accuracy of the best existing method by about 3%. The performance of the model for classification of samples coming from different datasets is reduced when it is trained with the MRL Eye dataset. This concludes that even though, the MRL Eye has a large number of samples compared to other datasets, but diversity still lacks in the MRL Eye samples to sufficiently train the model.

6. A New Benchmark on American Sign Language Recognition using Convolutional Neural Network (2019).

Authors: Md Moklesur Rahman, Md Shafiqul Islam, Md Hafizur Rahman, Roberto Sassi, Massimo W Rivolta, Md Aktaruzzaman
Conference: International Conference on Sustainable Technologies for Industry 4.0 (STI)
DOI: 10.1109/STI47673.2019.9067974

Abstract:

The listening or hearing impaired (deaf/dumb) people use a set of signs, called sign language instead of speech for communication among them. However, it is very challenging for non-sign language speakers to communicate with this community using signs. It is very necessary to develop an application to recognize gestures or actions of sign languages to make easy communication between the normal and the deaf community. The American Sign Language (ASL) is one of the mostly used sign languages in the World, and considering its importance, there are already existing methods for recognition of ASL with limited accuracy. The objective of this study is to propose a novel model to enhance the accuracy of the existing methods for ASL recognition. The study has been performed on the alphabet and numerals of four publicly available ASL datasets. After preprocessing, the images of the alphabet and numerals were fed to a newly proposed convolutional neural network (CNN) model, and the performance of this model was evaluated to recognize the numerals and alphabet of these datasets. The proposed CNN model significantly (9%) improves the recognition accuracy of ASL reported by some existing prominent methods.

7. Recognition Bangla sign language using convolutional neural network (2019).

Authors: Md Shafiqul Islam, Md Moklesur Rahman, Md Hafizur Rahman, Md Arifuzzaman, Roberto Sassi, Md Aktaruzzaman
Conference: 2019 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT)
DOI: 10.1109/3ICT.2019.8910301

Abstract:

The sign language, considered as the main language for deaf and hard of hearing, uses manual communication and body language to convey expressions and plays a major role in developing an identity. Nowadays, sign language recognition is an emerging field of research to improve interaction with the deaf community. The automatic recognition of American, British, and French sign languages with high accuracy has been reported in the literature. Even though, Bangla is one of the mostly spoken languages in the world, no significant research on Bangla sign language recognition can be found in the literature. The main reason for this lagging might be due to the unavailability of a Bangla sign language dataset. In this study, we have presented a large dataset of Bangla sign language consisting of both alphabets and numerals. The dataset was composed of 7052 samples representing 10 numerals and 23864 samples correspond to the 35 basic characters of the alphabet. Finally, the performance of a convolutional neural network in the recognition of numerals and alphabet separately, and in mixing of them, has been evaluated on the developed dataset using 10-fold cross-validation. The proposed method provided an average recognition accuracy of 99.83%, 100%, and 99.80%, respectively for basic characters, numerals, and for their combined usage.

Contact

Email: md.hafizur.rahman@maine.edu
hafizur.iueee@gmail.com

Page updated

Google Sites

Report abuse