My project portfolio reflects a focused effort to bridge theoretical foundations with real-world applications in artificial intelligence, with a primary emphasis on speech emotion recognition, brain–computer interfaces (BCI), and AI-driven mental health systems. My work lies at the intersection of machine learning, signal processing, and human-centered intelligent systems, aiming to develop technologies that can interpret and respond to human cognitive and emotional states.
The projects presented here span academic research, applied system development, and interdisciplinary experimentation. Core contributions include developing deep learning models for speech emotion recognition, designing EEG-based BCI systems for cognitive training, and exploring data-driven approaches for behavioral and psychological analysis. In parallel, I have worked on time-series forecasting, IoT-based data systems, and real-time analytics, strengthening my ability to handle complex, multimodal datasets.
Each project follows a rigorous research-oriented methodology, including data preprocessing, feature extraction (e.g., MFCC, spectrograms, physiological signals), model development, optimization, and evaluation. This reflects both strong engineering discipline and a commitment to reproducible, scalable research practices.
Collectively, this portfolio demonstrates my ability to integrate AI techniques with real-world constraints, with a growing focus on neurotechnology, affective computing, and intelligent healthcare applications. These projects form the foundation of my progression toward doctoral-level research in artificial intelligence, brain–computer interfaces, and human-centered intelligent systems.
This ongoing research focuses on developing a data-driven framework for analyzing cognitive and attentional states using electroencephalography (EEG) signals. The project aims to contribute toward scalable brain–computer interface (BCI) systems and the creation of high-quality datasets for behavioural and mental health research, particularly in the context of attention-related conditions such as ADHD.
Develop methodologies for capturing and analyzing EEG signals for cognitive state assessment
Design and curate a structured EEG dataset for attention and behavioural studies
Explore feature extraction and representation learning techniques for neural signals
Investigate machine learning and deep learning models for classification of cognitive states
EEG signal acquisition and preprocessing (noise filtering, artifact removal)
Feature extraction from time-domain and frequency-domain representations (e.g., power spectral density, band analysis)
Exploration of advanced representations for neural data (e.g., time-frequency transformations)
Model development using machine learning and deep learning approaches for classification and pattern recognition
Iterative experimental design for improving data quality and model performance
Python (NumPy, Pandas, SciPy)
Signal Processing Techniques
Machine Learning / Deep Learning Frameworks
EEG Data Processing Pipelines
A structured EEG dataset for cognitive and behavioural analysis
Insights into neural patterns associated with attention and cognitive states
Foundations for developing intelligent BCI-based systems for mental health applications
This project aligns with ongoing research in affective computing and neurotechnology, contributing toward the development of intelligent systems capable of interpreting human cognitive and emotional states. It provides a strong foundation for future work in brain–computer interfaces, mental health diagnostics, and human-centered AI.
This project focuses on the development of a deep learning-based Speech Emotion Recognition (SER) system capable of identifying human emotions from audio signals, independent of semantic content. The work was later published in Springer Lecture Notes in Computer Science (LNCS), highlighting its research contribution to affective computing and human–computer interaction.
Develop a robust model for emotion classification from speech signals
Analyze the effectiveness of different audio features for emotion recognition
Evaluate deep learning architectures for improving classification performance
Contribute toward real-world applications such as mental health monitoring and human–machine interaction
Dataset: RAVDESS (Ryerson Audio-Visual Database of Emotional Speech and Song)
Data preprocessing and augmentation (noise injection, pitch shifting, time stretching)
Feature Extraction: Mel Frequency Cepstral Coefficients (MFCC), Mel Spectrogram, Chroma Features, Zero Crossing Rate (ZCR), Root Mean Square Energy (RMS)
Model Development: Convolutional Neural Network (CNN) architecture with multiple convolutional and pooling layers. Regularization using dropout and hyperparameter tuning
Evaluation using accuracy, precision, F1-score, and confusion matrix
Python (Librosa, NumPy, Pandas)
TensorFlow / Keras
Signal Processing & Audio Feature Engineering
Achieved ~72% classification accuracy on selected emotion classes
Demonstrated effectiveness of CNN-based architectures for audio-based emotion recognition
Identified limitations related to dataset size and generalization
Published in Springer Lecture Notes in Computer Science (LNCS), 2023
This work contributes to the field of affective computing by demonstrating a scalable deep learning approach for emotion recognition from speech. It highlights the importance of feature engineering and model optimization in handling complex audio signals.
Integration of multimodal data (e.g., EEG + speech) for improved emotion recognition
Expansion to larger and cross-cultural datasets
Application in mental health monitoring and adaptive human–computer interaction systems
This project focused on designing and developing an EEG-based cognitive training system aimed at supporting individuals with Attention Deficit Hyperactivity Disorder (ADHD). The system integrates brain–computer interface (BCI) technology with interactive gaming to monitor and enhance user attention in real time.
Phase 1: System Development
Development of three interactive attention-training games tailored to different age groups, incorporating EEG-based feedback mechanisms.
Phase 2: Experimental Evaluation
Session-based experimental analysis to evaluate user engagement and attention levels using EEG signals.
Participants:
Three age groups (approximately 10–40 years)
Design a BCI-enabled interactive system for attention training
Analyze EEG-based cognitive responses during gameplay
Evaluate engagement and attention dynamics across different age groups
Unity 3D, Visual Studio Code
Emotiv Insight EEG Headset & SDK
MATLAB (signal analysis)
C#, Blender
EEG signal acquisition during gameplay sessions
Signal preprocessing and band-power feature extraction
Session-level analysis of attention and engagement patterns
Comparative analysis across age groups
Successfully developed a functional EEG-based BCI training prototype
Observed measurable variations in attention-related engagement across sessions
Demonstrated feasibility of integrating EEG signals into interactive cognitive training systems
Category: Research & Prototype Development
Status: Extended into ongoing research work
Developed an advanced Retrieval-Augmented Generation (RAG)-based AI agent for technical support automation. The system ingests historical support tickets, builds a hybrid knowledge base (dense vector + sparse BM25 retrieval), and answers new queries using state-of-the-art LLMs. Designed for high scalability, multilingual support, and seamless integration with web and chat platforms.
Automate technical support ticket triage and response generation
Enable continuous knowledge base updates as new tickets arrive
Improve ticket categorization and escalation workflows
Ensure compatibility with multiple languages and large-scale datasets
Developed and maintained using VS Code and Cursor for efficient, modern coding workflows
Data ingestion and parsing from raw support transcripts
Hybrid retrieval: dense embeddings (sentence-transformers) + sparse BM25
LLM-based answer generation (Gemini 3.0, Claude Opus 4.6)
Escalation to human agents for unresolved queries
Dockerised backend for cloud deployment; FastAPI for web API
Integration with chatbots (e.g., Lark bot via OpenClaw)
CLI and web-based frontend for testing and demonstration
Advanced analytics for support operations and user feedback:
Users can provide feedback on AI-generated answers, enabling the system to rank and improve reference tickets for greater accuracy in future responses.
Enhanced human-in-the-loop workflows for continuous learning:
The system continuously learns from newly resolved tickets—whether solved by the AI or escalated to human agents. Resolved cases are automatically processed and incorporated into the knowledge base, ensuring ongoing self-improvement and adaptation to new issues.
Python (sentence-transformers, ChromaDB, rank-bm25, OpenAI SDK)
FastAPI, Docker, YAML for deployment
Gemini 3.0, Claude Opus 4.6 for LLM tasks
Multilingual NLP pipelines
Visual Studio Code (VS Code) and Cursor (AI coding assistant)
Achieved robust ticket categorization, including for previously uncategorized data
Scalable to millions of tickets with efficient retrieval and storage
Multilingual support for global applicability
Automated escalation for unresolved or novel issues
Customizable skills layer for adaptation to other support domains
Continuous self-improvement through user feedback and new ticket ingestion
This project demonstrates a production-ready, scalable AI support agent architecture, addressing common limitations in ticketing systems (e.g., poor categorisation, lack of scalability, and language barriers). The modular skills layer enables rapid adaptation to new domains and continuous improvement as new data arrives.
Integration with additional LLMs and retrieval strategies
Expansion to other domains (e.g., customer service, IT helpdesk)
This dissertation was conducted as part of the SAFI (Statistical Analysis for Industry) research initiative, a funded industry-focused project, under the supervision of senior researchers from the AI Research (AIRE) Group at the University of Bradford. The project contributes to ongoing research in data-driven energy analytics, smart grids, and large-scale industrial data science applications.
The work addresses real-world challenges in consumer energy behavior analysis and load forecasting, leveraging large-scale smart meter data to support intelligent energy management and decision-making systems.
Model and analyze large-scale smart meter data for consumer behavior understanding
Segment users based on energy consumption patterns
Improve forecasting accuracy using hybrid clustering + time-series approaches
Support scalable, data-driven solutions for smart energy systems
Data Sources: Smart meter data, weather variables, and UK bank holiday indicators
Clustering: K-Means with Euclidean distance and WCSS (elbow method)
Forecasting: SARIMAX applied on clustered data
Feature Engineering: Normalization, scaling, and dimensionality reduction
Experimental Design: Five experimental configurations; hybrid clustering–forecasting approach yielded best performance
Consumer segmentation significantly improved forecasting performance
External factors (weather, occupancy, holidays) strongly influence energy usage
Normalization was essential for handling heterogeneous consumption scales
Cluster-based modeling enhanced both interpretability and predictive accuracy
This project demonstrates the integration of unsupervised learning and time-series modeling in a real-world, large-scale industrial context. Conducted within a funded research environment, it reflects experience working with:
high-dimensional, real-world datasets
industry-relevant problem settings
scalable analytical methodologies
The work aligns with ongoing research in:
smart grid analytics
sustainable energy systems
applied machine learning for societal impact
Supports energy-efficient decision-making and demand optimization
Enables consumer-level behavioral insights
Contributes to scalable forecasting solutions for smart grid systems
Deep learning models (LSTM, CNN) for sequential forecasting
Cross-domain data integration (e.g., socio-economic signals)
Advanced cluster validation and interpretability techniques
Note: Conducted under the AI Research (AIRE) Group, led by senior researchers with extensive funded research experience across EPSRC, BBSRC, and industry collaborations.
This project involves end-to-end data analysis, time-series forecasting, and real-time data streaming using environmental pollution datasets. It combines data science, machine learning, IoT communication, and stream processing to analyze and monitor air quality data.
Description:
Performed data analysis on two real-world air pollution datasets collected in Aarhus, Denmark.
Methods:
Data cleaning (missing/null value detection)
Statistical analysis using describe()
Correlation analysis (heatmaps)
Data visualization: Pairplots, Histogram, Density plots, Box plots, Scatter matrix
Python (Google Colab)
Pandas, Matplotlib, Seaborn
Identified relationships between pollutants and gained insights into data distribution and patterns.
Description:
Applied ARIMA model to forecast particulate matter levels over time.
Methods:
Data split: 60% training, 40% testing
Time-series transformation (daily frequency)
Stationarity testing using ADF test
Model parameter selection using ACF & PACF
Model evaluation using residual and Box-Ljung tests
RStudio
ARIMA Modeling
IBest model: ARIMA (1,0,1)
Successfully forecasted pollution levels with reliable accuracy
Description:
Developed a real-time data communication system using MQTT protocol.
Implementation:
Created two publishers for streaming pollution data
Developed a subscriber to receive and display real-time data
Configured Mosquitto MQTT broker
Python (Pycharm)
Mosquitto MQTT Broker
Paho-MQTT Library
Successfully simulated real-time IoT-based environmental monitoring system.
Description:
Implemented Complex Event Processing (CEP) for real-time pollution monitoring.
Methods:
Processed streaming data using Apache Flink
Defined thresholds using mean and standard deviation
Generated alerts and warnings based on patterns
Apache Flink
Java & XML
Apache NetBeans
Built a real-time alert system capable of detecting abnormal pollution levels using streaming analytics.
This project demonstrates a complete data pipeline, including:
Data analysis and visualization
Predictive modeling (time-series forecasting)
Real-time IoT data streaming
Stream processing and event detection
It highlights the integration of data science, machine learning, and distributed systems for smart environmental monitoring applications.
This project investigated the application of supervised machine learning techniques for automated medical diagnosis, focusing on the classification of diabetic retinopathy (DR) using structured features derived from retinal imaging data. The work emphasizes the role of machine learning in early disease detection and clinical decision support systems, aligning with broader research in biomedical signal and data analysis.
Develop a classification framework for detecting diabetic retinopathy from structured clinical features
Evaluate and compare multiple supervised learning algorithms on medical diagnostic data
Analyze the impact of feature optimization on classification performance and computational efficiency
Languages & Tools: Python, Google Colab, PyCharm
Supervised Machine Learning (Binary Classification)
Feature analysis and statistical validation
Hyperparameter tuning and cross-validation
Model evaluation using accuracy, precision, recall, and F1-score
Models Implemented: Support Vector Machine (SVM), Random Forest, AdaBoost, K-Nearest Neighbors (KNN), Gaussian Naïve Bayes, Gaussian Process Classifier (GPC), Decision Tree
Debrecen Diabetic Retinopathy dataset (UCI Repository)
1150 instances with features representing microaneurysms and exudates extracted from retinal images
Binary classification: DR vs. Non-DR
Achieved highest performance with Gaussian Process Classifier (~77%) and SVM (~76.5%)
Demonstrated that feature optimization improves model performance and stability
Identified key limitations in medical datasets, including labeling consistency and dataset size
Highlighted feasibility of lightweight ML models for clinical screening systems
This work demonstrates the applicability of classical machine learning methods in medical data classification tasks, particularly in scenarios with limited computational resources. The project also provides foundational insights relevant to biomedical signal processing and pattern recognition, which are critical in domains such as EEG analysis and brain–computer interfaces (BCI).
Category: Applied Machine Learning in Healthcare
Focus Area: Medical Data Analysis | Pattern Recognition | Computational Diagnostics
This project focuses on applying machine learning and feature selection techniques to predict student academic performance using educational data mining. The study explores how different demographic, behavioral, and academic attributes influence learning outcomes, while addressing challenges such as high-dimensional data, feature correlation, and model generalisation.
Predict student academic performance using machine learning classification models
Identify key contributing features influencing student outcomes
Evaluate the impact of feature selection on model accuracy and interpretability
Languages & Tools: Python, data visualisation libraries
Data preprocessing (cleaning, handling missing values, feature selection)
Exploratory Data Analysis (EDA) and statistical visualization
Supervised Machine Learning for classification
Feature correlation analysis and dimensionality reduction
Models Used: Logistic Regression, Random Forest
Evaluation Metrics: Accuracy, Precision, Confusion Matrix, Comparative Model Performance
Student Performance Dataset (demographics, parental background, test preparation, academic scores)
Mixed-type structured data (categorical and numerical features)
Real-world educational data with noise, imbalance, and feature dependencies
Identified strong correlations between parental education, test preparation, and academic performance
Demonstrated that feature selection significantly improves model performance and interpretability
Observed that Random Forest outperformed Logistic Regression in predictive accuracy on smaller datasets
Highlighted challenges of high-dimensional feature spaces and data imbalance in real-world datasets
This work builds foundational expertise in feature selection, classification, and pattern discovery in complex datasets, which are critical in domains such as neural signal processing. Similar challenges arise in EEG data analysis, where identifying relevant features from high-dimensional, noisy signals is essential for accurate prediction of cognitive states and brain–computer interface (BCI) applications. The experience gained in optimising feature spaces and evaluating model performance directly supports research in computational neuroscience and neuroinformatics.
Category: Machine Learning & Data Mining
Focus Area: Feature Selection | Classification | Predictive Modeling
This project presents a comprehensive study and critical analysis of machine learning approaches for text classification, focusing on the challenges of processing large-scale, high-dimensional textual data. The work explores the full text classification pipeline, including preprocessing, feature extraction, model selection, and evaluation, with applications spanning domains such as healthcare, legal systems, and social media analytics.
Analyze key machine learning algorithms for text classification tasks
Evaluate strengths and limitations of different classification approaches on textual data
Investigate the role of preprocessing, feature extraction, and dimensionality reduction in improving model performance
Languages & Tools: Python (conceptual/implementation level), NLP frameworks
Text preprocessing (stopword removal, noise reduction, tokenization)
Feature extraction and dimensionality reduction
Supervised Machine Learning for text classification
Comparative evaluation using precision, accuracy, F1-score, and confusion matrix
Models Analysed: Logistic Regression, Random Forest, K-Nearest Neighbors (KNN)
BBC News text dataset for multi-class text classification
High-dimensional textual data with unstructured features
Consideration of real-world text sources (social media, healthcare records, legal documents)
Demonstrated that model performance is highly dependent on feature representation and preprocessing quality
Identified trade-offs between model interpretability (Logistic Regression) and complexity/performance (Random Forest, KNN)
Highlighted computational challenges such as high dimensionality, memory requirements, and scalability
Provided a comparative framework for selecting appropriate models based on dataset characteristics
This work strengthens understanding of pattern recognition in high-dimensional data spaces, which is a fundamental challenge shared across domains such as natural language processing and neural signal analysis. The insights gained from feature extraction, dimensionality reduction, and classification are directly transferable to EEG signal processing, cognitive state decoding, and brain–computer interface (BCI) systems, where complex temporal and high-dimensional data must be efficiently modeled and interpreted.
Category: Machine Learning & Natural Language Processing
Focus Area: Pattern Recognition | High-Dimensional Data | Classification Systems
This project focuses on the visual analysis and trend exploration of solar flare observations using a real-world dataset containing flare event records spanning multiple decades. Solar flares are intense bursts of radiation from the Sun that vary in frequency and intensity, and understanding their temporal patterns contributes to space weather research and long-term solar activity analysis.
Perform data cleaning and preprocessing of raw solar flare records
Visualize yearly flare occurrences across different flare classes
Analyze correlations among major flare types to understand their temporal relationships
Investigate distinct patterns in flare frequency and class distribution
Languages & Tools: Python, data preprocessing libraries, Visualisation libraries
Data preprocessing to standardize date and flare classification fields
Line graphs, bar charts, stacked area plots, and heat maps for trend visualization
Correlation analysis between flare types over time
Exploratory data analysis to reveal long-term patterns in solar activity
Solar flare observational dataset (flare events from 1981–2017)
Includes flare occurrences classified into standard flare types (A, B, C, M, X)
Dataset required preprocessing due to inconsistent formats and missing values
Flare event records like these are maintained in databases that integrate observations across instruments and missions for scientific analysis of space weather data trends.
C-class flares were the most frequently occurring flare type across the dataset, followed by B- and M-class flares, while X- and A-class events were comparatively rare.
Visualizations demonstrated how flare frequencies vary year to year, revealing patterns consistent with known solar activity cycles.
Correlation analysis using heat maps and stacked charts showed statistical relationships between C-class flares and both M- and X-class events.
Data preprocessing techniques such as regex-based cleanup and date/time standardization were essential for reliable analysis.
Solar flare datasets like this one reflect long-term observational records used by the heliophysics community to study solar activity and its impact on space weather conditions. Scientific resources and catalogs maintained by organisations such as NASA and international data centers provide structured flare event archives for research and modeling.
This project demonstrates skills in handling real observational data, cleaning noisy datasets, and producing domain-relevant visual insights, which are valuable in research areas involving signals and time-series data — a key methodological overlap with challenges in EEG signal analysis and longitudinal biomedical signal studies.
Category: Data Visualization & Analytics
Focus Area: Time-Series Patterns | Signal Trends | Environmental Data
This project explored the role of big data technologies in modern smart grid systems, focusing on the challenges and opportunities associated with analyzing large-scale energy datasets from renewable sources such as solar power plants. The work emphasizes scalable data processing, real-time analytics, and the integration of data-driven approaches for improving energy efficiency and grid reliability.
Analyze challenges associated with large-scale data in smart grid and renewable energy systems
Investigate big data solutions for handling high-volume, high-velocity energy datasets
Explore data-driven approaches for improving energy forecasting, monitoring, and optimization
Languages & Tools: Python, Apache Flink, SQL, Big Data frameworks
Distributed data processing and stream analytics
Time-series analysis of energy consumption and generation data
Data pipeline design for large-scale energy systems
Literature-driven analytical study of smart grid architectures
Large-scale smart grid and renewable energy datasets (solar and green energy systems)
Focus on high-volume, high-velocity, and heterogeneous data sources
Consideration of real-time data streams from IoT-enabled energy infrastructures
Identified key challenges in big data for energy systems, including data volume, velocity, variety, and scalability
Analyzed limitations of traditional data processing methods in handling smart grid data
Proposed scalable architectures and big data solutions for efficient energy analytics
Highlighted importance of real-time processing for grid stability and energy optimization
This work provides insights into the integration of big data analytics with cyber-physical energy systems, highlighting parallels with other large-scale data domains such as biomedical signal processing. The project strengthens foundational knowledge in handling complex, high-dimensional data streams, which is directly relevant to research areas like EEG signal processing, real-time neural data analysis, and brain–computer interface systems.
Category: Big Data Systems & Energy Analytics
Focus Area: Distributed Systems | Time-Series Data | Real-Time Analytics
This project presents a comprehensive review of data security challenges in IoT-enabled smart automotive systems. With the rise of connected and autonomous vehicles, the study explores critical privacy and security risks such as remote hijacking, malware attacks, and data breaches. It evaluates both traditional and emerging security solutions, with a particular focus on blockchain-based architectures.
Analyze major security challenges in automotive IoT systems
Review conventional and distributed security frameworks
Evaluate blockchain as a potential solution for secure data management
Identify future research directions in automotive cybersecurity
Literature review of existing IoT and automotive security systems
Comparative analysis of centralized vs. distributed security approaches
Conceptual evaluation of blockchain-based architectures
Qualitative assessment of system resilience against cyber threats
Internet of Things (IoT)
Cyber-Physical Systems (CPS)
Blockchain Technology
Vehicle-to-Vehicle (V2V) & Vehicle-to-Infrastructure (V2I) Communication
Distributed Security Systems
Traditional centralized systems lack scalability, privacy, and user control
IoT-enabled vehicles are highly vulnerable to malware, hacking, and cloud-based threats
Blockchain offers enhanced security through decentralization, transparency, and data integrity
Integration of AI with blockchain can further strengthen automotive security frameworks
The study highlights blockchain as a promising solution for automotive data security while acknowledging challenges such as implementation complexity and lack of standards. Future work includes integrating machine learning and deep learning techniques to develop intelligent, adaptive security systems for connected vehicles.
In addition to the core projects listed above, I have completed several academic and independent projects covering machine learning, deep learning, natural language processing, and real-time data systems. These projects further strengthened my skills in handling diverse datasets and applying scalable analytical techniques.
Speech Emotion Recognition Using Deep Learning — Published (Springer, LNCS)
Multimodal Emotion Recognition (Audio, Text, Behavioral Data)
Time-Series Forecasting using ARIMA (MAPE-based evaluation)
MQTT-based Publisher–Subscriber Systems (Mosquitto)
Real-Time Stream Processing using Apache Flink (CEP)
Sentiment Analysis using NLP (Disaster Tweets and Product Reviews)
House Price Prediction using Regression Models
Fraud Detection in Financial Transactions (Imbalanced Data Handling)