Helping hospitals detect stroke risk early through data mining
MOHAMMAD JAVAN SAMBOEPUTRA HERLAMBANG
YANG ZIXUN
MUHAMMAD HANIF MURTAZA
CAI ZESHUO
Stroke is one of the leading causes of death and disability worldwide. Early detection and prevention are essential to reduce health risks and save lives.
This project applies machine learning techniques to predict stroke risk based on health attributes such as age, BMI, glucose levels, and medical history. Using real-world health data, we aim to assist hospitals and clinics in identifying high-risk individuals for early intervention.
The Ministry of Health Malaysia (MOH) is responsible for delivering comprehensive healthcare services to the nation’s population. As part of its mandate, the MOH is actively pursuing digital health transformation, which includes data-driven approaches to disease prevention and patient care.
MOH aims to improve the nation’s health through better care and disease prevention. This project supports MOH’s goals by using data mining to predict stroke risks early, helping doctors take preventive action before a stroke happens.
By using this system, MOH and public hospitals can:
Identify high-risk patients earlier for faster intervention.
Use hospital resources better, such as staff and screening tools.
Save costs by avoiding expensive emergency care.
Create smarter health policies using real data.
This project also reflects the growing role of AI and digital tools in healthcare—making Malaysia’s healthcare system more modern and proactive.
Stroke is one of the top causes of death and disability in Malaysia, especially among younger people. Many cases are only detected too late—when treatment is less effective and more costly.
Although stroke risk factors like high blood pressure, diabetes, and smoking are widely known, hospitals often lack smart tools to spot high-risk patients early. Manual screenings aren’t enough for large populations.
This project uses machine learning to turn patient data into early warnings. By predicting stroke risk before symptoms appear, healthcare providers can shift from late treatment to early prevention—saving lives and reducing costs.
This project aims to build an effective stroke prediction system using patient health data. To support our project goals, the following data and model-specific objectives were set:
Understand and clean the dataset to select meaningful features
Explore data patterns through EDA (Exploratory Data Analysis)
Preprocess data and address class imbalance issues
Train and compare multiple machine learning models
Select the best-performing model based on metrics such as accuracy, recall, F1 score, and AUC
Improve recall in selected models to better capture stroke cases without sacrificing interpretability or usability
Ultimately, we aim to:
Predict if a patient is likely to experience a stroke—focusing on catching actual cases
Identify individuals at highest risk to support early medical intervention
Help healthcare providers classify patients into low, medium, and high-risk groups for better-targeted care