REDCap–DHIS2–Power BI Interoperability Project
Indiana University Luddy School of Informatics, Computing, and Engineering Jan 2024 – May 2025
Clinical Data Integration Specialist (REDCap–DHIS2–Power BI Interoperability Project)
Developed an end-to-end data pipeline enabling interoperability between REDCap, DHIS2, and Power BI to support community health analytics.
Cleaned and transformed survey datasets to meet REDCap formatting standards, including recoding categorical values, resolving missing entries, and preparing for structured import.
Designed and uploaded a custom REDCap Data Dictionary with validated field types (radio, dropdown, yes/no, date, text) and successfully imported survey records using the Data Import Tool.
Enabled secure API token access (PID: 40481), built live Power BI connections using M code and JSON payloads, and created interactive dashboards for real-time public health insights.
Developed Python scripts to convert REDCap records into DHIS2-compliant JSON payloads for trackedEntityInstances, enrollments, and events, supporting future integration.
Conducted literature review on REDCap–DHIS2 interoperability using FHIR, ETL pipelines, middleware (OpenHIM), and API-based synchronization to inform system design and implementation.
Developed an end-to-end integration pipeline between REDCap, DHIS2, and Power BI to automate public health data monitoring and research reporting workflows.
Connected REDCap API to Power BI using M language, JSON, and ETL workflows, enabling real-time, interactive dashboards for faculty and health researchers.
Transformed REDCap survey datasets using Python (pandas) and Excel; standardized data via custom Data Dictionary and executed structured imports using the Data Import Tool.
Designed FHIR-aligned, DHIS2-compliant JSON payloads for trackedEntityInstances, events, and enrollments, ensuring secure and interoperable health data exchange
Authored data governance documentation, metadata templates, and standard operating protocols to improve data consistency and cross-team alignment.
Published REDCap-connected datasets to Power BI Service, enabling cloud-based dashboarding, Quick Insights, and automated visualization generation.
Leveraged Q&A Visuals and Smart Narrative in Power BI to produce natural language summaries and on-demand visual explanations for institutional stakeholders.
Built custom M code queries via Advanced Editor using Web.Contents, automating live API connections and recurring dataset refresh cycles.
Designed a streamlined data publishing workflow from Power BI Desktop to Power BI Service, enabling scheduled refreshes, interactive filtering, and cross-platform data exploration.
Created natural language–driven analytics using Power BI Copilot and Auto Insights (where enabled), improving dashboard usability and storytelling with REDCap data
Github Link : https://github.com/Mdsameer9656/Academic-Projects/tree/main/PROJECT%203%20-%20ESOPHAGEAL%20CANCER
Oesophagal Cancer Data Management and Analytics System August 2024 – May 2024 Healthcare Analytics Developer: Designed a normalised Mysql database for oesophagal cancer data with analytics and ML integration, enabling visualisations, predictive modelling, and insights into survival trends and treatment outcomes to support future CDSS
Advanced Healthcare Database System for Esophageal Cancer Treatment
Jan 2024 – Apr 2024 | MySQL, Python, Plotly, Random Forest, Logistic Regression, PHPMyAdmin, MySQL Workbench
Designed and implemented a normalized relational database to manage esophageal cancer patient data using MySQL.
Structured data across multiple entities (patients, visits, diagnosis, treatment, outcome, country) with 3NF and BCNF adherence for optimized querying.
Developed SQL queries to analyze treatment effectiveness, survival trends, and high-risk regions based on Karnofsky scores.
Created geographic and demographic visualizations to illustrate disparities in cancer incidence and survival outcomes.
Applied logistic regression and random forest models to predict patient survival outcomes; identified key predictors such as tumor stage, age, and treatment type.
Integrated plans for future CDSS and machine learning expansions to enhance clinical decision-making.
Github Link : https://github.com/Mdsameer9656/Academic-Projects/tree/main/PROJECT%202%20-%20FOOD%20SECURITY
Impact of Climate Change on Global Food Security
Team role: Lead Data Analyst and Methodology Specialist
Overview: Explored how climate change affects the food security index in various countries, aiming to understand the multifaceted relationship between climatic changes and food systems.
Data Source: Global Food Security Index covering 113 countries
Methodology:
Correlation Analysis: Identified climate change indicators that significantly correlate with the food security index.
Simple and Multiple Linear Regression Analysis: Examined the strength and significance of relationships between climate change indicators and food security components.
ANOVA, Shapiro-Wilk, Kruskal-Wallis Tests: Conducted to verify assumptions and analyze data.
Data Visualization: Used scatter plots, line graphs, heat maps, and histograms to interpret relationships between variables.
Outcomes: Provided insights into how climate change impacts food security, crucial for developing strategies to mitigate adverse effects on global food systems.
Github Link: https://github.com/Mdsameer9656/Academic-Projects/tree/main/PROJECT%201%20-%20LUNG%20CANCER
Project Experience 2
Lung Cancer Factors and Predictive Model
Team Role: Data Analyst
Overview: Worked on analyzing factors contributing to lung cancer risk and developing a predictive model to assess individual patient risk levels. This project aimed to identify key risk factors and provide actionable insights for early detection and intervention.
Data Source: Patient health records and publicly available datasets related to lung cancer statistics.
Methodology:
Exploratory Data Analysis (EDA): Uncovered patterns and trends in the data, identifying significant risk factors such as smoking history, age, and exposure to pollutants.
Feature Engineering: Created new features from raw data, such as the interaction between smoking history and occupational exposure, to enhance model accuracy.
Predictive Modeling: Developed and validated multiple machine learning models, including logistic regression, random forests, and support vector machines, to predict lung cancer risk.
Model Evaluation: Assessed model performance using metrics like accuracy, precision, recall, and AUC-ROC, ensuring the model’s robustness and reliability.
Data Visualization: Utilized bar charts, ROC curves, and confusion matrices to present findings and model performance to stakeholders.
Outcomes: Successfully identified critical risk factors and developed a predictive model that can assist healthcare professionals in early lung cancer detection and personalized treatment planning.
Github Link : https://github.com/Mdsameer9656/Academic-Projects/tree/main/PROJECT%204%20-%20INSULIN%20DEVICES
Project Experience
Research Study on Effectiveness of Current Remote Insulin Devices
Team Role: Project Manager
Overview: Led a comprehensive research study to evaluate the effectiveness and reliability of current remote insulin monitoring devices. The study aimed to provide evidence-based recommendations for improving diabetes management through advanced technology.
Data Source: Collected data from clinical trials, patient feedback, and device usage reports.
Methodology:
Project Planning: Designed the study framework, including participant selection criteria, data collection methods, and timeline management.
Data Collection and Analysis: Conducted surveys and collected data on device performance, patient satisfaction, and glycemic control outcomes. Analyzed the data using statistical techniques such as paired t-tests and regression analysis to determine device efficacy.
Protocol Development: Established rigorous testing protocols to ensure the accuracy and consistency of data across different devices and patient demographics.
Stakeholder Collaboration: Worked closely with healthcare providers, patients, and device manufacturers to gather comprehensive insights and ensure the study’s relevance to real-world applications.
Data Visualization and Reporting: Created detailed reports and visualizations, including trend analysis and comparative studies, to effectively communicate findings to stakeholders.
Outcomes: The study provided critical insights into the performance of remote insulin devices, identifying areas for improvement and contributing to the development of more effective diabetes management tools. Recommendations from this study are being considered for future enhancements in remote monitoring technology.
ePortfolio Effectiveness Study
Led research and authored a paper: Assessing the Effectiveness of ePortfolio for Streamlined Assignment Submission and Management.
Collected student feedback and analyzed usability, accessibility, and preferences using R.
Collaborated with Dr. Zeyana Hamid on publishing findings