Machine Learning Identification of CpG DNA Methylation Biomarkers for Early Prediction of Gestational Diabetes
Status: Completed - Biomedical Data Science Project (2026)
Institution: Madras Diabetes Research Foundation (MDRF)
Role: Machine Learning Analyst / Biostatistician
Description:
This project applied supervised machine learning techniques to DNA methylation data measured during the first trimester of pregnancy to identify CpG biomarkers associated with gestational diabetes mellitus (GDM).
The analysis aimed to determine which methylation markers consistently contribute to early GDM prediction across multiple machine learning algorithms while ensuring reproducible and transparent analytical workflows.
Objective:
To evaluate the predictive value of 12 candidate CpG methylation sites and identify the most informative biomarkers for early detection of gestational diabetes using multiple machine learning models.
Methods:
Supervised machine learning modelling in R
Logistic Regression baseline classifier
LASSO regularized regression for embedded feature selection
K-Nearest Neighbours (KNN)
Support Vector Machine (RBF kernel)
Naïve Bayes classification
Random Forest ensemble learning
Extreme Gradient Boosting (XGBoost)
10-fold cross-validation for robust model evaluation
Model discrimination assessed using ROC curves and AUC
Model-specific and permutation-based feature importance analysis
Consensus biomarker ranking across algorithms
Key Findings:
Several CpG sites showed consistent importance across multiple algorithms, suggesting robust biomarker potential.
Random Forest and SVM models achieved the highest discriminative performance (AUC ≈ 0.84-0.85).
Consensus ranking identified a subset of CpG markers that may contribute to early metabolic dysregulation during pregnancy.
Impact:
This project demonstrates how epigenetic biomarkers combined with machine learning approaches can support early identification of high-risk pregnancies.
Such approaches could contribute to precision medicine strategies, enabling earlier screening and targeted preventive interventions for gestational diabetes.
The project also highlights the importance of reproducible statistical workflows in biomedical data science.
Tools & Skills:
R • Machine Learning • Biomarker Discovery • Epigenetics • ROC Analysis • Feature Selection • Reproducible Research • Biomedical Data Science
Outputs:
GitHub Repository
https://github.com/Samadou-Tchakondo/GDM-CpG-Feature-Selection
Status: Completed - Applied Biostatistics Project (2026)
Institution: SRM Institute of Science and Technology, School of Public Health
Context: Two-Day Online Certificate FDP (Workshop-I, 14 & 21 February 2026)
Role: Principal Analyst / Author
This project developed a reproducible statistical workflow to evaluate training effectiveness using pre-post binary assessment data. The analysis demonstrates how simple correct/incorrect questionnaire responses can be transformed into rigorous, publication-ready quantitative evidence through appropriate non-parametric methodology.
To design a transparent analytical framework for assessing within-subject learning change while addressing the methodological challenges of bounded, non-normally distributed scores derived from binary items.
Construction of total and domain-specific scores from binary questionnaire items
Reliability assessment using Cronbach’s alpha
Distributional analysis through graphical diagnostics
Normality testing using the Shapiro-Wilk test
Paired comparison using the Wilcoxon signed-rank test
Effect size estimation r= ∣z∣ / N to quantify magnitude of change
Reproducible reporting using R Markdown and version-controlled workflow (Git/GitHub)
Statistically significant improvement was observed between pre- and post-training scores (p < 0.001).
Median increase of approximately 2.5 points on the total assessment scale.
Large overall effect size (r ≈ 0.80) indicating substantial practical impact.
The strongest gains were observed in statistical reasoning, followed by reporting & publishing, with meaningful improvements also in health knowledge.
Demonstrates how educational and training evaluation data can be analyzed using robust statistical reasoning rather than default parametric approaches.
The project provides a scalable template for evaluating capacity-building interventions in public health using reproducible, open-science methods.
R • R Markdown • Non-Parametric Statistics • Biostatistics • Educational Data Analysis • Reproducible Research • Git/GitHub • Public Health Analytics
RPubs Report:
https://rpubs.com/Samadou_Tchakondo/prepost-binary-data-analysis
GitHub Repository:
https://github.com/Samadou-Tchakondo/prepost-binary-data-analysis
Forecasting Monthly Confirmed Malaria Cases in Togo Using ARIMA, SARIMA, and ARIMAX Models
Status: Completed - Academic Research Project (2026)
Institution: SRM Institute of Science and Technology
Role: Principal Analyst / Author
Description:
This project applied advanced time series modelling techniques to routine malaria surveillance data collected between 2022 and 2024 to understand temporal dynamics and generate short-term forecasts for public health planning.
Objective:
To model seasonal malaria trends and produce forecasts for 2025–2026 that can inform preparedness, intervention timing, and resource allocation.
Methods:
Box-Jenkins time series methodology
ARIMA, SARIMA, and ARIMAX modelling in R
Stationarity assessment using Augmented Dickey-Fuller test
ACF/PACF diagnostics for model identification
Seasonal decomposition and residual analysis
Model comparison using AIC/BIC and forecast accuracy metrics
Out-of-sample validation for predictive performance
Key Findings:
Identified strong and consistent annual malaria seasonality.
SARIMA(0,0,1)(1,1,0)12 provided the best statistical and predictive fit.
Climatic covariates did not significantly improve model performance.
Forecasts indicate continued seasonal peaks, supporting anticipatory malaria control strategies.
Impact:
Demonstrates how routinely collected surveillance data can be transformed into actionable epidemiological intelligence using reproducible statistical workflows.
The approach can support early-warning systems, optimize intervention timing, and strengthen data-driven malaria control planning in resource-limited settings.
Tools & Skills:
R • Forecasting • Epidemiological Modelling • Time Series Analysis • Reproducible Research • Public Health Analytics
Outputs:
GitHub Repository
https://github.com/Samadou-Tchakondo/malaria-time-series-analysis
DOI (Zenodo Archive)
https://doi.org/10.5281/zenodo.18637204
Determinants and Spatial Patterns of Insecticide-Treated Net (ITN) Utilization Among Women of Reproductive Age in Togo: A Bayesian and Spatial Analysis of the 2017 TMIS Data
Status: Completed – Submitted for M.Sc. Dissertation
2025-10 | SRM Institute of Science and Technology
Role: Principal Investigator / Author
Description:
Objective: Investigate multi-level determinants and spatial patterns of ITN utilization among women of reproductive age in Togo.
Method: Analyzed 2017 Togo Malaria Indicator Survey data using Bayesian multilevel logistic regression and spatial autocorrelation techniques (Moran’s I, LISA) in R. Examined individual, household, and community-level predictors.
Impact: Revealed significant geographic disparities and key determinants such as age, marital status, education, exposure to ITN messages, and household headship. Identified “coldspots” of low ITN use in Lomé Commune and “hotspots” in Plateaux. Findings support targeted, equity-based malaria prevention strategies for women and urban populations.
Status: Prototype completed; platform under development
2025-08-26
Role: Co-Developer / Researcher
Description: An interactive platform that brings together all the essential steps for biostatistics and epidemiology analyses, featuring ready-to-use codes (Python, R, SPSS, SAS), step-by-step guides, visualizations, and preloaded datasets for hands-on practice. Simplify your research, save time, and access indispensable tools for students, researchers, and public health professionals.
Status: Completed - MSc Dissertation (Defended)
Period: 2020-2022
Institution: Higher School of Biological and Food Techniques (ESTBA), University of Lomé
Degree: MSc in Developmental Biology - Specialization in Microbial and Cellular Biotechnology
Role: Principal Investigator / Lead Author
Methods:
An ethnopharmacological survey was conducted among 37 traditional medicine practitioners in the Maritime Region of Togo to document therapeutic uses of C. procera. Hydroethanolic extraction of leaves was performed, followed by qualitative phytochemical screening to identify major secondary metabolites.
In vitro antioxidant activity was assessed using phosphomolybdate reduction and FRAP assays. Acute oral toxicity was evaluated in male and female Sprague-Dawley rats following OECD guideline 423.
Antimicrobial activity was tested using the agar well diffusion method against reference and clinical strains of urogenital pathogens (Escherichia coli, Staphylococcus aureus, Pseudomonas aeruginosa, Neisseria gonorrhoeae, and Candida albicans).
Impact / Key Findings:
The study confirmed the widespread traditional use of C. procera for infectious and inflammatory conditions and demonstrated the presence of bioactive compounds, including alkaloids, terpenoids, and coumarins, alongside measurable antioxidant activity. Acute toxicity testing indicated a high safety margin (LD₅₀ > 5000 mg/kg).
Despite these properties, the extract showed no significant antimicrobial activity against tested urogenital pathogens at high concentrations, highlighting the complex gap between traditional use and laboratory efficacy and underscoring the need for rigorous validation of plant-based therapies in antimicrobial resistance research.
📊 Publications
Socio-demographic and contextual drivers of insecticide-treated net (ITN) utilization among women of reproductive age in Togo: a Bayesian multilevel analysis of the 2017 Togo Malaria Indicator Survey
Status: Under review in peer review in BMC Bublic Health
10/2025
Role: Lead Author / Researcher
Description: Objective: Identify the individual, household, and community determinants of ITN utilization among women of reproductive age in Togo. Method: Analyzed 4,225 women from he nationally representative 2017 Togo Malaria Indicator Survey (TMIS) using Weighted Descriptive Statistics and Rao–Scott Chi-square tests. Determinants were identified using a Bayesian Multilevel Logistic Regression model (brms R package), with performance assessed by LOOIC and Bayesian R². Impact: Found overall use is relatively high (72.8%) yet uneven. Key negative predictors included female-headed and medium-sized households, leading to the recommendation of strengthening health communication and targeting these vulnerable groups to accelerate malaria control.
Geographic variation in malaria prevalence among children under five in coastal and inland counties of Liberia: analysis of the 2022 Malaria Indicator Survey
Status: Published on BMC Malaria Journal
2026-15-01 | Journal Article
Role: Co-author / Research Collaborator
Description: Objective: Assess socioeconomic, demographic, and behavioral determinants of malaria prevalence among children under five, focusing on spatial disparities between coastal and inland regions of Liberia. Method: Conducted secondary data analysis using the nationally representative 2022 Liberia Malaria Indicator Survey (LMIS) dataset, including 2,189 children aged 6–59 months. Descriptive statistics and a two-proportion Z-test assessed regional differences in malaria prevalence, while logistic regression identified key determinants such as ITN use, anemia status, household wealth, age, and residence. Impact: The study uncovered significant regional disparities in malaria burden, with inland regions showing higher prevalence. It highlighted anemia, poverty, non-use of ITNs, and rural residence as major risk factors, emphasizing the need for geographically targeted malaria control strategies and resource allocation to protect vulnerable under-five populations in Liberia.
In vitro phytochemical and biological studies of the hydroethanolic extract of Anchomanes difformis used in phytotherapy in Togo
Status: Published on Scientific Reports
2025-08-26 | Journal Article
Role: Co-author / Research Collaborator
Description: Objective: Evaluate the antimicrobial activity of A. difformis organs in the context of antibiotic resistance. Method: Employed hydroethanolic extraction, phytochemical analysis, flavonoid quantification, and solid-state diffusion antimicrobial tests. Impact: The study invalidated its local use against the tested microbial strains but confirmed high in vitro antioxidant activity, suggesting other therapeutic avenues.
Phytochemical and biological studies of the hydroethanolic extract of Pteleopsis suberosa and Piliostigma thonningii used in herbal medicine in Togo
Status: Under review in peer review journal
2025-07-22 | Preprint
Role: Co-author / Research Collaborator
Description: Objective: Evaluate the antimicrobial efficacy of two Togolese medicinal plants against human pathogens. Method: Used hydroethanolic extracts, phytochemical screening, antioxidant, and antimicrobial assays. Impact: The study supported the traditional use of the plants, notably identifying strong activity of P. thonningii trunk bark against C. albicans.
Status: Under review in peer review journal
2025-06-09 | Preprint
Role: Lead Co-author / Researcher
Description: Objective: Determine the phytochemical composition, acute toxicity, and antimicrobial properties of C. procera leaves against urogenital infection strains. Method: Conducted phytochemical screening, acute toxicity tests (OECD 423), and agar-well diffusion assays. Impact: Established a safe toxicological profile (up to 5,000 mg/kg b.w.), while suggesting further research for optimizing extraction techniques due to limited antimicrobial activity against the tested strains.
Synthesis of Current Data on Typhoid Fever
Status: Published on Open Access Journal of Science
2024-02-27 | Journal article
Role: Co-author / Research Collaborator
Description: Objective: To conduct a comprehensive bibliographical synthesis on the responsible germ, physiopathology, and modern laboratory management of typhoid fever. Method: Performed an extensive literature review across scientific databases using targeted keywords. Impact: The work serves as an essential, up-to-date reference for medical biology professionals on the critical isolation and identification techniques for Salmonella in endemic settings