Data Analytics and Topological Methods for Biological and Machine Learning

Dr. Mark Lexter D. De Lara, Dr. Ranzivelle Marianne Roxas-Villanueva

University of the Philippines Los Baños, Laguna, Philippines

ABSTRACT

Day 1. Machine Learning Foundations and Applications in Mathematical Biology
Machine learning (ML) has become a central computational paradigm for analyzing complex, high-dimensional, and nonlinear systems—features inherent to modern biological data. From disease classification and ecological forecasting to behavioral modeling and genomic signal detection, ML offers flexible mathematical frameworks for uncovering patterns directly from data. This lecture introduces foundational ML concepts, core algorithmic families, and biologically relevant applications, preparing participants for later hands-on sessions.

The lecture begins with a conceptual overview of ML, highlighting how data-driven inference differs from traditional rule-based systems. Key supervised learning terminologies are formally defined, followed by examples such as regression for predicting continuous biological variables and classification for identifying discrete phenotypic states. The discussion covers the mathematical structure and assumptions underlying linear models, logistic regression, decision trees, support vector machines, k-nearest neighbors, and ensemble methods that enhance predictive performance through model aggregation.

Unsupervised methods—particularly clustering and dimensionality reduction—are then presented as essential tools for exploring unlabeled biological datasets. Their mathematical foundations are explained to show how algorithmic choices shape biological interpretation.

Cross-validation, hyperparameter tuning, and model generalization are discussed within the framework of statistical learning theory, emphasizing the need for robust models when working with naturally variable biological systems. Participants will gain insights into balancing model complexity with interpretability, especially when dealing with noisy or limited data.

The session concludes with guided hands-on activities where participants implement supervised and unsupervised ML workflows using real or simulated biological datasets. Exercises include data preprocessing, feature engineering, model training and evaluation, clustering, and hyperparameter optimization. By the end of Day 1, participants will be equipped with both the theoretical grounding and practical skills needed to integrate ML techniques into their own research in mathematical biology.

Day 2. Topological Data Analysis for Machine Learning Classification
This workshop provides a practical introduction to Topological Data Analysis (TDA) for uncovering structural patterns in complex datasets. Designed for biologists, mathematicians, and data scientists, it covers essential concepts in computational topology and demonstrates how topological features can enhance real-world data analysis.

Participants will learn to compute and interpret persistence diagrams, extract meaningful topological features, and visualize multiscale data structures. The session highlights the Persistent Homology Classification Algorithm (PHCA), a supervised method that uses topological signatures for classification tasks. Through an interactive, notebook-based workflow, attendees will implement PHCA and apply it to sample datasets.

By the end of the workshop, participants will be equipped to integrate TDA and PHCA into research pipelines involving pattern recognition, signal and image analysis, or the exploration of complex biological systems.

Page updated

Report abuse