Jairo Diaz-Rodriguez
jdiazrod (at) yorku.ca
Ph.D., M. Sc., B.Eng.Assistant ProfessorYork UniversityToronto, Canada
I hold a degree from Universidad Industrial de Santander (Colombia) and earned my Ph.D. in Mathematics with a specialization in Statistics from the University of Geneva (Switzerland) in 2018, under the guidance of Professor Sylvain Sardy. Following my doctoral studies, I began my academic career as an Assistant Professor at Universidad del Norte (Colombia) in 2019. In 2021, I joined York University as a faculty member in the Department of Mathematics and Statistics.
My research lies at the intersection of statistics, machine learning and Artificial Intelligence, focusing on developing robust theoretical frameworks and translating them into practical solutions across diverse fields. Beyond research, I am passionate about data science education and mentoring, having designed and implemented a variety of courses and programs to foster learning in this ever-evolving discipline.
In addition to my academic roles, I bring extensive experience as a data science consultant and machine learning engineer. My expertise spans the entire data science lifecycle, including data collection, preprocessing, exploration, visualization, model development, and the deployment of production-ready systems. This blend of academic insight and hands-on industry experience enables me to bridge theory and practice effectively.
Current PhD. Students
Yilin Chen (starting September 2025, co-supervised with Divya Sharma)
Amy Jia (since 2023) - Currently working on self-attention models.
Nicolas Ewen (since 2023, co-supervised with Kelly Ramsay) - Currently working on filter and channel regularization for convolutional neural networks.
Current Master students
Blanca Fernandez Mendez
Ting-Jie Liao
Elina Lisenko
Current Undergraduate students
Johnathan Channer
Previous Master students
Zayeeda Shahreen Labiba - Currently a Statistical Analyst at the Population Health Research Institute (Canada).
Chenyi Yu
Previous Undergraduate students
Steven Zheng (USRA) - Currently master student at University of Toronto (Canada).
Nicolas Yaya (Universidad del Norte) - Currently Business Operation Specialist at Mincka Engineering (Australia).
Education
Ph.D. in Mathematics, Université de Genève. Geneva, Switzerland. 2018 (link)
M.Sc. in Mathematics and Computer Science, Université de Genève. Geneva, Switzerland. 2014 (link)
B.Eng. in Electronics, Universidad Industrial de Santander. Bucaramaga, Colombia. 2010
Research Interests
Data Science
Machine learning and Artificial Intelligence
High dimensional statistics
Inverse problems
Optimization
Activities
Peer-reviewed
J. Diaz-Rodriguez, J. P. Gomez, J. P. Orange, N. D. Burkett-Cadena, S. M. Wisely, J. K. Blackburn and S. Sardy. "Tomographic reconstruction of a disease transmission landscape via GPS recorded random paths". The Annals of Applied Statistics. To appear. 2025 (link)
K. Ramsay, J. Diaz-Rodriguez. "Differentially Private Boxplots". International Conference of Machine Learning. 2025 (link)
E. Nino-Ruiz, J. Diaz-Rodriguez. "Adjoint-Free 4D-Var Methods For Non-Linear Data Assimilation Via Line Search Optimization". Atmosphere. 2024 (link)
S. Sardy, C. Giacobino, J. Diaz-Rodriguez. "Thresholding tests based on affine LASSO to achieve non-asymptotic nominal level and high power under sparse and dense alternatives in high dimension". Computational Statistics & Data Analysis. 2022 (link)
J. Diaz-Rodriguez, S. Sardy, D. Eckert. "Nonparametric estimation of galaxy cluster's emissivity and point source detection in astrophysics with two lasso penalties". Journal of the American Statistical Association. 2020 (link)
D. Hug-Peter, S. Sardy, J Diaz-Rodriguez, E. Castella, V. Slaveykova. "Modeling whole body trace metal concentrations in aquatic invertebrate communities: A trait-based approach". Environmental Pollution. 2018 (link)
C. Giacobino, S. Sardy, J. Diaz-Rodriguez, N. Hengartner. "Quantile Universal Threshold". Electronic Journal of Statistics. 2017 (link)
J. Diaz-Rodriguez, S. Sardy. "A composite lasso penalty with an application in Cosmology". IEEE Intl Conference on Computational Science and Engineering (CSE). 2016 (link)
Preprints
J. Diaz-Rodriguez. "k-LLMmeans: Summaries as Centroids for Interpretable and Scalable LLM-Based Text Clustering". (link)
M. Jia*, J. Diaz-Rodriguez. "Dynamics of Spontaneous Topic Changes in Next Token Prediction with Self-Attention". (link)
In preparation
N. Ewen*, K. Ramsay, J. Diaz-Rodriguez. "Structured Output Regularization".
M. Jia*, J. Diaz-Rodriguez. "Detecting Topic Shifts by Exploiting LLMs’ Struggle to Transition"
*supervised student
Software and libraries
k-LLMmeans: LLM-based centroids for text clustering. 2025. Python (link)
DPBoxplot: Differentially private boxplots. 2024. Python (link)
TVqut: Total variation with the Quantile Universal Threshold. 2024. MATLAB (link)
ALIAS: Astrophysics Lasso Inverse Abel Solver. 2020. MATLAB (link) & C++ (link)
qut: Quantile Universal Threshold. R package. 2017 (link)
Data Science projects and apps
COVID-19 projects (video - in Spanish)
covidBAQ: Covid19 tracking and prediction in the city of Barranquilla
RtColombia: Real time calculation of the reproducible number of COVID-19 pandemic in every town of Colombia
Coronability: Calculation of coronavirus probabilities for sensibilization purposes
Real time modeling of accident probabilities according to traffic light locations (video - in Spanish)
Awards and grants
2024 - New investigators best presentation award. Winner. Statistical Society of Canada. (link)
2023 - New investigators best presentation award. Runner up. Statistical Society of Canada.
2021 - NSERC: Discovery Grant.
2019 - Latin American Swiss Center (CLS-HSG): Swiss Latin American seed grant.
2019 - Colombian Ministry of Information Technologies and Communications: Top 10 Data Science projects in Colombia.
2016 - Best conference paper. International Conference on Information Complexity and Statistical Modeling in High Dimensions with Applications.
2012 - Fellow. COLFUTURO: Foundation for the future of Colombia.
2009 - Bronce Medal. Iberoamerican Math Competition for University Students.
2004 - Gold Medal. Colombian National Math Olympiad.
Current courses (2023- )
MATH 1130: Introduction to Data Science (undergraduate)
MATH 2130: Principles and Techniques of Data Science (undergraduate)
MATH 6650: Introduction to Statistical Data Science (graduate)
Old courses
MATH 4939: Statistical data analysis in SAS and R (undergraduate)
NATS 1130: Statistical reasoning (undergraduate)
Old courses (at Universidad del Norte)
Statistics I, Statistics II (undergraduate)
Calculus I (undergraduate)
Mathematical Statistics (undergraduate)
Mathematical Statistics (graduate)
Computational Statistics (graduate)
Machine Learning (graduate)
Advanced topics in Statistics (graduate)
Current students (Summer 2025)
With my PhD. advisor and friend (Sylvain Sardy) in 2015 (Top of Sandia Mountain in New Mexico (US)
My family...