Project III (MATH 3382)
Jeffrey Giansiracusa - email: jeffrey.giansiracusa@durham.ac.uk
Darwin first introduced the idea of an evolutionary tree. Life forms evolve and differentiate into distinct species, and this process can be mapped and depicted with mathematical trees. Phylogenetics is the study of these trees, and working with these trees involves some fascinating mathematics that involves a mixture of combinatorics, probability theory, algorithms, and statistics.
With modern gene sequencing technology, we can collect a large number of gene sequences, and one of the basic problems is to try to reconstruct the evolutionary relationships from the genetic data. Applications range from tracking the spread and evolution of viruses (think of the various strains of covid or flu), to digital humanities and analysis of the many different versions of historic texts such as Chaucer's Canterbury tales.
In this project we'll explore a variety of topics in the mathematics, algorithms, and statistics of working with phylogenetic trees.
For the group component of this project, we will work through a selection of chapters from the book by Allman and Rhodes. Topics we will learn include:
Basics of trees and metrics on trees
Probabalistic models for DNA mutation, and measures of distance based on these
Methods for inferring phylogenetic trees, such as maximum parsimony and distance-based methods, and maximum likelihood.
The geometry of the space of trees.
The project will primarily emphasise learning through reading. Evidence of learning will consist of demonstrating understanding through working exercises, oral and written communication, and synthesis.
The individual project offers a choice of directions, depending on your preferences and interests. Some possible directions include:
A deeper exploration of the space of phylogenetic trees, comparing the Billera-Holmes-Vogtmann model and Wald space model
Baysian inference of trees
A deeper exploration of Markov models on trees
Methods of gene sequence alignment
Applications of phylogenetics
Some of these directions will emphasise reading, while others will involve a mixture of reading and computational work (coding and data analysis).
The individual component of this project is flexible and offers a choice of either a focus on reading and theoretical aspects, or a focus on coding (in any language you choose) to implement models and algorithms, and analyse data sets.
Evidence of learning will depend entirely on which mode your pursue. For a reading-based approach, the evidence of learning will include exploring examples and theoretical applications of the material, and synthesising ideas, methods, and results from the literature. For a computational mode, evidence of learning will include producing code that implements methods from computational topology and/or applies them to relevant and possibly novel data sets (which can either be existing publicly available data sets, or the result of a simulation process), and analysis comparing computational results with theory. In both cases, evidence will also consist of clearly communicating in both written and oral formats.
Algebra II (MATH 2581)
Discerete Mathematics (MATH 1031)
Statistics (MATH1617) might be beneficial, but is not requried.
Lecture Notes: The Mathematics of Phylogenetics
Elizabeth S. Allman, John A. Rhodes
https://jarhodesuaf.github.io/PhyloBook.pdf
The Mathematics of Phylogenetics
Lior Pachter, Bernd Sturmfels
https://arxiv.org/abs/math/0409132
Semple, Charles.; Steel, M. A.
Available in Bill Bryson library: https://discover.durham.ac.uk/permalink/44DUR_INST/k3s6qp/alma991004973859707366