Research
Research interests
My research lies in the application of mathematical and statistical methods in computational and evolutionary biology, particularly phylogenetics. In this field, we utilise the massive amount of genomic data that is being generated today (courtesy of rapid developments in sequencing technology) to infer evolutionary history. This is represented in the form of phylogenetic trees and networks, mathematical objects which depict the evolution of a family of species or genes through time, starting from their common ancestor. The ever-advancing scale of this problem requires the development of sophisticated and efficient algorithms, underpinned by a large variety of mathematical and statistical tools, such as dynamic programming, graph-theoretic methods, maximum-likelihood or Bayesian methods, hidden Markov models - the list goes on!
My research themes include reticulate evolution in phylogenetics and population genetics (the inference and analysis of ancestral recombination graphs and phylogenetic networks) and gene tree-species tree problems (reconciliation inference and analysis, and species tree inference).
Projects that I am currently working on, with collaborators, include:
Species tree inference under gene tree recombination;
Population genetics analyses based on the ancestral recombination graph;
Sequence analyses of the antigen-encoding var genes of the malaria parasite;
Inferring incomplete lineage sorting and copy number hemiplasy in gene tree-species tree relationships;
Methods to build phylogenetic networks for duplicate gene recombination, and to reconcile them to a species tree or network.
I am also interested in enumerative combinatorics and statistical mechanics. I seek to analyse the properties of models of physical systems such as magnets, gases or polymers, either by searching for an exact solution or by using numerical tools such as series expansions or Monte Carlo simulations. I am interested in combinatorial methods used in pursuit of these goals, in particular corner transfer matrix-related methods, which I have developed significantly. I am further interested in using these and other tools for the efficient enumeration of combinatorial objects such as walks or animals, which can be used to model polymers or knots.
Other areas of research that I have explored include statistics (classification and clustering methods for high-dimensional datasets, as well as specific applications related to protein imaging analysis and mining) and operations research (with a focus on specific applications - mining and wireless communications). I am interested in possible cross-disciplinary applications of such work.
If you are interested in my work, please do not hesitate to contact me.
Publications
He, W., Scornavacca, C., and Chan, Y. (2024). The accuracy of species tree inference under gene tree dependence. Submitted to PLoS Comp. Biol.
Huang, Z., Kelleher, J., Chan, Y., and Balding, D. J. (2024). Estimating evolutionary and demographic parameters via ARG-derived IBD. Submitted to PLoS Comp. Biol..
Chan, Y. (2024). An efficient algorithm for the reconciliation of a gene network and species tree. WABI 2024, to appear.
Tan., M. H., Tiedje, K. E., Feng, Q., . . . , Shim., H., Chan, Y., Day, K. P. (2023). A paradoxical population structure of var DBLα types in Africa. Submitted to PLoS Pathogens.
He, M., Chan, Y., and Hautphenne, S. (2023). Approximate Bayesian computation for Markovian binary trees in phylogenetics. Submitted to Bioinformatics.
Li, Q., Chan, Y., Galtier, N., and Scornavacca, C. (2024). The effect of copy number hemiplasy on gene family evolution. Syst. Biol. 73(2), 355-374.
Tan, M. H., Shim, H., Chan, Y., and Day, K. P. (2022). Unravelling var complexity: Relationship between DBLα types and var genes in Plasmodium falciparum. Frontiers in Parasitology 1, 12.
Chan, Y., Li, Q., and Scornavacca, C. (2022). The large-sample asymptotic behaviour of quartet-based summary methods for species tree inference. J. Math. Biol. 85(3), 1-22.
Mahmoudi, A., Koskela, J., Kelleher, J., Chan, Y., and Balding, D. J. (2022). Bayesian inference of ancestral recombination graphs. PLoS Comp. Biol. 18(3), e1009960.
Feng, Q., Tiedje, K., Ruybal-Pesántez, S., Tonkin-Hill, G. Q., Duffy, M., Day, K. P., Shim, H., and Chan, Y. (2022). An accurate method for identifying recombinants from unaligned sequences. Bioinformatics 38(7), 1823-1829.
Li, Q., Scornavacca, C., Galtier, N., and Chan, Y. (2021). The multilocus multispecies coalescent: a flexible new model of gene family evolution. Syst. Biol. 70(4), 822-837.
Tonkin-Hill, G. Q., ..., Chan, Y., and Day, K. P. (2021). Evolutionary analyses of the major variant surface antigen-encoding genes reveal population structure of Plasmodium falciparum within and between continents. PLoS Genet. 17(2), e1009269.
Chan, Y. and Robin, C. (2019). Reconciliation of a gene network and species tree. J. Theor. Biol., 472, 54-66.
Chan, Y. and Rechnitzer, A. (2018). Upper bounds on the growth rates of independent sets in two dimensions via corner transfer matrices. Linear Algebra Appl., 555, 139-156.
Burt, C. N., Caccetta, L., and Chan, Y. (2018). Utilisation-based equipment selection. In Burt, C. N. and Caccetta, L. (eds.), Equipment selection for mining: with case studies (pp. 115-143), Studies in Systems, Decision and Control 150. Springer, Cham.
Burt, C. N. and Chan, Y. (2018). Accurate costing of mining equipment. In Burt, C. N. and Caccetta, L. (eds.), Equipment selection for mining: with case studies (pp. 145-152), Studies in Systems, Decision and Control 150. Springer, Cham.
Chan, Y., Ranwez, V., and Scornavacca, C. (2017). Inferring incomplete lineage sorting, duplications, transfers and losses with reconciliations. J. Theor. Biol., 432, 1-13.
Bernard, G., Chan, C. X., Chan, Y., Chua, X.-Y., Cong, Y., Hogan, J. M., Maetschke, S. R., and Ragan, M. A. (2017). Alignment-free inference of hierarchical and reticulate phylogenomic relationships. Brief. Bioinform. 20(2), 426-435.
Cong, Y., Chan, Y., Phillips, C. A., Langston, M. A., and Ragan, M. A. (2017). Robust inference of genetic exchange communities from microbial genomes using TF-IDF. Front. Microbio. 8, 21.
Cong, Y., Chan, Y., and Ragan, M. A. (2016). Exploring lateral genetic transfer among microbial genomes using TF-IDF. Sci. Rep. 6, 29319.
Cong, Y., Chan, Y., and Ragan, M. A. (2016). A novel alignment-free method for detection of lateral genetic transfer based on TF-IDF. Sci. Rep. 6, 30308.
Chan, Y. (2015). Upper bounds on the growth rates of hard squares and related models via corner transfer matrices. In Disc. Math. and Theoret. Comp. Sci. – 27th International Conference on Formal Power Series and Algebraic Combinatorics (FPSAC 2015) (pp. 793-804).
Chan, Y., Ranwez, V., and Scornavacca, C. (2014). Exploring the space of gene/species reconciliations with transfers. J. Math. Biol. 71(5), 1179-1209.
Chan, Y. and Rechnitzer, A. (2014). Accurate lower bounds on 2-D constraint capacities from corner transfer matrices. IEEE Trans. Info. Theory 60(7), 3845-3858.
Chan, Y., Ranwez, V., and Scornavacca, C. (2013). Reconciliation-based detection of co-evolving gene families. BMC Bioinformatics 14, 332-340.
Chan, Y., Marckert, J.-F., and Selig, T. (2013). A natural stochastic extension of the sandpile model on a graph. J. Comb. Theor. A 120(7), 1913-1928.
Chan, Y. (2013). Series expansions from the corner transfer matrix renormalization group method: II. Asymmetry and high-density hard squares. J. Phys. A: Math. Theor. 46, 125009.
Chan, Y. and Rechnitzer, A. (2012). A Monte Carlo study of non-trapped self-avoiding walks. J. Phys. A: Math. Theor. 45, 405004.
Chan, Y. (2012). Series expansions from the corner transfer matrix renormalization group method: the hard squares model. J. Phys. A: Math. Theor. 45, 085001.
Chan, Y., Guttmann, A. J., Nickel, B. G., and Perk, J. H. H. (2011). The Ising susceptibility scaling function. J. Stat. Phys. 145, 549-590.
Chan, Y. and Hall, P. (2010). Using evidence of mixed populations to select variables for clustering very high dimensional data. J. Am. Stat. Assoc. 105(490), 798-809.
Burt, C., Chan, Y., and Sonenberg, N. (2009). Exact models for the k-connected minimum energy problem. In J. Zheng et al. (eds.), International Conference on Ad Hoc Networks (pp. 392-406), LNICST 28. Springer, Berlin, Heidelberg.
Chan, Y. and Hall, P. (2009). Robust nearest-neighbour methods for classifying high-dimensional data. Ann. Stat. 37(6A), 3186-3203.
Burt, C. and Chan, Y. (2009). Accurate costing in mixed integer utilisation mining models. In R. Braddock et al. (eds.), 18th IMACS World Congress - MODSIM09 International Congress on Modelling and Simulation (pp. 204-210). Modelling and Simulation Society of Australia and New Zealand.
Chan, Y. and Hall, P. (2009). Scale adjustments for classifiers in high-dimensional, low sample size settings. Biometrika 96(2), 469-478.
Chan, Y., Owczarek, A. L., Rechnitzer, A., and Slade, G. (2007). Mean unknotting times of random knots and embeddings. J. Stat. Mech.: Theory and Experiment 2007(05), P05004.
Chan, Y. (2005). Selected problems in lattice statistical mechanics. PhD Thesis, The University of Melbourne.
Chan, Y. and Guttmann, A. J. (2003). Some results for directed lattice walkers in a strip. Disc. Math and Theoret. Comp. Sci. (AC), 27-38.
Research supervision
I am currently supervising or have supervised the following postdocs:
Zhendong Huang. 2022 - present. (Co-supervised with David Balding.)
Angad Johar. 2021 - 2022. (Co-supervised with David Balding.)
I am currently supervising or have supervised the following students in their PhD research projects (at the University of Melbourne unless otherwise stated).
Wanting He. Species tree inference with dependent gene trees, 2023 - present (co-supervised with Celine Scornavacca).
Qian Feng, Analysing malaria var genes with hidden Markov models, 2017 - present (co-supervised with Heejung Shim).
Fatemeh Rezaei (MPhil). Inference of genealogical history with multiple populations, 2021 - 2024 (co-supervised with David Balding).
Qiuyi Li, A unified model of gene family evolution, 2018 - 2023 (co-supervised with Celine Scornavacca).
Ali Mahmoudi, Inference under the coalescent with recombination, 2017 - 2021 (co-supervised with David Balding).
Yingnan Cong, Constructing genetic exchange communities among bacteria and archaea, The University of Queensland, 2014 - 2016 (co-supervised with Mark Ragan).
I am also supervising or have supervised the following students in their Masters research projects at the University of Melbourne (unless otherwise stated).
Zhuochen Jiang. 2024 - present.
Luzhi Liang. 2024 - present.
Danielle Jayanthy. Bayesian phylogenetic inference with Markovian binary trees. 2023 - present (co-supervised with Sophie Hautphenne).
Wanting He. Species tree inference under dependency. 2021 -2022.
Mingqi He, Markovian binary trees in phylogenetics. 2020 - 2021 (co-supervised with Sophie Hautphenne).
Chenxing Hao. Phylogenetic networks. 2021.
Xiangyu Li, Species tree estimation under dependency. 2020-2021.
Wenzhou Zhang, Mixing parsimony and probabilistic approaches to reconcile gene trees with species trees. 2020 - 2021.
Mingqi He (vacation scholar), Markovian binary trees in phylogenetics. 2019 - 2020 (co-supervised with Sophie Hautphenne).
Sonam Rani, Adversarial attacks on neural network policies. 2018 - 2019.
Manu Yadav, Assessing reconstructability of ancestral character states using different phylogenetic methods. 2018 - 2019.
Qian Zhang, The estimation of species trees using Astral. 2018 - 2019.
Shuai Zhang, Introduction and modifications to an alignment-free method for detection of lateral gene transfer. 2017 - 2018.
Lei Yu, Discrete methods for constructing ancestral recombination graphs. 2017 - 2018.
Rachit Parasher, A new rain rule for one day international cricket. 2017 - 2018.
Ziyi Cong, Confidence levels for lateral genetic transfer detection with TF-IDF. 2016 - 2017.
Qiuyi Li (AMSI Vacation scholar). 2016 - 2017 (co-supervised with Nathan Clisby).
Miao Liu, An exploration of motorcycle race data via cointegration of multivariate time series. 2016.
Series data
For those of you who are interested (and for my own reference), here are some series I have generated using corner transfer matrix methods. You can also find them in the relevant papers.
Hard squares model
Ising model
These series are in variables which are functions of K = J/{k_B T}. They are in Maple format.