Seminar 31/5: Sequential Monte Carlo Mini-Symposium

posted May 22, 2017, 2:12 AM by Allison Hsiang

On Wednesday, May 31, we will be hosting a mini-symposium on Sequential Monte Carlo (SMC) at the museum. We will have two speakers, Fredrik Lindsten and Lawrence Murray, both from the Department of Information Technology at Uppsala University. The talks will take place in the morning, followed by lunch and afternoon discussion for all interested parties.


Title: Divide-and-Conquer Sequential Monte Carlo for Inference in Probabilistic Graphical Models
Speaker: Fredrik Lindsten
Date: May 31, 2017
Time: 10:00-11:00
Place: Naturhistoriska riksmuseet, Vintergatan conference room, 5th floor (near Cosmonova)

Abstract: Probabilistic graphical models (PGMs) are widely used to represent and to reason about underlying structure in high-dimensional probability distributions. We develop a framework for using sequential Monte Carlo
(SMC) methods for inference and learning in general PGMs. Structural information from the PGM is used to decompose the graph into a collection of subgraphs that can be organized in a tree. Based on this we develop a
new class of SMC samplers, Divide-and-Conquer SMC, for performing inference over the tree. We will see how this method extends the standard chain-based SMC framework to a method that naturally runs on trees. We
illustrate empirically that these approaches can outperform standard methods in terms of estimation accuracy. They also open up novel parallel implementation options and the possibility of concentrating the computational effort on the most challenging parts of the problem at hand.


Title: Software and Sequential Monte Carlo
Speaker: Lawrence Murray
Date: May 31, 2017
Time: 11:00-12:00
Place: Naturhistoriska riksmuseet, Vintergatan conference room, 5th floor (near Cosmonova)

Abstract: I will give a brief introduction to two software projects: LibBi and Birch. LibBi is used for state-space modelling on parallel and distributed computing hardware, such as multicore CPUs, GPUs and clusters, using Sequential Monte Carlo (SMC) and particle Markov chain Monte Carlo (PMCMC) methods. It is particularly effective for models with complex nonlinear dynamics, including continuous-time dynamics. Birch is its newer incarnation, currently in development. The idea of Birch is to broaden the class of models and inference methods that can be supported, beyond state-space models and SMC, moving from a model specification language to a general-purpose probabilistic programming language. I will focus in particular on the ability of Birch to solve analytically-tractable substructure in complex models.


Afternoon Discussion Session
Date: May 31, 2017
Time: 13:00-16:00
Place: Naturhistoriska riksmuseet, Sirius conference room, 5th floor (near Cosmonova)

Seminar 12/4: Gene transfers, like fossils, can date the tree of life (G. Szöllösi)

posted Apr 7, 2017, 2:51 AM by Johan Nylander

Welcome to the following seminar.

Time and location: April 12, 15:00, in the Pascal room, Gamma 6,  Science for Life Laboratory.

Speaker: Gergely J. Szöllösi, Eötvös University, Hungary

Gene transfers, like fossils, can date the tree of life

The geological record provides the only source of absolute time information to date the tree of life. But most life is microbial, and most microbes do not fossilize, leading to major uncertainties about the ages of microbial groups and the timing of some of the earliest and most important events in life's evolutionary history. I discuss our recent results, which show that patterns of lateral gene transfer deduced from analysis of modern genomes encode a novel and highly informative source of information about the temporal coexistence of lineages throughout the history of life. We use new phylogenetic methods to reconstruct the history of thousands of gene families and show that dates implied by gene transfers are strongly correlated with estimates from relaxed molecular clocks in Bacteria, Archaea and Eukaryotes. A comparison with mammalian fossils shows that gene transfer in microbes is potentially as informative for dating the tree of life as the geological record in macroorganisms.

Seminar 23/2: Michael Landis

posted Feb 14, 2017, 7:08 AM by Allison Hsiang   [ updated Feb 14, 2017, 7:23 AM ]

Please join us for the next SPG seminar, which will be given by Michael Landis. Michael is a Donnelly Postdoctoral Fellow in the Department of Ecology and Evolutionary Biology at Yale University with Michael Donoghue.

Title: Inference of phylogenetic biogeography using models of range evolution
Date: February 23, 2017
Time: 15:00-16:00
Place: KÖL Lunch Room (Frescativägen 54)

Abstract: The spatial distribution of modern biodiversity was generated by evolutionary processes acting over deep time. Because these historical processes cannot be observed directly, phylogenetic models of biogeographic evolution have been employed to reconstruct ancestral species ranges. This family of models, however, is young and the limits of their usefulness and scalability are still being explored. First, to their usefulness, I will discuss how biogeographic processes may be used to date speciation times when conditioning on dated empirical models of paleogeography. Second, to their scalability, I will describe how the unusual features of biogeographic character evolution complicate the use of standard phylogenetic marginalization techniques--such as the pruning algorithm--and demonstrate how Markov chain Monte Carlo may be configured to numerically marginalize over the space of complex evolutionary histories.

Seminar 9/2: Chi Zhang

posted Feb 2, 2017, 5:52 AM by Allison Hsiang

Our first 2017 SPG seminar will be given by Chi Zhang, former postdoc with Fredrik Ronquist and current postdoc with Tanja Stadler at D-BSSE, ETH Zürich.

All are invited and encouraged to attend!

Title:  Bayesian inference of species networks from multilocus data
Date:  February 9, 2017
Time:  15:00-16:00
Place:  KÖL Lunch Room (Frescativägen 54)

Abstract:  Hybridization plays an important role during speciation in certain animals and plants. The process leaves “fingerprints” in the genomic data, yet there is a lack of models and tools for analyzing such data.  We developed a Bayesian method to infer species networks from multilocus sequence data. The method applied the multispecies network coalescent (MSNC) model for the gene trees embedded in the species network, which accounts for gene tree discordance due to incomplete lineage sorting and reticulate species evolution such as hybridization or introgression. We also applied novel MCMC operators to sample the species networks and gene trees along with other parameters.
The method will be available as package for BEAST2.

PhD Defence: Probabilistic Modelling of Domain and Gene Evolution

posted Sep 22, 2016, 2:16 AM by Johan Nylander

Sayyed Auwn Muhammad's PhD defense, Monday, 26 September, in Air room at Scilifelab

Title: Probabilistic Modelling of Domain and Gene Evolution

Opponent: Bastien Boussau, Biometry and Evolutionary Biology Lab, CNRS, Lyon, France

Location: Conference room Air (Gamma building) SciLifeLab, Tomtebodavägen 23A, Solna

Starting Time: 9 AM

Date: 2016-09-26

Jens Lagergren

Course: Phylogenetics, Phylogenomics, Citizen science and Entrepreneurship

posted Sep 19, 2016, 12:27 AM by Johan Nylander

BIG 4 Fall Workshop 2016

Phylogenetics, Phylogenomics, Citizen science and Entrepreneurship

Dates: Mon Oct 10 - Tue Oct 18, 2016

Venue: Tovetorp research station (


The BIG4 fall workshop 2016 has two main themes. One theme is focused on phylogenetics, phylogenomics and bioinformatics, and is intended to provide students with an introduction to these topics, focused on skills that will be useful in their PhD research projects. There will be a mixture of lectures and practicals; students are encouraged to bring their own datasets and analysis problems. The other theme is citizen science and entrepreneurship, and this part of the course will provide broad introductions, exercises, and inspiration for students interested in pursuing these subjects as part of their current or future activities.

The workshop is primarily aimed at BIG 4 students but (except for the Friday morning session) is open to outside participants as well. We have room for up to 10 additional outside participants on a first come first served basis. Outside participants may choose to come for only part of the course, if they wish.

Course registration and fee

Register for the course no later than September 28 using the form at this web site: student participation will be confirmed on September 29. If you cancel past that date, we may very well have to charge you the course fee anyway.

There will be a fee for course, accommodation and food. We expect the total cost to be approximately 2,000 SEK/day for students participating for part of the course, and approximately 15,000 SEK in total for those students taking the entire workshop. Students will receive an invoice after the course is finished.

Schedule Overview

Monday October 10
Fredrik Ronquist. Introduction to statistical inference. Bayesian phylogenetic inference and Markov chain Monte Carlo simulation.
Fredrik Ronquist. Stochastic models of evolution. Introduction to MrBayes. Exercises using MrBayes.

Tuesday October 11
Fredrik Ronquist. Divergence time estimation: node dating and total-evidence dating. Statistical biogeography. Model testing, model averaging and other special topics.
Fredrik Ronquist. Phylogenetic graphical models. Introduction to RevBayes. Exercises using MrBayes and RevBayes.

Wednesday October 12
Erik Gobbo. De novo genome sequencing: next generation sequencing and genome assembly.
Chris Wheat. Gene finding, genome annotation and phylogenomic inference.
Chris Wheat, Erik Gobbo. Genome assembly, gene finding, genome annotation and phylogenomics exercises. Questions, discussion.

Thursday October 13
Rodrigo Esparza-Salas. Introduction to metabarcoding.
Daniel Marquina. Metabarcoding: special topics.
Rodrigo Esparza-Salas, Daniel Marquina. Metabarcoding exercises.

Friday October 14
BIG 4 Midterm Review [Only BIG 4 participants].
BIG 4 Midterm Review [Only BIG 4 participants].

Saturday October 15
Johan Nylander. Handling bioinformatics data in a unix environment. Bioinformatics exercises. Bring your own data and questions!

Sunday October 16
Free day for social activities. Food will be served as usual.

Monday October 17
Inga von Sydow. Why you should become an entrepreneur.
Inga von Sydow, Lyubomir Penev, Eduardo Pareja, Karin Carlsson. Round table discussion on entrepreneurship in biosystematics, informatics and genetics.
Karin Karlsson (Savantic AB). Training session: Steps in the entrepreneurship process.

Tuesday October 18
Karoly Makonyi. A personal view on citizen science.
Karin Carlsson. Gamification and viral effects: a case study.
Jakob Jönsson. Citizen science in cosmology.
Devin Sullivan. [Title TBA].
Erik Thorelli. Bionote - A citizen science platform for logging and sharing species sightings
Miroslav Valan. A brief history of citizen science and its potential in the era of big data. 
Miroslav Valan, Karoly Makonyi. Roundtable discussion on citizen

PhD defence, Raja Hashim Ali, 25 Feb

posted Feb 24, 2016, 4:39 AM by Johan Nylander   [ updated Feb 24, 2016, 4:40 AM ]


Raja Hashim Ali will defend his thesis on Thursday 25 Feb at 14:00, in room “Fire”, bottom floor of the Gamma building at Science for Life Laboratory.

The title of the thesis is "From genomes to post-processing of Bayesian inference of phylogeny” and Dannie Durand from CMU is the opponent.



Life is extremely complex and amazingly diverse; it has taken billions of years of evolution to attain the level of complexity we observe in nature now and ranges from single-celled prokaryotes to multi-cellular human beings. With availability of molecular sequence data, algorithms inferring homology and gene families have emerged and similarity in gene content between two genes has been the major signal utilized for homology inference. Recently there has been a significant rise in number of species with fully sequenced genome, which provides an opportunity to investigate and infer homologs with greater accuracy and in a more informed way. Phylogeny analysis explains the relationship between member genes of a gene family in a simple, graphical and plausible way using a tree representation. Bayesian phylogenetic inference is a probabilistic method used to infer gene phylogenies and posteriors of other evolutionary parameters. Markov chain Monte Carlo (MCMC) algorithm, in particular using Metropolis-Hastings sampling scheme, is the most commonly employed algorithm to determine evolutionary history of genes. There are many softwares available that process results from each MCMC run, and explore the parameter posterior but there is a need for interactive software that can analyse both discrete and real-valued parameters, and which has convergence assessment and burnin estimation diagnostics specifically designed for Bayesian phylogenetic inference.

In this thesis, a synteny-aware approach for gene homology inference, called GenFamClust (GFC), is proposed that uses gene content and gene order conservation to infer homology. The feature which distinguishes GFC from earlier homology inference methods is that local synteny has been combined with gene similarity to infer homologs, without inferring homologous regions. GFC was validated for accuracy on a simulated dataset. Gene families were computed by applying clustering algorithms on homologs inferred from GFC, and compared for accuracy, dependence and similarity with gene families inferred from other popular gene family inference methods on a eukaryotic dataset. Gene families in fungi obtained from GFC were evaluated against pillars from Yeast Gene Order Browser. Genome-wide gene families for some eukaryotic species are computed using this approach.

Another topic focused in this thesis is the processing of MCMC traces for Bayesian phylogenetics inference. We introduce a new software VMCMC which simplifies post-processing of MCMC traces. VMCMC can be used both as a GUI-based application and as a convenient command-line tool. VMCMC supports interactive exploration, is suitable for automated pipelines and can handle both real-valued and discrete parameters observed in a MCMC trace. We propose and implement joint burnin estimators that are specifically applicable to Bayesian phylogenetics inference. These methods have been compared for similarity with some other popular convergence diagnostics. We show that Bayesian phylogenetic inference and VMCMC can be applied to infer valuable evolutionary information for a biological case – the evolutionary history of FERM domain.

Seminar 25/11: Understanding the History of Life Using Morphology and Fossils: New Computational Approaches

posted Nov 19, 2015, 8:06 AM by Johan Nylander

Seminar at the Swedish Museum of Natural History on Wednesday November 25, 16:15–17:00 in the lunch/seminar room of the KÖL building, level 2:

Understanding the History of Life Using Morphology and Fossils: New Computational Approaches

Allison Hsiang

Postdoctoral Associate, Yale University

Abstract. Fossils represent a unique and indispensable source of data for studying macroevolutionary processes and dynamics, as they provide us with direct glimpses of how life actually evolved on Earth. Although stratigraphic data from fossils have been used extensively in divergence time calibration, morphology remains the primary type of data that can be extracted from fossils for use in phylogenetic inference. In general, however, molecular sequence data is more widely used and trusted for phylogenetic analyses (e.g., 106,255 vs. 33,473 publications matching the topic of “molecular phylogeny” vs. “morphological phylogeny” on the Web of Science between 2010-2015). This is due both to the relative ease of generating large amounts of genetic data as well as concerns about subjectivity and the acquisition of morphological data. As a result, when phylogenetic incongruence arises between morphological/paleontological and molecular datasets, the latter is often viewed as more robust, even when the divergences being inferred occurred in deep time. Molecular data, however, are not infallible, and the meaningful influence of fossil data in phylogeny estimation and comparative analyses is well established. Therefore, efforts directed towards improving the methodological process of generating and analyzing morphological data are a priority.

In this presentation, I will discuss computational approaches that illustrate that: 1) systematic biases and misleading signal may have a profound effect on molecular phylogenetic analyses; 2) the inclusion of phenomic-scale datasets in combined analyses can affect phylogenetic inference and comparative methods, even when morphological characters are vastly outnumbered; and 3) morphological data extraction can potentially be automated and scaled up effectively and efficiently. To demonstrate these points, I use three case studies, respectively: 1) the position of turtles within the amniote tree of life; 2) the evolutionary history and origin of snakes; and 3) the evolution of shape across North Atlantic communities in planktonic foraminifera. These studies set the groundwork for future work aiming to improve computational methods for analyzing morphological and paleontological data, both in terms of data extraction and data interpretation/analysis.


Fredrik Ronquist

Molecular-Clock Dating Using MrBayes - Seminar and Workshop

posted Apr 1, 2015, 2:16 AM by Johan Nylander   [ updated Apr 10, 2015, 1:02 AM ]

Molecular-Clock Dating Using MrBayes - Seminar and Workshop

22 April - Stockholm: 09:30-11:30, Rum 540, Institutionen för ekologi, miljö och botanik (Lilla Frescati), Stockholms universitet
23 April - Uppsala: 13:15-15:15, Lärosal 4, Evolutionsbiologiskt Centrum, Uppsala universitet

Chi Zhang*, Swedish Museum of Natural History, Stockholm
Johan Nylander, BILS/Swedish Museum of Natural History, Stockholm

MrBayes - the most often used software for Bayesian phylogenetic analysis - has 
included many new features since version 3.2. In this seminar, we will 
highlight some newly implemented functionality, with focus on the 
molecular-clock dating capacities of the current version (v.3.2.4).
The seminar will consist of two parts, where following a presentation* giving 
the necessary backrground information, there will be a hands-on tutorial where 
participants are encouraged to bring their own data (and computers).

There are two approaches on dating using molecular data: node dating and 
total-evidence dating. Node dating calibrates the internal nodes of the tree by 
assigning distributions using information from external sources, such as the 
fossil record. Total-evidence dating uses the morphological data from fossil 
record and morphological and sequence data from recent organisms together to 
infer the dates. Several steps involve in Bayesian dating analysis, including 
data partitioning, node or fossil age calibration, and setting priors for the 
tree and the molecular clock model. I will describe the available calibration 
probability distributions, clock tree priors - especially the fossilized 
birth-death prior for total-evidence dating, and relaxed clock models, through 
a step-by-step tutorial of MrBayes.

The program (MrBayes v.3.2.5) is available from (alternatively

Participants in the practical part are encouraged to bring their own computers 
with the software installed from the above mentioned URL's.


SUPERSMART - talk at NRM Stockholm

posted Feb 17, 2015, 3:56 AM by Johan Nylander

Speakers: Alexandre Antonelli (University of Gothenburg),
Hannes Hettling and Rutger Vos (Naturalis Biodiversity Center,
the Netherlands)

Title: SUPERSMART: Ecology and evolution in the era of big data

Time: March 24 at 15:00-16:00

Venue: Room 525, Naturhistoriska riksmuseet, Stockholm

Host: Fredrik Ronquist

Abstract: Rapidly growing biological data volumes - including
molecular sequences and fossil records - hold an unprecedented
potential to reveal how evolutionary processes generate and
maintain biodiversity. However, most studies integrating these
data use an idiosyncratic step-by-step approach for the
reconstruction of time-calibrated phylogenies. We will present a
novel conceptual framework, termed SUPERSMART (Self-Updating
Platform for Estimating Rates of Speciation and Migration, Ages,
and Relationships of Taxa), and present our proof of concept for
dealing with the moving targets of biodiversity research. This
framework reconstructs dated phylogenies based on the assembly
of molecular and genomic datasets. The data handled for each step
are continuously updated as databases accumulate new records. We
believe that this emerging framework will provide an invaluable
tool for a wide range of hypothesis-driven research questions in
systematics and evolution. For more information please see

1-10 of 30