Modelling speciation by integrating traits into polymorphism-aware models


Efforts to understand the speciation history of taxa have been hampered by incongruity among phylogenetic trees from different genomic regions. Different biological processes can cause incongruence: horizontal gene transfer or hybridization, gene duplication and loss, and incomplete lineage sorting (ILS). ILS has received considerable attention from a theoretical point of view (Degnan & Rosenberg, 2006). It occurs when genes coalesce not in extant species, but in the ancestral populations that gave rise to them. As a result, some genes from a species may cluster with sequences from a sister species rather than their own. This project aims to see the “wood from the trees”.

In my group, we have developed an approached called Polymorphism-aware phylogenetic Models (PoMo), which is based on allele frequencies and so overcomes these limitations. Standard models treat substitutions as instantaneous events, but PoMo describes them as a process: substitutions start as mutations to new, low-frequency alleles, then experience a series of changes in allele frequency. The changes of allele frequencies are modelled by a continuous-time Markov chain based on DNA models (introduction of variation due to mutations) and the continuous Moran model (removal of variation due to genetic drift and natural selection). In this PhD project the approach will be extended to trait evolution (see Figure 1 below)
The project will develop the PoMo approach in a Bayesian framework with the following objectives:
(i) Integrate trait evolution into the model, so the method can be used to study genotype and binary phenotype data in a unified analysis. We will use self-incompatibilities in plant as an application as these are well studied and can act as a proof of principal.
(ii) Expand the approach from binary traits to multiple discrete states. Together with Andreanna Welch’s group will work on applications to sea birds (order Procellariiformes) using traits such as flight and foraging.
(iii) Tackle time-series data of gene expression (RNA-Seq) as continuous function-valued traits in the context of species trees.


The implementation of polymorphism-aware trait evolution in a Bayesian framework provides a new, flexible way to model evolutionary processes and obtain reliable strengthen estimates of biological parameters. The PhD project will couple the approach with numerical methods––such as Markov chain Monte Carlo (MCMC)––for approximating the posterior probability distribution of parameters. Bayesian inference methods can be extremely powerful and have revolutionized the range of evolutionary questions that can be tackled. In particular, the Bayesian framework allows us to integrate different types of data: the molecular sequence data and (importantly) the phenotype/trait data. As the Bayesian implementations are anything than trivial, we will collaborate with Sebastian Höhna (Ludwig Maximilian University Munich, Germany) and Tracy Heath (Iowa State University, US). Both are internationally known for their contributions to the RevBayes software project (Höhna et al., 2016) Both collaborators will provide training for the student in this field.

Project Timeline

Year 1

Develop new Bayesian tools for polymorphic binary traits and select plant groups for proof of principle. Extraction of RNA that will be sent way for sequencing.

Year 2

Further develop of software to multiple discrete traits, testing of software, applications to seabird data sets. Cleaning and processing of the raw RNA-Seq data.

Year 3

Integration of continuous RNA-Seq expression data in the Bayesian approach. Analysis of RNA-seq data.

Year 3.5

Integrate approach into user-friendly environment and write up thesis.

& Skills

The student will receive training in (1) development and programming of Bayesian approaches for phylogenomics; (2) analysis of genome-wide data sets; (3) molecular methods for next-generation sequencing.
The student will receive training in development and programming of phylogenomics software and their genome-wide application in the Kosiol lab, Bayesian approaches (Sebastian Höhna and Tracy Heath lab) and bird traits in particular the underlying knowledge on bird morphology in the Welch lab.

References & further reading

Borges, R., Szöllősi, G. J., and Kosiol, C. (2019). Quantifying GC-Biased Gene Conversion in Great Ape Genomes Using Polymorphism-Aware Models. Genetics 212:1321-1336.

Degnan, J.H. & Rosenberg, N.A. (2006). Discordance of species trees with their most likely gene trees. PLoS genet 2: e68.

De Maio, N., Schrempf, D., and Kosiol, C. (2015). PoMo: An allele frequency-based approach for species tree estimation. Systematic Biology 64: 1018 -1031.

Estandía A, Chesser RT, James HF, Levy M, Ferrer-Obiol J, Bretagnolle V, González-Solís J, Welch AJ. Substitution rate variation in a robust Procellariiform seabird phylogeny is not solely explained by body mass, flight efficiency, population size or life history traits. Preprint published on bioRxiv at

Höhna, S., Landis, M.J., Heath, T.A., Boussau, B., Lartillot, N., Moore, B.R., Huelsenbeck, J.P. and Ronquist, F. (2016). RevBayes: Bayesian phylogenetic inference using graphical models and an interactive model-specification language. Systematic Biology 65: 726-

Further Information

This project is in competition with others for funding, and success will depend on the quality of applicants, relative to those for competing projects. Funding includes tuition fee waiver for University of St Andrews, a competitive stipend, and research support.

To express interest in applying, or for further information, please contact Dr. Carolin Kosiol at  In your email include a few sentences detailing your reasons for applying and how your experiences fit with the project.

Dr Carolin Kosiol
Greenside Place
KY16 9TH
University of St Andrews
Tel +44-(0)1334-463895

Apply Now