Integrating traits in polymorphism-aware trees to better model speciation


Efforts to understand the speciation history of taxa have been hampered by incongruity among phylogenetic trees from different genomic regions. Three different biological processes can cause incongruence: horizontal gene transfer, gene duplication and loss, and incomplete lineage sorting (ILS). Horizontal gene transfer has a major role in bacterial evolution, and gene duplication and losses are common throughout the entire tree of life. ILS has received considerable attention from a theoretical point of view (Degnan & Rosenberg, 2006). It occurs when genes coalesce not in extant species, but in the ancestral populations that gave rise to them. As a result, some genes from a species may cluster with sequences from a sister species rather than their own. By developing a new method this project will combine recent powerful polymorphism-aware phylogenetic models with trait evolution models in a total evidence approach bringing the study of speciation to a new level.

In the Kosiol group, we have developed an approached called Polymorphism-aware phylogenetic Models (PoMo), which is based on allele frequencies and so overcomes these limitations. Standard phylogenetic models treat substitutions as instantaneous events but PoMo describes them as a process: substitutions start as mutations to new, low-frequency alleles, then experience a series of changes in allele frequency. The changes of allele frequencies are modelled by a continuous-time Markov chain based on DNA models (introduction of variation due to mutations) and the continuous Moran model (removal of variation due to genetic drift and natural selection). So far, this approach has greatly improved the estimation of species trees (De Maio et al., 2015, Rogers et al., 2019), and the detection of allelic selection (Borges et al., 2019). While this approach substantially enhanced the tools available to study speciation, a major gap remains in that traits – which are key to many speciation events (Servedio et al., 2011), were not implemented in these models.

In this PhD project, the student will fill this gap by extending the PoMo approach to trait evolution, in order to build more realistic models of speciation (Figure 1, below).

Specifically, the project will develop the PoMo approach in a Bayesian framework with the following objectives:

Objective 1: Integrate trait evolution into the model, so the method can be used to study genotype and binary phenotype data in a unified analysis. We will use self-incompatibility (SI) in plant as an application. In particular, SI in Brassicaceae species (Park et al. 2010) is well understood and will be used as a proof of principle.

Objective 2: Expand the approach from binary traits to multiple discrete states. We will work on examples of mutualism of plant and pollinators with the Chomicki group using large unpublished data on orchid pollination and new phylogenomic datasets. For example, this will allow the study of pollination syndromes, suites of floral traits that evolve in a concerted fashion due to the preferences of pollen vectors either biotic (birds, bees or flies) or abiotic (wind and water). Because pollinator shifts may be associated with speciation events (e.g. Johnson, 2006), they should be analysed with a model that incorporates selection on traits.

Objective 3: Tackle time-series data of gene expression (RNA-Seq) as continuous function-valued traits in the context of species trees.
The pollinator syndrome is not only characterized by the pollen vector itself. Traits include flower shape, size, colour, odour, reward type and amount, nectar composition, timing of flowering. Many of them are developmental, and can be considered as continuous trait. We will focus on a sizeable clade of plants in the family Plantaginaceae, which includes the snapdragon (Antirrhinum majus) model, and in which transitions in floral morphology and associated shifts in pollination syndrome are well understood from a genetic standpoint (Preston et al., 2011). Herbaceous species in this group can easily be cultivated and will provide a tractable system for our objective. Gene expression data in the form of RNA-Seq can measure specific phases of the development resulting in function-valued traits (i.e. we consider gene expression data to be a phenotype).

Ultimately, the outcome will be an implementation of all three objectives in one novel software package which will be of tremendous value to researchers in the field of evolution and ecology.

Click on an image to expand

Image Captions

Figure 1: Individual nucleotide sites evolve through point mutations. Each gene evolves according to duplication, loss and transfer events. Species and their genomes evolve according to a diversification process. When attempting to infer a species tree only a fraction of the existing species and their genomes can be sampled. The PhD project will allow the use of phenotypic/ trait data. Traits can be binary, discrete, continuous or even function valued.


The implementation of polymorphism-aware trait evolution in a Bayesian framework provides a new, flexible way to model evolutionary processes and obtain reliable estimates of biological parameters. The PhD project will couple the PoMo approach with numerical methods – such as Markov chain Monte Carlo (MCMC) – for approximating the posterior probability distribution of parameters. Bayesian inference methods can be extremely powerful and have revolutionized the range of evolutionary questions that can be tackled. In particular, the Bayesian framework allows us to integrate different types of data: the molecular sequence data and (importantly) the phenotypic/trait data. As the Bayesian implementations are anything than trivial, we will collaborate with Sebastian H (Ludwig Maximilian University Munich, Germany) and Tracy Heath (Iowa State University, US). Both are internationally known for their contributions to the RevBayes software project (H et al., 2016), and will provide training for the student in this field.

Project Timeline

Year 1

Develop new Bayesian tools for polymorphic binary traits and select plant groups as a proof of principle. Extract DNA and RNA samples and send them to a NERC sequencing centre (e.g. the NERC Environmental Omics Facility (NEOF)).

Year 2

Further develop of software to multiple discrete traits, testing of software, application to plant and pollinator data sets. Cleaning and processing of the raw RNA-Seq data.

Year 3

Integration of continuous RNA-Seq expression data in the Bayesian approach. Analysis of RNA-seq data.

Year 3.5

Integrate approach into user-friendly environment and write up thesis.

& Skills

The student will receive training in (1) development and programming of Bayesian approaches for phylogenomics; (2) analysis of genome-wide next generation sequencing data sets; (3) plant trait database building.

References & further reading

Borges, R., …’si, G. J. & Kosiol, C. (2019). Quantifying GC-Biased Gene Conversion in Great Ape Genomes Using Polymorphism-Aware Models. Genetics 212:1321-1336.

Degnan, J.H. & Rosenberg, N.A. (2006). Discordance of species trees with their most likely gene trees. PLoS Genetics 2: e68.

De Maio, N., Schrempf, D. & Kosiol, C. (2015). PoMo: An allele frequency-based approach for species tree estimation. Systematic Biology 64: 1018 -1031.

S., Landis, M.J., Heath, T.A., Boussau, B., Lartillot, N., Moore, B.R., Huelsenbeck, J.P. & Ronquist, F. (2016). RevBayes: Bayesian phylogenetic inference using graphical models and an interactive model-specification language. Systematic Biology 65: 726-736.

Johnson, S.D. (2006). Pollinator-driven speciation in plants (pp. 295-310). Ecology and evolution of flowers. Oxford: Oxford University Press.

Park, S. et al., (2010). Genome-wide discovery of DNA polymorphism in Brassica rapa. Molecular Genetics and Genomics 283: 135-145.

Preston, J.C., Martinez, C.C. & Hileman, L.C., 2011. Gradual disintegration of the floral symmetry gene network is implicated in the evolution of a wind-pollination syndrome. Proceedings of the National Academy of Sciences of the USA 108: 2343-2348.
Rogers, J., Raveendran, M., Harris, R.A., Mailund, T., Lepp, K., Athanasiadis, G., Schierup, M.H., Cheng, J., Munch, K., Walker, J.A. and Konkel, M.K. (2019). The comparative genomics and complex population history of Papio baboons. Science Advances 5: eaau6947.

Servedio, M.R., Van Doorn, G.S., Kopp, M., Frame, A.M. & Nosil, P., 2011. Magic traits in speciation magic but not rare?. Trends in Ecology & Evolution 26: 389-397.

Further Information


Apply Now