We develop software for population genetics, including tools for simulation, inference, and infrastructure.


We’re core members of the community building tools for working with tree sequences, an exciting and highly efficient format for storing and working with genome sequences as well as the underlying genealogical trees. All of this work is led by Jerome Kelleher at Oxford. See for more about the community and the tools; in particular, see:

  • tskit: manipulate, plot, and analyse tree sequences
  • pyslim: interface with SLiM’s tree sequences
  • msprime: a coalescent simulator


We are also core members of the PopSim Consortium. The main product (so far) of the consortium is stdpopsim, which aims to make it easier for you to run reproducible, bug-free simulations of genetic datasets from published demographic histories, genetic maps, and more.


locator is a supervised machine learning method for predicting the geographic origin of a sample from genotype or sequencing data.


ReLERNN uses deep learning to infer the genome-wide landscape of recombination from as few as four individually sequenced chromosomes, or from allele frequencies inferred by pooled sequencing.


popVAE fits a variational autoencoder (VAE) to a set of genotypes and outputs the latent space. A manuscript describing popVAE’s methods and testing it on several empirical datasets is published in G3 (link).


lostruct is a method that describes how relatedness varies along the genome, by doing PCA on windows and visualizing similarities in the results. The most important use for it might be to look for segregating inversions.


diploS/HIC uses a deep convolutional neural network to identify hard and soft selective sweep in population genomic data.


FILET (Finding Introgressed Loci using Extra-Trees)


discoal is a coalescent simulation program capable of simulating models with recombination, selective sweeps, and demographic changes including population splits and admixture events. Its functionality has been mostly included into msprime, which is much faster.


The Soft/Hard Inference through Classification tool (S/HIC). Superceded by diploS/HIC.


A tool for writing and compiling reports in R+markdown.


An R package for simulating populations on landscapes. (Out of date: use SLiM!)