Software

We develop software for population genetics, including tools for simulation, inference, and infrastructure.

tskit

We’re core members of the community building tools for working with tree sequences, an exciting and highly efficient format for storing and working with genome sequences as well as the underlying genealogical trees. All of this work is led by Jerome Kelleher at Oxford. See tskit.dev for more about the community and the tools; in particular, see:

  • tskit: manipulate, plot, and analyse tree sequences
  • pyslim: interface with SLiM’s tree sequences
  • msprime: a coalescent simulator

stdpopsim

We are also core members of the PopSim Consortium. The main product (so far) of the consortium is stdpopsim, which aims to make it easier for you to run reproducible, bug-free simulations of genetic datasets from published demographic histories, genetic maps, and more.

locator

locator is a supervised machine learning method for predicting the geographic origin of a sample from genotype or sequencing data.

ReLERNN

ReLERNN uses deep learning to infer the genome-wide landscape of recombination from as few as four individually sequenced chromosomes, or from allele frequencies inferred by pooled sequencing.

disperseNN2

A deep learning toolset for estimating the the mean per-generation dispersal distance from georeferenced SNPs.

popvae

popVAE fits a variational autoencoder (VAE) to a set of genotypes and outputs the latent space. A manuscript describing popVAE’s methods and testing it on several empirical datasets is published in G3 (link).

lostruct

lostruct is a method that describes how relatedness varies along the genome, by doing PCA on windows and visualizing similarities in the results. The most important use for it might be to look for segregating inversions.

diploS/HIC

diploS/HIC uses a deep convolutional neural network to identify hard and soft selective sweep in population genomic data.

FILET

FILET (Finding Introgressed Loci using Extra-Trees)

discoal

discoal is a coalescent simulation program capable of simulating models with recombination, selective sweeps, and demographic changes including population splits and admixture events. Its functionality has been mostly included into msprime, which is much faster.

shIC

The Soft/Hard Inference through Classification tool (S/HIC). Superceded by diploS/HIC.

templater

A tool for writing and compiling reports in R+markdown.

landsim

An R package for simulating populations on landscapes. (Out of date: use SLiM!)