Software
We develop software for population genetics, including tools for simulation, inference, and infrastructure.
tskit
We’re core members of the community building tools for working with tree sequences, an exciting and highly efficient format for storing and working with genome sequences as well as the underlying genealogical trees. All of this work is led by Jerome Kelleher at Oxford. See tskit.dev for more about the community and the tools; in particular, see:
- tskit: manipulate, plot, and analyse tree sequences
- pyslim: interface with SLiM’s tree sequences
- msprime: a coalescent simulator
stdpopsim
We are also core members of the PopSim Consortium. The main product (so far) of the consortium is stdpopsim, which aims to make it easier for you to run reproducible, bug-free simulations of genetic datasets from published demographic histories, genetic maps, and more.
- find the code on github: github.com/popsim-consortium/stdpopsim
- and the documentation: stdpopsim.readthedocs.io/en/latest/introduction.html
locator
locator is a supervised machine learning method for predicting the geographic origin of a sample from genotype or sequencing data.
- find it on github: github.com/kr-colab/locator
ReLERNN
ReLERNN uses deep learning to infer the genome-wide landscape of recombination from as few as four individually sequenced chromosomes, or from allele frequencies inferred by pooled sequencing.
- find it on github: github.com/kr-colab/ReLERNN
disperseNN2
A deep learning toolset for estimating the the mean per-generation dispersal distance from georeferenced SNPs.
- find it on github: github.com/kr-colab/disperseNN
popvae
popVAE fits a variational autoencoder (VAE) to a set of genotypes and outputs the latent space. A manuscript describing popVAE’s methods and testing it on several empirical datasets is published in G3 (link).
- find it on github: github.com/kr-colab/popvae
lostruct
lostruct is a method that describes how relatedness varies along the genome, by doing PCA on windows and visualizing similarities in the results. The most important use for it might be to look for segregating inversions.
- find it on github: github.com/petrelharp/local_pca
diploS/HIC
diploS/HIC uses a deep convolutional neural network to identify hard and soft selective sweep in population genomic data.
- find it on github: github.com/kr-colab/diploSHIC
FILET
FILET (Finding Introgressed Loci using Extra-Trees)
- find it on github: github.com/kr-colab/FILET
discoal
discoal is a coalescent simulation program capable of simulating models with recombination, selective sweeps, and demographic changes including population splits and admixture events. Its functionality has been mostly included into msprime, which is much faster.
- find it on github: github.com/kr-colab/discoal
shIC
The Soft/Hard Inference through Classification tool (S/HIC). Superceded by diploS/HIC.
- find it on github: github.com/kr-colab/shIC
templater
A tool for writing and compiling reports in R+markdown.
- find it on github: github.com/petrelharp/templater
landsim
An R package for simulating populations on landscapes. (Out of date: use SLiM!)
- find it on github: https://github.com/petrelharp/landsim