{ "cells": [ { "cell_type": "markdown", "id": "224b2831", "metadata": {}, "source": [ "# Genetic diversity, with trees" ] }, { "cell_type": "markdown", "id": "f7376884", "metadata": {}, "source": [ "Before we measured genetic diversity\n", "using expected heterozygosity,\n", "which is the proportion of sites that differ between two randomly chosen genomes:\n", "for a genome of length $L$, with allele frequency $p_i$ at the $i^\\text{th}$ site, this is:\n", "\n", "$$\n", " \\pi = \\frac{1}{L}\\sum_{i=1}^L 2 p_i (1-p_i) .\n", "$$\n", "\n", "We'll now derive this in a different way.\n", "First, let's think about where the differences between trees come from.\n", "As we saw before, they come from mutation, somewhere -\n", "concretely, they come from mutations that happened\n", "somewhere on the path from the two genomes\n", "back up to their common ancestor.\n", "(If there weren't any mutations, then they'd be identical;\n", "if there's only one mutation, then they differ,\n", "and if there was more than one mutation then it depends,\n", "but this is rare and we mostly ignore it.)\n", "\n", "Let's have a look at this in a small example." ] }, { "cell_type": "code", "execution_count": 1, "id": "1c8d84e1", "metadata": {}, "outputs": [], "source": [ "%load_ext slim_magic\n", "\n", "import tskit, pyslim\n", "import pandas as pd\n", "import numpy as np\n", "from matplotlib import pyplot as plt\n", "from IPython.display import display, SVG" ] }, { "cell_type": "code", "execution_count": 2, "id": "554183e0", "metadata": {}, "outputs": [], "source": [ "%%slim_ts --out ts\n", "initialize()\n", "{\n", " setSeed(123);\n", " initializeTreeSeq();\n", " initializeMutationRate(7e-8);\n", " initializeMutationType(\"m1\", 0.5, \"f\", 0.0);\n", " initializeGenomicElementType(\"g1\", c(m1), c(1.0));\n", " initializeGenomicElement(g1, 0, 99999);\n", " initializeRecombinationRate(1e-8);\n", " suppressWarnings(T);\n", "}\n", "\n", "1 {\n", " sim.addSubpop(\"p1\", 500);\n", "}\n", "\n", "3000 late() {\n", " sim.treeSeqOutput(\"tmp.trees\");\n", " sim.simulationFinished();\n", "}" ] }, { "cell_type": "markdown", "id": "5a56a644", "metadata": {}, "source": [ "What we get here is a *tree sequence*:\n", "see [this tutorial](https://tskit.dev/tutorials/intro.html) for an introduction,\n", "and [the documentation](https://tskit.dev/tskit/docs/stable/introduction.html)\n", "for what you can do with them." ] }, { "cell_type": "code", "execution_count": 3, "id": "719c6633", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n",
" | \n",
" |
---|---|
Trees | 15 |
Sequence Length | 100000.0 |
Time Units | generations |
Sample Nodes | 1000 |
Total Size | 208.9 KiB |
Metadata | \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" dict\n", " \n", "\n",
" SLiM:\n",
" \n",
" \n",
" \n",
" dict\n", " file_version: 0.7generation: 3000 model_type: WF nucleotide_based: False separate_sexes: False spatial_dimensionality: spatial_periodicity: stage: late \n", " \n", " |
Table | \n", "Rows | \n", "Size | \n", "Has Metadata | \n", "
---|---|---|---|
Edges | \n", "1855 | \n", "58.0 KiB | \n", "\n", " \n", " | \n", "
Individuals | \n", "500 | \n", "50.6 KiB | \n", "\n", " ✅\n", " | \n", "
Migrations | \n", "0 | \n", "8 Bytes | \n", "\n", " \n", " | \n", "
Mutations | \n", "129 | \n", "8.5 KiB | \n", "\n", " ✅\n", " | \n", "
Nodes | \n", "1818 | \n", "68.1 KiB | \n", "\n", " ✅\n", " | \n", "
Populations | \n", "2 | \n", "2.4 KiB | \n", "\n", " ✅\n", " | \n", "
Provenances | \n", "1 | \n", "2.1 KiB | \n", "\n", " \n", " | \n", "
Sites | \n", "129 | \n", "3.0 KiB | \n", "\n", " \n", " | \n", "
\n",
" | \n",
" |
---|---|
Trees | 6 |
Sequence Length | 100000.0 |
Time Units | generations |
Sample Nodes | 6 |
Total Size | 112.7 KiB |
Metadata | \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" dict\n", " \n", "\n",
" SLiM:\n",
" \n",
" \n",
" \n",
" dict\n", " file_version: 0.7generation: 3000 model_type: WF nucleotide_based: False separate_sexes: False spatial_dimensionality: spatial_periodicity: stage: late \n", " \n", " |
Table | \n", "Rows | \n", "Size | \n", "Has Metadata | \n", "
---|---|---|---|
Edges | \n", "25 | \n", "808 Bytes | \n", "\n", " \n", " | \n", "
Individuals | \n", "3 | \n", "2.1 KiB | \n", "\n", " ✅\n", " | \n", "
Migrations | \n", "0 | \n", "8 Bytes | \n", "\n", " \n", " | \n", "
Mutations | \n", "39 | \n", "3.3 KiB | \n", "\n", " ✅\n", " | \n", "
Nodes | \n", "14 | \n", "1.2 KiB | \n", "\n", " ✅\n", " | \n", "
Populations | \n", "1 | \n", "2.3 KiB | \n", "\n", " ✅\n", " | \n", "
Provenances | \n", "2 | \n", "2.5 KiB | \n", "\n", " \n", " | \n", "
Sites | \n", "39 | \n", "991 Bytes | \n", "\n", " \n", " | \n", "