Probabilities of fixation
Contents
Probabilities of fixation¶
The total influx of mutations into a population scales with the population size, since with each new individual there is a chance of a new mutation occurring. Concretely, in a diploid population of size \(N\) with a genome of length \(L\) with mutation rate \(\mu\) per base per generation, there are \(2N \mu L\) new mutations that enter each generation. However, we showed that a neutral mutation has probability \(p_f(0) = 1/2N\) of reaching fixation: so, the rate of fixation of new, neutral mutations does not depend on population size – it is just \(\mu\) per base pair per generation. This makes sense, because if you consider tracing back the lineage of an individual through time, then any mutations that occur on this lineage past the population’s TMRCA will be fixations - and, this lineage only passes through one genome per generation.
But, what about natural selection? Since a beneficial allele’s frequency tends to increase, positive selection ought to increase the chance of fixation. Here, we’ll work out the probability of fixation of a newly mutated allele with additive selection coefficient \(s\).
A branching process¶
To do this, we’ll start out with a simple calculation that ignores a bunch of the details. Forget about the details of the population (especially diploidy!) and imagine a single “thing” of some kind that reproduces (by itself) and dies, and then each of its offspring does the same. The “thing” could be a viral infection (so “reproduction” is infecting someone else) or an individual of a species just arrived in a new habitat, or a newly mutated, selected allele. As the thing and its offspring reproduce, it’s family will either die out or will grow indefinitely (i.e., until something else happens, like herd immunity). Let’s call \(p_f\) the probability that the family grows indefinitely.
Suppose the original thing has \(x\) offspring. This thing’s family dies out if and only if the families of all the \(x\) offspring die out. If the fate of all those \(x\) families are independent of each other, then this happens with probability $\( (1 - p_f)^x . \)\( (Note that this works with \)x=0\(, in which case the family dies out immediately.) But, of course, the number of offspring is random: let's call \)X\( the random number of offspring. Averaging over the value of \)X$, we get that
Hey! That’s an equation that we can (maybe) solve for \(p_f\)! But to do it, we’ll need to figure out what the distribution of \(X\) is.
Wright-Fisher with selection¶
Let’s apply this to the selected allele. As usual, it’s much easier to work with a haploid model, so that when we say “individual” really we mean “individual copy of a chromosome within a diploid individual”.
One way to motivate the Wright-Fisher model is to suppose that (a) every individual has a very large number of gametes, but that there’s only room in the population for exactly \(2N\) adults, so (b) the next generation is formed by a random draw of \(2N\) of the gametes. Recall the useful fact that says that if a large number of independent trials happens with small probability of success per trial, then it is approximately Poisson distributed. So, the number of offspring of a given individual is Binomial\((2N, 1/2N)\), which has mean 1, but a good approximation to this is Poisson(1).
Now, suppose that the benefical allele produces a larger proportion of the gamete pool: each benefical allele makes \(1+s\) times as many gametes as the original allele. Then, if there are \(k\) copies of this beneficial allele, the chance a given offspring is chosen out of this pool of benefical alleles is $\(\begin{aligned} \frac{ k (1+s)}{ k (1+s) + (2N - k) } &= \frac{ k }{2N + ks} (1 + s) \\ &\approx \frac{k}{2N} (1 + s) . \end{aligned}\)\( In other words, *each* benefical allele has a Binomial\)(2N, (1+s)/2N)\( number of offspring -- approximately, a Poisson\)(1+s)$ number.
Putting it together¶
Now let’s try to solve our equation for \(p_f\),
in the case that \(X\) is Poisson with mean \(1+s\). Writing this out, and recognizing the power series for the exponential, \(\exp(x) = \sum_{k \ge 0} x^k / k!\),
(The other way to evaluate this sum would be to look up the probability generating function of the Poisson distribution.)
Plugging this in to our equation, we have that
Unfortunately, there’s no simple way to write down the solution to this equation. We could solve this numerically, but so that we have something that’s easy to think about and remember, let’s approximate this: since \(\exp(x) \approx 1 + x + x^2 / 2\),
Rearranging, this is
This has two solutions: \(p_f = 0\) or \( p_f = \frac{2s}{(1+s)^2}\). We have already gotten rid of all but the leading power of \(s\) a few times, so we might as well continue:
We’ve shown that in a Wright-Fisher population of constant size, a new mutation additive effect \(s\) on fitness has probability approximately \(2s\) of becoming common in the population. Notice that this doesn’t depend on \(N\)!
Exercise: We showed that \(p_f\) is a fixed point of the function \(\phi(u) = 1 - \exp(-u(1+s))\), i.e., \(\phi(p_f) = p_f\). Pick a value of \(s\) and make a plot of this function for \(0 \le u \le 1\). Add the \(y=x\) line: fixed points are where the plot of \(\phi\) intersects this line. You can find the fixed point numerically by iterating: start with \(u = 1/2\) and apply \(\phi\) until convergence, so that \(p_f = \phi(\cdots \, \phi(\phi(1/2))\cdots )\). Add this to the plot. Do the same thing for a negative value of \(s\). What happens?