The Wright-Fisher Generation Cycle
The escapement of the forge: one tick, one generation.
The Wright-Fisher model is the oldest and simplest model of a finite population. Each generation, \(N\) offspring are produced by drawing parents at random (with replacement) from the current population. Add mutations, recombination, and fitness-proportional parent selection, and you have the engine that drives SLiM.
In this chapter, we build a minimal Wright-Fisher simulator from scratch. It will not be fast – SLiM’s C++ implementation is heavily optimized with mutation runs, fitness caching, and template specialization – but it will be correct, and every line of code will correspond to a line of math.
Step 1: The Population and Its Genomes
A diploid individual carries two haplosomes (chromosome copies). Each haplosome is a list of mutations, where each mutation has a position on the chromosome, a selection coefficient \(s\), and a dominance coefficient \(h\).
import numpy as np
from dataclasses import dataclass, field
@dataclass
class Mutation:
"""A single mutation on a haplosome.
Each mutation has a genomic position, a selection coefficient s,
and a dominance coefficient h. The mutation also records when
(which generation) and where (which individual) it arose.
"""
position: int # base-pair position on the chromosome
s: float # selection coefficient (>0 beneficial, <0 deleterious)
h: float # dominance coefficient (0.5 = codominant)
origin_tick: int = 0 # generation when the mutation arose
@dataclass
class Individual:
"""A diploid individual carrying two haplosomes.
Each haplosome is a sorted list of Mutation objects. Fitness
is computed from the combined effects of all mutations.
"""
haplosome_1: list = field(default_factory=list)
haplosome_2: list = field(default_factory=list)
fitness: float = 1.0
This is the data model. Now for the dynamics.
Step 2: Fitness Calculation
An individual’s fitness is the product of the effects of all its mutations. The key subtlety is dominance: a mutation’s effect depends on whether the individual carries it on one haplosome (heterozygous) or both (homozygous).
For a mutation with selection coefficient \(s\) and dominance \(h\):
The individual’s total fitness is the product over all mutations:
where \(f(m)\) is the fitness contribution of mutation \(m\), taking into account its zygosity.
Probability Aside – Why multiplicative fitness?
Multiplicative fitness means that the effects of mutations are independent on a log scale: \(\log w = \sum \log f(m)\). This is the standard model in population genetics (it is the simplest assumption, and SLiM uses it by default). Alternative models include additive fitness (\(w = 1 + \sum s_m\)) and epistatic fitness (where the effect of one mutation depends on which other mutations are present). Multiplicative fitness is a good approximation when \(|s|\) is small, since \(\prod(1 + s_i) \approx 1 + \sum s_i\) for small \(s_i\).
Closing a confusion gap – Dominance
Dominance coefficient \(h\) interpolates between recessive (\(h = 0\)) and dominant (\(h = 1\)):
\(h = 0\): The mutation has no effect in heterozygotes (fully recessive). Only homozygous carriers are affected.
\(h = 0.5\): The heterozygote effect is half the homozygote effect (codominant/additive). This is the most common default.
\(h = 1\): The heterozygote has the full effect (fully dominant).
In SLiM’s source code, the cached fitness for a heterozygous mutation
is 1 + h * s, and for a homozygous mutation it is 1 + s. The
fitness value is clamped to a minimum of 0 (an individual cannot have
negative fitness).
def calculate_fitness(individual):
"""Compute fitness as the product of all mutation effects.
For each mutation, we check whether the individual is homozygous
(mutation on both haplosomes) or heterozygous (mutation on one).
This mirrors SLiM's core fitness calculation in population.cpp.
"""
w = 1.0
# Collect positions of mutations on each haplosome
positions_1 = {m.position for m in individual.haplosome_1}
positions_2 = {m.position for m in individual.haplosome_2}
# Mutations on haplosome 1
for m in individual.haplosome_1:
if m.position in positions_2:
# Homozygous: mutation present on both haplosomes
w *= max(0.0, 1.0 + m.s)
else:
# Heterozygous: mutation only on haplosome 1
w *= max(0.0, 1.0 + m.h * m.s)
# Mutations on haplosome 2 that are NOT on haplosome 1
for m in individual.haplosome_2:
if m.position not in positions_1:
# Heterozygous: mutation only on haplosome 2
w *= max(0.0, 1.0 + m.h * m.s)
# Clamped to zero (SLiM also does this: fitness cannot be negative)
individual.fitness = max(0.0, w)
return individual.fitness
# Example: one beneficial heterozygous mutation
ind = Individual()
ind.haplosome_1 = [Mutation(position=500, s=0.1, h=0.5)]
print(f"Fitness with one het beneficial (s=0.1, h=0.5): "
f"{calculate_fitness(ind):.4f}")
# Expected: 1 + 0.5 * 0.1 = 1.05
# Two deleterious mutations, one homozygous
ind2 = Individual()
m1 = Mutation(position=100, s=-0.05, h=0.5)
m2 = Mutation(position=100, s=-0.05, h=0.5) # same position = homozygous
ind2.haplosome_1 = [m1]
ind2.haplosome_2 = [m2]
print(f"Fitness with one hom deleterious (s=-0.05): "
f"{calculate_fitness(ind2):.4f}")
# Expected: 1 + (-0.05) = 0.95
Step 3: Parent Selection
Parents are drawn with probability proportional to their fitness. This is the mechanism by which selection acts: fitter individuals are more likely to be chosen as parents, so their alleles are over-represented in the next generation.
Formally, the probability that individual \(i\) is chosen as a parent is:
This is fitness-proportional selection, also called roulette wheel selection in the evolutionary computation literature.
Probability Aside – Selection as a biased coin
Under neutrality (\(s = 0\) for all mutations), all individuals have fitness \(w = 1\), and parent selection reduces to uniform random sampling – the standard Wright-Fisher model. Selection introduces a bias: if individual \(i\) has fitness \(1 + s\) and everyone else has fitness \(1\), the probability of choosing \(i\) is \((1 + s) / (N - 1 + 1 + s) \approx (1 + s) / N\) for large \(N\). The bias is small per generation but compounds over many generations – this is how evolution works.
In SLiM’s C++ implementation, fitness-proportional selection uses a GSL
discrete lookup table for \(O(1)\) sampling after \(O(N)\) setup.
Our Python version uses numpy.random.choice with weights:
def select_parents(population):
"""Draw two parents with probability proportional to fitness.
Returns indices into the population list. The two parents
may be the same individual (selfing is allowed in the basic
Wright-Fisher model).
"""
fitnesses = np.array([ind.fitness for ind in population])
total = fitnesses.sum()
if total == 0:
# All individuals have zero fitness -- population is dead.
# This can happen with very strong purifying selection.
raise RuntimeError("Population extinction: all fitnesses are zero")
probs = fitnesses / total
# Draw two parents independently (with replacement)
parent_indices = np.random.choice(len(population), size=2, p=probs)
return parent_indices[0], parent_indices[1]
Step 4: Recombination
Each parent contributes one haplosome to the offspring. That haplosome is assembled by recombining the parent’s two haplosomes: crossover points are drawn, and the haplosome alternates between the two parental copies at each crossover.
The number of crossovers along a chromosome of length \(L\) with per-bp recombination rate \(r\) is:
Each crossover point is placed uniformly at random along the chromosome. The offspring haplosome starts by copying from one randomly chosen parental haplosome, then switches to the other at each crossover point.
def recombine(parent, r, L):
"""Generate one recombinant haplosome from a diploid parent.
This is the core of meiosis: start with one randomly chosen
parental haplosome, then switch at each crossover point.
Parameters
----------
parent : Individual
The diploid parent with two haplosomes.
r : float
Per-bp, per-generation recombination rate.
L : int
Chromosome length in base pairs.
Returns
-------
child_haplosome : list of Mutation
The recombinant haplosome for the child.
"""
# Draw number of crossovers from Poisson distribution
n_crossovers = np.random.poisson(r * L)
if n_crossovers == 0:
# No recombination: child gets a complete copy of one haplosome
if np.random.random() < 0.5:
return list(parent.haplosome_1)
else:
return list(parent.haplosome_2)
# Draw crossover positions (sorted)
breakpoints = sorted(np.random.randint(1, L, size=n_crossovers))
# Start with a randomly chosen haplosome
if np.random.random() < 0.5:
sources = [parent.haplosome_1, parent.haplosome_2]
else:
sources = [parent.haplosome_2, parent.haplosome_1]
# Build the recombinant haplosome by alternating between sources
child_haplosome = []
current_source = 0 # index into sources list
for m in sources[0] + sources[1]:
# Determine which source this mutation falls in
# by counting how many breakpoints are to the left of it
n_breaks_before = sum(1 for bp in breakpoints if bp <= m.position)
source_idx = n_breaks_before % 2
if source_idx == 0 and m in sources[0]:
child_haplosome.append(m)
elif source_idx == 1 and m in sources[1]:
child_haplosome.append(m)
# Sort by position for consistency
child_haplosome.sort(key=lambda m: m.position)
return child_haplosome
A cleaner (and faster) implementation that mirrors SLiM more closely:
def recombine_v2(parent, r, L):
"""Generate one recombinant haplosome (cleaner version).
Walk along the chromosome, switching between parental haplosomes
at each breakpoint. Collect mutations from whichever haplosome
is currently active.
"""
n_crossovers = np.random.poisson(r * L)
breakpoints = sorted(np.random.randint(1, L, size=n_crossovers)) if n_crossovers > 0 else []
breakpoints.append(L + 1) # sentinel: end of chromosome
# Choose starting haplosome randomly
if np.random.random() < 0.5:
hap_a, hap_b = parent.haplosome_1, parent.haplosome_2
else:
hap_a, hap_b = parent.haplosome_2, parent.haplosome_1
child = []
current = hap_a
other = hap_b
bp_idx = 0
# Merge mutations from both haplosomes in position order
all_muts_a = [(m.position, 'a', m) for m in hap_a]
all_muts_b = [(m.position, 'b', m) for m in hap_b]
all_events = sorted(all_muts_a + all_muts_b, key=lambda x: x[0])
using_a = True # start with hap_a
for pos, source, mut in all_events:
# Advance through breakpoints
while bp_idx < len(breakpoints) and breakpoints[bp_idx] <= pos:
using_a = not using_a
bp_idx += 1
# Keep mutation if it comes from the currently active haplosome
if (using_a and source == 'a') or (not using_a and source == 'b'):
child.append(mut)
return child
Step 5: Adding New Mutations
After recombination, new mutations are added to the offspring’s haplosomes. The number of new mutations on each haplosome is Poisson-distributed:
Each mutation is placed at a uniformly random position along the chromosome. Its selection coefficient \(s\) is drawn from the distribution of fitness effects (DFE), which defines the spectrum of possible fitness effects.
Probability Aside – Common DFE models
The DFE is the probability distribution of \(s\) for new mutations. Common choices:
Fixed: \(s\) is a constant (e.g., \(s = 0\) for neutral mutations, \(s = -0.01\) for weakly deleterious).
Exponential: \(s \sim \text{Exp}(\beta)\). Used for beneficial mutations, where most have small effect and few have large effect.
Gamma: \(|s| \sim \text{Gamma}(\alpha, \beta)\). The most commonly used DFE for deleterious mutations (Eyre-Walker & Keightley, 2007). The shape parameter \(\alpha\) controls how concentrated effects are: \(\alpha < 1\) gives an L-shaped distribution (many nearly-neutral, few strongly deleterious); \(\alpha > 1\) gives a bell-shaped distribution.
Normal: \(s \sim \mathcal{N}(\mu_s, \sigma_s^2)\). Rarely used for purifying selection but sometimes for stabilizing selection models.
def add_mutations(haplosome, mu, L, tick, dfe='neutral', dfe_params=None):
"""Add new mutations to a haplosome.
Parameters
----------
haplosome : list of Mutation
The haplosome to mutate.
mu : float
Per-bp, per-generation mutation rate.
L : int
Chromosome length.
tick : int
Current generation (for recording origin time).
dfe : str
Distribution of fitness effects: 'neutral', 'fixed',
'exponential', or 'gamma'.
dfe_params : dict
Parameters for the DFE (e.g., {'s': -0.01} for fixed,
{'mean': 0.01} for exponential,
{'shape': 0.3, 'scale': 0.05} for gamma).
Returns
-------
haplosome : list of Mutation
The haplosome with new mutations added.
"""
# Number of new mutations: Poisson(mu * L)
n_new = np.random.poisson(mu * L)
if dfe_params is None:
dfe_params = {}
for _ in range(n_new):
# Random position along the chromosome
position = np.random.randint(0, L)
# Draw selection coefficient from the DFE
if dfe == 'neutral':
s = 0.0
elif dfe == 'fixed':
s = dfe_params.get('s', 0.0)
elif dfe == 'exponential':
s = np.random.exponential(dfe_params.get('mean', 0.01))
elif dfe == 'gamma':
s = -np.random.gamma(
dfe_params.get('shape', 0.3),
dfe_params.get('scale', 0.05)
) # negative because deleterious
h = dfe_params.get('h', 0.5) # default: codominant
haplosome.append(Mutation(position=position, s=s, h=h,
origin_tick=tick))
# Keep sorted by position
haplosome.sort(key=lambda m: m.position)
return haplosome
Step 6: The Complete Generation Cycle
Now we assemble the parts into a complete Wright-Fisher generation. This is the heart of SLiM – the function that ticks once per generation:
def wright_fisher_generation(population, N, L, mu, r, tick,
dfe='neutral', dfe_params=None):
"""Run one Wright-Fisher generation.
This is the complete cycle:
1. Calculate fitness for all individuals
2. Generate N offspring by:
a. Selecting parents proportional to fitness
b. Recombining parental haplosomes
c. Adding new mutations
3. Replace the old population with the new one
Parameters
----------
population : list of Individual
Current generation (N diploid individuals).
N : int
Population size (constant).
L : int
Chromosome length.
mu : float
Per-bp, per-generation mutation rate.
r : float
Per-bp, per-generation recombination rate.
tick : int
Current generation number.
dfe : str
Distribution of fitness effects.
dfe_params : dict
DFE parameters.
Returns
-------
new_population : list of Individual
The next generation.
"""
# Step 1: Recalculate fitness for all individuals
for ind in population:
calculate_fitness(ind)
# Step 2: Generate N offspring
new_population = []
for _ in range(N):
# Select two parents (fitness-proportional)
p1_idx, p2_idx = select_parents(population)
parent1 = population[p1_idx]
parent2 = population[p2_idx]
# Recombine: each parent contributes one haplosome
child_hap1 = recombine_v2(parent1, r, L)
child_hap2 = recombine_v2(parent2, r, L)
# Add new mutations
child_hap1 = add_mutations(child_hap1, mu, L, tick, dfe, dfe_params)
child_hap2 = add_mutations(child_hap2, mu, L, tick, dfe, dfe_params)
new_population.append(Individual(
haplosome_1=child_hap1,
haplosome_2=child_hap2
))
return new_population
Step 7: The Full Simulation
Now we can run a complete forward simulation. We initialize a population, run for \(T\) generations, and track statistics along the way.
def simulate(N, L, mu, r, T, dfe='neutral', dfe_params=None,
track_every=100):
"""Run a complete Wright-Fisher forward simulation.
Parameters
----------
N : int
Diploid population size.
L : int
Chromosome length (bp).
mu : float
Per-bp, per-generation mutation rate.
r : float
Per-bp, per-generation recombination rate.
T : int
Number of generations to simulate.
dfe : str
Distribution of fitness effects.
dfe_params : dict
DFE parameters.
track_every : int
Record statistics every this many generations.
Returns
-------
population : list of Individual
Final population.
stats : dict
Time series of population statistics.
"""
# Initialize: N individuals with empty haplosomes
population = [Individual() for _ in range(N)]
stats = {
'generation': [],
'mean_fitness': [],
'num_segregating': [],
'mean_mutations_per_individual': [],
}
for tick in range(T):
population = wright_fisher_generation(
population, N, L, mu, r, tick, dfe, dfe_params
)
# Track statistics periodically
if tick % track_every == 0 or tick == T - 1:
fitnesses = [ind.fitness for ind in population]
n_muts = [len(ind.haplosome_1) + len(ind.haplosome_2)
for ind in population]
# Count segregating mutations (present in at least one
# but not all individuals)
all_positions = set()
for ind in population:
for m in ind.haplosome_1 + ind.haplosome_2:
all_positions.add(m.position)
stats['generation'].append(tick)
stats['mean_fitness'].append(np.mean(fitnesses))
stats['num_segregating'].append(len(all_positions))
stats['mean_mutations_per_individual'].append(np.mean(n_muts))
print(f"Gen {tick:>6d}: mean_w={np.mean(fitnesses):.4f}, "
f"seg_sites={len(all_positions):>5d}, "
f"muts/ind={np.mean(n_muts):.1f}")
return population, stats
# Example: small neutral simulation
np.random.seed(42)
pop, stats = simulate(
N=100, # small population for speed
L=10000, # 10 kb chromosome
mu=1e-4, # elevated rate (for tractability)
r=1e-4, # same as mutation rate
T=500, # 500 generations
dfe='neutral',
track_every=100
)
Step 8: Verifying Against Theory
Under neutrality (no selection), we can verify our simulator against known results from coalescent theory.
Expected number of segregating sites. For a diploid population of size \(N\), the expected number of segregating sites at mutation-drift equilibrium is:
where \(\gamma \approx 0.577\) is the Euler-Mascheroni constant and we sum to \(2N - 1\) because there are \(2N\) haploid genomes in a diploid population of size \(N\).
Expected heterozygosity. The expected heterozygosity (probability that two randomly chosen haplosomes differ at a site) is:
for small \(\mu\).
def verify_neutrality(pop, N, L, mu):
"""Check that our neutral simulation matches coalescent predictions.
Compares the number of segregating sites and heterozygosity to
the theoretical expectations from coalescent theory.
"""
theta = 4 * N * mu * L # population-scaled mutation rate
# Count segregating sites
all_positions = set()
for ind in pop:
for m in ind.haplosome_1 + ind.haplosome_2:
all_positions.add(m.position)
S_observed = len(all_positions)
# Harmonic number for 2N - 1
harmonic = sum(1.0 / k for k in range(1, 2 * N))
S_expected = theta * harmonic
print(f"Segregating sites:")
print(f" Observed: {S_observed}")
print(f" Expected: {S_expected:.1f}")
print(f" Theta: {theta:.1f}")
# After a long burn-in, the observed S should be close to expected
# (The short simulation above won't have reached equilibrium;
# a proper test would run for ~10*N generations)
The Wright-Fisher cycle is complete. You have built the escapement. In the next chapter, we put it to work with three recipes that demonstrate the power of forward simulation: phenomena that the coalescent cannot easily model.