Overview of PHLASH
Before assembling the watch, lay out every part and understand what it does.
PHLASH at a glance. The four pillars of the PHLASH inference pipeline: composite SFS likelihood connecting data to demography, the RBF kernel measuring inter-particle similarity for SVGD, particle evolution under Stein Variational Gradient Descent, and random time discretisation for debiased gradient estimation.
What Does PHLASH Do?
Given whole-genome sequencing data from one or more diploid individuals, PHLASH infers the posterior distribution over the population size history \(N(t)\).
Unlike PSMC, which returns a single best estimate of \(N(t)\), PHLASH returns a distribution over possible histories – a cloud of plausible trajectories rather than a single line. Each trajectory in the cloud is consistent with the data; the spread of the cloud tells you how certain or uncertain the inference is.
Think of it this way. PSMC is a watch that shows you a single reading: “It is 3:17.” PHLASH is a watch that shows you a distribution: “It is 3:17 give or take two minutes, and here is a plot showing the probability density over all possible times.” Both tell time; PHLASH also tells you how well it is keeping time.
Why a Successor to PSMC?
PSMC (Timepiece I) demonstrated that a single diploid genome encodes deep demographic history. But it has three well-known limitations:
No uncertainty quantification. PSMC uses the EM algorithm, which finds a point estimate (a single maximum-likelihood history). The standard approach to uncertainty – bootstrapping over genomic blocks – is computationally expensive and can underestimate true uncertainty because blocks are not truly independent.
Discretization bias. PSMC divides time into fixed intervals and estimates a constant population size within each interval. The choice of intervals affects the result: too few intervals miss real demographic events, while the spacing pattern (typically log-spaced) introduces systematic bias because the integral approximation errors do not cancel across a fixed grid.
Limited data use. PSMC analyzes one diploid genome at a time. When multiple genomes are available, it cannot easily combine their information. The pairwise approach (PSMC’) extends to pairs of individuals but does not scale to many samples.
PHLASH addresses all three:
Limitation |
PSMC |
PHLASH |
|---|---|---|
Uncertainty |
Point estimate + bootstrap |
Full posterior via SVGD |
Discretization |
Fixed grid, systematic bias |
Random grids, bias cancels on average |
Data sources |
One diploid (pairwise HMM only) |
SFS (many individuals) + pairwise HMM |
Optimization |
EM (local maximum, slow convergence) |
SVGD (posterior sampling, GPU-parallel) |
Gradients |
Not needed (EM uses closed-form updates) |
Score function algorithm, \(O(LM^2)\) |
How PHLASH Extends the Coalescent HMM
At its core, PHLASH still uses the same coalescent HMM as PSMC. Recall from Timepiece I: the hidden states are discretized coalescence time intervals, the observations are heterozygous/homozygous bins, and the transition matrix encodes how the TMRCA changes between adjacent positions under the SMC approximation.
PHLASH inherits this entire machinery but wraps it in a larger framework:
The coalescent HMM likelihood is computed for each pair of haplotypes (or each diploid individual), exactly as in PSMC. This gives a likelihood \(\ell_{\text{HMM}}(\eta)\) for a demographic history \(\eta\).
The SFS likelihood is computed from the allele frequency distribution across all sampled individuals. This gives a second likelihood \(\ell_{\text{SFS}}(\eta)\) that captures information from many individuals simultaneously.
The two likelihoods are combined into a composite likelihood:
\[\ell_{\text{comp}}(\eta) = \ell_{\text{SFS}}(\eta) + \ell_{\text{HMM}}(\eta)\](on the log scale, so the composite log-likelihood is a sum). This composite likelihood is then used as the basis for Bayesian inference, combined with a smoothness prior on \(\eta\).
The composite likelihood is the mainspring of PHLASH – the energy source that drives the entire mechanism. But turning the mainspring requires computing gradients efficiently, which is where the score function algorithm comes in. And the gradients must be computed on a discretized time grid, which is where random discretization cancels the bias. And the gradients drive SVGD, which produces the posterior. Every gear depends on the one before it.
The Demographic Model
PHLASH parameterizes the population size history as a piecewise-constant function on a set of time intervals, just as PSMC does:
But unlike PSMC, the time breakpoints \(t_0 < t_1 < \cdots < t_M\) are not fixed across the analysis. Instead, PHLASH randomly samples the breakpoints for each gradient computation. Different random grids yield different discretization errors, but these errors have zero mean: they cancel when averaged across draws.
The vector of population sizes \(\boldsymbol{\eta} = (\eta_0, \ldots, \eta_{M-1})\) lives in a high-dimensional space. PHLASH places a smoothness prior on \(\boldsymbol{\eta}\) – a Gaussian process prior that penalizes histories with large jumps between adjacent epochs. This prior acts as a regularizer, preventing overfitting to noise while allowing the data to drive the inference.
Parameters
Parameter |
Symbol |
Physical meaning |
|---|---|---|
Population sizes |
\(\eta_0, \ldots, \eta_{M-1}\) |
Effective population size in each time interval. The primary quantities of interest – the posterior distribution over these values is the output. |
Mutation rate |
\(\theta\) |
Population-scaled mutation rate, inherited from PSMC’s parameterization. Controls the density of heterozygous sites per unit coalescence time. |
Recombination rate |
\(\rho\) |
Population-scaled recombination rate. Controls how frequently the genealogy changes between adjacent positions. |
Time breakpoints |
\(t_0, \ldots, t_M\) |
Boundaries of the piecewise-constant intervals. Randomly sampled in each gradient computation step; not treated as fixed parameters. |
Number of particles |
\(J\) |
The number of candidate histories maintained by SVGD. More particles give a better approximation to the posterior but cost more computation. |
Smoothness prior |
\(\Sigma\) |
The covariance structure of the Gaussian process prior on \(\log \boldsymbol{\eta}\). Controls how much the inferred history is allowed to fluctuate between adjacent time intervals. |
The Flow in Detail
Step 1: Data preparation
From whole-genome data, compute:
(a) SFS from all individuals
(b) Heterozygosity sequences for each diploid individual
(same binning as PSMC)
Step 2: Initialize particles
Draw J initial demographic histories from the prior.
Each particle is a vector eta = (eta_0, ..., eta_{M-1}).
Step 3: For each SVGD iteration:
(a) Sample a random time discretization t_0 < t_1 < ... < t_M
(b) For each particle j:
- Compute SFS log-likelihood given eta^(j)
- Compute coalescent HMM log-likelihood given eta^(j)
- Sum to get composite log-likelihood
- Compute gradient via score function algorithm
- Add gradient of log-prior (smoothness penalty)
(c) Compute SVGD update:
- Attraction: gradient of log-posterior pushes particles
toward high-probability regions
- Repulsion: kernel gradient pushes particles apart
to maintain diversity
(d) Update all particles in parallel (on GPU)
Step 4: After convergence:
The J particles approximate the posterior distribution.
Compute credible intervals, posterior mean, uncertainty bands.
Ready to Build
The next four chapters disassemble PHLASH gear by gear:
The Composite Likelihood – The mainspring: how the SFS likelihood and the coalescent HMM likelihood are computed and combined into a single composite objective.
Random Time Discretization – The tourbillon: how random time breakpoints cancel discretization bias, turning a systematic error into zero-mean noise that vanishes under averaging.
The Score Function Algorithm – The gear train: the \(O(LM^2)\) algorithm that computes the gradient of the log-likelihood 30–90x faster than autodiff, making SVGD practical at genomic scale.
Stein Variational Gradient Descent (SVGD) – The winding mechanism: how SVGD uses gradients and a repulsive kernel to push particles toward the posterior distribution, producing uncertainty estimates on a GPU.
Each chapter derives the math, explains the intuition, and connects back to the PSMC foundations from Timepiece I.
Let us start with the foundation: the composite likelihood.