Complication II: Escapement

Simulation-Free Deep Coalescent Inference via Variational Genealogies

The Mechanism at a Glance

Mainspring learns to invert simulations: train on millions of msprime outputs, then hope real data looks like the training distribution. This is amortized inference – fast at test time, but fundamentally limited by the simulation fidelity gap. If the real biological process differs from the training simulations (and it always does), the network fails silently.

Escapement takes the opposite approach: no simulations at all. Instead of learning a simulator-to-inference mapping, it uses the coalescent likelihood itself – the same equations derived in every Timepiece – as a differentiable loss function. The network trains directly on the observed data.

Every Timepiece in this book derives two things:

  1. A prior: \(P(\text{genealogy} \mid N_e)\) from coalescent theory

  2. A likelihood: \(P(\text{data} \mid \text{genealogy}, \mu)\) from the mutation model

These are analytical. You don’t need to simulate – you can evaluate them in closed form for any proposed genealogy. The intractable part is the posterior:

\[P(\text{genealogy} \mid \text{data}, N_e, \mu) \propto P(\text{data} \mid \text{genealogy}, \mu) \cdot P(\text{genealogy} \mid N_e)\]

The space of genealogies is combinatorial and enormous. Classical methods handle this differently: PSMC discretizes and uses an HMM. ARGweaver and SINGER sample with MCMC. tsinfer finds a point estimate via heuristics. tsdate uses variational approximation with hand-derived updates.

Escapement introduces a new option: learn the variational posterior with a neural network, optimized against the true coalescent likelihood.

The name

The escapement is the only part of a mechanical watch that needs no external calibration – it generates its own rhythm from first principles. The geometry of the escape wheel and pallet fork, combined with the physics of the balance spring, produces a precise oscillation without reference to any external standard. Similarly, Escapement generates its inference from the mathematical principles of the coalescent, without reference to any external simulation standard.

The four modules of Escapement:

  1. The Genealogy Encoder (the escape wheel) – A Transformer that processes the genotype matrix, producing per-sample, per-position latent vectors. The same architecture as Mainspring’s encoder, but optimized against a different loss.

  2. The Variational Tree Posterior (the pallet fork) – Produces a distribution over tree sequences: soft parent assignments via Gumbel-softmax (topology), reparameterized gamma distributions (branch lengths, from tsdate/Gamma-SMC), and Bernoulli breakpoints (from the SMC recombination model).

  3. The Differentiable Likelihood (the balance spring) – Pure math, no neural networks. Given a sampled genealogy, computes the Poisson mutation likelihood (from tsdate), the Kingman coalescent prior (from msprime/PSMC), and the entropy of the variational posterior. These three terms form the ELBO.

  4. Demographic Inference (the regulator) – \(N_e(t)\) parameterized as a neural spline, Gaussian process, or piecewise-constant function. Optimized jointly with the variational posterior by maximizing the ELBO.

Observed genotype matrix D ∈ {0,1}^{n × L}
                   |
                   v
         +--------------------------+
         |  GENEALOGY ENCODER       |
         |  Transformer over        |
         |  genomic windows         |
         +--------------------------+
                   |
                   v
         +--------------------------+
         |  VARIATIONAL POSTERIOR   |
         |  q(τ | D, φ)            |
         |                          |
         |  Topology: Gumbel-softmax|
         |  Times: Gamma(α, β)      |
         |  Breaks: Bernoulli(σ)    |
         +--------------------------+
                   |
                   v (sample τ ~ q)
         +--------------------------+
         |  DIFFERENTIABLE          |
         |  LIKELIHOOD              |
         |  (pure math, no NN)      |
         |                          |
         |  log P(D | τ, μ)         |
         |  + log P(τ | N_e, ρ)     |
         |  + H[q]                  |
         |  = ELBO                  |
         +--------------------------+
                   |
                   v (maximize ELBO)
         +--------------------------+
         |  DEMOGRAPHIC INFERENCE   |
         |  N_e(t): neural spline   |
         |  or GP in log-space      |
         +--------------------------+
                   |
                   v
         Posterior over genealogies
         + N_e(t) with uncertainty

Prerequisites for this Complication

Escapement synthesizes ideas from many Timepieces. Before starting, you should have worked through:

  • PSMC – the SMC factorization that makes the prior tractable

  • tsdate – variational gamma posteriors for coalescence times

  • tsinfer – attention as the copying model

  • Gamma-SMC – continuous time, no grid

  • msprime – the coalescent prior

  • Probabilistic Inference – variational inference and the ELBO

Familiarity with variational autoencoders and the reparameterization trick is assumed.

Chapters