Comparison and Limitations

The master watchmaker keeps three timepieces on her bench: a marine chronometer for absolute accuracy, a minute repeater for darkness, and a split-seconds chronograph for measuring intervals. She does not ask which is best. She asks which is right for the task at hand. The answer depends on what you need to measure, how quickly, and how much uncertainty you can tolerate.

This chapter places Balance Wheel in the full landscape of neural and classical methods for population-genetic inference. We compare it against the other two Complications, trace its connections to every relevant Timepiece, enumerate its limitations honestly, and provide a decision tree for choosing the right tool.

The Three Complications

Each Complication operates at a different level of the data hierarchy and serves a different purpose. They are not competing approaches – they are complementary tools for different questions.

The three Complications compared

Property

Mainspring

Escapement

Balance Wheel

Metaphor

Learn to assemble the watch from blueprints

Understand the physics of timekeeping

Feel the aggregate pressure of evolution

Data input

Raw genotypes \(\mathbf{D} \in \{0,1\}^{n \times L}\)

Raw genotypes \(\mathbf{D} \in \{0,1\}^{n \times L}\)

Site Frequency Spectrum \(\mathbf{D} \in \mathbb{Z}^{n-1}\)

Training signal

Simulated ARGs (msprime)

Coalescent likelihood (analytical)

Exact SFS from moments / dadi

Simulations needed

Yes (millions of ARGs)

No

No (just moments evaluations)

What it infers

Full ARG + \(N_e(t)\)

Genealogy + \(N_e(t)\)

Demographic parameters only

Resolution

Per-site, per-sample

Per-site, per-sample

Population-level (SFS)

Speed at inference

~1 second (forward pass)

~10–30 minutes (ELBO optimization)

~0.1 ms per SFS + seconds for posterior

Closest Timepiece

tsinfer + tsdate

PSMC + ARGweaver

dadi + moments

Best for

High-throughput ARG inference

Deep analysis of one dataset

Demographic model fitting + comparison

Each operates at a different level of the data hierarchy:

  • Mainspring: sequence \(\to\) ARG \(\to\) demography (most detailed, needs simulations)

  • Escapement: sequence \(\to\) coalescent times \(\to\) demography (no simulations, per-dataset)

  • Balance Wheel: SFS \(\to\) demography (fastest, most practical for demographic inference)

The information hierarchy

Moving from Mainspring to Balance Wheel, we trade information for speed. The raw genotype matrix contains all the information: haplotype structure, LD, spatial patterns, allele frequencies. The SFS retains only allele frequencies. This is a massive compression – and under the Poisson Random Field model, it is lossless for demographic inference. But for questions about recombination, selection, or genealogy structure, the SFS is insufficient. Choose the Complication that matches the question, not the one that processes the most data.

Connection to Every Timepiece

Balance Wheel does not exist in isolation. Every major design decision traces to a mathematical insight from a Timepiece.

Design principles and their Timepiece origins

Timepiece

What Balance Wheel borrows

How it appears

dadi

The Wright-Fisher diffusion PDE

Balance Wheel learns to approximate the PDE solution. The teacher (dadi) solves the PDE exactly; the student (neural network) reproduces the result without solving the PDE. The diffusion theory provides the mathematical foundation that makes the SFS a smooth function of \(\Theta\).

moments

The ODE system for SFS entries

moments is the primary teacher during training. Its ODE integrator computes the exact expected SFS for each training example. Balance Wheel distills this computation into a neural network.

PSMC

Piecewise-constant \(N_e(t)\) parameterization

Balance Wheel extends PSMC’s piecewise-constant representation to continuous \(N_e(t)\) via neural splines, while retaining the piecewise-constant option as the simplest case.

momi2

Coalescent SFS computation for multi-population models

momi2 serves as an alternative teacher for complex multi-population topologies where moments becomes expensive. Its tensor machinery inspires the factored output approach.

Gamma-SMC

Gamma distributions for parameter posteriors

The Bayesian posterior inference via HMC produces posterior distributions that are often well-approximated by gamma distributions, providing a connection to Gamma-SMC’s analytical posteriors.

phlash

Differentiable inference engine

The core idea of replacing a classical inference engine with a differentiable neural alternative. phlash pioneered this for the SMC likelihood; Balance Wheel does it for the SFS computation.

SMC++

Continuous-time demographic inference

The motivation for continuous \(N_e(t)\) comes from SMC++’s demonstration that piecewise-constant models are limiting.

msprime

Kingman coalescent theory

The prior distributions on demographic parameters are grounded in coalescent expectations (e.g., expected TMRCA for pair of lineages).

What Balance Wheel Cannot Do

Four fundamental limitations, stated without euphemism.

1. Inherits the SFS’s Limitations

The SFS discards:

  • Linkage disequilibrium (LD). Two-locus statistics, haplotype blocks, and LD decay patterns are invisible in the SFS. Recent selective sweeps that leave strong LD signatures but modest SFS distortions will be missed.

  • Haplotype structure. The SFS counts allele frequencies but not which alleles co-occur on the same haplotype. Admixture events that create characteristic haplotype patterns (e.g., long blocks of introgressed sequence) cannot be detected from the SFS alone.

  • Spatial information. The genomic positions of SNPs are irrelevant to the SFS. Recombination rate variation, which produces spatial patterns of diversity, is invisible.

\[\text{SFS} = f(\text{allele frequencies}) \neq g(\text{haplotype structure})\]

For questions about selection, recombination, or genealogy structure, use Escapement or Mainspring.

2. Teacher Quality Ceiling

Balance Wheel’s accuracy is bounded by the teacher’s accuracy. If moments has numerical issues for extreme parameter values – very large or very small population sizes, very short epochs, very high migration rates – the neural network will inherit those issues or extrapolate unpredictably.

Teacher accuracy in boundary cases

Scenario

Teacher behavior

Neural network behavior

\(N_e < 100\)

moments may lose precision (drift dominated)

May extrapolate poorly

\(N_e > 10^6\)

moments may be slow (many ODE steps)

May interpolate well if trained in range

Epoch duration < 0.001 coalescent units

moments may miss the effect

May smooth over the epoch

Migration rate \(m > 1\)

moments may be numerically unstable

Will reflect the instability

Mitigation. Validate the neural predictions against the teacher in the specific parameter regime of interest. If moments gives unreliable results in some regime, filter those examples from the training set and acknowledge the limitation.

3. Generalization to Unseen Topologies

The multi-population version is trained on a distribution of population tree topologies. If the true topology is:

  • More complex than the training distribution (e.g., 6 populations when training covered up to 4).

  • Structurally different (e.g., reticulate admixture when training used only tree-like splits).

  • Extreme (e.g., very asymmetric topologies not represented in the prior).

the network may produce inaccurate joint SFS predictions. Unlike moments, which can compute the SFS for any topology that can be specified, the neural network is limited to the topologies it has seen during training.

Mitigation. Ensure the training topology distribution covers the models of interest. For novel topologies, generate a focused training set and fine-tune the network. Always validate against moments for the specific topology being analyzed.

4. Not a Replacement for Full-Likelihood

For a single dataset analyzed once with a well-specified model, running moments directly is more trustworthy than running Balance Wheel. The classical solver gives the exact expected SFS; the neural approximation introduces an error (typically < 1%, but still an error). Balance Wheel’s advantages emerge only when:

  • You need thousands of likelihood evaluations (HMC, model comparison, bootstrap).

  • The model has \(k \geq 3\) populations (moments becomes slow).

  • You need continuous \(N_e(t)\) (moments requires piecewise-constant).

  • You need full posterior distributions (moments gives only the MLE).

For a two-population model analyzed once to find the MLE, moments is simpler, more transparent, and more trustworthy.

When to Use Which Complication

A decision tree for choosing the right tool:

START
  |
  v
What is your data?
  |
  ├── Raw genotypes (VCF, genotype matrix)
  |     |
  |     v
  |   Do you need the full ARG?
  |     |
  |     ├── Yes ──▶ Do you have GPU + training budget?
  |     |            |
  |     |            ├── Yes ──▶ MAINSPRING
  |     |            └── No  ──▶ tsinfer + tsdate
  |     |
  |     └── No ──▶ Do you need per-site uncertainty?
  |                 |
  |                 ├── Yes ──▶ ESCAPEMENT
  |                 └── No  ──▶ Compute SFS, then ──▶ (see SFS path)
  |
  └── Site Frequency Spectrum (SFS)
        |
        v
      How many populations?
        |
        ├── 1 or 2, simple model ──▶ moments / dadi (classical)
        |
        ├── 1 or 2, need posterior ──▶ BALANCE WHEEL + HMC
        |
        ├── 3+, any analysis ──▶ BALANCE WHEEL (only viable option)
        |
        └── Model comparison needed ──▶ BALANCE WHEEL + marginal likelihood
Summary: when to use each approach

Scenario

Recommended approach

Simple 2-pop split model, find MLE

moments directly

2-pop model, need posterior uncertainty

Balance Wheel + HMC

3+ populations, any analysis

Balance Wheel (moments too slow)

Continuous \(N_e(t)\) inference

Balance Wheel (moments needs piecewise-constant)

Model comparison (2-epoch vs. 3-epoch)

Balance Wheel + marginal likelihood

Need full ARG for downstream analysis

Mainspring

Need per-site genealogy uncertainty

Escapement

Screening 1,000 genomic windows for \(N_e(t)\)

Mainspring (speed)

Single dataset, maximal statistical rigor

Hybrid: Mainspring \(\to\) Escapement

Teaching and understanding demographic inference

dadi + moments (Timepieces)

No GPU available

moments or dadi

The honest summary

Balance Wheel occupies a specific niche: fast, differentiable SFS computation that enables Bayesian posterior inference and handles multi-population models where the classical solvers fail. It is not a replacement for dadi or moments – it is a neural accelerator for the computation they perform. When you need the MLE of a simple model, use moments. When you need the posterior of a complex model, use Balance Wheel. When you need the full ARG, use Mainspring or Escapement.

The three Complications form a hierarchy:

  • Balance Wheel: SFS \(\to\) demographic parameters (fastest, least detailed)

  • Escapement: genotypes \(\to\) genealogy + demography (per-dataset, principled)

  • Mainspring: genotypes \(\to\) full ARG + demography (amortized, most detailed)

And the Timepieces remain the foundation. Every neural approach in this book is built on the mathematical insights of the classical methods. Balance Wheel without dadi’s diffusion theory and moments’ ODE system would have no teacher. Escapement without the coalescent likelihood would have no loss function. Mainspring without msprime would have no training data.

Use the simplest tool that answers your question. And always check the results against a method you trust.

Full Comparison Against Classical Methods

For completeness, we place Balance Wheel alongside all relevant Timepieces:

Balance Wheel vs. all SFS-based methods

Property

dadi

moments

momi2

Balance Wheel

phlash

SMC++

Data input

SFS

SFS

SFS

SFS

Sequence pairs

Sequence pairs

Forward model

PDE solver

ODE integrator

Coalescent tensor

Neural MLP

SMC likelihood

SMC likelihood

Per-eval cost

~100 ms

~10 ms

~5 ms

~0.1 ms

~50 ms

~100 ms

Gradient

Finite diff

AD (ODE)

Analytic

Backprop

Score function

AD (ODE)

Multi-pop

Up to 3

Up to 3

Up to ~5

Up to 5+

2

2

Continuous \(N_e\)

No

No

No

Yes

Yes (spline)

Yes (spline)

Posterior

No (MLE only)

No (MLE only)

No (MLE only)

Yes (HMC)

Yes (SVGD)

No (MLE only)

Training cost

None

None

None

One-time

None

None

Needs GPU

No

No

No

Yes

Yes

No

The place of each method

dadi and moments are the gold standard for SFS-based demographic inference. They are classical, well-tested, and require no special hardware. For simple models (\(k \leq 2\)), they remain the best choice for finding the MLE.

Balance Wheel extends their reach: faster evaluations for complex models, Bayesian posteriors via HMC, continuous demography, and multi-population scaling. It trades the certainty of exact numerical computation for the speed of neural approximation.

phlash and SMC++ operate on sequence-level data (not the SFS) and use the SMC likelihood. They capture LD information that the SFS discards. For single-population or two-population inference where LD is informative, they may be more powerful than any SFS-based method.

The right choice depends on the question, the data, and the resources available.