.. _mainspring_overview:

========================
Overview of Mainspring
========================

   *The master watchmaker does not discard the tools of their apprenticeship. They
   build a single instrument that embodies every lesson learned at the bench.*

Every Timepiece in this book makes a bargain. :ref:`PSMC <psmc_timepiece>` trades
multi-sample information for the elegance of a two-haplotype HMM.
:ref:`SMC++ <smcpp_timepiece>` gains samples but discretizes the coalescent into an
ODE system. :ref:`ARGweaver <argweaver_timepiece>` is exact under the discrete
sequential Markov coalescent but costs :math:`O(S^2)` per site and hours of MCMC per
kilobase. :ref:`tsinfer <tsinfer_timepiece>` scales to millions of samples but
surrenders posterior inference entirely, producing a single point estimate with no
uncertainty. :ref:`tsdate <tsdate_timepiece>` adds dates to the tree sequence but
treats the topology as fixed. :ref:`dadi <dadi_timepiece>` and
:ref:`moments <moments_timepiece>` collapse the genome to a frequency histogram,
discarding linkage information. :ref:`Gamma-SMC <gamma_smc_timepiece>` maintains full
posterior uncertainty over coalescence times but processes only two haplotypes at a
time. :ref:`phlash <phlash_timepiece>` achieves scalable Bayesian demography but
relies on composite likelihood.

Each Timepiece sits at a different point on the Pareto frontier between accuracy,
scalability, and biological realism. No classical method occupies the corner where
all three are maximized -- the computational cost of exact inference under the full
coalescent is simply too high.

Mainspring is an attempt to break this frontier. Not by inventing new population
genetics -- every equation in this book remains valid -- but by **compiling** the
structural insights of every Timepiece into a single neural architecture that learns
to perform approximate posterior inference in a single forward pass.


What Mainspring Does
=====================

Given a genotype matrix :math:`\mathbf{D} \in \{0,1\}^{n \times L}` (where :math:`n`
is the number of haploid samples and :math:`L` is the number of segregating sites),
Mainspring produces:

1. A **full ancestral recombination graph** (ARG) in tskit format -- topology,
   breakpoints, and node times.
2. A **posterior distribution** over effective population size trajectories
   :math:`N_e(t)`.

Both outputs emerge from a single forward pass through the network. No MCMC sampling,
no EM iterations, no iterative optimization at inference time.

.. math::

   f_\theta : \mathbf{D} \in \{0,1\}^{n \times L} \;\longrightarrow\;
   \bigl(\widehat{\text{ARG}},\; q(N_e(t))\bigr)

The function :math:`f_\theta` is a neural network with parameters :math:`\theta`
trained on millions of simulated datasets from :ref:`msprime <msprime_timepiece>`.
At training time, we have access to the true ARG and true demography for each
simulation. At inference time, we have only the genotype matrix.

.. admonition:: Amortized inference

   The term **amortized** means that the computational cost of inference is paid
   once, during training, and then **amortized** across all future datasets.
   Training Mainspring takes days on a GPU cluster. But once trained, inference on
   a new dataset takes seconds. This is the same economics as compiling a program:
   the compiler is slow, but the compiled binary is fast. The simulations are the
   source code; the trained network is the compiled binary.

   For a thorough introduction to amortized inference, see
   :ref:`amortized_inference`.


Why "Structure-Aware" Matters
==============================

Neural approaches to population genetics are not new. pg-gan (Sheehan & Song 2016)
used a GAN to match simulated and observed SFS. ImaGene (Torada et al. 2019) applied
a CNN to genotype matrices for selection detection. ReLERNN (Adrion et al. 2020)
used an RNN for recombination rate estimation. These methods treat the genotype matrix
as a generic image or sequence, applying off-the-shelf architectures without encoding
any domain knowledge.

Mainspring is different in a specific, measurable way: **every architectural choice
corresponds to a mathematical property of the coalescent**.

.. list-table:: Generic vs. structure-aware neural inference
   :header-rows: 1
   :widths: 35 30 35

   * - Property
     - Generic approach
     - Mainspring
   * - Positional structure
     - Flat convolution or full attention
     - Sliding-window causal attention (sequential Markov property from
       :ref:`PSMC <psmc_timepiece>`)
   * - Sample exchangeability
     - Fixed sample ordering
     - Set Transformer, permutation-equivariant (from
       :ref:`SMC++ <smcpp_timepiece>`)
   * - Output format
     - Scalar summary statistics
     - Full ARG in tskit format (from
       :ref:`ARGweaver <argweaver_timepiece>` /
       :ref:`SINGER <singer_timepiece>`)
   * - Tree dating
     - Regression to point estimate
     - GNN message passing with gamma posteriors (from
       :ref:`tsdate <tsdate_timepiece>` / :ref:`Gamma-SMC <gamma_smc_timepiece>`)
   * - Haplotype relationships
     - Implicit in convolution filters
     - Cross-attention as Li & Stephens copying (from
       :ref:`tsinfer <tsinfer_timepiece>` / :ref:`lshmm <lshmm_timepiece>`)
   * - Demographic output
     - Point estimate of :math:`N_e`
     - Normalizing flow posterior :math:`q(N_e(t))` (from
       :ref:`phlash <phlash_timepiece>`)
   * - Physics regularization
     - None
     - Differentiable SFS loss (from :ref:`dadi <dadi_timepiece>` /
       :ref:`moments <moments_timepiece>` / :ref:`momi2 <momi2_timepiece>`)

The result is not that Mainspring is merely "better" in some generic sense -- it is
that the network **converges faster**, **generalizes better to out-of-distribution
demographies**, and **produces calibrated uncertainty estimates**, because the
architecture encodes the right inductive biases.


The Key Insight: Population Genetics Compiled into Architecture
================================================================

The central thesis of Mainspring can be stated in one sentence:

.. admonition:: Core thesis

   The mathematical structure of population genetics -- the sequential Markov
   property, permutation invariance of exchangeable samples, the Li & Stephens
   copying model, message-passing on trees, gamma-distributed coalescence times,
   and the site frequency spectrum as a sufficient statistic for demography -- can
   be compiled into neural network architecture rather than hard-coded into
   likelihood functions.

What does "compiled" mean here? Consider the analogy to a mechanical watch. Each
Timepiece in this book hand-crafts a specific gear train: PSMC builds a transition
matrix, tsdate implements belief propagation, ARGweaver constructs an MCMC sampler.
These are **interpreted** approaches -- they execute the population-genetic equations
step by step at inference time.

Mainspring takes a different approach. It **compiles** the equations into a fixed
neural circuit during training. The circuit does not execute the equations -- it
has learned to shortcut them. But the circuit's wiring diagram (the architecture)
mirrors the structure of the equations, so the shortcuts are faithful.

This is why Mainspring is a Complication, not a replacement. A complication in
horology adds functionality to the basic movement without altering the movement
itself. The Timepieces are the movement. Mainspring is a complication that adds
speed -- but only because the movement is sound.


What Each Timepiece Trades Away
================================

To appreciate what Mainspring attempts to unify, we must be precise about what each
Timepiece sacrifices.

.. list-table:: The Pareto frontier of classical methods
   :header-rows: 1
   :widths: 15 30 30 25

   * - Timepiece
     - Core mechanism
     - What it trades away
     - Mainspring's response
   * - :ref:`PSMC <psmc_timepiece>`
     - HMM on two haplotypes
     - Multi-sample information; piecewise-constant :math:`N_e(t)`
     - Process all samples jointly; continuous :math:`N_e(t)` via normalizing flow
   * - :ref:`SMC++ <smcpp_timepiece>`
     - ODE-based HMM with distinguished lineage
     - ARG topology; limited to :math:`\sim 200` undistinguished samples
     - Output full ARG; permutation-equivariant encoder handles arbitrary :math:`n`
   * - :ref:`ARGweaver <argweaver_timepiece>`
     - MCMC sampling of full ARG
     - Speed (:math:`O(S^2 K)` per site); limited scalability
     - Single forward pass; linear-time sliding-window attention
   * - :ref:`SINGER <singer_timepiece>`
     - Gibbs sampling of ARG with GP prior
     - Speed (hours for megabase regions); sequential processing
     - Parallel across genomic windows; batched inference
   * - :ref:`tsinfer <tsinfer_timepiece>`
     - Li & Stephens ancestor matching
     - Posterior uncertainty; dates; demographic inference
     - Posterior via dropout/ensemble; GNN dating; demographic decoder
   * - :ref:`tsdate <tsdate_timepiece>`
     - Inside-outside on fixed topology
     - Topology uncertainty; demographic inference
     - Joint topology and dating; demographic decoder
   * - :ref:`dadi <dadi_timepiece>`
     - Diffusion equation for SFS
     - Linkage information; limited to :math:`\sim 3` populations
     - Full sequence input; population structure as latent variable
   * - :ref:`moments <moments_timepiece>`
     - ODE system for SFS moments
     - Same as dadi (different numerical approach)
     - SFS used as auxiliary loss, not sole input
   * - :ref:`Gamma-SMC <gamma_smc_timepiece>`
     - Gamma-distributed coalescence-time posteriors
     - Two haplotypes only; no ARG topology
     - Gamma output heads on multi-sample GNN
   * - :ref:`phlash <phlash_timepiece>`
     - SVGD over :math:`N_e(t)` with composite likelihood
     - Composite likelihood approximation; pre-computed pairs
     - End-to-end training; full likelihood via simulation


Honest Limitations
====================

Mainspring is not a universal solution. It has fundamental limitations that no amount
of engineering can fully resolve.

**1. The simulation fidelity gap.** Mainspring is only as good as its training
simulations. If the real data-generating process includes features absent from
msprime -- gene conversion, structural variants, sequencing error, population
structure not captured by the demographic model -- the network may produce
confidently wrong answers. This is the Achilles' heel of all simulation-based
inference: the posterior is conditioned on the simulator being correct.

**2. No statistical guarantees.** Unlike MCMC methods (which are asymptotically
exact given enough iterations) or variational methods (which provide a lower bound
on the evidence), Mainspring's posterior approximation has no formal guarantees. The
network may be miscalibrated, especially in regions of parameter space poorly
represented in the training set.

**3. Extrapolation.** Neural networks extrapolate poorly. If the true demography
lies outside the prior used to generate training data -- a bottleneck more severe
than any in the training set, a population size larger than any simulated -- the
network will struggle. The training prior must be chosen carefully and validated
against held-out scenarios.

**4. Interpretability.** A Timepiece's likelihood function can be inspected term by
term. The PSMC transition matrix tells you exactly how recombination and population
size interact. Mainspring's learned representations are opaque. We can probe them
with attention maps and ablation studies, but we cannot point to a specific neuron
and say "this computes the coalescence rate."

**5. Training cost.** Training Mainspring requires millions of msprime simulations
(each producing a full ARG), hundreds of GPU-hours, and careful hyperparameter
tuning. This is a one-time cost, but it is substantial. A lab without GPU resources
may find classical methods more practical.

**6. Recombination map dependency.** Mainspring assumes a known recombination map.
Errors in the recombination map propagate into errors in the inferred ARG and
demography. This limitation is shared with ARGweaver and SINGER but not with
methods that operate on summary statistics (dadi, moments).

.. admonition:: When to use Mainspring vs. a Timepiece

   Use a Timepiece when you need **interpretable, guaranteed inference** on a
   well-characterized problem (e.g., estimating :math:`N_e(t)` from a single diploid
   genome with PSMC). Use Mainspring when you need **fast, approximate inference on
   many datasets** and are willing to validate against classical methods on a subset.

   The best practice is to use both: Mainspring for rapid screening, a Timepiece for
   careful analysis of interesting cases. This is the hybrid pipeline described in
   :ref:`mainspring_comparison`.


The Road Ahead
===============

The remaining chapters of this Complication build Mainspring from the ground up:

1. :ref:`mainspring_design_principles` -- Ten design principles, one from each
   Timepiece. These are the architectural decisions that distinguish Mainspring from
   a generic neural network.

2. :ref:`mainspring_architecture` -- The four-stage architecture in full detail:
   genomic encoder, topology decoder, dating GNN, and demographic decoder. PyTorch
   pseudocode for every component.

3. :ref:`mainspring_training` -- The simulation engine, the composite loss function,
   the curriculum training strategy, and the complete training loop.

4. :ref:`mainspring_comparison` -- A systematic comparison against every Timepiece,
   honest limitations revisited, and the hybrid pipeline that combines Mainspring
   with :ref:`Escapement <escapement_complication>` for principled refinement.

Each chapter follows the book's rhythm: motivation, math, code, verification. But
the "math" here is architecture design -- the translation of population-genetic
structure into neural network components. And the "verification" is not analytical
but empirical: we check that the network recovers known truths from simulated data.

A minimal example showing Mainspring's interface (assuming a trained model):

.. code-block:: python

   import torch
   import msprime

   # Simulate a test dataset
   ts = msprime.sim_ancestry(20, sequence_length=1e5, recombination_rate=1e-8,
                              population_size=10_000, random_seed=42)
   ts = msprime.sim_mutations(ts, rate=1.25e-8, random_seed=42)
   D = torch.tensor(ts.genotype_matrix().T, dtype=torch.float32).unsqueeze(0)

   # Run inference (single forward pass)
   model = Mainspring.load_pretrained("mainspring_v1.pt")
   model.eval()
   with torch.no_grad():
       results = model(D, hard=True)

   predicted_arg = results['topology']
   ne_posterior = results['ne_posterior']   # samples from q(N_e(t))
   node_times = results['times']           # gamma means

Let us begin with the design principles that make Mainspring more than a black box.
