{% if text_output %}
{{ text_output }}
{% endif %}

Analysis Comparison Report

UMAP projection of theme embeddings

Theme Network ?

â–¼ Theme Quick Reference Short labels generated by LLM and used throughout this report
{% for analysis_name, themes in comparison.embedded_strings.items() %} {% set set_letter = themes[0].set_letter if themes else 'X' %}
{{ set_letter }} {{ analysis_name }} {{ themes | length }} themes
{% endfor %}

Pairwise Comparisons

Select a pair to view detailed comparison metrics.

{% for key, comp in comparison.by_comparisons().items() %} {% set pair_idx = loop.index %}

{{ comp.a.name }} vs {{ comp.b.name }}

Unbalanced Optimal Transport ?

Many-to-many alignment, but allows themes to remain unmatched when no good match exists.

Shared mass
{% if comp.stats.alignment_scree %}
Alignment
{% endif %} {% if comp.stats.splits_joins_scree %}
Splits/Joins
{% endif %}

Green = paraphrase ceiling, Red = word-salad floor.

{% for k_val in comp.stats.k_values if k_val >= 0.1 %} {% set ot_k = comp.stats.ot_by_k[k_val] %}
K = {{ k_val }} ? {% if k_val == comp.stats.diminishing_k %}diminishing returns{% endif %} {% if k_val == comp.stats.default_k %}default{% endif %}
Shared Mass ?: {{ "%.1f"|format(ot_k.ot.shared_mass * 100) }}% {% if ot_k.ot.shared_mass_pct_of_ceiling is defined %} ({{ "%.1f"|format(ot_k.ot.shared_mass_pct_of_ceiling * 100) }}% of ceiling) {% endif %}
Alignment ?: {% if ot_k.ot.alignment_observed is defined %}{{ "%.1f"|format(ot_k.ot.alignment_observed) }}{% else %}{{ "%.1f"|format(1 - ot_k.ot.avg_cost) }}{% endif %}
Effect ?: {{ "%.1f"|format(ot_k.ot.shared_mass_effect) }} MADs
{% if ot_k.ot.filter_stats is defined and ot_k.ot.filter_stats.filtering_enabled %} Splits: {{ "%.1f"|format(ot_k.ot.filtered_split_join_stats.splits_from_a.mean) }} avg Joins: {{ "%.1f"|format(ot_k.ot.filtered_split_join_stats.joins_to_b.mean) }} avg (filtered) {% else %} Splits: {{ "%.1f"|format(ot_k.split_join_stats.splits_from_a.mean) }} avg Joins: {{ "%.1f"|format(ot_k.split_join_stats.joins_to_b.mean) }} avg {% endif %}
{% if ot_k.ot.filter_stats is defined and ot_k.ot.filter_stats.filtering_enabled %}
Transport filtered at {{ "%.0f"|format(ot_k.ot.filter_stats.threshold * 100) }}% threshold: {{ "%.1f"|format(ot_k.ot.filter_stats.mass_retained_pct) }}% mass retained, {{ ot_k.ot.filter_stats.edges_filtered }}/{{ ot_k.ot.filter_stats.edges_original }} edges
{% endif %} {% if ot_k.ot.cost_structure is defined %}
Theme preferences: {{ "%.0f"|format(ot_k.ot.cost_structure.relative_preference * 100) }}% relative preference ? — {{ ot_k.ot.cost_structure.interpretation }}
{% endif %} {% if ot_k.ot.paraphrase_upper_bound is defined or ot_k.ot.null_shared_mass_mean is defined or ot_k.ot.cost_structure is defined %}
Shared Mass Baselines:
    {% if ot_k.ot.paraphrase_upper_bound is defined %}
  • â–² Paraphrase ceiling ?: {{ "%.1f"|format(ot_k.ot.paraphrase_upper_bound * 100) }}%
  • {% endif %} {% if ot_k.ot.null_shared_mass_mean is defined %}
  • â–¼ Word-salad floor ?: {{ "%.1f"|format(ot_k.ot.null_shared_mass_mean * 100) }}%
  • {% endif %}
{% if ot_k.ot.cost_structure is defined %} Theme Preference Structure ?:
  • Relative preference: {{ "%.0f"|format(ot_k.ot.cost_structure.relative_preference * 100) }}%
  • Preference strength: {{ "%.3f"|format(ot_k.ot.cost_structure.avg_preference_strength) }}
  • Spikiness: {{ "%.2f"|format(ot_k.ot.cost_structure.spikiness) }}
  • {{ ot_k.ot.cost_structure.interpretation }}
{% else %} Alignment:
    {% if ot_k.ot.alignment_paraphrase_ceiling is defined %}
  • â–² Paraphrase ceiling ?: {{ "%.1f"|format(ot_k.ot.alignment_paraphrase_ceiling) }}
  • {% endif %} {% if ot_k.ot.alignment_null_floor is defined %}
  • â–¼ Word-salad floor ?: {{ "%.1f"|format(ot_k.ot.alignment_null_floor) }}
  • {% endif %}
{% endif %}
{% endif %}
Transport Flow ?
Transport Matrix ?
Transport heatmap
Many-to-Many Matches

{{ comp.a.name }} →

{% for match in ot_k.best_matches_a_to_b[:5] %} {% set theme_a = comp.embedded_a[match.theme_a_index] %} {% set theme_b = comp.embedded_b[match.theme_b_index] %} {% endfor %}
{{ theme_a.set_letter }}{{ theme_a.theme_index }}: {{ theme_a.short_label | default(theme_a.theme_name[:12]) }} → {{ theme_b.set_letter }}{{ theme_b.theme_index }}: {{ theme_b.short_label | default(theme_b.theme_name[:12]) }} {{ "%.0f"|format(match.mass_total * 100) }}%

{{ comp.b.name }} →

{% for match in ot_k.best_matches_b_to_a[:5] %} {% set theme_b = comp.embedded_b[match.theme_b_index] %} {% set theme_a = comp.embedded_a[match.theme_a_index] %} {% endfor %}
{{ theme_b.set_letter }}{{ theme_b.theme_index }}: {{ theme_b.short_label | default(theme_b.theme_name[:12]) }} → {{ theme_a.set_letter }}{{ theme_a.theme_index }}: {{ theme_a.short_label | default(theme_a.theme_name[:12]) }} {{ "%.0f"|format(match.mass_total * 100) }}%
{% endfor %}

Best Matches

Coverage & Hit Rates ?

Thematic analysis doesn't have ground truth, so traditional precision/recall don't apply. Instead, we measure coverage (did themes find matches?) and fidelity (how close are the best matches?). Based on {{ comp.stats.similarity_metric }} similarity. {% if comp.stats.calibration_path %} Using calibrated values {% endif %}

Coverage (Hit Rates)

Proportion of themes with at least one match above threshold ({{ comparison.config.threshold }})

  • Hit Rate A: {{ "%.1f"|format(comp.stats.hit_rate_a * 100) }}%
  • Hit Rate B: {{ "%.1f"|format(comp.stats.hit_rate_b * 100) }}%
  • Pair Match Rate: {{ "%.3f"|format(comp.stats.jaccard) }} (pairs above threshold / total pairs)

High hit rates indicate both analyses found similar conceptual territory.

Fidelity (Match Quality)

How close are the best matches? (Mean of each theme's best match similarity)

  • A→B: {{ "%.3f"|format(comp.stats.mean_max_sim_a_to_b) }}
  • B→A: {{ "%.3f"|format(comp.stats.mean_max_sim_b_to_a) }}
  • Fidelity: {{ "%.3f"|format(comp.stats.fidelity) }}

Fidelity is the harmonic mean of directional scores. Higher = tighter semantic alignment.


Hungarian Matching (1:1) ?

The Hungarian algorithm finds the optimal one-to-one pairing that maximizes total similarity. Each theme maps to at most one theme in the other set -- no reuse allowed. {% if comp.stats.calibration_path %} Using calibrated values {% endif %}

Intuition: "If I had to explain set B's themes to someone who only knew set A, which single theme in A would each B theme correspond to, with no reuse?"

What this enables: Hungarian matching removes ambiguity by assigning each theme to at most one partner. Coverage metrics show what proportion of each set found a good match.

Limitation: This penalises legitimate theme refinement (splitting one theme into two is treated as unmatched). Use OT if you want to allow many-to-many alignment.
Mean Matched Similarity (primary metric)

{{ "%.3f"|format(comp.stats.hungarian.soft_metrics.soft_precision) }}

Average similarity of optimal pairs

Interpretation: "How good are the best one-to-one correspondences?" Higher = tighter semantic alignment between the two theme sets.

{% if comp.stats.hungarian.distribution.n_pairs > 0 %}

Distribution of {{ comp.stats.hungarian.distribution.n_pairs }} optimal pairs:

  • Median: {{ "%.3f"|format(comp.stats.hungarian.distribution.median) }}   (Q1: {{ "%.3f"|format(comp.stats.hungarian.distribution.q1) }}, Q3: {{ "%.3f"|format(comp.stats.hungarian.distribution.q3) }})
  • Range: {{ "%.3f"|format(comp.stats.hungarian.distribution.min) }} -- {{ "%.3f"|format(comp.stats.hungarian.distribution.max) }}
{% endif %}
Coverage & Set Overlap

Based on {{ comp.stats.hungarian.distribution.n_pairs }} matched pairs above threshold ({{ comparison.config.threshold }})

{% set cov_a = comp.stats.hungarian.thresholded_metrics.recall %} {% set cov_b = comp.stats.hungarian.thresholded_metrics.precision %} {% set mean_cov = (cov_a + cov_b) / 2 %}

{{ "%.0f"|format(cov_a * 100) }}%

Coverage A

(A themes matched)

{{ "%.0f"|format(cov_b * 100) }}%

Coverage B

(B themes matched)

{{ "%.3f"|format(comp.stats.hungarian.thresholded_metrics.true_jaccard) }}

Jaccard Index

(set overlap)


Jaccard Index = matched / (|A| + |B| - matched). Measures overlap between theme sets after 1:1 assignment. Higher = more themes found good partners.

Optimal Matched Pairs ({{ comp.stats.hungarian.all_pairs|length }})
{% if comp.stats.hungarian.all_pairs|length > 0 %}

Hungarian algorithm finds the optimal one-to-one assignment. {% if comp.stats.calibration_path %} Similarity values are calibrated. {% endif %}

{% if comp.stats.calibration_path %} {% else %} {% endif %} {% for pair_data in comp.stats.hungarian.all_pairs %} {% set i = pair_data[0] %} {% set j = pair_data[1] %} {% set similarity = pair_data[2] %} {% set similarity_raw = pair_data[3] if pair_data|length > 3 else similarity %} {% set theme_a = comp.embedded_a[i] %} {% set theme_b = comp.embedded_b[j] %} {% if comp.stats.calibration_path %} {% else %} {% endif %} {% endfor %}
Theme in {{ comp.a.name }} Theme in {{ comp.b.name }}Calibrated Raw AngularAngular Similarity
{{ theme_a.set_letter }}{{ theme_a.theme_index }}: {{ theme_a.short_label | default(theme_a.theme_name[:18]) }} {{ theme_b.set_letter }}{{ theme_b.theme_index }}: {{ theme_b.short_label | default(theme_b.theme_name[:18]) }} {{ "%.3f"|format(similarity) }} {{ "%.3f"|format(similarity_raw) }}{{ "%.3f"|format(similarity) }}
{% else %}

No optimal pairs found.

{% endif %}

Similarity Metrics

Various distance and similarity metrics between theme embeddings.

{% if comp.stats.calibration_path %}
Primary Calibrated Similarity

All statistics (fidelity, coverage, thresholding, Hungarian matching) use these calibrated values. Similarities transformed using a calibration model trained on paraphrases.

Continuous Values (Calibrated)
{% if comp.plots.heatmaps_rescaled %} Calibrated similarity heatmap {% endif %}
Binary Match (threshold={{ comparison.config.threshold }})
{% if comp.plots.heatmaps_rescaled_thresholded %} Calibrated thresholded heatmap {% endif %}
{% for j in range(comp.stats.selected_similarity_matrix[0]|length) %} {% endfor %} {% for i, row in enumerate(comp.stats.selected_similarity_matrix) %} {% for val in row %} {% endfor %} {% endfor %}
{{ j+1 }}
{{ i+1 }}{{ "%.2f"|format(val) }}
{% endif %}
{% if comp.stats.calibration_path %} Raw {% endif %} Angular Similarity ?

Angular distance uses the angle between embedding vectors (arccos of cosine), normalised to [0,1]. Unlike cosine, it satisfies the triangle inequality, making it a proper metric. {% if comp.stats.calibration_path %}These are RAW values before calibration.{% endif %}

Continuous Values (Raw)
Angular distance heatmap
{% for j in range(comp.stats.angle_similarity_matrix[0]|length) %} {% endfor %} {% for i, row in enumerate(comp.stats.angle_similarity_matrix) %} {% for val in row %} {% endfor %} {% endfor %}
{{ j+1 }}
{{ i+1 }}{{ "%.3f"|format(val) }}
{% if comp.stats.calibration_path %} Raw {% endif %} Cosine Similarity

Raw cosine similarity between embedding vectors. Not a proper metric (doesn't satisfy triangle inequality). {% if comp.stats.calibration_path %}These are RAW values before calibration.{% endif %}

Continuous Values (Raw)
Cosine similarity heatmap
{% for j in range(comp.stats.similarity_matrix[0]|length) %} {% endfor %} {% for i, row in enumerate(comp.stats.similarity_matrix) %} {% for val in row %} {% endfor %} {% endfor %}
{{ j+1 }}
{{ i+1 }}{{ "%.3f"|format(val) }}
Alternative Distance Metrics
Shepard Similarity ?

Exponential decay on angular distance (k={{ comp.stats.shepard_k_value }}). Cognitively realistic similarity function.

Shepard similarity heatmap
{% endfor %}

Full text of embedded themes for each analysis

The actual strings that were embedded for similarity comparison. Labels are used in plots; embedded strings are used for calculating similarity.

{% for analysis_name, themes in comparison.embedded_strings.items() %}
{{ analysis_name }} -- Embedded Themes ({{ themes | length }})
{% for item in themes %} {% endfor %}
ID Theme Name Short Label Embedded String
{{ item.set_letter }}{{ item.theme_index }} {{ item.theme_name }} {{ item.set_letter }}{{ item.theme_index }}: {{ item.short_label | default('Theme ' ~ item.theme_index) }} {{ item.embedded_string }}
{% endfor %}

Paraphrase ceiling and word-salad floor calibration data

Synthetic baselines help calibrate alignment scores by providing reference points for "best case" (paraphrase ceiling) and "worst case" (word-salad floor) scenarios.

{% for key, comp in comparison.by_comparisons().items() %}
{{ comp.a.name }} vs {{ comp.b.name }}
{% if comp.stats.paraphrase_baseline %}
Paraphrase Ceiling -- Best-case reference

LLM-generated paraphrases of each theme establish what alignment looks like when themes have identical meaning but different wording. This represents the best achievable similarity between semantically equivalent analyses.

  • Mean self-similarity: {{ "%.3f"|format(comp.stats.paraphrase_baseline.paraphrase_similarity_mean) if comp.stats.paraphrase_baseline.paraphrase_similarity_mean is not none else "N/A" }}
  • Model: {{ comp.stats.paraphrase_baseline.metadata.model }}
  • Paraphrases per theme: {{ comp.stats.paraphrase_baseline.metadata.n_paraphrases }}
{{ comp.a.name }} samples:
{% for sample in comp.stats.paraphrase_baseline.samples_a[:3] %}
Original:
{{ sample.original[:100] }}{% if sample.original|length > 100 %}...{% endif %}
Paraphrases: {% for para in sample.paraphrases[:2] %}
• {{ para[:80] }}{% if para|length > 80 %}...{% endif %}
{% endfor %}
{% endfor %}
{{ comp.b.name }} samples:
{% for sample in comp.stats.paraphrase_baseline.samples_b[:3] %}
Original:
{{ sample.original[:100] }}{% if sample.original|length > 100 %}...{% endif %}
Paraphrases: {% for para in sample.paraphrases[:2] %}
• {{ para[:80] }}{% if para|length > 80 %}...{% endif %}
{% endfor %}
{% endfor %}
{% endif %} {% if comp.stats.word_salad_samples %}
Word Salad Floor -- Random baseline

Word salad is generated by randomly shuffling words from themes, destroying semantic meaning. This represents what you'd expect from random text with similar vocabulary -- a floor below which alignment cannot meaningfully fall.

  • Samples generated: {{ comp.stats.word_salad_samples|length }}
  • Method: Words randomly shuffled while preserving theme length
  • {% if comp.stats.word_salad_self_similarity_raw is not none %}
  • Self-similarity: {{ "%.3f"|format(comp.stats.word_salad_self_similarity_raw) }} (rescaled: {{ "%.3f"|format(comp.stats.word_salad_self_similarity) }})
  • {% endif %}
{% for sample in comp.stats.word_salad_samples[:3] %}
Sample {{ loop.index }}
{% for text in sample[:5] %}
{{ loop.index }}. {{ text[:100] }}{% if text|length > 100 %}...{% endif %}
{% endfor %} {% if sample|length > 5 %}
... and {{ sample|length - 5 }} more
{% endif %}
{% endfor %}
{% endif %} {% if comp.stats.color_scale_info %}
Colour Scale -- Sankey diagram interpretation

{{ comp.stats.color_scale_info.description }}

  • Green Strong match (low cost, well below K threshold)
  • Amber Moderate match
  • Red Marginal match (cost approaching K, close to being dropped)

Baselines: word-salad = {{ "%.3f"|format(comp.stats.color_scale_info.word_salad_self_similarity) }}, paraphrase = {{ "%.3f"|format(comp.stats.color_scale_info.paraphrase_self_similarity) }}

{% endif %}
{% endfor %}

Notes & Explanations

Reference guide for interpreting the metrics and visualisations in this report.

UMAP Interpretation

UMAP is a non-linear dimensionality-reduction method that prioritises preserving local neighbourhood structure rather than global variance. Nearby points can be interpreted as closely related themes, while larger-scale distances and cluster shapes should be interpreted qualitatively rather than metrically. This plot is intended as an exploratory visualisation of thematic relationships, not as a quantitative evaluation.

Angular Distance

Angular distance uses the angle between embedding vectors (arccos of cosine), normalised to [0,1]. Unlike cosine similarity, it satisfies the triangle inequality, making it a proper metric for averaging and comparison. A value of 1.0 indicates identical vectors; 0.0 indicates orthogonal vectors.

Optimal Transport

Unbalanced Optimal Transport (OT) finds the most efficient way to "transport" mass from one theme set to another while allowing some mass to remain unmatched. Unlike standard matching algorithms, OT can express many-to-many relationships -- a single theme can contribute to multiple themes in the other set, and vice versa. This is appropriate when themes may be split, merged, or refined across analyses.

K Parameter

The K parameter (reg_m) controls the penalty for leaving mass unmatched in optimal transport. Lower K = more selective matching (themes must be very similar to match). Higher K = more permissive matching (more mass is transported). The scree plots help identify the elbow where increasing K provides diminishing returns.

Shared Mass

Shared mass is the proportion of total thematic content that was matched between the two sets. A value of 100% means all themes found partners; lower values indicate some themes were left unmatched (novel or missing concepts).

Semantic Alignment

Alignment measures the quality of theme-to-theme matches (computed as 1 - transport cost). Higher alignment means matched themes are more semantically similar. This is the "how good" complement to shared mass's "how much".

Coverage & Hit Rates

Coverage measures what proportion of themes found at least one match above the similarity threshold. Hit Rate A = proportion of A themes with matches; Hit Rate B = proportion of B themes with matches. High hit rates indicate both analyses found similar conceptual territory.

Fidelity

Fidelity is the harmonic mean of directional best-match scores (A→B and B→A). It measures "how close are the best matches on average?" Higher fidelity = tighter semantic alignment between theme sets.

Hungarian Algorithm

The Hungarian algorithm finds the optimal one-to-one pairing that maximises total similarity. Each theme maps to at most one partner -- no reuse allowed. This removes ambiguity but penalises legitimate theme refinement (splitting one theme into two appears as unmatched).

Paraphrase Ceiling

LLM-generated paraphrases of each theme establish what alignment looks like when themes have identical meaning but different wording. If observed alignment reaches this level, the analyses are semantically equivalent.

Word Salad Floor

Word salad is generated by randomly shuffling words from themes, destroying semantic meaning. This represents what you'd expect from random text with similar vocabulary. The further above this baseline, the more genuine the semantic similarity.

Theme Preference Structure

This measures whether themes have preferred partners that enable meaningful alignment, or are largely interchangeable (diffuse overlap).

When preferences are weak (<8%), the two analyses describe overlapping conceptual territory but themes don't map neatly onto each other -- they're interchangeable from the OT perspective. This isn't necessarily a problem; it characterises the relationship as diffuse overlap rather than one-to-one correspondence.

Effect Size (MADs)

Effect size shows how many MADs (median absolute deviations) above the word-salad baseline the observed value falls. This measures how distinct the observed alignment is from random text -- higher values indicate more meaningful semantic similarity. MAD is a robust measure less sensitive to outliers than standard deviation.

Sankey Colour Scale

The Sankey diagram supports two colour modes, selectable via the toggle button:

1. Similarity-based (default): Uses absolute similarity thresholds for consistent interpretation across different K values.

2. K-relative: Shows how close each match is to being dropped at the current K threshold.

In K-relative mode, the scale adapts to each K value. Links with cost near K appear red because the OT algorithm is nearly indifferent between keeping them or destroying the mass. This mode is useful for understanding the sensitivity of matches to the K parameter.

Transport Heatmap

The transport heatmap shows how mass flows from A themes (rows) to B themes (columns). Cell values show percentage of total transported mass. Darker cells indicate stronger transport links.

Splits & Joins

Splits measure how many B themes each A theme connects to (1.0 = perfect 1:1). Joins measure how many A themes each B theme receives from. Higher values indicate more many-to-many relationships in the transport plan.

Shepard Similarity

Shepard similarity applies exponential decay to angular distance, providing a cognitively realistic similarity function where small distance differences matter more for nearby items than for distant ones.


Configuration

{{ comparison.config | tojson(indent=2) }}
{% if calibration_info %}

Calibration Model

Similarity scores are transformed using a calibration model trained on paraphrases of varying semantic similarity.

Training Information
{% if calibration_info.metadata.validation %} {% endif %}
Embedding Model {{ calibration_info.metadata.embedding_model }}
Template {{ calibration_info.metadata.embedding_template }}
Training Samples {{ calibration_info.metadata.validation.n_train }}
Test Samples {{ calibration_info.metadata.validation.n_test }}
Category Accuracy {{ "%.1f" | format(calibration_info.metadata.validation.category_accuracy * 100) }}%
Target Values
{% for cat, target in calibration_info.metadata.targets.items() %} {% if calibration_info.metadata.category_stats[cat] %} {% else %} {% endif %} {% endfor %}
Category Target Raw Mean Raw Range
{{ cat }} {{ "%.2f" | format(target) }}{{ "%.3f" | format(calibration_info.metadata.category_stats[cat].mean) }} {{ "%.3f" | format(calibration_info.metadata.category_stats[cat].min) }} - {{ "%.3f" | format(calibration_info.metadata.category_stats[cat].max) }}- -
Calibration Curve

Raw angular similarity (x-axis) mapped to calibrated semantic similarity (y-axis). Coloured points show training data for each paraphrase category.

{% if calibration_info.plot_base64 %} Calibration curve {% else %} Raw Similarity Calibrated 0.5 0.75 1.0 0 0.5 1.0 {% for pt in calibration_info.transformation %} {% endfor %} {% endif %}
Sample Transformations
{% for pt in calibration_info.transformation %} {% if loop.index0 % 4 == 0 %} {% endif %} {% endfor %}
Raw Calibrated
{{ "%.3f" | format(pt.raw) }} {{ "%.3f" | format(pt.calibrated) }}
{% endif %}