{{ text_output }}
Analysis Comparison Report
Theme Network ?
â–¼ Theme Quick Reference Short labels generated by LLM and used throughout this report
Pairwise Comparisons
Select a pair to view detailed comparison metrics.
{{ comp.a.name }} vs {{ comp.b.name }}
Unbalanced Optimal Transport ?
Many-to-many alignment, but allows themes to remain unmatched when no good match exists.
Green = paraphrase ceiling, Red = word-salad floor.
Best Matches
Coverage & Hit Rates ?
Thematic analysis doesn't have ground truth, so traditional precision/recall don't apply. Instead, we measure coverage (did themes find matches?) and fidelity (how close are the best matches?). Based on {{ comp.stats.similarity_metric }} similarity. {% if comp.stats.calibration_path %} Using calibrated values {% endif %}
Proportion of themes with at least one match above threshold ({{ comparison.config.threshold }})
- Hit Rate A: {{ "%.1f"|format(comp.stats.hit_rate_a * 100) }}%
- Hit Rate B: {{ "%.1f"|format(comp.stats.hit_rate_b * 100) }}%
- Pair Match Rate: {{ "%.3f"|format(comp.stats.jaccard) }} (pairs above threshold / total pairs)
High hit rates indicate both analyses found similar conceptual territory.
How close are the best matches? (Mean of each theme's best match similarity)
- A→B: {{ "%.3f"|format(comp.stats.mean_max_sim_a_to_b) }}
- B→A: {{ "%.3f"|format(comp.stats.mean_max_sim_b_to_a) }}
- Fidelity: {{ "%.3f"|format(comp.stats.fidelity) }}
Fidelity is the harmonic mean of directional scores. Higher = tighter semantic alignment.
Hungarian Matching (1:1) ?
The Hungarian algorithm finds the optimal one-to-one pairing that maximizes total similarity. Each theme maps to at most one theme in the other set -- no reuse allowed. {% if comp.stats.calibration_path %} Using calibrated values {% endif %}
What this enables: Hungarian matching removes ambiguity by assigning each theme to at most one partner. Coverage metrics show what proportion of each set found a good match.
Limitation: This penalises legitimate theme refinement (splitting one theme into two is treated as unmatched). Use OT if you want to allow many-to-many alignment.
{{ "%.3f"|format(comp.stats.hungarian.soft_metrics.soft_precision) }}
Average similarity of optimal pairs
Interpretation: "How good are the best one-to-one correspondences?" Higher = tighter semantic alignment between the two theme sets.
{% if comp.stats.hungarian.distribution.n_pairs > 0 %}Distribution of {{ comp.stats.hungarian.distribution.n_pairs }} optimal pairs:
- Median: {{ "%.3f"|format(comp.stats.hungarian.distribution.median) }} (Q1: {{ "%.3f"|format(comp.stats.hungarian.distribution.q1) }}, Q3: {{ "%.3f"|format(comp.stats.hungarian.distribution.q3) }})
- Range: {{ "%.3f"|format(comp.stats.hungarian.distribution.min) }} -- {{ "%.3f"|format(comp.stats.hungarian.distribution.max) }}
Based on {{ comp.stats.hungarian.distribution.n_pairs }} matched pairs above threshold ({{ comparison.config.threshold }})
{% set cov_a = comp.stats.hungarian.thresholded_metrics.recall %} {% set cov_b = comp.stats.hungarian.thresholded_metrics.precision %} {% set mean_cov = (cov_a + cov_b) / 2 %}{{ "%.0f"|format(cov_a * 100) }}%
Coverage A
(A themes matched)
{{ "%.0f"|format(cov_b * 100) }}%
Coverage B
(B themes matched)
{{ "%.3f"|format(comp.stats.hungarian.thresholded_metrics.true_jaccard) }}
Jaccard Index
(set overlap)
Jaccard Index = matched / (|A| + |B| - matched). Measures overlap between theme sets after 1:1 assignment. Higher = more themes found good partners.
Hungarian algorithm finds the optimal one-to-one assignment. {% if comp.stats.calibration_path %} Similarity values are calibrated. {% endif %}
| Theme in {{ comp.a.name }} | Theme in {{ comp.b.name }} | {% if comp.stats.calibration_path %}Calibrated | Raw Angular | {% else %}Angular Similarity | {% endif %}
|---|---|---|---|---|
| {{ theme_a.set_letter }}{{ theme_a.theme_index }}: {{ theme_a.short_label | default(theme_a.theme_name[:18]) }} | {{ theme_b.set_letter }}{{ theme_b.theme_index }}: {{ theme_b.short_label | default(theme_b.theme_name[:18]) }} | {% if comp.stats.calibration_path %}{{ "%.3f"|format(similarity) }} | {{ "%.3f"|format(similarity_raw) }} | {% else %}{{ "%.3f"|format(similarity) }} | {% endif %}
No optimal pairs found.
{% endif %}Similarity Metrics
Various distance and similarity metrics between theme embeddings.
{% if comp.stats.calibration_path %}Primary Calibrated Similarity
All statistics (fidelity, coverage, thresholding, Hungarian matching) use these calibrated values. Similarities transformed using a calibration model trained on paraphrases.
| {% for j in range(comp.stats.selected_similarity_matrix[0]|length) %} | {{ j+1 }} | {% endfor %}
|---|---|
| {{ i+1 }} | {% for val in row %}{{ "%.2f"|format(val) }} | {% endfor %}
{% if comp.stats.calibration_path %} Raw {% endif %} Angular Similarity ?
Angular distance uses the angle between embedding vectors (arccos of cosine), normalised to [0,1]. Unlike cosine, it satisfies the triangle inequality, making it a proper metric. {% if comp.stats.calibration_path %}These are RAW values before calibration.{% endif %}
| {% for j in range(comp.stats.angle_similarity_matrix[0]|length) %} | {{ j+1 }} | {% endfor %}
|---|---|
| {{ i+1 }} | {% for val in row %}{{ "%.3f"|format(val) }} | {% endfor %}
{% if comp.stats.calibration_path %} Raw {% endif %} Cosine Similarity
Raw cosine similarity between embedding vectors. Not a proper metric (doesn't satisfy triangle inequality). {% if comp.stats.calibration_path %}These are RAW values before calibration.{% endif %}
| {% for j in range(comp.stats.similarity_matrix[0]|length) %} | {{ j+1 }} | {% endfor %}
|---|---|
| {{ i+1 }} | {% for val in row %}{{ "%.3f"|format(val) }} | {% endfor %}
Alternative Distance Metrics
Shepard Similarity ?
Exponential decay on angular distance (k={{ comp.stats.shepard_k_value }}). Cognitively realistic similarity function.
â–¶ Themes & Embeddings Full text of embedded themes for each analysis
The actual strings that were embedded for similarity comparison. Labels are used in plots; embedded strings are used for calculating similarity.
{% for analysis_name, themes in comparison.embedded_strings.items() %}| ID | Theme Name | Short Label | Embedded String |
|---|---|---|---|
| {{ item.set_letter }}{{ item.theme_index }} | {{ item.theme_name }} | {{ item.set_letter }}{{ item.theme_index }}: {{ item.short_label | default('Theme ' ~ item.theme_index) }} | {{ item.embedded_string }} |
â–¶ Synthetic Baselines Paraphrase ceiling and word-salad floor calibration data
Synthetic baselines help calibrate alignment scores by providing reference points for "best case" (paraphrase ceiling) and "worst case" (word-salad floor) scenarios.
{% for key, comp in comparison.by_comparisons().items() %}{{ comp.a.name }} vs {{ comp.b.name }}
{% if comp.stats.paraphrase_baseline %}LLM-generated paraphrases of each theme establish what alignment looks like when themes have identical meaning but different wording. This represents the best achievable similarity between semantically equivalent analyses.
- Mean self-similarity: {{ "%.3f"|format(comp.stats.paraphrase_baseline.paraphrase_similarity_mean) if comp.stats.paraphrase_baseline.paraphrase_similarity_mean is not none else "N/A" }}
- Model: {{ comp.stats.paraphrase_baseline.metadata.model }}
- Paraphrases per theme: {{ comp.stats.paraphrase_baseline.metadata.n_paraphrases }}
{{ comp.a.name }} samples:
{% for sample in comp.stats.paraphrase_baseline.samples_a[:3] %}{{ comp.b.name }} samples:
{% for sample in comp.stats.paraphrase_baseline.samples_b[:3] %}Word salad is generated by randomly shuffling words from themes, destroying semantic meaning. This represents what you'd expect from random text with similar vocabulary -- a floor below which alignment cannot meaningfully fall.
- Samples generated: {{ comp.stats.word_salad_samples|length }}
- Method: Words randomly shuffled while preserving theme length {% if comp.stats.word_salad_self_similarity_raw is not none %}
- Self-similarity: {{ "%.3f"|format(comp.stats.word_salad_self_similarity_raw) }} (rescaled: {{ "%.3f"|format(comp.stats.word_salad_self_similarity) }}) {% endif %}
{{ comp.stats.color_scale_info.description }}
- Green Strong match (low cost, well below K threshold)
- Amber Moderate match
- Red Marginal match (cost approaching K, close to being dropped)
Baselines: word-salad = {{ "%.3f"|format(comp.stats.color_scale_info.word_salad_self_similarity) }}, paraphrase = {{ "%.3f"|format(comp.stats.color_scale_info.paraphrase_self_similarity) }}
Notes & Explanations
Reference guide for interpreting the metrics and visualisations in this report.
UMAP Interpretation
UMAP is a non-linear dimensionality-reduction method that prioritises preserving local neighbourhood structure rather than global variance. Nearby points can be interpreted as closely related themes, while larger-scale distances and cluster shapes should be interpreted qualitatively rather than metrically. This plot is intended as an exploratory visualisation of thematic relationships, not as a quantitative evaluation.
Angular Distance
Angular distance uses the angle between embedding vectors (arccos of cosine), normalised to [0,1]. Unlike cosine similarity, it satisfies the triangle inequality, making it a proper metric for averaging and comparison. A value of 1.0 indicates identical vectors; 0.0 indicates orthogonal vectors.
Optimal Transport
Unbalanced Optimal Transport (OT) finds the most efficient way to "transport" mass from one theme set to another while allowing some mass to remain unmatched. Unlike standard matching algorithms, OT can express many-to-many relationships -- a single theme can contribute to multiple themes in the other set, and vice versa. This is appropriate when themes may be split, merged, or refined across analyses.
K Parameter
The K parameter (reg_m) controls the penalty for leaving mass unmatched in optimal transport. Lower K = more selective matching (themes must be very similar to match). Higher K = more permissive matching (more mass is transported). The scree plots help identify the elbow where increasing K provides diminishing returns.
Shared Mass
Shared mass is the proportion of total thematic content that was matched between the two sets. A value of 100% means all themes found partners; lower values indicate some themes were left unmatched (novel or missing concepts).
Semantic Alignment
Alignment measures the quality of theme-to-theme matches (computed as 1 - transport cost). Higher alignment means matched themes are more semantically similar. This is the "how good" complement to shared mass's "how much".
Coverage & Hit Rates
Coverage measures what proportion of themes found at least one match above the similarity threshold. Hit Rate A = proportion of A themes with matches; Hit Rate B = proportion of B themes with matches. High hit rates indicate both analyses found similar conceptual territory.
Fidelity
Fidelity is the harmonic mean of directional best-match scores (A→B and B→A). It measures "how close are the best matches on average?" Higher fidelity = tighter semantic alignment between theme sets.
Hungarian Algorithm
The Hungarian algorithm finds the optimal one-to-one pairing that maximises total similarity. Each theme maps to at most one partner -- no reuse allowed. This removes ambiguity but penalises legitimate theme refinement (splitting one theme into two appears as unmatched).
Paraphrase Ceiling
LLM-generated paraphrases of each theme establish what alignment looks like when themes have identical meaning but different wording. If observed alignment reaches this level, the analyses are semantically equivalent.
Word Salad Floor
Word salad is generated by randomly shuffling words from themes, destroying semantic meaning. This represents what you'd expect from random text with similar vocabulary. The further above this baseline, the more genuine the semantic similarity.
Theme Preference Structure
This measures whether themes have preferred partners that enable meaningful alignment, or are largely interchangeable (diffuse overlap).
- Relative preference: How much better is each theme's best match compared to its average match, as a percentage of mean cost. Higher values (>15%) indicate themes have clear preferred partners.
- Preference strength: The absolute difference between best and average match costs. Larger values mean stronger preferences.
- Spikiness: How varied the cost matrix is (std/range). Higher values (>0.2) mean some pairs are much more similar than others.
When preferences are weak (<8%), the two analyses describe overlapping conceptual territory but themes don't map neatly onto each other -- they're interchangeable from the OT perspective. This isn't necessarily a problem; it characterises the relationship as diffuse overlap rather than one-to-one correspondence.
Effect Size (MADs)
Effect size shows how many MADs (median absolute deviations) above the word-salad baseline the observed value falls. This measures how distinct the observed alignment is from random text -- higher values indicate more meaningful semantic similarity. MAD is a robust measure less sensitive to outliers than standard deviation.
Sankey Colour Scale
The Sankey diagram supports two colour modes, selectable via the toggle button:
1. Similarity-based (default): Uses absolute similarity thresholds for consistent interpretation across different K values.
- Green: similarity >= {{ comparison.config.color_green | default(0.75) }} (strong semantic match)
- Amber: moderate similarity
- Red: similarity <= {{ comparison.config.color_red | default(0.4) }} (weak match)
2. K-relative: Shows how close each match is to being dropped at the current K threshold.
- Green: cost < {{ comparison.config.k_color_green | default(0.2) }}×K (strong match, well below threshold)
- Amber: moderate cost
- Red: cost > {{ comparison.config.k_color_red | default(1.1) }}×K (marginal match, close to being dropped)
In K-relative mode, the scale adapts to each K value. Links with cost near K appear red because the OT algorithm is nearly indifferent between keeping them or destroying the mass. This mode is useful for understanding the sensitivity of matches to the K parameter.
Transport Heatmap
The transport heatmap shows how mass flows from A themes (rows) to B themes (columns). Cell values show percentage of total transported mass. Darker cells indicate stronger transport links.
Splits & Joins
Splits measure how many B themes each A theme connects to (1.0 = perfect 1:1). Joins measure how many A themes each B theme receives from. Higher values indicate more many-to-many relationships in the transport plan.
Shepard Similarity
Shepard similarity applies exponential decay to angular distance, providing a cognitively realistic similarity function where small distance differences matter more for nearby items than for distant ones.
Configuration
{{ comparison.config | tojson(indent=2) }}
Calibration Model
Similarity scores are transformed using a calibration model trained on paraphrases of varying semantic similarity.
Training Information
| Embedding Model | {{ calibration_info.metadata.embedding_model }} |
| Template | {{ calibration_info.metadata.embedding_template }} |
| Training Samples | {{ calibration_info.metadata.validation.n_train }} |
| Test Samples | {{ calibration_info.metadata.validation.n_test }} |
| Category Accuracy | {{ "%.1f" | format(calibration_info.metadata.validation.category_accuracy * 100) }}% |
Target Values
| Category | Target | Raw Mean | Raw Range | ||
|---|---|---|---|---|---|
| {{ cat }} | {{ "%.2f" | format(target) }} | {% if calibration_info.metadata.category_stats[cat] %}{{ "%.3f" | format(calibration_info.metadata.category_stats[cat].mean) }} | {{ "%.3f" | format(calibration_info.metadata.category_stats[cat].min) }} - {{ "%.3f" | format(calibration_info.metadata.category_stats[cat].max) }} | {% else %}- | - | {% endif %}
Calibration Curve
Raw angular similarity (x-axis) mapped to calibrated semantic similarity (y-axis). Coloured points show training data for each paraphrase category.
{% if calibration_info.plot_base64 %}Sample Transformations
| Raw | Calibrated |
|---|---|
| {{ "%.3f" | format(pt.raw) }} | {{ "%.3f" | format(pt.calibrated) }} |