Example: Cross-dataset Comparison¶

Comparing datasets that describe different populations.

Scenario

What someone tries to do:

Compare employment statistics from two surveys
Chart trends across different census releases

What they expect:

Direct comparison of values from both datasets

Why it's wrong (or risky)¶

Datasets often describe different universes — the population they cover. Comparing values across incompatible universes produces misleading results.

Example:

Dataset	Universe	Employment Rate
Survey A	All adults (18+)	65%
Survey B	Working-age adults (18-64)	72%

These rates are not directly comparable — Survey B excludes retirees.

What Invariant detects¶

Claim violated: Datasets have different universe definitions
Evidence: Universe mismatch between survey_a and survey_b
Rule: UniverseComparabilityRule

Acknowledge required

Datasets have different universes. Comparison requires explicit acknowledgment.

Warn

For partial comparability, results include disclosure about universe differences.

Typical remediations¶

Acknowledge the difference — Accept the comparison with disclosed caveats
Filter to common universe — Restrict both datasets to comparable populations
Use separate visualizations — Show datasets side-by-side with clear labels

What to do next¶

Concepts: Universe — What universes are and why they matter
Integration: Query Lifecycle — How acknowledgment flows work