Skip to content

Appendix: Sample Project

Census Explorer is a complete working example that demonstrates Invariant in action.

Overview

The sample project demonstrates:

  • Implementing CatalogStore — loading metadata from JSON
  • Implementing QueryEngine — executing queries with DuckDB
  • Building a CLI — validating and querying with semantic rules
  • Validation in practice — seeing queries allowed, warned, or blocked

Location

examples/sample-project/

Installation

cd examples/sample-project
pip install -e ../../  # Install invariant
pip install -e .       # Install census-explorer

Quick commands

# List available data products
census-explorer list-data-products

# Validate a query (allowed)
census-explorer validate aa0e8400-e29b-41d4-a716-446655440001 \
    -m population:SUM -d geography_code

# Validate a query (blocked - can't sum an indicator)
census-explorer validate aa0e8400-e29b-41d4-a716-446655440002 \
    -m unemployment_rate:SUM -d geography_code

# Execute a query
census-explorer query aa0e8400-e29b-41d4-a716-446655440001 \
    -m population:SUM -d geography_code -d sex

File layout

examples/sample-project/
├── pyproject.toml                 # Package configuration
├── README.md                      # Detailed usage guide
├── data/
│   ├── catalog.json               # Metadata definitions
│   ├── census_demographics.parquet    # Population data (306 rows)
│   └── labour_force.parquet           # Employment data (9 rows)
└── src/census_explorer/
    ├── cli.py                     # CLI commands (Typer)
    └── infrastructure/
        ├── json_catalog.py        # CatalogStore implementation
        └── duckdb_engine.py       # QueryEngine implementation

Data products

ID Name Kind Description
aa0e8400-e29b-41d4-a716-446655440001 Population by Geography and Demographics FACT Census population counts — can SUM
aa0e8400-e29b-41d4-a716-446655440002 Labour Force Indicators INDICATOR Employment rates — cannot SUM

What to inspect

data/catalog.json

The metadata file defines:

  • 2 studies (Census 2021, Labour Force Survey 2023)
  • 2 universes (population definitions)
  • 2 datasets
  • 2 data products with variables
  • 1 indicator definition (unemployment_rate with NOT_AGGREGATABLE policy)

infrastructure/json_catalog.py

Shows how to implement the CatalogStore port:

  • Load entities from JSON
  • Build a CatalogSnapshot for validation
  • List and retrieve operations

infrastructure/duckdb_engine.py

Shows how to implement a query engine:

  • Translate QueryPlan to SQL
  • Execute against parquet files
  • Return typed results

cli.py

Shows the integration pattern:

  • Wire up catalog store and validator
  • Build QueryRequest from CLI args
  • Call ValidateQueryUseCase
  • Display results with Rich tables

How it maps to concepts

Sample file Concept
catalog.json (studies) Data Products
catalog.json (variables) Variables
catalog.json (universes) Universe
Blocked SUM on indicator Indicator Aggregation
cli.py (validation flow) Query Lifecycle

Expected output

Valid query (SUM a measure)

census-explorer validate aa0e8400-e29b-41d4-a716-446655440001 \
    -m population:SUM -d geography_code
Query ID: q-a1b2c3d4
Status: ALLOW
Can Execute: Yes

Invalid query (SUM an indicator)

census-explorer validate aa0e8400-e29b-41d4-a716-446655440002 \
    -m unemployment_rate:SUM -d geography_code
Query ID: q-e5f6g7h8
Status: BLOCK
Can Execute: No

Issues:
  [INDICATOR_AGG_NOT_ALLOWED] Cannot aggregate indicator 'unemployment_rate' with SUM
    -> Define numerator/denominator so the system can recompute safely
    -> Use NONE (display as-is) or a safe aggregation