Architecture¶
Invariant follows Clean Architecture principles with a clear separation between domain logic and infrastructure concerns.
Layer Overview¶
┌─────────────────────────────────────────────────────────────────┐
│ Your Application │
│ (CLI, API, Web UI) │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Application Layer │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │
│ │ Use Cases │ │ DTOs │ │ Ports │ │
│ │ (orchestr.) │ │ (in/out) │ │ (interfaces) │ │
│ └─────────────┘ └─────────────┘ └─────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Domain Layer │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │
│ │ Models │ │ Services │ │ Value Objects │ │
│ │ (entities) │ │ (rules) │ │ (immutable data) │ │
│ └─────────────┘ └─────────────┘ └─────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Infrastructure Layer │
│ (YOU IMPLEMENT) │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │
│ │ Catalog │ │ Query │ │ Other │ │
│ │ Store │ │ Engine │ │ Adapters │ │
│ └─────────────┘ └─────────────┘ └─────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Domain Layer¶
The domain layer contains pure business logic with no external dependencies. It defines:
Domain Models (by component)¶
Entities and value objects are organized by bounded context:
| Component | Module | Contents |
|---|---|---|
catalog |
domain/entities/ |
Study, Dataset, DataProduct, Variable |
identity |
domain/entities/ |
Concept, Universe, ComparabilityRules |
semantic |
domain/entities/ |
Metric, Dimension, GeoHierarchy, SemanticCatalog |
query |
domain/value_objects/ |
QuerySpec, FilterSpec, GroupBySpec |
query |
application/planning/ |
QueryPlan (internal) |
validation |
domain/value_objects/ |
Issue, Severity, Disclosure |
validation |
domain/services/ |
Validator, SemanticValidator |
reference |
domain/entities/ |
ReferenceSystem, Crosswalk |
shared |
contracts/ |
Typed IDs, shared enums, boundary types |
All models are:
- Frozen dataclasses (immutable)
- Self-validating (invariants checked in __post_init__)
- Pure Python (no I/O, no external dependencies)
Domain Services (by component)¶
Domain services encode business rules and are organized by bounded context:
| Component | Service | Purpose |
|---|---|---|
validation |
Validator |
Runs validation rules against QueryPlan |
validation |
SemanticValidator |
Validates semantic queries |
validation |
IndicatorAggregationRule |
Blocks naive indicator aggregation |
identity |
ComparabilityResolver |
Assesses dataset comparability |
semantic |
MetricGraph |
Resolves metric dependencies |
Services are stateless and operate on domain models.
Application Layer¶
The application layer orchestrates domain logic and defines contracts for external systems.
Use Cases (invariant.application.use_cases)¶
Each use case represents a single user action:
| Use Case | Purpose |
|---|---|
ValidateQueryUseCase |
Validate a QueryRequest, return issues |
ExecuteQueryUseCase |
Execute a validated plan |
CreateStudyUseCase |
Register a new study |
AcknowledgeIssuesUseCase |
Accept warnings and proceed |
Use cases: - Take DTOs as input - Return DTOs as output - Depend on ports (not concrete implementations) - Orchestrate domain services
DTOs (invariant.application.dto)¶
Data Transfer Objects for crossing boundaries:
| Module | Purpose |
|---|---|
query_request.py |
QueryRequest, MetricRequest, FilterRequest |
catalog_write.py |
CreateStudyRequest, CreateDatasetRequest |
catalog_read.py |
StudyDTO, DatasetDTO (if needed) |
validation_dto.py |
ValidationResultDTO, IssueDTO, DisclosureDTO |
results_dto.py |
QueryResultDTO |
DTOs are frozen dataclasses with primitive types only.
Ports (invariant.application.ports)¶
Interfaces that the infrastructure must implement:
| Port | Responsibility |
|---|---|
CatalogStore |
Load/save catalog entities |
QueryEngine |
Execute validated query plans |
CrosswalkService |
Provide reference system mappings |
IndicatorEngine |
Handle indicator recomputation |
SuppressionEngine |
Apply suppression policies |
AuditLog |
Record queries and acknowledgments |
IdGenerator |
Generate entity IDs |
Clock |
Provide current time (for testing) |
Ports use Python's Protocol for structural typing.
Infrastructure Layer¶
You implement this layer. It provides concrete implementations of ports.
Example implementations in the sample project:
| Implementation | Port | Technology |
|---|---|---|
JsonCatalogStore |
CatalogStore |
JSON file |
DuckDBQueryEngine |
QueryEngine |
DuckDB + Parquet |
Other possible implementations:
| Implementation | Port | Technology |
|---|---|---|
PostgresCatalogStore |
CatalogStore |
PostgreSQL |
BigQueryEngine |
QueryEngine |
BigQuery |
InMemoryAuditLog |
AuditLog |
Dict |
CloudLoggingAuditLog |
AuditLog |
GCP Logging |
Data Flow¶
Here's how a query flows through the system:
1. User submits query
│
▼
2. QueryRequest DTO created
│
▼
3. ValidateQueryUseCase.execute(request)
│
├── Load data products from CatalogStore
│
├── Build QueryPlan from request
│
├── Run Validator with rules
│
└── Return ValidationResultDTO
│
▼
4. If can_execute:
│
▼
5. ExecuteQueryUseCase.execute(plan)
│
├── Call QueryEngine.execute(plan)
│
├── Apply SuppressionEngine (if configured)
│
└── Return QueryResultDTO with disclosures
Dependency Rule¶
Dependencies point inward:
- Domain has no dependencies
- Application depends only on Domain
- Infrastructure depends on Application and Domain
This means: - Domain can be tested with no setup - Application can be tested with fake ports - Infrastructure can be swapped without touching business logic
Testing Strategy¶
| Layer | Test Type | Dependencies |
|---|---|---|
| Domain | Unit tests | None |
| Application | Unit tests | Fake ports |
| Infrastructure | Integration tests | Real databases/files |
Example test setup:
# Unit test for ValidateQueryUseCase
def test_validate_query():
# Use fake implementations
catalog = FakeCatalogStore()
id_gen = FakeIdGenerator()
# Set up test data
catalog.save_data_product(make_test_data_product())
# Run use case
use_case = ValidateQueryUseCase(catalog, id_gen)
result = use_case.execute(make_test_request())
# Assert on result
assert result.status == "ALLOW"
The tests/unit/application/fakes.py file provides fake implementations for all ports.
Package Structure¶
src/invariant/
├── catalog/ # Catalog component
│ ├── domain/entities/ # Study, Dataset, DataProduct, Variable
│ └── application/
├── identity/ # Identity component
│ ├── domain/entities/ # Concept, Universe, VariableSemantics
│ ├── domain/services/ # ComparabilityResolver
│ └── application/
├── semantic/ # Semantic component
│ ├── domain/entities/ # Metric, Dimension, GeoHierarchy, SemanticCatalog
│ ├── domain/services/ # MetricGraph
│ └── application/
├── query/ # Query component
│ ├── domain/ir/ # PlanNode, ScanNode, FilterNode, etc.
│ ├── domain/services/ # QueryPlanner, PostgresCompiler
│ ├── domain/value_objects/ # QuerySpec
│ └── application/
├── validation/ # Validation component
│ ├── domain/entities/ # RulesetPack
│ ├── domain/services/ # Validator, SemanticValidator
│ ├── domain/value_objects/ # Issue, Disclosure, Severity, Attribution
│ └── application/
├── reference/ # Reference component
│ ├── domain/entities/ # ReferenceSystem, Crosswalk
│ ├── domain/value_objects/ # Geography
│ └── application/
├── shared/ # Shared contracts
│ ├── contracts/ # Ids, Enums, ValueObjects (boundary types)
│ └── _adapters/ # Internal adapters
├── kernel/ # Facade entry point
│ └── invariant_kernel.py
├── domain/ # Legacy re-exports (backward compatibility)
│ ├── model/ # Re-exports to new locations
│ └── services/ # Re-exports to new locations
└── application/ # Shared application layer
├── dto/
├── ports/
├── use_cases/
└── services/
Key Design Decisions¶
Typed IDs¶
Every entity has a typed ID (e.g., StudyId, DatasetId) rather than raw UUIDs. This prevents accidentally passing the wrong ID type.
Frozen Dataclasses¶
All domain models are frozen (immutable). This: - Prevents accidental mutation - Makes objects hashable - Simplifies reasoning about state
Protocol-Based Ports¶
Ports use typing.Protocol for structural (duck) typing rather than ABC inheritance. This keeps infrastructure decoupled from the kernel.
Query Plans Use IDs¶
QueryPlan uses VariableId references rather than variable names. This provides stability when variables are renamed.
Validation Before Execution¶
Queries must be validated before execution. The QueryEngine.execute() method assumes the plan is already valid.