ai-agents-metrics

Warehouse Layering

What this document is: The rules that govern how data flows through the warehouse and what each layer is allowed to contain. Named layers (raw_*, normalized_*, classified derived_*, aggregate derived_*) exist to keep interpretation out of raw storage, to let each layer be rebuilt from the one below without loss, and to make classifier rules versionable without touching primary data.

When to read this:

Related docs:


The four layers

Layer Purpose Built from Can be rebuilt losslessly
1. raw_* Byte-perfect capture of source files The filesystem Yes — from source files
2. normalized_* Deterministic parsing of raw content raw_* only Yes — from raw_*
3. Classified derived_* Per-item interpretation (one row per normalized item) normalized_* only Yes — from normalized_*
4. Aggregate derived_* Rollups, cross-item relationships, goal/attempt shape Classified + normalized Yes — from lower layers

Every layer can be dropped and rebuilt from the layers below it. This is a non-negotiable invariant: it is the only reason the pipeline is debuggable.


Layer 1 — Raw

Purpose: faithfully mirror what is on disk. Downstream layers may change behavior; raw must not.

Allowed in a raw row:

Forbidden in a raw row:

Test of the invariant: deleting the raw layer and re-ingesting from the source files produces byte-identical rows (modulo ingested_at / ingest_run_id). If the new rows differ in any other column, something is doing interpretation and belongs in a later layer.


Layer 2 — Normalized

Purpose: turn raw content into stable, typed, deduplicated rows that every later stage can rely on. Still no semantics — a human tagger, a regex classifier, and an LLM classifier must all see the same normalized rows.

Allowed in a normalized row:

Forbidden in a normalized row:

Test of the invariant: rebuilding the normalized layer from the raw layer produces identical rows. If changing a regex config changes normalized output, the regex belongs in Layer 3.


Layer 3 — Classified derived_*

Purpose: per-item interpretation. One row per normalized item, annotated with what the item IS. This is where regex, heuristics, and LLM classification live.

This layer is new as of 2026-04-19. Previously, per-item interpretation was mixed into Layer 4 aggregates, which made it hard to re-classify without rebuilding aggregates and hard to compare two classifier versions against each other.

Allowed in a classified row:

Forbidden in a classified row:

Versioning rule: every row carries the exact classifier_version that produced it. Bumping the version does NOT retroactively rewrite older rows — it writes a new set. Queries that want “current” classification filter by the latest version; comparison queries keep access to history.

Test of the invariant: running the same classifier version over the same normalized input produces identical rows (idempotent). Two different versions produce two co-existing sets of rows for the same items.

Session-level vs message-level classification are separate tables. Verified on the Claude Code warehouse (2026-04-19, see docs/private/product-hypotheses/H-040.md):

The two layers must not be collapsed into one table: they describe different kinds of items (files vs messages), they use different signals (filename vs content), and they have different confidence profiles (filename is 100% deterministic; message content needs regex and may need LLM fallback). The classifier-version axis is also independent — a new skill added to the tool does not change session classification.


Layer 4 — Aggregate derived_*

Purpose: rollups, cross-item relationships, and goal/attempt shape. This is the “metrics” layer and the main read path for reports.

Allowed in an aggregate row:

Forbidden in an aggregate row:

Test of the invariant: aggregates equal exactly what their source rows say. E.g. SUM(derived_projects.total_tokens) == SUM(derived_session_usage.total_tokens). These equalities are existing tests; new aggregates must add similar equality tests.


Cross-layer invariants

These should be enforced by tests in tests/public/ and by the validation code that runs at the end of each pipeline stage. Every invariant below is expressed so it can be checked with a single SQL query or a deterministic rebuild.

Raw ↔ filesystem

Normalized ↔ raw

Classified ↔ normalized

Aggregate ↔ classified + normalized

Idempotency (across all layers)


Versioning policy

Three version axes, one column per axis on the relevant layer:

Column Lives in Changes when
ingest_run_id raw A new ingest pass starts (usually every run)
(implicit: schema version) all layers Table schema changes — bump via migration
classifier_version classified Classifier rules change (new template, new model)
pipeline_version aggregate Aggregate logic changes

Rule: bumping a downstream version never requires upstream data to be rewritten. Layer 1 never cares about classifier versions; Layer 3 never cares about aggregate versions.

Rules that consume external config (regex files, pricing tables) embed the config hash in the version string — e.g. classifier_version = 'v1-skills-' + sha8(skill_templates.yaml). Identical inputs → identical version → identical outputs.


What happens when rules are violated

The most common violations are “fields leaked upstream”:

Known current violations of these rules (to be fixed as a separate refactor, not as part of new feature work) are tracked in architecture/ARCH-018-layer-separation-cleanup.md.


Adding a new layer-crossing feature

Checklist when a change spans layers:

  1. Identify the layer where each new fact belongs. A classifier’s verdict is Layer 3, not Layer 2; a rollup is Layer 4, not Layer 3.
  2. Add or extend the relevant table; do not append to a table in a different layer just because the column “fits”.
  3. Wire versioning: classifier_version if new classification, pipeline_version if new aggregate.
  4. Add a cross-layer invariant test that would fail if the new column leaks upstream.
  5. Update history-pipeline.md if the table catalog changed.