ai-agents-metrics

History Pipeline

What this document is: How ai-agents-metrics extracts metrics from raw AI agent session data. The history pipeline is the primary analysis layer — it reads existing conversation history files and produces retry pressure, token cost, and session timelines without any manual instrumentation. This document describes the current adapters and the pipeline stages.

Retry pressure definition: retry_count is the number of user messages sent after the first one in a thread — each additional user message represents a clarifying or corrective input that the agent required before completing the task. has_retry_pressure = retry_count > 0. This definition is source-agnostic and works identically for Codex and Claude sessions.

When to read this:

Related docs:


Scope and limitations

This pipeline supports three --source values:

Both sources feed the same ingest → normalize → derive pipeline and produce the same warehouse table shapes. The history pipeline is the primary product flow — run it to get retry pressure, token cost, and session timelines from existing history files with no prior setup. The NDJSON ledger (events.ndjson) is a complementary opt-in layer for explicit goal boundaries and outcome judgements.

Note on Codex vs Claude: For Codex, each session file maps 1:1 to a thread — there is no multi-session threading. Retry pressure is still measured correctly for Codex through user message counts, not session counts.

sync-usage is a separate lightweight cost-backfill adapter (reads ~/.claude telemetry) and is not part of this transcript pipeline.

The product model (goals, attempts, outcomes, cost) is source-agnostic. New sources would follow the same pattern but with different raw tables and source readers.


Pipeline overview

Raw sources (~/.codex, Codex sessions)
  ↓  ingest       → raw warehouse tables (SQLite)
  ↓  normalize    → cleaned, stable rows
  ↓  classify     → session kinds (main/subagent) + practice events
  ↓  derive       → goal/attempt/timeline marts
  ↓  compare      → diff against metrics ledger (events.ndjson)

Raw sources (~/.claude/projects, Claude Code sessions)
  ↓  ingest       → raw warehouse tables (SQLite)
  ↓  normalize    → cleaned, stable rows
  ↓  classify     → session kinds (main/subagent) + practice events
  ↓  derive       → goal/attempt/timeline marts
  ↓  compare      → diff against metrics ledger (events.ndjson)

Raw sources (~/.claude, Claude Code telemetry) — cost/token only (lightweight backfill)
  ↓  sync-usage   → goal cost fields in events.ndjson (no transcript)
Stage Module What it does
Ingest history/ingest/ (package: warehouse.py, codex.py, claude.py, __init__.py) Reads raw Codex + Claude sources into the warehouse
Normalize history/normalize.py Cleans and stabilises raw rows
Classify history/classify.py Labels session kinds (main vs subagent) and extracts practice events
Derive history/derive.py + derive_build.py + derive_insert.py + derive_schema.py Builds goal, attempt, and timeline marts
Compare history/compare.py Diffs derived goals against the NDJSON ledger

Run in order: ingest → normalize → classify → derive → compare.


Raw Sources

Codex (~/.codex)

Source Content
state_5.sqlite Thread-level metadata: model, cwd, title, archival state
sessions/**/rollout-*.jsonl Full event stream per session: messages, token counts, task events
archived_sessions/*.jsonl Same format as active sessions, post-archival
logs_1.sqlite Runtime log side-channel: task boundaries, debug traces, model hints

Claude Code (~/.claude/projects/)

Source Content
<encoded-cwd>/*.jsonl Full session JSONL: messages, token usage, metadata
<encoded-cwd>/subagents/agent-*.jsonl Subagent session files (same format, share sessionId with parent)

Thread identity is derived from sessionId in the session events. Thread metadata (model, cwd, version) is extracted from the session file itself — there is no separate state DB for Claude Code. Subagent files share the parent’s sessionId, so the parent’s raw_threads row is preserved when subagents are imported.


Relationship Map

flowchart LR
    subgraph Sources["~/.codex sources"]
        ST["state_5.sqlite"]
        SS["sessions/**/rollout-*.jsonl"]
        AS["archived_sessions/*.jsonl"]
        LG["logs_1.sqlite"]
    end

    subgraph Raw["Raw warehouse"]
        RT["raw_threads"]
        RS["raw_sessions"]
        RSE["raw_session_events"]
        RM["raw_messages"]
        RTU["raw_token_usage"]
        RL["raw_logs"]
    end

    subgraph Norm["Normalized warehouse"]
        NT["normalized_threads"]
        NS["normalized_sessions"]
        NM["normalized_messages"]
        NU["normalized_usage_events"]
        NL["normalized_logs"]
        NP["normalized_projects"]
    end

    subgraph Derived["Derived marts"]
        DG["derived_goals"]
        DA["derived_attempts"]
        DTE["derived_timeline_events"]
        DMF["derived_message_facts"]
        DRC["derived_retry_chains"]
        DUS["derived_session_usage"]
        DP["derived_projects"]
    end

    subgraph Ops["Operational store"]
        EV["events.sqlite.events"]
    end

    ST --> RT
    SS --> RS
    SS --> RSE
    SS --> RM
    SS --> RTU
    LG --> RL

    RT --> NT
    RS --> NS
    RSE --> NM
    RTU --> NU
    RL --> NL

    NT --> DG
    NS --> DA
    NM --> DTE
    NU --> DUS
    NL --> DTE
    NM --> DMF
    NU --> DMF
    NP --> DP
    DG --> DRC
    DG --> DP
    DA --> DUS
    DA --> DTE

events.sqlite.events is a separate operational audit store written by observability.py — not part of the transcript warehouse.


Raw Warehouse Tables

raw_threads

Thread-level metadata from state_5.sqlite.

raw_sessions

One row per rollout session file.

raw_session_events

Every JSONL event in session order.

raw_messages

Parsed developer, user, and assistant text extracted from response_item.

raw_token_usage

Parsed token usage rows from token_count events.

raw_logs

Runtime/log side-channel rows from logs_1.sqlite.


Normalized Tables

normalized_threads

Thread metadata after normalization.

normalized_sessions

Session metadata after normalization.

normalized_messages

Message rows after normalization.

normalized_usage_events

Token usage rows after normalization.

normalized_logs

Log rows after normalization.

normalized_projects

Project-level aggregate after normalization.


Derived Tables

derived_goals

One row per thread/goal.

derived_attempts

One row per session attempt inside a thread.

derived_timeline_events

Compressed timeline view of the full thread.

derived_message_facts

Message-level OLAP fact table with token usage attributed to the nearest assistant turn, the resolved model, and a derived message date.

derived_retry_chains

Retry-chain summary per thread.

derived_session_usage

Session-level usage aggregates.

derived_projects

Project-level aggregate after derivation.


Event Store

events.sqlite.events is the audit log of ai-agents-metrics CLI operations and goal mutations — written by observability.py. It is not part of the transcript/history warehouse and is not produced by this pipeline.


Stable Identifiers

Identifier Meaning
thread_id Top-level conversation/thread identity
session_path Source rollout file for one session
turn_id Per-turn identifier from task_started and turn_context events
event_index Raw JSONL line order within a session
message_id Deterministic id for a parsed message row
usage_event_id Deterministic id for a token usage event
timeline_event_id Deterministic id for derived timeline rows

Practical Search Workflow

For Codex or Claude Code sessions (full transcript + cost):

  1. Ingest sources into the raw warehouse:
    • All sources: history-ingest (reads ~/.codex + ~/.claude, default)
    • Codex only: history-ingest --source codex (reads ~/.codex)
    • Claude Code only: history-ingest --source claude (reads ~/.claude/projects/)
  2. Normalize raw rows into stable message, usage, and project tables.
  3. Derive higher-level goal, attempt, and timeline marts.
  4. Search raw_messages or normalized_messages for transcript text.
  5. Use thread_id, session_path, and turn_id to group related context.
  6. Use raw_token_usage or normalized_usage_events for cost or token questions.
  7. Use derived_message_facts for message-level OLAP analysis or token spend by date.
  8. Use derived_goals and derived_projects for project-level comparison.

For cost-only backfill (Claude Code, lightweight):

  1. Run sync-usage to backfill cost and token totals from ~/.claude telemetry into the NDJSON ledger.
  2. This does not ingest transcripts — use history-ingest (or --source claude) above for full conversation history.

Useful Query Shapes

Search transcript text:

SELECT thread_id, session_path, role, text
FROM raw_messages
WHERE text LIKE '%keyword%'
ORDER BY thread_id, session_path, event_index, message_index;

Find usage for a turn or session:

SELECT thread_id, session_path, input_tokens, cached_input_tokens, output_tokens, total_tokens
FROM raw_token_usage
WHERE thread_id = ? OR session_path = ?
ORDER BY event_index;

Inspect message-level token facts:

SELECT message_date, model, role, text, total_tokens
FROM derived_message_facts
WHERE thread_id = ?
ORDER BY message_timestamp, event_index, message_index;

Inspect derived project slice:

SELECT project_cwd, thread_count, attempt_count, message_count, usage_event_count, total_tokens
FROM derived_projects
ORDER BY thread_count DESC, attempt_count DESC, project_cwd ASC;

Extending the pipeline

To add a new data source (e.g. a different agent or log format):

  1. Add a new ingest module that reads from the new source and writes into the same raw warehouse schema, or a new schema alongside it.
  2. Add normalize and derive stages that produce the same output table shapes (derived_goals, derived_attempts, etc.) so downstream comparison and analysis remain unchanged.
  3. The compare stage (history/compare.py) works against the NDJSON ledger and is source-agnostic — it does not need to change for new sources.

The current raw table names (raw_threads, raw_sessions, etc.) are Codex-specific. New adapters should introduce their own raw table namespace rather than reusing these names.


Notes