What this document is: How ai-agents-metrics extracts metrics from raw AI agent session data. The history pipeline is the primary analysis layer — it reads existing conversation history files and produces retry pressure, token cost, and session timelines without any manual instrumentation. This document describes the current adapters and the pipeline stages.
Retry pressure definition: retry_count is the number of user messages sent after the first one in a thread — each additional user message represents a clarifying or corrective input that the agent required before completing the task. has_retry_pressure = retry_count > 0. This definition is source-agnostic and works identically for Codex and Claude sessions.
When to read this:
Related docs:
raw_* / normalized_* / derived_* layer is allowed to containThis pipeline supports three --source values:
--source all, default): reads from both ~/.codex and ~/.claude; sources that don’t exist are skipped silently.--source codex): reads from ~/.codex only — full transcript, token usage, and thread metadata.--source claude): reads from ~/.claude/projects/ only — full transcript and token usage from session JSONL files, including subagent sessions.Both sources feed the same ingest → normalize → derive pipeline and produce the same warehouse table shapes. The history pipeline is the primary product flow — run it to get retry pressure, token cost, and session timelines from existing history files with no prior setup. The NDJSON ledger (events.ndjson) is a complementary opt-in layer for explicit goal boundaries and outcome judgements.
Note on Codex vs Claude: For Codex, each session file maps 1:1 to a thread — there is no multi-session threading. Retry pressure is still measured correctly for Codex through user message counts, not session counts.
sync-usage is a separate lightweight cost-backfill adapter (reads ~/.claude telemetry) and is not part of this transcript pipeline.
The product model (goals, attempts, outcomes, cost) is source-agnostic. New sources would follow the same pattern but with different raw tables and source readers.
Raw sources (~/.codex, Codex sessions)
↓ ingest → raw warehouse tables (SQLite)
↓ normalize → cleaned, stable rows
↓ classify → session kinds (main/subagent) + practice events
↓ derive → goal/attempt/timeline marts
↓ compare → diff against metrics ledger (events.ndjson)
Raw sources (~/.claude/projects, Claude Code sessions)
↓ ingest → raw warehouse tables (SQLite)
↓ normalize → cleaned, stable rows
↓ classify → session kinds (main/subagent) + practice events
↓ derive → goal/attempt/timeline marts
↓ compare → diff against metrics ledger (events.ndjson)
Raw sources (~/.claude, Claude Code telemetry) — cost/token only (lightweight backfill)
↓ sync-usage → goal cost fields in events.ndjson (no transcript)
| Stage | Module | What it does |
|---|---|---|
| Ingest | history/ingest/ (package: warehouse.py, codex.py, claude.py, __init__.py) |
Reads raw Codex + Claude sources into the warehouse |
| Normalize | history/normalize.py |
Cleans and stabilises raw rows |
| Classify | history/classify.py |
Labels session kinds (main vs subagent) and extracts practice events |
| Derive | history/derive.py + derive_build.py + derive_insert.py + derive_schema.py |
Builds goal, attempt, and timeline marts |
| Compare | history/compare.py |
Diffs derived goals against the NDJSON ledger |
Run in order: ingest → normalize → classify → derive → compare.
~/.codex)| Source | Content |
|---|---|
state_5.sqlite |
Thread-level metadata: model, cwd, title, archival state |
sessions/**/rollout-*.jsonl |
Full event stream per session: messages, token counts, task events |
archived_sessions/*.jsonl |
Same format as active sessions, post-archival |
logs_1.sqlite |
Runtime log side-channel: task boundaries, debug traces, model hints |
~/.claude/projects/)| Source | Content |
|---|---|
<encoded-cwd>/*.jsonl |
Full session JSONL: messages, token usage, metadata |
<encoded-cwd>/subagents/agent-*.jsonl |
Subagent session files (same format, share sessionId with parent) |
Thread identity is derived from sessionId in the session events. Thread metadata (model, cwd, version) is extracted from the session file itself — there is no separate state DB for Claude Code. Subagent files share the parent’s sessionId, so the parent’s raw_threads row is preserved when subagents are imported.
flowchart LR
subgraph Sources["~/.codex sources"]
ST["state_5.sqlite"]
SS["sessions/**/rollout-*.jsonl"]
AS["archived_sessions/*.jsonl"]
LG["logs_1.sqlite"]
end
subgraph Raw["Raw warehouse"]
RT["raw_threads"]
RS["raw_sessions"]
RSE["raw_session_events"]
RM["raw_messages"]
RTU["raw_token_usage"]
RL["raw_logs"]
end
subgraph Norm["Normalized warehouse"]
NT["normalized_threads"]
NS["normalized_sessions"]
NM["normalized_messages"]
NU["normalized_usage_events"]
NL["normalized_logs"]
NP["normalized_projects"]
end
subgraph Derived["Derived marts"]
DG["derived_goals"]
DA["derived_attempts"]
DTE["derived_timeline_events"]
DMF["derived_message_facts"]
DRC["derived_retry_chains"]
DUS["derived_session_usage"]
DP["derived_projects"]
end
subgraph Ops["Operational store"]
EV["events.sqlite.events"]
end
ST --> RT
SS --> RS
SS --> RSE
SS --> RM
SS --> RTU
LG --> RL
RT --> NT
RS --> NS
RSE --> NM
RTU --> NU
RL --> NL
NT --> DG
NS --> DA
NM --> DTE
NU --> DUS
NL --> DTE
NM --> DMF
NU --> DMF
NP --> DP
DG --> DRC
DG --> DP
DA --> DUS
DA --> DTE
events.sqlite.events is a separate operational audit store written by observability.py — not part of the transcript warehouse.
raw_threadsThread-level metadata from state_5.sqlite.
thread_idthread_id to get tokens or pricingraw_sessionsOne row per rollout session file.
session_paththread_idraw_session_eventsEvery JSONL event in session order.
event_idtask_started, turn_context, response_item, or token_countraw_messagesParsed developer, user, and assistant text extracted from response_item.
message_idthread_id, session_path, event_indexraw_token_usageParsed token usage rows from token_count events.
token_event_idthread_id, session_path, event_indexmodel is not populated here; use raw_threads.model or normalized_threads.model for model attributionraw_logsRuntime/log side-channel rows from logs_1.sqlite.
(source_path, row_id)thread_idnormalized_threadsThread metadata after normalization.
thread_idnormalized_sessionsSession metadata after normalization.
session_pathnormalized_messagesMessage rows after normalization.
message_idnormalized_usage_eventsToken usage rows after normalization.
usage_event_idnormalized_logsLog rows after normalization.
(source_path, row_id)normalized_projectsProject-level aggregate after normalization.
project_cwdderived_goalsOne row per thread/goal.
thread_idattempt_count, retry_count, message_count, usage_event_count, timeline_event_count, modelderived_attemptsOne row per session attempt inside a thread.
attempt_idthread_id, session_pathderived_timeline_eventsCompressed timeline view of the full thread.
timeline_event_idderived_message_factsMessage-level OLAP fact table with token usage attributed to the nearest assistant turn, the resolved model, and a derived message date.
message_idthread_id, session_path, event_indexderived_retry_chainsRetry-chain summary per thread.
thread_idretry_count, has_retry_pressure, attempt_countretry_count: number of user messages after the first one — each is a clarifying or corrective input the user had to add before the task was donehas_retry_pressure: 1 if retry_count > 0, else 0attempt_count: retry_count + 1 — total user turns in the threadderived_session_usageSession-level usage aggregates.
session_usage_idthread_id, session_path, attempt_indexderived_projectsProject-level aggregate after derivation.
project_cwdevents.sqlite.events is the audit log of ai-agents-metrics CLI operations and goal mutations — written by observability.py. It is not part of the transcript/history warehouse and is not produced by this pipeline.
| Identifier | Meaning |
|---|---|
thread_id |
Top-level conversation/thread identity |
session_path |
Source rollout file for one session |
turn_id |
Per-turn identifier from task_started and turn_context events |
event_index |
Raw JSONL line order within a session |
message_id |
Deterministic id for a parsed message row |
usage_event_id |
Deterministic id for a token usage event |
timeline_event_id |
Deterministic id for derived timeline rows |
For Codex or Claude Code sessions (full transcript + cost):
history-ingest (reads ~/.codex + ~/.claude, default)history-ingest --source codex (reads ~/.codex)history-ingest --source claude (reads ~/.claude/projects/)raw_messages or normalized_messages for transcript text.thread_id, session_path, and turn_id to group related context.raw_token_usage or normalized_usage_events for cost or token questions.derived_message_facts for message-level OLAP analysis or token spend by date.derived_goals and derived_projects for project-level comparison.For cost-only backfill (Claude Code, lightweight):
sync-usage to backfill cost and token totals from ~/.claude telemetry into the NDJSON ledger.history-ingest (or --source claude) above for full conversation history.Search transcript text:
SELECT thread_id, session_path, role, text
FROM raw_messages
WHERE text LIKE '%keyword%'
ORDER BY thread_id, session_path, event_index, message_index;
Find usage for a turn or session:
SELECT thread_id, session_path, input_tokens, cached_input_tokens, output_tokens, total_tokens
FROM raw_token_usage
WHERE thread_id = ? OR session_path = ?
ORDER BY event_index;
Inspect message-level token facts:
SELECT message_date, model, role, text, total_tokens
FROM derived_message_facts
WHERE thread_id = ?
ORDER BY message_timestamp, event_index, message_index;
Inspect derived project slice:
SELECT project_cwd, thread_count, attempt_count, message_count, usage_event_count, total_tokens
FROM derived_projects
ORDER BY thread_count DESC, attempt_count DESC, project_cwd ASC;
To add a new data source (e.g. a different agent or log format):
derived_goals, derived_attempts, etc.) so downstream comparison and analysis remain unchanged.history/compare.py) works against the NDJSON ledger and is source-agnostic — it does not need to change for new sources.The current raw table names (raw_threads, raw_sessions, etc.) are Codex-specific. New adapters should introduce their own raw table namespace rather than reusing these names.
raw_messages is the first place to look for transcript text.