ai-agents-metrics

Data Schema

What this document is: The full field reference for the in-memory data model — GoalRecord, AttemptEntryRecord, and the summary block.

When to read this:

Related docs:


Summary

The primary store is metrics/events.ndjson — an append-only NDJSON event log. The in-memory state is reconstructed by replaying events at read time. The three top-level keys are goals, entries, and summary.


Core concepts


Source of truth


Top-level structure

{
  "summary": { ... },
  "goals": [ ... ],
  "entries": [ ... ]
}
Key Type Description
summary object Aggregated statistics. Computed in-memory on every load_metrics call; never stored in events.ndjson.
goals array List of GoalRecord objects in chronological order.
entries array List of AttemptEntryRecord objects — one per attempt within a goal.

The tasks key is a legacy alias for goals. It is normalised to goals in-memory during event replay and is never stored in events.ndjson.


GoalRecord

One task (goal). Corresponds to a Linear issue or a retrospective.

{
  "goal_id": "2026-04-06-001",
  "title": "Add public boundary verification",
  "goal_type": "product",
  "supersedes_goal_id": null,
  "status": "success",
  "attempts": 2,
  "started_at": "2026-04-06T10:00:00+00:00",
  "finished_at": "2026-04-06T11:30:00+00:00",
  "cost_usd": 1.234567,
  "input_tokens": 12000,
  "cached_input_tokens": 340000,
  "output_tokens": 4500,
  "tokens_total": 980000,
  "failure_reason": null,
  "notes": "Implemented and tested. Pre-push hook integration verified.",
  "result_fit": "exact_fit",
  "agent_name": null,
  "model": "gpt-5.4-mini"
}

GoalRecord fields

Field Type Required Description
goal_id string Unique ID in YYYY-MM-DD-NNN format (three digits, zero-padded)
title string Short task description
goal_type string One of: product / retro / meta
supersedes_goal_id string|null   ID of the goal this one replaces (used in retry chains)
status string One of: in_progress / success / fail
attempts int Number of attempts. 0 if never closed. Closed goals require at least 1.
started_at string|null   ISO 8601 with timezone. Example: 2026-04-06T10:00:00+00:00. In-memory Python type: datetime \| None (parsed by serde.py).
finished_at string|null   ISO 8601 with timezone. Must be null when status=in_progress. In-memory Python type: datetime \| None.
cost_usd float|null   Cost in USD. null means no data available.
input_tokens int|null   Non-cached input tokens. null means no data.
cached_input_tokens int|null   Tokens served from cache. null means no data.
output_tokens int|null   Output tokens. null means no data.
tokens_total int|null   Total tokens (may include reasoning tokens). null means no data.
failure_reason string|null   Required when status=fail. See allowed values below.
notes string|null   Free text. What was done, relevant context.
result_fit string|null   Only for goal_type=product. One of: exact_fit / partial_fit / miss
agent_name string|null   Name of the agent that executed the task.
model string|null   Model name. Examples: gpt-5.4-mini, claude-sonnet-4-6

Allowed values

goal_type: product, retro, meta

status: in_progress, success, fail

failure_reason (required when status=fail):

result_fit (only for goal_type=product):


AttemptEntryRecord

One attempt within a goal. A goal with attempts=2 has two records in entries.

{
  "entry_id": "2026-04-06-001-attempt-001",
  "goal_id": "2026-04-06-001",
  "entry_type": "product",
  "inferred": false,
  "status": "fail",
  "started_at": "2026-04-06T10:00:00+00:00",
  "finished_at": "2026-04-06T10:45:00+00:00",
  "cost_usd": 0.456789,
  "input_tokens": 5000,
  "cached_input_tokens": 120000,
  "output_tokens": 1800,
  "tokens_total": 400000,
  "failure_reason": "validation_failed",
  "notes": null,
  "agent_name": null,
  "model": "gpt-5.4-mini"
}

AttemptEntryRecord fields

Field Type Required Description
entry_id string Format: {goal_id}-attempt-{NNN}
goal_id string Reference to the parent GoalRecord.goal_id
entry_type string Attempt type. Typically matches the parent goal’s goal_type.
inferred bool true if the record was created automatically during history reconstruction
status string One of: in_progress / success / fail
started_at string|null   ISO 8601 with timezone
finished_at string|null   ISO 8601 with timezone. null when status=in_progress.
cost_usd float|null   Cost of this attempt
input_tokens int|null    
cached_input_tokens int|null    
output_tokens int|null    
tokens_total int|null    
failure_reason string|null   Required when status=fail and inferred=false
notes string|null    
agent_name string|null    
model string|null    

summary (aggregate)

Recomputed automatically on every load_metrics call. Do not edit manually.

Top-level summary fields

Field Description
closed_tasks Number of goals with status=success or fail
successes / fails Breakdown by outcome
total_attempts Sum of attempts across all goals
total_cost_usd Total cost
total_input_tokens / total_cached_input_tokens / total_output_tokens / total_tokens Summed token counts
success_rate successes / closed_tasks. null if no closed goals.
attempts_per_closed_task total_attempts / closed_tasks. null if no closed goals.
known_cost_successes Number of successful goals that have cost data
known_cost_per_success_usd Average cost per success based on known data
complete_cost_per_covered_success_usd Same, but only goals with complete data
cost_per_success_usd Average cost across all successes. null if any data is missing.
model_summary_goals Number of goals with model data
model_complete_goals Number of goals where all attempts used the same model
mixed_model_goals Number of goals where attempts used different models

Nested summary blocks

Key Description
by_goal_type Same metrics broken down by meta, product, retro
by_task_type Legacy alias for by_goal_type
by_model Same metrics grouped by model name
entries Aggregate over AttemptEntryRecord (not goals): closed_entries, success_rate, failure_reasons, tokens

Formats and constraints

ID format:

Timestamps: ISO 8601, always with timezone (+00:00 or Z). No microseconds.

Tokens: tokens_total may exceed input + cached + output — the difference is reasoning/tool tokens not broken down separately. When all four fields are present: tokens_total >= input + cached + output.

Cost: rounded to 6 decimal places via round_usd.

For the full set of validation rules that apply to these fields, see data-invariants.md.


Warehouse: derived_projects token coverage columns

The derived_projects table (Layer 4 aggregate in the history warehouse) holds per-project rollups of session token counts. Because some sessions have no usage events at all, raw token sums must be accompanied by coverage denominators so consumers can distinguish “zero tokens” from “no data”.

Column Type Description
input_tokens INTEGER | NULL Sum of input tokens across sessions that had usage data. NULL if no sessions had usage data.
output_tokens INTEGER | NULL Sum of output tokens across covered sessions. NULL if uncovered.
total_tokens INTEGER | NULL Sum of total tokens across covered sessions. NULL if uncovered.
input_tokens_covered_sessions INTEGER Count of sessions that contributed a non-NULL input_tokens sum. 0 if no session had usage data.
output_tokens_covered_sessions INTEGER Count of sessions that contributed a non-NULL output_tokens sum.
total_tokens_covered_sessions INTEGER Count of sessions that contributed a non-NULL total_tokens sum.

Usage pattern: To compute a meaningful per-session token average, use total_tokens / total_tokens_covered_sessions rather than total_tokens / attempt_count. Sessions without usage data are excluded from both the numerator and denominator.

raw_json: All six fields above are included in derived_projects.raw_json for downstream consumers that do not query the warehouse directly.