All notable changes to ai-agents-metrics will be recorded here.
gpt-5.4 → gpt-5 / gpt-5.4-mini → gpt-5-mini alias from resolve_pricing_model_alias. gpt-5.4* models now have their own entries in the bundled model_pricing.json with correct pricing ($2.5/$15 per MTok input/output, vs the old aliased $1.25/$10). User impact: if you have a custom workspace model_pricing.json that predates this release and lacks gpt-5.4* entries, Codex gpt-5.4 sessions will no longer resolve a price — cost_usd falls back to None instead of being computed against gpt-5 rates. Fix: copy the new entries from the bundled file, or delete your override and let the bundled pricing be usedclaude-opus-4-7 pricing entry in model_pricing.json ($5/$0.5/$6.25/$25 per MTok — same family as 4.6). Previously, opus-4-7 usage was silently dropped from render-html Chart 3 “Cost by Model” even though it counted toward the summary total — ~34% of the author’s cost was missing from the chartrender-html warehouse-state callout: when the warehouse file is absent, schema-outdated, or has zero rows matching the current project, the HTML now shows an amber banner above Session History with the actionable hint Run: ai-agents-metrics history-update. Four detected states (ok / missing_file / schema_outdated / empty_for_cwd) map to specific messagesrender-html --warehouse-path PATH flag exposed (was read via getattr fallback but never registered in the parser, effectively dead code); brings render-html in line with the 8 other warehouse-consuming subcommandsrender-html --cwd PATH override: replaces Path.cwd() for warehouse filtering. Enables queries against cross-machine warehouses (e.g. a Mac-imported snapshot queried on Linux, where absolute paths otherwise never match)main_attempt_count (H-040 classifier) instead of raw retry_count, so Claude subagent spawns are no longer counted as main-agent retries. On pre-fix data, retry_count > 0 flagged 40.9% of threads as “under retry pressure” while the H-040-corrected signal was 0% — the chart was reading subagent-usage growth as quality degradation. Subtitle, legend, and the green “no retries” plaque updated to say “main-agent session” / “subagent spawns excluded” so the semantics are explicitshow header, markdown report header, bootstrap scaffold, and the packaged-policy section of AGENTS.md. The codex-metrics CLI entrypoint alias stays in pyproject.toml for backward compatibility; references to the OpenAI Codex agent (--source=codex, ~/.codex) are unchangedrender-html Chart 3 “Cost by Model” Y-axis no longer compresses bars to 24× below their true values on skewed cost distributions. The smartMax clip cap is now floored at rawMax/5, so outliers are at most 5× above the visible axis and non-outlier bars remain distinguishablerender-html Chart 4 outlier labels now stack across up to 3 vertical rows when adjacent outliers would otherwise overlap horizontally. Top margin reserves the vertical space; diamonds stay anchored at the clip linerender-html Session History badge is source-aware: reads WAREHOUSE only when both Chart 2 and Chart 3 are warehouse-sourced, LEDGER otherwise. Previously the badge was hard-coded to WAREHOUSE even when the underlying data had silently fallen back to ledgerrender-html Chart 2 no longer shows “No data available” when the retry signal is genuinely zero — the empty-state guard now distinguishes “no buckets / no data” from “all values are a real zero”. A real all-zero retry pattern renders with the green “no retries” plaque instead of being hiddenrender-html no longer displays a false-positive “No retries” green plaque when Chart 2 data fell back to the ledger source; the plaque only renders on warehouse-sourced signals where it reflects a real main-agent retry measurementrender-html Chart 5: “Practice Events by Name” — top 15 practice names from derived_practice_events stacked by kind (Agent / Skill / Other). Hidden automatically on warehouses without practice events. Summary line shows long-tail count when more than 15 distinct names existhistory-update on an empty machine (no ~/.claude, no ~/.codex) now prints actionable guidance: the default paths it checked, three concrete next steps (--claude-root, --codex-state-path, --source), instead of the terse “nothing to normalize” stub that left fresh users unsure whether the tool was broken or misconfiguredai-agents-metrics --help now shows only the 9 primary-flow commands in the main listing. Pipeline stages, audit/debug, manual-tracking adjuncts, and maintenance commands (17 total) are listed by name in four grouped sections in the epilog — still fully callable via <cmd> --helpusage: header shortened from a 26-name {...} blob to <command> via an add_subparsers metavarhistory-classify command: new pipeline stage between history-normalize and history-derive that runs a deterministic classifier over raw session data. Writes structural session-kind labels (main vs subagent) and practice-event rows to the warehouse. Automatically invoked by history-update (H-040)derived_practice_events table: one row per Agent tool_use / Skill invocation / slash-command detected in session transcripts, with practice_name, practice_family, and source coordinates. Enables per-skill analysis of token compression and practice usage patterns (H-040 Phase 2)role='user' pollution, size-confounded practice splits, within-thread compression, per-skill compression rankingdocs/warehouse-layering.md: rules for what each warehouse layer (raw_* / normalized_* / classified derived_* / aggregate derived_*) is allowed to contain — source of truth for layer-boundary decisionsshow history-signals block now reports structural retry pressure (main_attempt > 1 count) and subagent spawn count instead of the naive file-per-attempt ratio. The old ratio conflated user retries with internal delegation — see F-001render-html: Chart 3 now stacks by model (one color per model, deterministic palette, unknown pinned last in slate) instead of by input/cached/output tokens — answers “where is my money going?” directly (ARCH-017)render-html: summary strip gained a “Total Cost” card (sum across all closed goals, success + fail) between Successes and Avg Cost / Success (ARCH-017)derived_projects coverage invariant corrected to a one-way relationpython -m ai_agents_metrics --version now displays python3.X -m ai_agents_metrics <version> instead of __main__.py <version>model column propagated to derived_session_usage, derived_attempts, and derived_goals during derive; unlocks per-model cost analysis without joining back to normalized_usage_events (ARCH-016)history/ subpackage gains classify.py; derive.py, derive_schema.py, derive_insert.py updated to consume classified inputshistory-update and history-ingest now default to --source all: reads both ~/.codex and ~/.claude automatically — Claude Code users no longer need --source claude explicitly (CODEX-67)--source all is now visible in --help with per-choice descriptions — was previously an undocumented internal value (CODEX-67)history-update no longer errors when all sources are absent or skipped — exits cleanly with a summary (CODEX-67)show now prints an actionable hint when the warehouse does not exist yet: Run 'ai-agents-metrics history-update' to extract retry pressure and token cost from your agent history files.show now prints a Tip when showing all-project fallback: run history-update from your project directory to get per-project signals.attempt_count floor corrected: single-session threads no longer inflate retry counts (CODEX-67)history_derive.py split into a history/ subpackage (derive.py, derive_build.py, derive_insert.py, derive_schema.py) for maintainability (CODEX-68)history_audit.py, history_compare.py, history_ingest.py, history_normalize.py moved into the history/ subpackage — import paths unchangedshow --json warning now routes to stderr, preventing JSON parse failures in non-git directories (CODEX-62)completion zsh header corrected from codex-metrics to ai-agents-metrics — zsh completion now registers under the correct binary name (CODEX-63)history-compare, history-normalize, history-derive now include an actionable hint when the warehouse is missing: Run 'ai-agents-metrics history-update' first. (CODEX-64)show now falls back to all-project warehouse data when run outside a tracked project directory — Quick Start no longer silently shows zeroscodex_raw_history.sqlite to warehouse.db — matches the path documented in READMEpipx install is now the primary recommendation; pip install is documented for use inside a virtualenvhistory-update command: runs the full history pipeline (ingest → normalize → derive) in one step — the primary entry point for history-first users--source claude support in history-update for Claude Code history (~/.claude)start-task / finish-task) is now an explicit opt-in enhancement layerai-agents-metrics-policy.md) updated with Two-Tier Model section to reflect the primary/opt-in splitdocs/ai-agents-metrics-policy.md at build time — single source of truthrender-html command: self-contained interactive HTML report with four Canvas-rendered trend charts (Successful Tasks, Retry Pressure, Token Cost Breakdown, Cost per Success)First public release.
start-task, continue-task, finish-task, show, bootstrap commandsmake verify-public-boundary)llms.txt for AI engine discoverability