Priority: medium Complexity: medium Status: done
The ingest → normalize → derive pipeline passes data through SQLite. The schema between stages exists only in column names and SQL queries — not in Python types:
history_ingest.py → SQLite (raw) → history_normalize.py
history_normalize.py → SQLite (normalized) → history_derive.py
history_derive.py → GoalRecord / AttemptEntryRecord
If normalize renames a column, derive will raise a KeyError at runtime. mypy cannot catch this. Adding a new field in one stage is not reflected in the next stage’s signature.
Add TypedDict (or dataclasses) for rows at stage boundaries:
# history_normalize.py
class NormalizedMessageRow(TypedDict):
session_id: str
message_id: str
role: str
timestamp: str | None
...
# history_derive.py accepts Iterable[NormalizedMessageRow]
SQLite remains the storage medium, but Python types serve as a documented contract. Schema mismatches become visible at type-check time rather than at runtime.
history_normalize.py and history_derive.py use these types in function signatures# type: ignore suppressionsmake verify passes