Schema Overview
opentraces uses a training-first JSONL schema where each line is one complete agent trace. The schema is a superset of ATIF v1.6, informed by ADP and field patterns from existing HF datasets.
Design Principles
- Training / SFT - Clean message sequences with role labels, tool-use as tool_call/tool_result pairs, outcome signals.
- RL / RLHF - Trajectory-level reward signals, step-level annotations, decision point identification.
- Telemetry - Token counts, latency, model identifiers, cache hit rates, cost estimates.
- Cross-agent - Represents traces from Claude Code, Cursor, Cline, Codex, and future agents without agent-specific fields.
Top-Level Structure
{
"schema_version": "0.3.0",
"trace_id": "uuid",
"session_id": "uuid",
"content_hash": "<sha256-hex>",
"timestamp_start": "ISO8601",
"timestamp_end": "ISO8601",
"execution_context": "devtime",
"task": { },
"agent": { },
"environment": { },
"system_prompts": { },
"tool_definitions": [ ],
"steps": [ ],
"outcome": { },
"dependencies": [ ],
"metrics": { },
"security": { },
"attribution": { },
"lifecycle": "provisional",
"git_links": [ ],
"generation_index": 0,
"metadata": { }
}
Key Design Decisions
| Decision | Rationale |
|---|---|
steps not turns | Each step is an LLM API call, not a conversational turn. Aligns with ATIF's TAO loop. |
role: "agent" not "assistant" | Follows ATIF convention (system, user, agent). |
| Tool calls separated from observations | Preserves call/result separation training pipelines depend on. |
| System prompt dedup | Hash-based lookup table. A 20K-token prompt repeated across steps would be wasteful. |
parent_step per step | Precise parent-child tree for sub-agents, not a flat trace-level array. |
content_hash | Two scopes, two algorithms by design. Top-level TraceRecord.content_hash is SHA-256 of the serialized record — cryptographic collision resistance for cross-contributor dedup at upload time. AttributionRange.content_hash is murmur3:<32-hex> — fast cross-tool matching of specific line ranges, per Agent Trace v0.1.0. The murmur3 prefix (added 0.3.0) replaces the prior md5-truncated form and only applies to attribution-range hashes. |
reasoning_content | Explicit chain-of-thought field. Improved SWE-Bench by ~3 pts (Cognition data). |
outcome.committed | Did the trace's changes get committed? Cheap, deterministic quality signal. |
attribution | Embedded Agent Trace block. Bridges trajectory (process) with code attribution (output). |
Schema Package
The schema is a standalone Python package:
pip install opentraces-schema
from opentraces_schema import TraceRecord, SCHEMA_VERSION
record = TraceRecord(
trace_id="abc-123",
session_id="sess-456",
agent={"name": "claude-code", "version": "1.0.32"},
)
line = record.to_jsonl_line()
See TraceRecord, Steps, and Outcome & Attribution for field-level detail.