Schema Overview

opentraces uses a training-first JSONL schema where each line is one complete agent trace. The schema is a superset of ATIF v1.6, informed by ADP and field patterns from existing HF datasets.

Design Principles

Training / SFT - Clean message sequences with role labels, tool-use as tool_call/tool_result pairs, outcome signals.
RL / RLHF - Trajectory-level reward signals, step-level annotations, decision point identification.
Telemetry - Token counts, latency, model identifiers, cache hit rates, cost estimates.
Cross-agent - Represents traces from Claude Code, Cursor, Cline, Codex, and future agents without agent-specific fields.

Top-Level Structure

{
  "schema_version": "0.7.0",
  "trace_id": "uuid",
  "session_id": "uuid",
  "content_hash": "<sha256-hex>",
  "timestamp_start": "ISO8601",
  "timestamp_end": "ISO8601",
  "execution_context": "devtime",
  "task": { },
  "agent": { },
  "environment": { },
  "system_prompts": { },
  "tool_definitions": [ ],
  "steps": [ ],
  "outcome": { },
  "dependencies": [ ],
  "metrics": { },
  "security": { },
  "attribution": { },
  "lifecycle": "provisional",
  "patches": [ ],
  "git_links": [ ],
  "context_tree_summary": { },
  "generation_index": 0,
  "metadata": { }
}

Key Design Decisions

Decision	Rationale
`steps` not `turns`	Each step is an LLM API call, not a conversational turn. Aligns with ATIF's TAO loop.
`role: "agent"` not `"assistant"`	Follows ATIF convention (`system`, `user`, `agent`).
Tool calls separated from observations	Preserves call/result separation training pipelines depend on.
System prompt dedup	Hash-based lookup table. A 20K-token prompt repeated across steps would be wasteful.
`parent_step` per step	Precise parent-child tree for sub-agents, not a flat trace-level array.
`content_hash`	Two scopes, two algorithms by design. Top-level `TraceRecord.content_hash` is SHA-256 of the serialized record — cryptographic collision resistance for cross-contributor dedup at upload time. `AttributionRange.content_hash` is `murmur3:<32-hex>` — fast cross-tool matching of specific line ranges, per Agent Trace v0.1.0. The murmur3 prefix (added 0.3.0) replaces the prior md5-truncated form and only applies to attribution-range hashes.
`reasoning_content`	Explicit chain-of-thought field. Improved SWE-Bench by ~3 pts (Cognition data).
`outcome.committed`	Did the trace's changes get committed? Cheap, deterministic quality signal.
`patches[]`	Authoritative dev-time output set. One `Patch` per tool-produced change/hunk, joined to Trace Trails for full patch history.
`context_tree_summary`	Compact roll-up of Context Tree capture so consumers can tell whether "what the agent saw" evidence exists.
`attribution`	Embedded Agent Trace block. Bridges trajectory (process) with code attribution (output).

Schema Package

The schema is a standalone Python package:

pip install opentraces-schema

from opentraces_schema import TraceRecord, SCHEMA_VERSION

record = TraceRecord(
    trace_id="abc-123",
    session_id="sess-456",
    agent={"name": "claude-code", "version": "1.0.32"},
)
line = record.to_jsonl_line()

See TraceRecord, Steps, and Outcome & Attribution for field-level detail.

●HUMAN ○MACHINE