# open traces

> Open-source CLI for repo-local agent trace capture, review, and upload to Hugging Face Hub. React inbox, terminal inbox, and structured JSONL schema.

## Links

- Documentation: https://opentraces.ai/docs
- GitHub: https://github.com/jayfarei/opentraces
- Explorer: https://opentraces.ai/explorer
- Schema: https://opentraces.ai/schema

## Full Documentation

---

# opentraces

Open schema + CLI for agent trace capture, review, and upload to Hugging Face Hub.

Every coding session with an AI agent produces action trajectories, tool-use sequences, and reasoning chains. These are the most valuable dataset nobody is collecting in the open. opentraces captures them automatically, scans for secrets, and publishes structured JSONL datasets to HuggingFace Hub. Private by default. You control what leaves your machine.

## What you get

**As a developer.** Share traces, get analytics back. Cost per session, cache hit rates, tool usage patterns, success rates. Your Spotify Wrapped for coding agents. Traces are searchable by dependency, so framework maintainers and researchers can find sessions relevant to their stack.

**As an ML team.** Real workflows, not synthetic benchmarks. Outcome signals for RL. Tool sequences for SFT. Sub-agent hierarchy for orchestration research. One dataset, many consumers, no vendor lock-in.

**As a team lead.** Commit traces automatically to a shared private dataset. Run downstream analytics, fine-tuning, or evaluation jobs on top of it. All on HuggingFace, all standard tooling.

## Schema designed for downstream use

The [schema](/docs/schema/overview) is built for the people who consume traces, not just the tools that produce them. It is a superset of ATIF, informed by ADP and Agent Trace, and works across Claude Code, Cursor, Cline, Codex, and future agents without agent-specific fields.

- **Training / SFT** — Clean message sequences with role labels, tool-use as tool_call/tool_result pairs, outcome signals.
- **RL / RLHF** — Trajectory-level reward signals, step-level annotations, decision point identification via sub-agent hierarchy.
- **Telemetry** — Token counts, latency, model identifiers, cache hit rates, cost estimates per step.
- **Code attribution** *(experimental)* — File and line-level attribution linking each edit back to the agent step that produced it. Confidence varies by session complexity.

## Docs

| Section | What's inside |
|---------|---------------|
| **[Installation](/docs/getting-started/installation)** | Install, verify, upgrade |
| **[Authentication](/docs/getting-started/authentication)** | Hugging Face login and credentials |
| **[Quick Start](/docs/getting-started/quickstart)** | Init, inbox, commit, push |
| **[Commands](/docs/cli/commands)** | Public and hidden CLI surface |
| **[Security Modes](/docs/security/tiers)** | Review policy, security pipeline |
| **[Schema](/docs/schema/overview)** | TraceRecord, steps, outcome, attribution |
| **[Workflow](/docs/workflow/parsing)** | Parse, review, assess, push, consume |
| **[CI/CD](/docs/integration/ci-cd)** | Headless automation and token auth |
| **[Contributing](/docs/contributing/development)** | Local dev and schema changes |


---

# Installation

## pipx

```bash
pipx install opentraces
```

## brew

```bash
brew install JayFarei/opentraces/opentraces
```

## skills.sh

```bash
npx skills add jayfarei/opentraces
```

Installs the opentraces skill via [skills.sh](https://skills.sh) so your coding agent can drive the full workflow (init, review, push) conversationally. Works with Claude Code, Cursor, Codex, and any agent that supports skills. `opentraces init` also auto-installs the skill when you initialize a project.

## Copy to your agent

Paste this into your coding agent (Claude Code, Cursor, Codex, etc.):

```
{{AGENT_PROMPT}}
```

The agent installs the CLI, authenticates, and initializes. `init` handles the skill installation automatically. After that the agent uses the skill file for everything else.

## From Source

```bash
git clone https://github.com/jayfarei/opentraces
cd opentraces
python3 -m venv .venv
source .venv/bin/activate
pip install -e packages/opentraces-schema
pip install -e ".[dev]"
```

## Verify Installation

```bash
opentraces --version
```

## System Requirements

| Platform | Status |
|----------|--------|
| macOS (ARM64, x86_64) | Supported |
| Linux (x86_64, ARM64) | Supported |
| Windows (WSL) | Supported via Linux binary |

Python 3.10 or later is required.

## Upgrading

The preferred in-project upgrade path is:

```bash
opentraces upgrade
```

Auto-detects whether you installed via pipx, brew, or pip and upgrades accordingly. Also refreshes the skill file and session hook in the current project.

If you are outside a project context, use the direct package manager command instead:

```bash
pip install --upgrade opentraces
```

## Uninstalling

```bash
pip uninstall opentraces
```

To also remove local data and credentials:

```bash
rm -rf ~/.opentraces
```

---

# Authentication

opentraces publishes to HuggingFace Hub. You need an HF account.

## Token Login (Recommended)

```bash
opentraces login --token
```

Prompts for a HuggingFace access token. Generate one at [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens) with **write** scope. This is required for creating datasets and pushing traces.

## Browser Login

```bash
opentraces login
```

Opens a browser-based OAuth device code flow, similar to `gh auth login`. You'll see a short code to enter at huggingface.co.

> **Note:** OAuth tokens can write to existing dataset repos but cannot create new ones. If your dataset repo doesn't exist yet, use `opentraces login --token` with a write-access personal access token to create it.

## Environment Variable

```bash
export HF_TOKEN=hf_...
```

The CLI checks for `HF_TOKEN` automatically. Useful in CI pipelines where interactive login isn't available.

## Auth Precedence

1. `HF_TOKEN` environment variable
2. Stored credentials from `opentraces login`

## Verify

```bash
opentraces whoami
```

Shows your authenticated HuggingFace username.

## Logout

```bash
opentraces logout
```

Clears stored credentials from `~/.opentraces/credentials.json`.

---

# Quick Start

From local inbox to published dataset.

## 1. Install

```bash
pipx install opentraces
```

## 2. Authenticate

```bash
opentraces login --token
```

Paste a HuggingFace access token with **write** scope from [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens). Use `HF_TOKEN` instead if you are running headless.

## 3. Initialize the Project

```bash
opentraces init --review-policy review --start-fresh
```

This creates `.opentraces/config.json`, `.opentraces/staging/`, the agent session hook, and installs the opentraces skill into `.agents/skills/opentraces/`. If you omit the flags, `opentraces init` will prompt for the same choices interactively.

If your agent already has session logs for this repo, pass `--import-existing` to pull that backlog into the inbox now. Use `--start-fresh` if you only want capture from your next connected session onward.

## 4. Open the Inbox

### Web inbox

```bash
opentraces web
```

The browser inbox shows a timeline of each session's steps, tool calls, and reasoning. Switch to the review view to see context items grouped by source.

![Web inbox - timeline view](/docs/assets/web-timeline.png)

![Web inbox - review view](/docs/assets/web-review.png)

### Terminal inbox

```bash
opentraces tui
```

The TUI shows sessions, summary, and detail in a three-panel layout. Use keyboard shortcuts to navigate, commit, reject, or discard traces.

![Terminal inbox](/docs/assets/tui.png)

Use `session list`, `session commit`, `session reject`, and `session redact` if you prefer direct CLI control.

## 5. Commit and Push

```bash
opentraces commit --all
opentraces push
```

`commit` moves inbox traces to the committed stage. `push` uploads committed traces to `{username}/opentraces` on Hugging Face Hub as sharded JSONL and updates the dataset card.

## What Happens Next

Your traces are available as a Hugging Face dataset:

```python
from datasets import load_dataset

ds = load_dataset("your-name/opentraces")
```

## Next Steps

- [Security Modes](/docs/security/tiers) - Review policy and security pipeline
- [CLI Reference](/docs/cli/commands) - Full command reference
- [Schema Overview](/docs/schema/overview) - What is stored in a trace record

---

# Commands

Complete reference for the current opentraces CLI surface.

## Public Commands

| Command | Description |
|---------|-------------|
| `opentraces login` | Authenticate with Hugging Face Hub |
| `opentraces logout` | Clear stored credentials |
| `opentraces whoami` | Print the active Hugging Face identity |
| `opentraces auth` | Authentication subcommands (`login`, `logout`, `status`) |
| `opentraces init` | Initialize the current project inbox |
| `opentraces remove` | Remove the local inbox from the current project |
| `opentraces status` | Show inbox status and counts |
| `opentraces remote` | Manage the configured dataset remote |
| `opentraces session` | Inspect and edit staged traces |
| `opentraces commit` | Commit inbox traces for upload |
| `opentraces push` | Upload committed traces to Hugging Face Hub |
| `opentraces assess` | Run quality assessment on committed traces or a remote dataset |
| `opentraces web` | Open the browser inbox UI |
| `opentraces tui` | Open the terminal inbox UI |
| `opentraces stats` | Show aggregate inbox statistics |
| `opentraces context` | Return machine-readable project context |
| `opentraces config show` | Display current config |
| `opentraces config set` | Update config values |
| `opentraces import-hf` | Import traces from a HuggingFace dataset |
| `opentraces hooks install` | Install Claude Code session capture hooks |
| `opentraces log` | List uploaded traces grouped by date |
| `opentraces upgrade` | Upgrade CLI and refresh project skill file |

## Authentication

### `opentraces login`

Authenticate with Hugging Face Hub.

```bash
opentraces login --token
opentraces login
```

| Flag | Default | Description |
|------|---------|-------------|
| `--token` | off | Paste a personal access token (required for pushing) |

> **Recommended:** Use `opentraces login --token` with a write-access PAT from [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens). The browser OAuth flow (`opentraces login` without `--token`) authenticates your identity but cannot create or push to dataset repos.

### `opentraces logout`

Clear stored Hugging Face credentials.

### `opentraces auth`

Authentication subcommands:

```bash
opentraces auth status
opentraces auth login
opentraces auth logout
```

## Project Setup

### `opentraces init`

Initialize opentraces in the current project directory. Creates `.opentraces/config.json`, `.opentraces/staging/`, and the Claude Code hook.
If Claude Code already has session files for this repo, the interactive flow can import that backlog into the inbox immediately.

```bash
opentraces init
opentraces init --review-policy review --start-fresh
opentraces init --review-policy auto --import-existing
opentraces init --review-policy review --remote your-name/opentraces --start-fresh
```

| Flag | Default | Description |
|------|---------|-------------|
| `--agent` | detected interactively | Agent runtime to connect |
| `--review-policy` | prompt | `review` or `auto` |
| `--import-existing / --start-fresh` | prompt when backlog exists | Whether to import existing Claude Code sessions for this repo during init |
| `--remote` | unset | HF dataset repo (`owner/name`) |
| `--no-hook` | off | Skip Claude Code hook installation |
| `--private / --public` | private | Dataset visibility when creating the remote repo |

`--mode` is a legacy alias kept for compatibility.

`init` also installs the opentraces skill into `.agents/skills/opentraces/` and symlinks it into the selected agent's skill directory (e.g., `.claude/commands/opentraces/` for Claude Code).

### `opentraces remove`

Remove the local `.opentraces/` inbox and Claude Code hook from the current project.

### `opentraces upgrade`

Upgrade the CLI and refresh the skill file and session hook in the current project.

```bash
opentraces upgrade              # upgrade CLI + refresh skill and hook
opentraces upgrade --skill-only # just refresh the skill file and hook
```

| Flag | Default | Description |
|------|---------|-------------|
| `--skill-only` | off | Skip CLI upgrade, only refresh the skill file and hook |

Detects the install method (pipx, brew, pip, source) and runs the appropriate upgrade command. Then re-copies the latest skill file into `.agents/skills/opentraces/` and updates the session hook.

### `opentraces config show`

Display the current user config with secrets masked.

### `opentraces config set`

Update configuration values.

```bash
opentraces config set --exclude /path/to/client-project
opentraces config set --redact "INTERNAL_API_KEY"
```

| Flag | Description |
|------|-------------|
| `--exclude` | Append a project path to the exclusion list |
| `--redact` | Append a literal custom redaction string |
| `--pricing-file` | Override token pricing table |
| `--classifier-sensitivity` | `low`, `medium`, or `high` |

## Inbox and Review

### `opentraces web`

Open the browser inbox UI. This serves the React viewer from `web/viewer/` through the local Flask app.

```bash
opentraces web
opentraces web --port 8080
opentraces web --no-open
```

| Flag | Default | Description |
|------|---------|-------------|
| `--port` | `5050` | Local port |
| `--no-open` | off | Do not auto-open the browser |

### `opentraces tui`

Open the terminal inbox UI.

```bash
opentraces tui
opentraces tui --fullscreen
```

### `opentraces session`

Fine-grained review commands for staged traces.

```bash
opentraces session list
opentraces session show <trace-id>
opentraces session show <trace-id> --verbose
opentraces session commit <trace-id>
opentraces session reject <trace-id>
opentraces session reset <trace-id>
opentraces session redact <trace-id> --step 3
opentraces session discard <trace-id> --yes
```

`session list` accepts `--stage inbox|committed|pushed|rejected`, `--model`, `--agent`, and `--limit`.

`session show` truncates step content to 500 chars in human output by default to protect context windows. Pass `--verbose` to see full content, or use `opentraces --json session show <id>` to get the complete record as JSON (never truncated).

## Upload

### `opentraces commit`

Commit inbox traces into a commit group for upload.

```bash
opentraces commit --all
opentraces commit -m "Fix parser and update schema"
```

### `opentraces push`

Upload committed traces to Hugging Face Hub as sharded JSONL files.

```bash
opentraces push
opentraces push --private
opentraces push --public
opentraces push --publish
opentraces push --gated
opentraces push --assess
opentraces push --repo user/custom-dataset
```

| Flag | Default | Description |
|------|---------|-------------|
| `--private` | off | Force private visibility |
| `--public` | off | Force public visibility |
| `--publish` | off | Publish an existing private dataset |
| `--gated` | off | Enable gated access on the dataset |
| `--assess` | off | Run quality assessment after upload and embed scores in dataset card |
| `--repo` | `{username}/opentraces` | Target HF dataset repo |

`--approved-only` is not part of the current CLI. The public path is `commit -> push`.

### `opentraces assess`

Run quality assessment on committed traces or a full remote dataset.

```bash
opentraces assess
opentraces assess --judge
opentraces assess --judge --judge-model sonnet
opentraces assess --limit 50
opentraces assess --all-staged
opentraces assess --compare-remote
opentraces assess --dataset user/my-traces
```

| Flag | Default | Description |
|------|---------|-------------|
| `--judge / --no-judge` | off | Enable LLM judge for qualitative scoring |
| `--judge-model` | `haiku` | Model for LLM judge: `haiku`, `sonnet`, or `opus` |
| `--limit` | `0` (all) | Maximum number of traces to assess |
| `--compare-remote` | off | Fetch the remote dataset's `quality.json` and show score deltas |
| `--all-staged` | off | Assess all staged traces instead of COMMITTED-only |
| `--dataset TEXT` | unset | Assess a full remote HF dataset (e.g. `user/my-traces`). Downloads all shards, runs assessment, and updates `README.md` and `quality.json` on the dataset repo. Does not require hf-mount. |

By default, `assess` targets only **committed** traces, matching the population that `push` would upload. Use `--all-staged` to include traces that are staged but not yet committed.

`--dataset` is independent of the local inbox. It downloads shards from the specified HF dataset repo and updates that repo's dataset card and `quality.json` sidecar in place, without requiring a new push.

### opentraces import-hf

Import traces from a HuggingFace dataset into your local inbox.

```bash
opentraces import-hf DATASET_ID [OPTIONS]
```

| Flag | Description |
|------|-------------|
| `DATASET_ID` | HuggingFace dataset ID (e.g. `user/my-traces`) |
| `--parser` | Parser to use: `hermes` or `generic` (default: `hermes`) |
| `--subset` | Dataset subset/config name |
| `--split` | Dataset split (default: `train`) |
| `--limit` | Maximum number of traces to import |
| `--auto` | Commit imported traces immediately, skip inbox |
| `--dry-run` | Preview import without writing any files |

Exit codes: `0` success, `1` partial failure (some traces rejected by quality gate).

### opentraces hooks install

Install Claude Code session capture hooks into the current project. Hooks run automatically at session end (`on_stop`) and after context compaction (`on_compact`) to enrich traces with session metadata.

```bash
opentraces hooks install
```

Run this once per project after `opentraces init`.

### `opentraces remote`

Manage the configured dataset remote.

```bash
opentraces remote
opentraces remote set owner/dataset
opentraces remote set owner/dataset --private
opentraces remote set owner/dataset --public
opentraces remote remove
```

### `opentraces status`

Show the current project inbox, counts, review policy, agents, and remote.

### `opentraces log`

List uploaded traces grouped by date. Shows trace IDs, timestamps, models used, and step counts for traces that have been pushed to the remote.

```bash
opentraces log
```

### `opentraces stats`

Show aggregate counts, token totals, estimated cost, model distribution, and stage counts for the current inbox. Useful for understanding your contribution volume and cost breakdown.

```bash
opentraces stats
```

### `opentraces context`

The agent's "what should I do next?" command. Returns project config, auth status, counts per stage, and a `suggested_next` command. Start here when resuming work or when uncertain about state.

```bash
opentraces context
opentraces --json context
```

## Machine-Readable Output

Add `--json` to any command to suppress human-readable text and get structured JSON only:

```bash
opentraces --json context
opentraces --json session list --stage inbox
opentraces --json push
```

JSON is emitted after the sentinel line `---OPENTRACES_JSON---`. When parsing programmatically, split on this sentinel and parse the text that follows.

Every JSON response includes:

| Field | Description |
|-------|-------------|
| `status` | `"ok"`, `"error"`, or `"needs_action"` |
| `next_steps` | Array of suggested next actions (human-readable) |
| `next_command` | The single most likely next command to run |

### CI / headless / agent mode

When `stdout` is not a TTY, bare `opentraces` prints help text instead of launching the TUI. You can also force this explicitly:

```bash
OPENTRACES_NO_TUI=1 opentraces    # always prints help, never opens TUI
```

`HF_TOKEN` is also respected as the highest-priority credential source, so CI pipelines can authenticate without running `opentraces login`.

## Hidden and Internal Commands

These commands exist for automation, compatibility, or diagnostics and are hidden from normal help output:

| Command | Purpose |
|---------|---------|
| `opentraces discover` | List available agent sessions across all projects |
| `opentraces parse` | Parse agent sessions into enriched JSONL traces (global mode) |
| `opentraces review` | Legacy alias for `web`/`tui`/`session` |
| `opentraces export` | Export traces to other formats (e.g., `--format atif`) |
| `opentraces migrate` | Check schema version and run migrations |
| `opentraces capabilities --json` | Machine-discoverable feature list, supported agents, versions |
| `opentraces introspect` | Full API schema and TraceRecord JSON schema for automation |
| `opentraces _capture` | Invoked by the Claude Code SessionEnd hook to auto-capture sessions |
| `opentraces _assess-remote` | Force quality assessment on a remote dataset via hf-mount (automation only) |

## Exit Codes

| Code | Meaning |
|------|---------|
| `0` | Success |
| `2` | Usage error (bad flags, conflicting options) |
| `3` | Auth/config error (not authenticated, not initialized) |
| `4` | Network or upload error |
| `5` | Data corruption / invalid state |
| `6` | Not found (trace ID, project, or resource) |
| `7` | Lock contention / busy state |

---

# Supported Agents

opentraces currently ships with two live parsers: Claude Code and Hermes.

## Current Support

| Agent | Identifier | Category | Status |
|-------|-----------|----------|--------|
| Claude Code | `claude-code` | dev-time | Supported |
| Hermes | `hermes` | run-time | Supported |
| Cursor | `cursor` | dev-time | Planned |
| Codex | `codex` | dev-time | Planned |
| OpenCode | `opencode` | dev-time | Planned |
| OpenClaw | `openclaw` | run-time | Planned |
| NemoClaw | `nemoclaw` | run-time | Planned |

## How Detection Works

The parser registry is discovered at runtime from `src/opentraces/parsers/`.

```python
from opentraces.parsers import get_parsers

supported = list(get_parsers().keys())
```

`opentraces init --agent ...` uses the same registry to validate agent selection.

## What Parsers Extract

All parsers normalize agent sessions into the opentraces schema with:

- user / agent / system steps
- tool calls and observations
- system prompt deduplication
- snippets from edit/write activity
- per-step token usage
- sub-agent hierarchy when present

## Adapter Contract

New parsers implement the `SessionParser` protocol:

```python
class SessionParser(Protocol):
    agent_name: str
    def discover_sessions(self, projects_path: Path) -> Iterator[Path]: ...
    def parse_session(self, session_path: Path, byte_offset: int = 0) -> TraceRecord | None: ...
```

That keeps the parser surface small and lets new agents plug in without changing the review or upload workflow.

---

# Troubleshooting

## First Check

```bash
opentraces status
```

`status` shows the inbox summary, review policy, agents, remote, and stage counts.

## Common Issues

### "No HF token found"

Run:

```bash
opentraces login --token
```

Paste a write-access token from [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens). Or export `HF_TOKEN` in your shell:

```bash
export HF_TOKEN=hf_...
```

### "Not initialized"

Run `opentraces init` in the project directory. That creates `.opentraces/config.json` and `.opentraces/staging/`.

### "No sessions found"

Claude Code session files live under `~/.claude/projects/`. If there are no session files, start a Claude Code conversation first.

`opentraces discover` is a hidden diagnostic command if you need to inspect the raw session directories.

### Parse Errors

If a specific trace looks wrong:

```bash
opentraces session list
opentraces session show <trace-id>
opentraces session redact <trace-id> --step 3
```

### Push Fails With 403

Your token does not have write access. OAuth device tokens (from `opentraces login`) cannot create or write to dataset repos. Re-authenticate with a personal access token:

```bash
opentraces login --token
```

Paste a token with **write** scope from [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens).

### Resetting Local State

```bash
rm -rf .opentraces/
opentraces init
```

To clear credentials as well:

```bash
opentraces logout
```

---

# Parsing

Parsing is the ingestion step that turns raw agent session logs into staged `TraceRecord` JSONL files.

## What Runs Automatically

When `opentraces init` installs the agent session hook, the hidden `_capture` command runs after each session ends. That capture path:

1. Finds new session files under `~/.claude/projects/`
2. Parses the raw session into a `TraceRecord`
3. Filters out trivial sessions with fewer than 2 steps or no tool calls
4. Runs the enrichment and security pipeline
5. Writes the result to `.opentraces/staging/<trace-id>.jsonl`

## Enrichment Pipeline

Every parsed trace is enriched before staging:

| Step | What it does | Example output |
|------|-------------|----------------|
| Git signals | Detects repo state, extracts commit info | `committed: true`, `commit_sha: "a3f9..."`, `branch: "main"` |
| Attribution | Maps Edit and Write tool calls to file/line ranges | `auth.py L42-67` attributed to step 4 |
| Dependencies | Extracts from manifests and install commands | `["flask", "pydantic"]` from `pyproject.toml` |
| Metrics | Aggregates token counts, cost, cache rates | `cache_hit_rate: 0.91`, `estimated_cost_usd: 3.21` |
| Security scan | Regex + entropy scan, tiered redaction | API key in Bash output replaced with `[REDACTED]` |
| Classification | Tier 2 heuristic flagging for review | Internal hostname `*.corp` flagged for manual review |
| Anonymization | Strips usernames and home paths | `/Users/alice/project/` becomes `/~/project/` |

## Review Policy Interaction

`review_policy` controls where a parsed trace lands:

| Policy | Result |
|--------|--------|
| `review` | Trace lands in `Inbox` for manual review |
| `auto` | Clean traces are committed and pushed automatically |

The review surface still exists either way. `auto` just reduces the amount of manual triage needed, and traces with scan hits still land in the inbox.

## Parsing Existing Sessions

To import sessions that were recorded before you ran `opentraces init`, pass `--import-existing` at init time:

```bash
opentraces init --import-existing
```

This runs a one-off batch parse of all existing Claude Code sessions for the current project directory, applying the same enrichment and security pipeline as the hook.

### Internal Batch Commands

`discover` and `parse` are hidden commands available for diagnostics and manual batch processing:

```bash
opentraces discover          # list all Claude Code projects with session files
opentraces parse             # parse all unparsed sessions into staging
opentraces parse --auto      # parse and auto-approve (skip review)
opentraces parse --limit 10  # parse at most 10 sessions
```

These bypass the hook path and write directly to `.opentraces/staging/`. The user-facing path for ongoing collection is the hook plus inbox workflow.

## What Gets Filtered

- Sessions with fewer than 2 steps
- Sessions with zero tool calls
- Duplicate sessions by `content_hash`

## Next Step

```bash
opentraces web
```

Use the browser inbox or `opentraces tui` to review the staged traces before committing them.

---

# Inbox

The inbox is where you inspect and edit staged traces before committing. Use the web inbox, the terminal inbox, or the `session` CLI subcommands.

## Web Inbox

```bash
opentraces web
opentraces web --port 8080
```

This serves the React viewer from `web/viewer/` through the local Flask app. The timeline view shows each step with tool calls and token counts. The review view groups context items by source (user input, filesystem, external, LLM output).

![Web inbox - timeline view](/docs/assets/web-timeline.png)

![Web inbox - review view](/docs/assets/web-review.png)

## Terminal Inbox

```bash
opentraces tui
opentraces tui --fullscreen
```

Three-panel layout: sessions list, summary, and detail. Keyboard shortcuts for navigation, commit, reject, and discard.

![Terminal inbox](/docs/assets/tui.png)

## CLI

```bash
opentraces session list
opentraces session show <trace-id>
opentraces session show <trace-id> --verbose
opentraces session commit <trace-id>
opentraces session reject <trace-id>
opentraces session reset <trace-id>
opentraces session redact <trace-id> --step 3
opentraces session discard <trace-id> --yes
```

`commit` moves a trace directly to `Committed`, `reject` keeps it local only, `reset` sends it back to `Inbox`, and `redact` rewrites the staged JSONL in place. `session show --verbose` prints the full step-level detail including raw tool inputs and outputs.

## Stage Vocabulary

| Stage | Meaning |
|-------|---------|
| `inbox` | Needs review |
| `committed` | Bundled for upload |
| `pushed` | Published upstream |
| `rejected` | Kept local only |

## What To Look For

- Secrets that escaped redaction
- Internal hostnames and collaboration URLs
- Customer names, paths, or identifiers
- Traces that are too short or too trivial
- Tool outputs that should be redacted before sharing

## Inbox Flow

```bash
opentraces session commit <trace-id>
opentraces commit --all
opentraces push
```

If you need the old compatibility entry point, `opentraces review` still exists as a hidden alias, but `web`, `tui`, and `session` are the current surfaces.

---

# Push

`opentraces push` uploads committed traces to Hugging Face Hub as sharded JSONL files. Only committed traces are uploaded — run `opentraces commit` first if needed.

## Options

```bash
opentraces push --private
opentraces push --public
opentraces push --publish
opentraces push --gated
opentraces push --assess
opentraces push --repo user/custom-dataset
```

| Flag | Default | Description |
|------|---------|-------------|
| `--private` | off | Force private visibility |
| `--public` | off | Force public visibility |
| `--publish` | off | Change an existing private dataset to public |
| `--gated` | off | Enable gated access on the dataset |
| `--assess` | off | Run quality assessment after upload and embed scores in dataset card |
| `--repo` | `{username}/opentraces` | Target HF dataset repo |

`--approved-only` is not part of the current CLI. The supported path is `commit -> push`.

## How Upload Works

Each push creates a new JSONL shard. Existing data is never overwritten or appended to.

```text
data/
  traces_20260329T142300Z_a1b2c3d4.jsonl
  traces_20260401T091500Z_e5f6a7b8.jsonl   <- new shard from this push
```

That means:

- Each push is atomic
- No merge conflicts between contributors
- Dataset history grows by shard

## Dataset Card

`push` generates or updates a `README.md` dataset card on every successful upload. The card aggregates statistics across **all** shards in the repo, not just the current batch, so counts are always accurate.

The card records:

- schema version
- trace counts, steps, and tokens
- model and agent distribution
- date range
- average cost and success rate (when available)

A machine-readable JSON block is embedded for programmatic consumers:

```html
<!-- opentraces:stats
{"total_traces":1639,"avg_steps_per_session":42,...}
-->
```

### Quality scorecard (`--assess`)

`opentraces push --assess` runs quality scoring after upload and embeds the results in the dataset card. Here's what it looks like on a live dataset:

[![Overall Quality 78.1%](https://img.shields.io/badge/Overall_Quality-78.1%25-ffc107)](https://opentraces.ai) [![Gate FAILING](https://img.shields.io/badge/Gate-FAILING-dc3545)](https://opentraces.ai) ![Conformance 88.4%](https://img.shields.io/badge/Conformance-88.4%25-28a745) ![Training 89.0%](https://img.shields.io/badge/Training-89.0%25-28a745) ![RL 73.4%](https://img.shields.io/badge/RL-73.4%25-ffc107) ![Analytics 55.7%](https://img.shields.io/badge/Analytics-55.7%25-fd7e14) ![Domain 84.1%](https://img.shields.io/badge/Domain-84.1%25-28a745)

The scorecard embeds per-persona scores as shields.io badges, a breakdown table with PASS / WARN / FAIL per rubric, and a `quality.json` sidecar for machine consumers. See [Assess](/docs/workflow/quality) for scoring details.

## Visibility

| Setting | Who Can See | Use Case |
|---------|-------------|----------|
| Private | Only you | Sensitive code or private experiments |
| Public | Anyone | Open-source contributions |
| Gated | Anyone who requests access | Controlled sharing |

## Push Behavior by Mode

In `review` mode, you commit and push manually. In `auto` mode, clean traces are committed and pushed automatically after capture.

## Export

Export to other formats is not part of the public workflow yet. The CLI exposes a hidden stub for future automation:

```bash
opentraces export --format atif  # not yet public
```

The schema package documents ATIF, ADP, and OTel field mappings in `packages/opentraces-schema/FIELD-MAPPINGS.md`. If you need to write a converter now, start from the `TraceRecord` / `Step` model definitions there.

---

# Assess

`opentraces assess` scores committed traces against five consumer-facing rubrics. Run it after committing, before you push:

```bash
opentraces assess
```

Scores are printed to the terminal. Low-scoring traces show which checks failed so you can decide whether to fix or push anyway. Assessment only runs against committed traces — run `opentraces commit` first if your inbox isn't empty.

You can also score and push in one step with `opentraces push --assess`, which uploads and embeds the scorecard in the HuggingFace dataset card. See [Push](/docs/workflow/pushing) for details.

## How scoring works

Assessment is **deterministic by default**: every check is a Python function over
the `TraceRecord` fields. No LLM calls, no external requests, no randomness.
The same trace always produces the same score.

Each trace is scored against all five personas. Per-persona score is a weighted
average of its individual checks (0-100%). Batch score is the average across traces.

### The five personas

| Persona | What it checks | Who uses it |
|---------|----------------|-------------|
| **Conformance** | Schema validity: trace IDs, content hash, timestamps, steps present, security scanned | Anyone ingesting opentraces data |
| **Training** | SFT readiness: alternating roles, tool_call/observation pairing, reasoning coverage | Model fine-tuners |
| **RL** | Outcome signals: committed flag or terminal_state, signal confidence, cost, model ID | RLHF / reward modeling |
| **Analytics** | Observability: cache hit rate, cost, duration, per-step timestamps | Infra / cost dashboards |
| **Domain** | Discoverability: language ecosystem, dependencies, task description, VCS info | Dataset search and filtering |

### Conformance

Structural checks that apply to every trace regardless of agent type:

| Check | Description |
|-------|-------------|
| C1: schema_version | Matches current schema version |
| C2: trace_id format | Valid UUID-like string (≥32 chars with dashes) |
| C3: content_hash | 64-character hex, present |
| C4: agent name | Non-empty agent identifier |
| C5: timestamps | Both timestamp_start and timestamp_end present |
| C6: steps present | At least one step recorded |
| C7: security scanned | `security.scanned = True` |

### Training

Grounded in ADP (Agent Data Protocol) empirical requirements for SFT pipelines:

| Check | Description |
|-------|-------------|
| T1: alternating roles | user/agent steps alternate ≥90% of transitions (≥50% for conversation-turn sources) |
| T2: tool_call pairing | Every tool_call_id has a matching observation |
| T3: reasoning coverage | `reasoning_content` present on agent steps |
| T4: data cleanliness | No redaction markers in step content |

### RL

Checks the reward proxy signal appropriate to the agent's execution context:

| Check | Description |
|-------|-------------|
| RL1: outcome signal | `committed=True` for devtime agents; `terminal_state` or `reward` for runtime agents |
| RL2: signal confidence | `signal_confidence` is `derived` or `annotated` (not default) |
| RL3: cost signal | `estimated_cost_usd > 0` (differentiates traces for cost-aware RL) |
| RL4: model identified | `agent.model` populated (needed for per-model policy training) |

### Analytics

Observability checks that differentiate opentraces from trace-level-only sources.
Checks that require per-step data are automatically skipped for `conversation_turn`
fidelity sources (e.g. Hermes imports), which only have session-level timestamps:

| Check | Description |
|-------|-------------|
| A1: cache_hit_rate | Computed and in [0.0, 1.0] (skipped for runtime) |
| A2: estimated_cost | `estimated_cost_usd > 0` |
| A3: total_duration | `total_duration_s > 0` (skipped for runtime) |
| A4: step timestamps | Timestamps on >80% of steps (skipped for conversation_turn) |
| A5: token breakdown | Per-step `input_tokens` and `output_tokens` present |
| A6: token consistency | Step-sum ≈ session total (within 10%) |

### Domain

Checks that enable HuggingFace dataset discovery and filtering:

| Check | Description |
|-------|-------------|
| D1: language_ecosystem | Populated (skipped for runtime with no code-writing tool calls) |
| D2: dependencies | At least one dependency when language detected |
| D3: task description | Meaningful task description (>10 chars) |
| D4: VCS info | `environment.vcs.base_commit` present (skipped for runtime) |
| D5: code snippets | At least one snippet captured (skipped for runtime) |
| D6: attribution | Attribution data present |
| D7: agent identity | Agent name + version OR name alone for runtime sources |

## Fidelity-aware scoring

Some sources (like Hermes imports) provide conversation turns rather than individual
API calls. Checks that require call-level data are automatically marked `skipped`
for these sources and excluded from the weighted average. This prevents penalizing
community datasets for structural limitations of the source format.

The `step_fidelity` field on each trace records this: `"individual_api_call"` (devtime)
vs `"conversation_turn"` (Hermes, other community imports).

## Gate thresholds

The gate blocks push when any persona falls below its threshold:

| Persona | Min (any trace) | Min (batch average) |
|---------|-----------------|---------------------|
| Conformance | 70% | 80% |
| Training | 40% | 45% |
| RL | — | 40% |
| Analytics | 60% | 70% |
| Domain | 45% | 55% |

Gate `FAILING` does not block `push` by default. It's a signal, not a hard
stop — you can push a failing batch and the gate status will be visible in
the dataset card. Use `--gate` to enforce hard blocking (coming soon).

## Dataset card integration

When you push with `--assess`, scores are embedded in the HuggingFace dataset card as badges and a scorecard table, and written to YAML frontmatter as searchable keys. See [Push](/docs/workflow/pushing) for details.

---

# Consume

Once traces are pushed to Hugging Face Hub, how you load them depends on what you're building.

## For developers and ML teams

Use the [HuggingFace `datasets` library](https://huggingface.co/docs/datasets/en/loading) when you want structured access, pandas integration, or a PyTorch DataLoader.

=== "pandas"

    ```python
    from datasets import load_dataset

    ds = load_dataset("your-org/agent-traces")
    df = ds["train"].to_pandas()
    ```

=== "PyTorch"

    ```python
    from datasets import load_dataset

    ds = load_dataset("your-org/agent-traces")
    ds["train"].with_format("torch")
    ```

=== "Streaming"

    ```python
    from datasets import load_dataset

    # Stream without downloading the full dataset
    ds = load_dataset("your-org/agent-traces", streaming=True)
    for trace in ds["train"]:
        print(trace["session_id"])
    ```

## For agents

Use [hf-mount](https://github.com/huggingface/hf-mount) to expose the dataset as a virtual filesystem. The dataset becomes a directory of JSONL files — no Python library required, no full download, readable with any file tool call.

```bash
hf-mount your-org/agent-traces /mnt/traces
```

```python
import json
import pathlib

traces = [
    json.loads(line)
    for p in pathlib.Path("/mnt/traces").glob("*.jsonl")
    for line in p.read_text().splitlines()
    if line.strip()
]
```

The mount approach suits agents because it avoids library overhead and works with standard file reads — the same way the agent reads any local path.

## Schema reference

Each JSONL line is a `TraceRecord`. See the [schema overview](/docs/schema/overview) for field definitions, and [outcome & attribution](/docs/schema/outcome-attribution) for the reward signals used in RL workflows.

---

# Schema Overview

opentraces uses a training-first JSONL schema where each line is one complete agent session. The schema is a superset of ATIF v1.6, informed by ADP and field patterns from existing HF datasets.

## Design Principles

1. **Training / SFT** - Clean message sequences with role labels, tool-use as tool_call/tool_result pairs, outcome signals.
2. **RL / RLHF** - Trajectory-level reward signals, step-level annotations, decision point identification.
3. **Telemetry** - Token counts, latency, model identifiers, cache hit rates, cost estimates.
4. **Cross-agent** - Represents traces from Claude Code, Cursor, Cline, Codex, and future agents without agent-specific fields.

## Top-Level Structure

```json
{
  "schema_version": "0.2.0",
  "trace_id": "uuid",
  "session_id": "uuid",
  "content_hash": "sha256-hex",
  "timestamp_start": "ISO8601",
  "timestamp_end": "ISO8601",
  "execution_context": "devtime",
  "task": { },
  "agent": { },
  "environment": { },
  "system_prompts": { },
  "tool_definitions": [ ],
  "steps": [ ],
  "outcome": { },
  "dependencies": [ ],
  "metrics": { },
  "security": { },
  "attribution": { },
  "metadata": { }
}
```

## Key Design Decisions

| Decision | Rationale |
|----------|-----------|
| `steps` not `turns` | Each step is an LLM API call, not a conversational turn. Aligns with ATIF's TAO loop. |
| `role: "agent"` not `"assistant"` | Follows ATIF convention (`system`, `user`, `agent`). |
| Tool calls separated from observations | Preserves call/result separation training pipelines depend on. |
| System prompt dedup | Hash-based lookup table. A 20K-token prompt repeated across steps would be wasteful. |
| `parent_step` per step | Precise parent-child tree for sub-agents, not a flat session-level array. |
| `content_hash` | SHA-256 for dedup at upload time. |
| `reasoning_content` | Explicit chain-of-thought field. Improved SWE-Bench by ~3 pts (Cognition data). |
| `outcome.committed` | Did the session's changes get committed? Cheap, deterministic quality signal. |
| `attribution` | Embedded Agent Trace block. Bridges trajectory (process) with code attribution (output). |

## Schema Package

The schema is a standalone Python package:

```bash
pip install opentraces-schema
```

```python
from opentraces_schema import TraceRecord, SCHEMA_VERSION

record = TraceRecord(
    trace_id="abc-123",
    session_id="sess-456",
    agent={"name": "claude-code", "version": "1.0.32"},
)
line = record.to_jsonl_line()
```

See [TraceRecord](/docs/schema/trace-record), [Steps](/docs/schema/steps), and [Outcome & Attribution](/docs/schema/outcome-attribution) for field-level detail.

---

# TraceRecord

The top-level record. One per JSONL line, one per agent session.

## Identification

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `schema_version` | string | yes | Schema version, e.g. `"0.2.0"` |
| `trace_id` | string (UUID) | yes | Unique identifier for this trace |
| `session_id` | string | yes | Agent session reference |
| `content_hash` | string | no | SHA-256 of the serialized record, populated when written |
| `execution_context` | string | no | `"devtime"` (code-editing agent) or `"runtime"` (action-trajectory / RL agent). Null for pre-0.2 traces. |

## Timestamps

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `timestamp_start` | string (ISO8601) | no | Session start time |
| `timestamp_end` | string (ISO8601) | no | Session end time |

## Task

```json
{
  "task": {
    "description": "Fix the failing test in src/parser.ts",
    "source": "user_prompt",
    "repository": "owner/repo",
    "base_commit": "abc123def456..."
  }
}
```

## Agent

```json
{
  "agent": {
    "name": "claude-code",
    "version": "1.0.83",
    "model": "anthropic/claude-sonnet-4-20250514"
  }
}
```

Model identifiers follow the `provider/model-name` convention.

## Environment

```json
{
  "environment": {
    "os": "darwin",
    "shell": "zsh",
    "vcs": {
      "type": "git",
      "base_commit": "abc123...",
      "branch": "main",
      "diff": "unified diff string or null"
    },
    "language_ecosystem": ["typescript", "python"]
  }
}
```

## System Prompts

Deduplicated into a top-level lookup table. Steps reference prompts by hash.

```json
{
  "system_prompts": {
    "sp_a1b2c3": "You are Claude Code..."
  }
}
```

## Tool Definitions

The session-level tool schema list.

## Dependencies

Package names referenced during the session. Extracted from manifest files or tool calls.

```json
{
  "dependencies": ["stripe", "prisma", "next"]
}
```

## Metrics

```json
{
  "metrics": {
    "total_steps": 42,
    "total_input_tokens": 1800000,
    "total_output_tokens": 34000,
    "total_duration_s": 780,
    "cache_hit_rate": 0.92,
    "estimated_cost_usd": 2.4
  }
}
```

## Security

```json
{
  "security": {
    "scanned": true,
    "flags_reviewed": 3,
    "redactions_applied": 1,
    "classifier_version": "0.1.0"
  }
}
```

## Metadata

Open-ended object for future extensions.

## Notes

- `content_hash` is filled in when the record is serialized with `to_jsonl_line()`
- `task`, `environment`, `steps`, and the nested blocks all have defaults in the Python model
- `security.scanned` confirms the security pipeline (scan, redact, classify) was applied

---

# Steps

The `steps` array contains the conversation as a sequence of LLM API calls. Each step follows the TAO (Thought-Action-Observation) pattern from ATIF.

## Step Structure

```json
{
  "step_index": 2,
  "role": "agent",
  "content": "I'll investigate the failing test...",
  "reasoning_content": "The user wants me to...",
  "model": "anthropic/claude-sonnet-4-20250514",
  "system_prompt_hash": "sp_a1b2c3",
  "agent_role": "main",
  "parent_step": null,
  "call_type": "main",
  "tools_available": ["bash", "read", "edit", "glob", "grep", "write", "agent"],
  "tool_calls": [],
  "observations": [],
  "snippets": [],
  "token_usage": {},
  "timestamp": "ISO8601"
}
```

## Fields

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `step_index` | integer | yes | Sequential step number |
| `role` | string | yes | `"system"`, `"user"`, or `"agent"` |
| `content` | string | no | Message content; may be empty for pure tool or warmup steps |
| `reasoning_content` | string | no | Thinking content |
| `model` | string | no | Model used (`provider/model-name`) |
| `system_prompt_hash` | string | no | Reference to `system_prompts` lookup table |
| `agent_role` | string | no | `"main"`, `"explore"`, `"plan"`, etc. |
| `parent_step` | integer | no | Step index of parent (for sub-agents) |
| `call_type` | string | no | `"main"`, `"subagent"`, or `"warmup"` |
| `tools_available` | string[] | no | Tools available at this step |
| `tool_calls` | ToolCall[] | no | Tool invocations made in the step |
| `observations` | Observation[] | no | Tool results linked back by `source_call_id` |
| `snippets` | Snippet[] | no | Extracted code blocks |
| `token_usage` | TokenUsage | no | Per-step token usage breakdown |
| `timestamp` | string | no | ISO8601 timestamp |

### `call_type` Values

| Value | Description |
|-------|-------------|
| `main` | Primary agent step |
| `subagent` | Sub-agent invocation |
| `warmup` | Cache priming call with no useful output |

## Tool Calls

```json
{
  "tool_calls": [
    {
      "tool_call_id": "tc_001",
      "tool_name": "bash",
      "input": {
        "command": "npm test -- --grep parser"
      },
      "duration_ms": 3400
    }
  ]
}
```

Tool calls carry a `tool_call_id`. Observations link back via `source_call_id`.

## Observations

```json
{
  "observations": [
    {
      "source_call_id": "tc_001",
      "content": "FAIL src/parser.test.ts...",
      "output_summary": "1 test failed: parser.test.ts line 42 assertion error",
      "error": null
    }
  ]
}
```

`output_summary` is a lightweight preview so consumers can assess relevance without downloading full multi-KB outputs.

## Snippets

Code blocks extracted from tool results and agent responses:

```json
{
  "snippets": [
    {
      "file_path": "src/parser.ts",
      "start_line": 42,
      "end_line": 55,
      "language": "typescript",
      "text": "function parseToken(input: string)..."
    }
  ]
}
```

## Token Usage

Per-step token breakdown:

```json
{
  "token_usage": {
    "input_tokens": 12400,
    "output_tokens": 890,
    "cache_read_tokens": 11200,
    "cache_write_tokens": 1200,
    "prefix_reuse_tokens": 11200
  }
}
```

## Sub-Agent Hierarchy

Sub-agent steps use `parent_step` to link back to the invoking step:

```json
{
  "step_index": 5,
  "role": "agent",
  "agent_role": "explore",
  "parent_step": 3,
  "call_type": "subagent",
  "content": "Searching for related parser implementations..."
}
```

Sub-agent transcripts are linked by `session_id` reference to separate trajectory records, not embedded.

---

# Outcome & Attribution

## Outcome

The `outcome` object captures the session-level result and the confidence of the signal that set it:

Outcome fields are split by `execution_context`. Devtime agents (code-editing) use `committed`
as the primary reward proxy. Runtime agents (action-trajectory / RL) use `terminal_state` and `reward`.

**Devtime example:**

```json
{
  "outcome": {
    "success": true,
    "signal_source": "deterministic",
    "signal_confidence": "derived",
    "description": "Test passes after fix",
    "patch": "unified diff string",
    "committed": true,
    "commit_sha": "def789abc..."
  }
}
```

**Runtime example:**

```json
{
  "outcome": {
    "terminal_state": "goal_reached",
    "reward": 1.0,
    "reward_source": "rl_environment",
    "signal_confidence": "derived"
  }
}
```

### Fields

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `success` | boolean | no | Did the task succeed? |
| `signal_source` | string | no | Current implementation uses `deterministic` |
| `signal_confidence` | string | no | `derived`, `inferred`, or `annotated` |
| `description` | string | no | Human-readable outcome description |
| `patch` | string | no | Unified diff produced by the session |
| `committed` | boolean | no | Whether changes were committed to git (devtime) |
| `commit_sha` | string | no | The specific commit, if committed (devtime) |
| `terminal_state` | string | no | `goal_reached`, `interrupted`, `error`, or `abandoned` (runtime, added 0.2.0) |
| `reward` | float | no | Numeric reward signal from an RL environment or evaluator (runtime, added 0.2.0) |
| `reward_source` | string | no | Canonical: `rl_environment`, `judge`, `human_annotation`, `orchestrator` (added 0.2.0) |

### Committed as a Quality Signal

For devtime agents, a session that results in a commit is higher-signal than one abandoned or reverted. The commit hash gives a deterministic anchor for replaying the patch and comparing later revisions.

For runtime agents, `terminal_state` and `reward` serve the equivalent role — ground truth from the environment.

## Attribution

The `attribution` block records which files and line ranges were produced by the agent session.

```json
{
  "attribution": {
    "files": [
      {
        "path": "src/parser.ts",
        "conversations": [
          {
            "contributor": {
              "type": "ai",
              "model_id": "anthropic/claude-sonnet-4-20250514"
            },
            "url": "opentraces://trace/step_2",
            "ranges": [
              {
                "start_line": 42,
                "end_line": 55,
                "content_hash": "9f2e8a1b"
              }
            ]
          }
        ]
      }
    ]
  }
}
```

### How Attribution Is Constructed

Attribution is built deterministically from trace data:

1. Edit and Write tool calls provide file paths and line ranges
2. `outcome.patch` provides the unified diff for cross-checking
3. Snippets provide extracted code blocks with file positions

These are synthesized into Agent Trace-compatible `attribution` records.

### The Bridge

This field bridges trajectory (process) and attribution (output):

- `conversation.url` links each attributed range back to the step that produced it
- `content_hash` is a short stable hash for tracking attribution across refactors
- Sessions that produce no code changes have `attribution: null`

### Why Embed, Not Link

Embedding keeps the record self-contained. An opentraces record can say "here is the full conversation that produced these lines, including the reasoning, tool calls, and final diff."

## Reserved RL Fields

The schema leaves room for:

- token ID sequences for RL training
- token log probabilities
- step-level reward annotations

---

# Standards Alignment

opentraces sits at the intersection of four public standards. It adopts what works from each, and bridges the gap between trajectory (process) and attribution (output).

## ATIF / Harbor (v1.6)

[github.com/laude-institute/harbor](https://github.com/laude-institute/harbor/blob/main/docs/rfcs/0001-trajectory-format.md)

A training trajectory serialization format for agent research. Defines the step-based TAO (Thought-Action-Observation) loop, with fields for token IDs, logprobs, and reward signals designed for RL and SFT pipelines.

**Relationship:** opentraces is a superset of ATIF. We adopt the step-based model, role conventions (`system | user | agent`), and field patterns. We add attribution blocks, per-step token breakdowns, environment metadata, dependency tracking, and security metadata. The downstream field mappings live in `packages/opentraces-schema/FIELD-MAPPINGS.md`; the public export workflow is still experimental.

## ADP (Agent Data Protocol)

[arxiv.org/abs/2410.10762](https://arxiv.org/abs/2410.10762)

An interlingua for normalizing diverse agent trace formats into a common structure for training. Proposes a universal adapter layer so each dataset and each agent only needs one converter, O(D+A), instead of pairwise mappings, O(D*A).

**Relationship:** opentraces' adapter-based normalization follows the same pattern. Per-agent parsers are ADP-style adapters outputting the enriched schema.

## Agent Trace (Cursor/community, v0.1.0 RFC)

[github.com/cursor/agent-trace](https://github.com/cursor/agent-trace)

A code attribution spec (CC BY 4.0) that records which lines of code came from which agent conversation, at file/line granularity. Backed by 10+ sponsors (Cloudflare, Vercel, Google Jules, Cognition).

**Relationship:** opentraces embeds Agent Trace attribution blocks directly in the trace record. Agent Trace focuses on _output_ (code attribution), opentraces bridges that with _process_ (trajectory).

## OTel GenAI Semantic Conventions

[opentelemetry.io/docs/specs/semconv/gen-ai](https://opentelemetry.io/docs/specs/semconv/gen-ai/)

OpenTelemetry's GenAI semantic conventions define standardized span attributes for LLM calls in observability pipelines, covering model names, token counts, and request metadata.

**Relationship:** opentraces' per-step token usage and model fields align with OTel GenAI conventions, enabling cross-referencing between observability spans and training trajectories.

## The Core Insight

Agent Trace preserves _which_ lines came from AI. ATIF/ADP preserve _how_ the agent reasoned. Neither alone tells the complete story. opentraces connects the full conversation trajectory to the specific code output at line granularity.

## Message Taxonomy

opentraces adopts a training-oriented message taxonomy:

| Role | Description |
|------|-------------|
| `system` | System prompt (deduplicated by hash) |
| `user` | User message / prompt |
| `agent` | Agent response, tool calls, or thinking |

Agent steps are further classified by `call_type` (`main`, `subagent`, `warmup`) and `agent_role` (`main`, `explore`, `plan`).

---

# Schema Versioning

The opentraces schema follows semantic versioning. The version lives in `packages/opentraces-schema/src/opentraces_schema/version.py` as the single source of truth.

## Version Policy

| Change Type | Version Bump | Example |
|-------------|--------------|---------|
| New optional field | Minor | Adding `metrics.p95_latency_ms` |
| New optional model | Minor | Adding a `debugging` block |
| Field rename | Major | Renaming `steps` to `turns` |
| Field removal | Major | Removing `metadata` |
| Type change | Major | Changing `success` from boolean to string |
| Bug fix / docs | Patch | Fixing a validation regex |

## Current Version

```text
0.2.0
```

The `0.x` series means breaking changes may still land between minor versions until `1.0.0`.

## Version Checks

There is no public migration workflow today. Version checks happen when configs are normalized and when `TraceRecord` JSONL is loaded. A hidden `opentraces migrate` command still exists for diagnostics, but it only reports the current config and schema versions.

## Rationale Documents

Each schema version ships with a rationale document and a changelog entry in the schema package. See [`VERSION-POLICY.md`](https://github.com/JayFarei/opentraces/blob/main/packages/opentraces-schema/VERSION-POLICY.md) for the full versioning policy and [`CHANGELOG.md`](https://github.com/JayFarei/opentraces/blob/main/packages/opentraces-schema/CHANGELOG.md) for the release history.

## Field Mappings

The repository keeps downstream mapping tables in `packages/opentraces-schema/FIELD-MAPPINGS.md`.

---

# Security Modes

opentraces applies the same security pipeline to every trace: scan, redact, and classify. The review policy controls whether traces require manual approval before publishing.

## Review Policy

`opentraces init` sets a project-level review policy:

| Policy | Values | What it controls |
|--------|--------|------------------|
| `review_policy` | `review`, `auto` | Whether traces require manual review or are committed automatically |

```bash
opentraces init --review-policy review
opentraces init --review-policy auto
```

In `review` mode, all traces land in the inbox for manual commit and push. In `auto` mode, clean traces are committed automatically, while traces with scan hits still land in the inbox for review.

## Security Pipeline

Every trace goes through the full security pipeline:

1. **Scan** - Two-pass secret detection (field-level + serialized)
2. **Redact** - Automatic replacement of detected secrets
3. **Classify** - Heuristic flagging of internal hostnames, AWS ARNs, database URIs, deep file paths
4. **Anonymize** - Path anonymization to remove usernames

The `security.scanned` field on each trace confirms the pipeline ran. `security.redactions_applied` and `security.flags_reviewed` record what was found.

## Review Flow

```text
Trace captured
  -> parsed, scanned, redacted, classified
  -> Inbox (review mode) or auto-committed (auto mode)
  -> session commit / reject / redact
  -> push
```

```bash
opentraces web
opentraces tui
opentraces session list --stage inbox
opentraces session commit <trace-id>
opentraces commit --all
opentraces push
```

## Changing Settings

```bash
opentraces config set --redact "ACME_INTERNAL_TOKEN"
opentraces config set --classifier-sensitivity high
```

See [Security Configuration](/docs/security/configuration) for the config file shape and [Scanning & Redaction](/docs/security/scanning) for the field-by-field security pipeline.

---

# Security Configuration

Security settings are split between the user config in `~/.opentraces/config.json` and the per-project inbox config in `.opentraces/config.json`.

## User Config

The user config stores defaults shared across projects:

- `excluded_projects`
- `custom_redact_strings`
- `classifier_sensitivity`
- `dataset_visibility`

View it with:

```bash
opentraces config show
```

## Project Config

Each project keeps its inbox settings in `.opentraces/config.json`:

```json
{
  "review_policy": "review",
  "agents": ["claude-code"],
  "remote": "your-name/opentraces",
  "visibility": "private"
}
```

## Per-Project Setup

```bash
cd ~/project-a
opentraces init --review-policy review

cd ~/project-b
opentraces init --review-policy auto
```

## Exclusions

Exclude whole projects from trace collection:

```bash
opentraces config set --exclude /path/to/client-project
opentraces config set --exclude /path/to/another-sensitive-project
```

Excluded projects are skipped during capture and batch parsing.

## Custom Redaction Strings

Add literal strings that should always be redacted:

```bash
opentraces config set --redact "ACME_INTERNAL_TOKEN"
opentraces config set --redact "corp-api-prefix-"
```

## Classifier Sensitivity

```bash
opentraces config set --classifier-sensitivity low
opentraces config set --classifier-sensitivity medium
opentraces config set --classifier-sensitivity high
```

Higher sensitivity adds more heuristic flags (internal hostnames, deep file paths, identifier density).

---

# Scanning & Redaction

The security pipeline is context-aware and runs in two passes:

1. Scan the trace record field-by-field using the field type to decide whether entropy analysis is enabled.
2. Scan the final serialized JSONL bytes to catch anything introduced during enrichment or serialization.

## What Gets Scanned

| Field | Context | Notes |
|-------|---------|-------|
| `system_prompts` | General | Full scan |
| `task.description` | General | Full scan |
| `steps[].content` | General | Full scan |
| `steps[].reasoning_content` | Reasoning | Regex only, no entropy |
| `steps[].tool_calls[].input` | Tool input | Full scan for input-like tools, regex-only for result-like tools |
| `steps[].observations[].content` | Tool result | Regex only, no entropy |
| `steps[].observations[].output_summary` | Tool result | Regex only, no entropy |
| `steps[].observations[].error` | Tool result | Regex only, no entropy |
| `steps[].snippets[].text` | General | Full scan |
| `outcome.patch` | General | Full scan |
| `environment.vcs.diff` | General | Full scan, truncated before storage when the repo diff is very large |

The scanner also applies a second pass over the serialized JSONL output so redaction does not depend on field shape alone.

## What Gets Redacted

Detected secrets and path fragments are replaced with `[REDACTED]` or hashed path segments, depending on the detector:

```text
Before: export OPENAI_API_KEY=sk-abc123...
After:  export OPENAI_API_KEY=[REDACTED]
```

```text
Before: /Users/jay/src/project/...
After:  /Users/[REDACTED]/src/project/...
```

The staged JSONL is rewritten in place. Raw session files on disk are not modified.

## Tier 2 Classifier

Tier 2 adds a heuristic classifier on top of scanning and redaction. It flags:

- internal hostnames
- AWS account IDs in ARNs
- database connection strings
- internal collaboration URLs
- dense UUID / hash sequences
- deep file paths that may reveal internal structure

## Custom Redaction

```bash
opentraces config set --redact "INTERNAL_API_KEY"
opentraces config set --redact "corp-secret-prefix-"
```

Custom redaction strings are treated as literal matches wherever they appear in trace content.

---

# Agent Setup

opentraces is designed to be driven by agents as well as by humans.

## What The CLI Emits

Most commands emit structured JSON with `next_steps` and `next_command`, so an agent can chain the workflow without parsing free-form text.

## Setup by Agent

`opentraces init` installs the session hook for your agent and copies the bundled skill into `.agents/skills/opentraces/`. Pass `--agent` to specify which harness to connect.

### Claude Code

```bash
opentraces login
opentraces init --agent claude-code --review-policy review --start-fresh
```

If the repo already has existing session logs and you want them in the inbox immediately, switch `--start-fresh` to `--import-existing`.

### Hook Enrichment

For richer trace metadata, install the Claude Code session hooks:

```bash
opentraces hooks install
```

This registers two hooks in `.claude/settings.json`:
- **`on_stop`** — runs at session end, captures context window state, token usage, and project metadata
- **`on_compact`** (PostCompact event) — captures context compaction events for long sessions

Hooks fire automatically on every session — no further action needed after `hooks install`.

### Hermes

```bash
opentraces login
opentraces init --agent hermes --review-policy review --start-fresh
```

After setup, the current surfaces are:

- `opentraces web` for the browser inbox
- `opentraces tui` for the terminal inbox
- `opentraces session list` for machine-readable review
- `opentraces status` for the current inbox summary
- `opentraces context` for a compact project snapshot

## Hidden Capture Command

The hook calls the hidden `_capture` command with a specific session directory:

```bash
opentraces _capture --session-dir /path/to/session --project-dir /path/to/project
```

That is the internal batch entry point for agent automation and tests.

## Machine Discovery

Hidden commands are still available for automation:

```bash
opentraces capabilities --json
opentraces introspect
```

They expose versioning, feature discovery, and the current `TraceRecord` schema.

---

# CI/CD & Automation

Headless environments can use the same inbox and upload model, but the commands should be run explicitly.

## Authentication

```bash
export HF_TOKEN=hf_...
```

or:

```bash
opentraces login --token
```

`HF_TOKEN` is the preferred path in CI.

## Recommended Pattern

If you are running opentraces in automation, keep the steps explicit:

```bash
opentraces init --review-policy review --remote my-org/opentraces --no-hook
opentraces session list
opentraces session commit <trace-id>
opentraces commit --all
opentraces push --private
```

Your CI script should call `commit` and `push` directly.

## GitHub Actions

```yaml
- name: Install opentraces
  run: pip install opentraces

- name: Authenticate with Hugging Face
  env:
    HF_TOKEN: ${{ secrets.HF_TOKEN }}
  run: opentraces login --token

- name: Commit and push traces
  env:
    HF_TOKEN: ${{ secrets.HF_TOKEN }}
  run: |
    opentraces commit --all
    opentraces push --private
```

## Notes

- Use `--private` for proprietary codebases
- Use `--repo owner/dataset` if you want a shared team dataset
- If you need to capture a specific session directory yourself, wire the hidden `_capture` command into your own hook or runner

---

# Development

## Setup

```bash
git clone https://github.com/jayfarei/opentraces
cd opentraces
python3 -m venv .venv
source .venv/bin/activate
pip install -e packages/opentraces-schema
pip install -e ".[dev]"
```

## Optional Dependencies

```bash
pip install -e ".[web,tui]"       # Web and TUI inbox clients
pip install -e ".[release]"       # Build and publish tools (build, twine)
```

## Running Tests

```bash
make test                         # or: ./.venv/bin/pytest -q
make lint                         # ruff check
```

Some tests require real Claude Code session data and are skipped by default. To run them, set the env var pointing to your project's sessions directory:

```bash
export OPENTRACES_TEST_PROJECT_DIR=~/.claude/projects/<your-project-slug>
make test
```

The repository also has frontend test suites under `web/viewer/` and buildable docs under `web/site/`.

## Building and Releasing

The `Makefile` orchestrates the full build and release pipeline:

```bash
make build            # Build viewer, schema, and CLI packages
make version-check    # Show current versions
make release          # Full pipeline: test, lint, build, publish to PyPI, tag
```

The CLI version lives in `src/opentraces/__init__.py` (single source of truth). The schema version lives in `packages/opentraces-schema/src/opentraces_schema/version.py`.

## Project Structure

```
packages/opentraces-schema/   Schema package (Pydantic v2 models)
src/opentraces/               CLI package
  cli.py                      Click-based CLI entry point
  clients/                    TUI and Flask inbox clients
  parsers/                    Agent session parsers
  security/                   Secret scanning and anonymization
  enrichment/                 Git signals, attribution, metrics
  quality/                    Trace quality assessment and rubrics
  upload/                     Hugging Face upload helpers
tests/                        Python test suite
web/viewer/                   React inbox viewer (bundled in pip package)
web/site/                     Next.js docs and marketing site
Makefile                      Build, test, and release orchestration
Formula/                      Homebrew formula template
```

## Key Files

- `src/opentraces/cli.py` - CLI commands and hidden automation hooks
- `src/opentraces/clients/web_server.py` - Flask inbox server that serves the React viewer
- `src/opentraces/clients/tui.py` - Textual inbox client
- `src/opentraces/pipeline.py` - Enrichment and security pipeline
- `packages/opentraces-schema/src/opentraces_schema/models.py` - Pydantic schema models

## Adding A Parser

1. Create `src/opentraces/parsers/your_agent.py`
2. Implement the `SessionParser` protocol in `src/opentraces/parsers/base.py`
3. Register it in `src/opentraces/parsers/__init__.py`
4. Add tests under `tests/`

## Notes

- The current shipped parser is Claude Code
- The inbox workflow is `web/tui/session -> commit/reject/redact -> push`
- Hidden commands still exist for compatibility and automation, but the public docs should use `web`, `tui`, `session`, `commit`, and `push`
- The React viewer (`web/viewer/dist`) is bundled into the pip package via `force-include` in `pyproject.toml`

---

# Schema Changes

The opentraces schema is open source. Feedback, questions, and proposals are welcome via [GitHub Issues](https://github.com/jayfarei/opentraces/issues).

## How to Propose a Change

When suggesting a schema change, include:

1. **What** field or model you would add, change, or remove
2. **Why** it matters for your use case (training, analytics, attribution, etc.)
3. **How** it relates to existing standards (ATIF, Agent Trace, ADP, OTel) if applicable

## What Counts as Breaking

| Change | Version Bump |
|--------|-------------|
| New optional field | Minor |
| New optional model | Minor |
| Field rename | Major |
| Field removal | Major |
| Type change | Major |

See [Versioning](/docs/schema/versioning) for full policy.

## Adapter Contributions

To add support for a new agent (e.g., Cursor, Codex), implement the `BaseParser` interface:

```python
from opentraces.parsers.base import BaseParser

class CursorParser(BaseParser):
    agent_name = "cursor"

    def can_parse(self, path: Path) -> bool:
        # Return True if this path contains Cursor sessions
        ...

    def parse_session(self, path: Path) -> TraceRecord:
        # Parse a single session into a TraceRecord
        ...

    def discover_sessions(self) -> list[SessionInfo]:
        # Find all Cursor sessions on the system
        ...
```

Register the parser in `src/opentraces/parsers/__init__.py`. See `claude_code.py` for the reference implementation.

## Review Process

- Schema changes are reviewed by the maintainers
- Breaking changes require a new rationale document
- All changes are documented in the [CHANGELOG](https://github.com/jayfarei/opentraces/blob/main/packages/opentraces-schema/CHANGELOG.md)