docs / workflow / consume

Consume

How you load traces depends on what you're building.

Agents

hf-mount exposes any HuggingFace dataset as a virtual filesystem. The dataset appears as a directory of JSONL files — no library required, no full download. An agent can ls, grep, and read individual files the same way it would explore any local directory, which makes it well-suited for discovery: browsing shards, sampling traces, or writing code against the data without knowing its structure upfront.

Install:

curl -fsSL https://raw.githubusercontent.com/huggingface/hf-mount/main/install.sh | sh

Mount and explore:

hf-mount start repo datasets/your-org/agent-traces /mnt/traces
ls /mnt/traces/data/
# traces_20240101_abc123.jsonl  traces_20240102_def456.jsonl  ...

Once mounted, read a single record to understand the schema:

head -n 1 /mnt/traces/data/traces_20240101_abc123.jsonl | python3 -m json.tool | head -40

Which returns a TraceRecord — a representative subset of fields looks like:

{
  "schema_version": "0.2.0",
  "trace_id": "tr_01abc...",
  "session_id": "sess_xyz...",
  "execution_context": "devtime",
  "agent": { "name": "claude-code", "model": "anthropic/claude-sonnet-4-20250514" },
  "task": { "description": "Fix failing tests in auth module", "repository": "org/repo" },
  "outcome": { "success": true, "committed": true, "commit_sha": "a1b2c3d" },
  "metrics": { "total_steps": 14, "total_input_tokens": 48200, "estimated_cost_usd": 0.031 },
  "steps": [ "..." ]
}

Stream shards line by line — don't slurp whole files into memory:

import json, pathlib

for path in pathlib.Path("/mnt/traces/data").glob("traces_*.jsonl"):
    with open(path) as f:
        for line in f:
            if not line.strip():
                continue
            record = json.loads(line)
            outcome = record.get("outcome") or {}
            if outcome.get("success") and record.get("execution_context") == "devtime":
                print(record["trace_id"], record["metrics"]["total_steps"])

For private or gated datasets, authenticate first:

huggingface-cli login

Unmount when done:

hf-mount stop /mnt/traces

Developers and ML teams

Use the HuggingFace datasets library for structured access, pandas, or PyTorch.

=== "pandas"

```python
from datasets import load_dataset

ds = load_dataset("your-org/agent-traces")
df = ds["train"].to_pandas()

# Filter to successful devtime traces — outcome is a dict column, guard for nulls
good = df[
    df["execution_context"] == "devtime"
].copy()
good = good[good["outcome"].apply(lambda o: bool(o) and o.get("success"))]
```

=== "PyTorch"

```python
from datasets import load_dataset

ds = load_dataset("your-org/agent-traces")
# Note: nested fields like steps and outcome are not tensors.
# Extract the scalar signals you need before formatting.
flat = ds["train"].map(lambda x: {"success": (x["outcome"] or {}).get("success", False)})
flat.with_format("torch", columns=["success"])
```

=== "Streaming"

```python
from datasets import load_dataset

ds = load_dataset("your-org/agent-traces", streaming=True)
for trace in ds["train"]:
    print(trace["trace_id"], trace["metrics"]["total_steps"])
```

Choosing an approach

Use hf-mount for free-form exploration or when the consumer reads files with standard tool calls. Use the datasets library for notebooks or training pipelines.

Schema reference

Each JSONL line is a TraceRecord. See the schema overview for field definitions, and outcome & attribution for RL reward signals.