docs / security / tiers

Security Tools

Security in opentraces is a flat optional tool registry. Tools are not enabled by default for per-record sanitization. A bucket flow or workflow opts in to the tools it wants, either explicitly with --tools or by enabling tools in config and running with --use-config.

Current pipeline version: SECURITY_VERSION = 0.6.0.

opentraces security tools list
opentraces security tools info regex --json
opentraces doctor --security

Registered Tools

ToolKindDefaultNotes
regexdetectoroffBuilt-in token/key patterns
entropydetectoroffHigh-entropy secret-like strings
trufflehogdetectoroffOptional deep secret detector, configured by opentraces setup trufflehog
privacy_filterdetectoroffOptional openai/privacy-filter NER detector, configured by opentraces setup privacy-filter
llm_piidetectoroffAdvanced per-field LLM PII detector; configure security.llm_pii directly before enabling
business_logicdetectoroffPromotes internal-hostname / internal-url / db-connection-string / aws-account-id heuristics to redactable spans; always-on inside the capsule REDACTION_FLOOR
path_anonymizertransformeroffRewrites usernames in filesystem paths
capsule_scopetransformeroffApplies field-path exclusion of prompt-bearing fields (the "this never leaves" guarantee); default-disabled in normal ingest
classifierjudgeoffHeuristic sensitivity verdict without mutating the record

Canonical order is deterministic:

regex -> entropy -> trufflehog -> privacy_filter -> llm_pii -> business_logic -> path_anonymizer -> capsule_scope -> classifier

Explicit --tools requests are sorted into that order so a judge never sees pre-redaction text because of caller ordering.

Running Sanitization

security sanitize reads JSON from stdin and requires one selection mode:

printf '%s\n' '{"text":"OPENAI_API_KEY=sk-demo"}' \
  | opentraces security sanitize --tools regex

printf '%s\n' '{"row":{"path":"/Users/alice/project"}}' \
  | opentraces security sanitize --tools path_anonymizer

printf '%s\n' '{"record":{...}}' \
  | opentraces security sanitize --use-config

Payload shapes:

ShapeBehavior
{"text": "..."}Runs detector tools over one string and returns sanitized text + findings
{"row": {...}}Runs detector tools over string leaves in a JSON row
{"record": {...TraceRecord...}}Runs the full record path and stamps metadata

Metadata Contract

Sanitized records get a compact tool report under metadata.security:

{
  "metadata": {
    "security": {
      "tools_applied": ["regex", "entropy"],
      "tools": {
        "regex": { "findings": 1 },
        "entropy": { "findings": 0 }
      }
    }
  }
}

If no tools run, tools_applied is [] and stale per-tool metadata is cleared.

Setup Commands

opentraces setup trufflehog
opentraces setup trufflehog --enable
opentraces setup trufflehog --disable
opentraces setup privacy-filter --enable --install-deps
opentraces setup privacy-filter --disable
opentraces setup llm-review
opentraces setup llm-review --disable

llm-review is intentionally separate from the inline tool registry. It is a session or row reviewer used by dataset publication gates, not a per-record sanitize tool.

Bucket Security Policies

bucket security is a scoped front-end that bundles the registry tools above into a few named policies for the private bucket. security tools list|info and security sanitize stay the generic registry surface; bucket security just applies a named bundle of the same cfg.security.<tool>.enabled flags.

opentraces bucket security status
opentraces bucket security policy --policy recommended
opentraces bucket security policy --tool regex --enable
opentraces bucket security policy --tool entropy --disable
PolicyTools
off(nothing)
basicregex, entropy
recommendedregex, entropy, business_logic, path_anonymizer, classifier
strictregex, entropy, trufflehog, privacy_filter, business_logic, path_anonymizer, classifier

bucket security status inspects the active policy and tools. bucket security policy --policy accepts only off|basic|recommended|strict. This bucket policy vocabulary is unrelated to the dataset run --privacy-tier off|low|medium|high field.

Review And Publication

Dataset review remains explicit:

opentraces dataset review my-dataset --json
opentraces dataset review approve my-dataset <row-id>
opentraces dataset publish my-dataset --check-only
opentraces dataset publish my-dataset

Rows that have not been approved are not published. Workflows that require additional privacy checks should run security sanitize or configure llm-review before approval.