augur-sdk — normative spec¶
Status: Draft, v0.1. Audience: Implementers of the Augur SDK and anyone integrating a CUA with it. Purpose: Pin down the contract the SDK exposes — public surface, behavioural guarantees, and version boundaries — so coding agents, framework owners, and downstream tools can rely on it without reading the source.
This document uses RFC-2119 keywords (MUST, SHOULD, MAY).
The umbrella product spec (architecture, the full debugger surface, non-goals) is maintained alongside the hosted Augur service and is not publicly mirrored. This file is the SDK-only slice — the contract the public client exposes.
1. What the SDK is¶
A Python library that an authoring CUA imports to:
- Capture a per-run trace at a configurable level of detail.
- Redact sensitive material before it lands on disk or hits the network.
- Bundle captured records onto disk in a path-stable, schema-validated layout.
- Stream captured records to an Augur server (Sentry-style DSN), when configured. Streaming is observe-only on top of (1)–(3) — the local bundle is the canonical artefact; the network sink is an optimisation.
The SDK does NOT:
- Provide a UI (that's the Augur viewer, upstream).
- Provide a CLI (that's
augur-cli, upstream). - Make decisions for the CUA. The SDK never reads back from the server during a run to influence the agent's behaviour. (See §6.)
2. CUA contract preservation¶
The SDK MUST NOT promote DOM-derived coordinates to runtime inputs. Per
the umbrella spec §4, runtime action selection is screenshot-grounded.
Adapters MAY emit DOM/CDP records as provenance: "dom" for diagnostic
evidence, but the SDK will not interpret them as the agent's real
target. The diagnostic-rules engine (augur_sdk.diagnostics) flags any
rule that treats a non-screenshot provenance as a runtime input.
3. Public surface (the supported API)¶
These are the symbols the SDK considers public. Stable across patch versions; may break across minor versions while pre-1.0; strictly additive across minors post-1.0.
Top-level (augur_sdk)¶
from augur_sdk import (
DebugSession,
CaptureMode, resolve_capture_mode,
DefaultRedactionPolicy, RedactionPolicy, DEFAULT_POLICY_ID,
LocalFSStore, S3Store, Store,
Adapter,
ModelApiAdapterBase, # since 0.1.7
SUPPORTED_SCHEMA_RANGE, __version__,
)
DebugSession public methods:
attach_observation, record_step, record_event, set_status,
add_tag, set_capture_mode (since 0.1.3), append_log (since 0.1.3),
attach_verifier (since 0.1.5), set_score / set_costs /
set_step_costs / record_modelio (since 0.1.8), close.
Models (augur_sdk.models)¶
Every record type the SDK accepts or produces is a TypedDict:
DebugSession,StepTrace,Observation,DecisionEvent,ReplayFixture,DiagnosticFinding,BundleManifest,BundleTrace.
Enum types are Literals — implementers can use them at the boundary
without runtime imports.
Diagnostics (augur_sdk.diagnostics)¶
BundleContext,Rule,RuleResult,RulesEngineload_pack(name: str) -> list[Rule]rule(rule_id: str, *, severity: str = "medium")decoratorCUA_RULES— the generic rule pack
Streaming (augur_sdk.streaming)¶
DSN— parserStreamingSink— best-effort POSTs
Anything not listed here is private. Names starting with _ are
private regardless.
4. Behavioural guarantees¶
The SDK MUST behave as follows. These are the only invariants downstream consumers may rely on.
4.1 Bundle writes are atomic per file¶
Every file in a bundle is written via <file>.tmp + os.replace. A
reader observing the bundle mid-run never sees a partial file.
4.2 The bundle is always written¶
If capture_mode != "off", the bundle is written to out_dir even when:
- Streaming is disabled.
- The network is unreachable.
- The DSN's api_key is rejected.
- The Augur server returns 4xx/5xx.
The bundle is the source of truth; streaming is observe-only.
4.3 Streaming failures are never fatal¶
The SDK MUST NOT raise on network errors. The streaming sink runs on background threads and logs failures at DEBUG; never WARN or higher. This means an instrumented CUA cannot be made to crash by toggling a firewall rule.
4.4 Heartbeat cadence¶
When dsn is configured, the SDK MUST emit one initial
session_opened heartbeat as soon as the StreamingSink is constructed
(i.e. on DebugSession(dsn=…)), so the server's connection list shows
the client before any step has been recorded. While the session is
open, the SDK then heartbeats at intervals of 15 seconds ± 1 s.
Operators set the server's heartbeat window via
AUGUR_HEARTBEAT_WINDOW_S (default 60 s).
4.5 No reads from the server during a run¶
The SDK MUST NOT issue HTTP GET against the Augur server while a
session is open. Live attach (Phase 4) introduces SSE/WS reads in a
viewer-side path; the SDK side stays write-only.
4.6 Redaction runs before bytes leave the SDK¶
Both local bundle writes and streaming POSTs run through the active
RedactionPolicy before serialization. There is no opt-out path.
4.7 Schema validation on close¶
session.close() MUST validate every record against the canonical
schemas under augur_sdk._schema/json/. Validation failure raises;
this is the one place the SDK is allowed to be loud (the alternative
is to ship a broken bundle).
4.8 Path stability¶
Every file in a bundle is at a path predictable from the step index (or constant for top-level files). See [docs/concepts/bundle-layout.md].
4.9 No side-effecting imports¶
import augur_sdk MUST NOT open files, sockets, or threads. Side
effects begin at DebugSession(...) construction (network heartbeat
and thread spawn, when dsn is set) and at DebugSession.__enter__
(bundle directory creation, periodic heartbeat loop).
4.10 Thread safety¶
DebugSession.record_step, record_event, and attach_observation
MUST be safe to call from multiple threads on the same session. The SDK
uses an internal threading.Lock around the EventRecorder.
4.11 Mid-run capture-mode override (since 0.1.3)¶
DebugSession.set_capture_mode(mode) MUST NOT mutate the manifest's
capture_mode. Instead, every subsequent record_step gets an
explicit capture_mode field stamped on the StepTrace, per the
optional field on step_trace.schema.json. A capture_mode already
present on the StepTrace passed by the caller MUST NOT be overwritten.
The override persists for the rest of the session unless changed again
by another set_capture_mode call.
4.12 append_log is streaming-only (since 0.1.3)¶
DebugSession.append_log(text, …) MUST be a no-op when streaming is
disabled — the bundle on disk is the source of truth for local logs,
and producers that want them write to logs/ directly via the
configured Store. With streaming enabled, the SDK POSTs to
/api/v1/runs/<run_id>/logs with a JSON body; the server routes the
chunk to logs/<name>.log (or logs/step-<idx>.log when
step_index is set).
4.13 attach_verifier post-hoc verdict (since 0.1.5)¶
DebugSession.attach_verifier(step_index, status=…, …) MUST replace
the target step's verdict field in-place. The previous verdict is
discarded — precedence is documented as
native verdict > attach_verifier > inferred default, i.e. an
external harness opting in to attach_verifier is asserting authority
over the producer's classification. When streaming is enabled, the
SDK MUST echo the patched step through the live sink (so viewers see
the updated verdict without waiting for close()); when streaming is
off, the patched verdict is written to disk on close(). Raises
ValueError if no step exists at step_index.
4.14 Training-data substrate schemas (since 0.1.6)¶
The vendored JSON Schemas gain four additive surfaces so a single bundle can serve both debugging and SFT/DPO/RL training pipelines:
modelio.schema.json(new short namemodelio): canonical one-model-call record (request,response,usage,prompt_hash, …). Producers MUST write these under<bundle>/modelio/<step_index:04d>-<layer>-<seq>.jsonwhencapture_modeismodel_ioorfull; lower modes MAY omit the directory entirely.step_trace.costs+step_trace.latency(optional): per-step token counts, USD breakdown, and per-layer wall-clock.debug_session.costs(optional): session-level rollup of the above.verdict.{score, score_components, comparator}(optional): continuous reward signal (0..1) alongside the categoricalstatus. Default mapping whenscoreis absent:passed → 1.0,recoverable → 0.5,failed|skipped|unknown → 0.0.preference.schema.json(new short namepreference): DPO / RLHF preference record stored at<bundle>/preferences/<step_index:04d>.json. Decoupled from the immutable trace so raters MAY add comparisons after the run completes; the schema requirespreferred_action+alternatives[](each with anactionand optionalreward_estimate) and accepts acomparatorofverifier | model-judge | human-rater | replay-diff.prior_steps.schema.json(new short nameprior_steps, since 0.1.9): sibling ofreplay_fixturewritten to<bundle>/replay/<step_index:04d>.prior.jsonwhen a fixture'sprior_stepsis non-null. Compact list of the steps that preceded the fixture's target — intentionally a thin subset ofstep_trace(action + verdict + intent only) so the replay agent sees the same context the original run had without inheriting observations/costs. Resolves a gap surfaced during the upstream augur#55 v1.0 freeze review.
Every field is additive and optional; all 0.1.x bundles produced by
prior SDK versions continue to validate. Producer-side helpers
(attach_costs, attach_score, record_modelio) ship in 0.2.0;
until then, adapters MAY populate the new fields directly on the
StepTrace dict they pass to record_step().
4.16 Training-data producer-side helpers (since 0.1.8)¶
The 0.1.6 schema substrate (modelio, costs, verdict.score) gains producer-side SDK helpers so adapters don't have to mutate session internals to emit the new fields:
DebugSession.set_costs(...)andset_step_costs(step_index, ...)MUST be merge-on-call (later kwargs win; unset dimensions are preserved).set_costssurfaces on both the session record and the manifest (manifest.json#/costs);set_step_costspatches the step'scostsobject on disk.DebugSession.set_score(step_index, score, *, comparator, components)MUST merge into the existing verdict (the categoricalstatusis preserved);scoreMUST be clamped to[0.0, 1.0];comparator, when set, MUST be one ofverifier | model-judge | exact-match | human. Default status→score mapping for absent score remains the consumer's responsibility (passed→1.0, recoverable→0.5, failed/skipped/ unknown→0.0).DebugSession.record_modelio(record, *, step_index, layer, validate=True)MUST validaterecordagainstmodelio.schema.json(Draft 2020-12) unlessvalidate=False, apply the session'sRedactionPolicybefore persistence (stampingredaction_applied: true), and be idempotent on the record'sprompt_hash— repeat calls with the same hash MUST return the existing bundle-relative path without staging a duplicate file. Path convention:modelio/<step_index:04d>-<layer>-<seq>.jsonfor step-scoped calls,modelio/run-<layer>-<seq>.jsonfor run-scoped (step_index=None);<seq>is a monotonic per-(step, layer) counter starting at 0.
Step → modelio linkage is path-based, not field-based:
step_trace.schema.json carries no modelio_refs field. Consumers
locate the model calls for step N by globbing modelio/N*.json.
4.15 ModelApiAdapterBase scaffolding (since 0.1.7)¶
augur_sdk.ModelApiAdapterBase is shared scaffolding for adapters
whose native format is a message log (OpenAI Responses, Anthropic
Messages, similar). The base provides
bundle_from_input(input_path, output_dir, *, screens_dir=None, run_id=None)
— the canonical entry point the augur.adapters group looks for —
and resolves it through two subclass hooks:
iter_tool_calls(messages) -> Iterator[(turn_idx, tc_idx, action_dict)](MUST override): walks the message log in dispatch order and yields one canonical-shaped action per tool call.load_messages(path) -> (messages, metadata)(default reads a single JSON file with{"messages": [...], "metadata": {...}}): override for vendor-specific layouts (jsonl, split metadata, etc.).
Sidecar screenshots are looked up via find_screenshot(screens_dir,
step_index, kind); the default template is
{step_index:04d}_{kind}.png. Steps the base writes default to
status="succeeded" with verdict={"status": "unknown", "reason":
"no native verifier"} — external harnesses MAY refine this via
DebugSession.attach_verifier (§4.13).
The base MUST produce a bundle that passes validate_bundle() for
every conforming subclass. Reference subclasses ship in separate
user-org packages (augur-adapter-openai-cua,
augur-adapter-anthropic-cua) to keep this SDK vendor-free.
5. Version policy¶
- The SDK follows semver per
MAJOR.MINOR.PATCH. - Pre-1.0: minors MAY break public surface but MUST ship a migration
note in
CHANGELOG.md. - Post-1.0: minors are strictly additive; only majors break.
__version__andSUPPORTED_SCHEMA_RANGEare introspectable at runtime.- The bundled JSON Schemas under
augur_sdk._schema/json/track the upstream schemas at the version pinned inSUPPORTED_SCHEMA_RANGE.
6. Non-goals (don't ask the SDK for these)¶
The SDK does not:
- Render the UI (use the Augur viewer, upstream).
- Provide a CLI (use
augur-cli, upstream). - Implement an HTTP server (use
augur-server, upstream). - Manage tenants, users, or DSNs (those are operator concerns; mint
DSNs via
augur admin dsn-issueupstream). - Run a coding agent (use Claude Code / Cursor; this SDK produces the bundle they read).
- Schedule, retry, or replay runs (those are framework concerns).
- Validate adapter code (see the contract test suite upstream).
7. Status¶
- Schema version:
0.1 - SDK version:
0.1.9 - Supported Python: 3.11, 3.12, 3.13
- Runtime deps:
jsonschema,referencing,urllib3
Anything beyond 0.1.x may break this spec; check CHANGELOG.md.