Skip to content

Reference

What it covers
HTTP API Every endpoint, request body, response shape, error code
Architecture Internal architecture: brains, runner, env, gym, verification
CUA models Available cua_model backends (Holo3, Fara, EvoCUA, OpenCUA, Gemma4-CUA, Claude) with action-space caveats
Environment variables Server-side env knobs (caps, paths, log format, model routing)
Glossary Quick definitions of project-specific terms
Predicate grammar World-model verification predicates emitted by brains and evaluated by the runner
Done-acceptance gate Deterministic predicates the runner applies before accepting done(success=True)
Form controller Single object owning runtime form-filling state — pending values, used regions, submit latch, director hooks
Adaptive settle Replaces fixed time.sleep(settle_time) with frame-stability / network-idle gates
Chrome session reuse Container-scoped cache that reuses live Xvfb + Chrome across /v1/cua requests
Speculative inference Overlaps brain.think() with the post-action settle to remove the serial inference cost
Perceptual diff verifier Detects silent failures on high-risk actions by comparing pre/post frame hashes
Loop recovery policy Forces action-class transitions (Tab / Return / Type) when the brain loops on a no-effect class
Step recovery Multi-layer recovery chain when a required step exhausts retries: handler escalation → intent_rewriter (Opus) → agentic_recovery (Haiku/Opus, four modes)
Adaptive loop windows Per-call adjustment of the soft/hard loop-detection windows by recent action diversity + state progress
Adaptive click tolerance Drift-loop tolerance scaled by screen DPI and per-action element class (button / link / dropdown)
Ablation harness Single-deploy paired ON/OFF A/B against /v1/cua; required for quality-touching PRs