Compute Client — unified contract across both planes¶
Status: Foundation (PR 1 of #785). PR 2 wires the toggle through the executor; PRs 3-4 add the extension surface on Browser-Use Plane. Owners: TBD Tracks issue: #785 (browser-use epic)
Summary¶
Mantis runs two compute planes:
- Computer Plane (#696) — Xvfb + Chrome + xdotool. CUA-pure: screenshot + key/mouse only. Stealth-capable.
- Browser-Use Plane (#785) — Chrome under Playwright/CDP-native control. DOM-aware:
state.*,tabs.*,links.*extensions.
Both planes implement the same ComputeClient base contract. Plane selection is a runtime flag (compute_backend), not a fork in the brain plane or handler set. DOM verbs are capability-gated extensions advertised at session_init and enforced via a per-executor CapabilityAllowlist.
This document is the umbrella spec. Plane-specific docs:
docs/reference/computer-plane.md— Computer Plane (existing).docs/reference/browser-use-plane.md— Browser-Use Plane (lands with PR 2).
Goals¶
- Single contract. A plan + handler set drives either plane. Switching is
compute_backend: computer_plane | browser_use_planeat submit time. - Capability gating. DOM-aware extensions are advertised at
session_initand only consumed by handlers whose executor allowlist permits them. Pure-CUA executors fail loud if they consume an extension — quiet degradation is explicitly avoided (feedback_cua_no_dom_access.md). - Uniform profile + proxy contracts. Same
(tenant_id, profile_id)identity on both planes. SameProxyConfigaccepted atsession_init. Storage is per-plane.
Non-goals¶
- Cross-plane profile handoff. v1 ships per-plane profiles; cross-plane handoff (package → release → mount) is a deferred follow-up gated on real demand (#785).
- Stealth parity across planes. Browser-Use Plane on CF-protected sites is not a v1 requirement; pure-CUA on Computer Plane stays the path for stealth-sensitive harvesting.
- Shared persistent volume across planes. Forces same-region/provider; ruled out unless mid-plan plane-switching becomes a hot path.
Two-plane layout¶
┌─────────────────────────┐
│ Brain Plane │
│ (executors + handlers) │
└────────┬────────────────┘
│ one ComputeClient contract
┌────────────┴────────────┐
│ │
▼ ▼
┌──────────────────────┐ ┌──────────────────────┐
│ Computer Plane │ │ Browser-Use Plane │
│ (CUA-pure) │ │ (DOM-aware) │
│ │ │ │
│ Xvfb + Chrome + │ │ Playwright / CDP- │
│ xdotool │ │ native Chrome │
│ │ │ │
│ capabilities: │ │ capabilities: │
│ dom_aware=False │ │ dom_aware=True │
│ stealth=True │ │ stealth=False(v1) │
└──────────────────────┘ └──────────────────────┘
Base surface (both planes implement)¶
| Method | Notes |
|---|---|
session_init(profile_id, proxy_config) → (token, Capabilities) |
Bind tenant + profile + run. Advertises capabilities. |
session_close(token) |
Tear down. |
screenshot() → (png, viewport) |
PNG + scroll_y + captured_at_ms. |
dispatch(action) |
Uniform action verb — click(x,y) / key / type / scroll. Each plane translates to its native primitive (xdotool on Computer Plane, Playwright page.mouse/keyboard on Browser-Use Plane). |
health() |
Liveness + last-action timestamp. |
The base surface is enforced by the umbrella. Adding a Computer-Plane-only verb that isn't a no-op stub on Browser-Use Plane breaks the toggle promise — file a separate issue if a divergence is genuinely needed.
Extension surface (Browser-Use Plane only, capability-gated)¶
| Verb | Issue | Capability |
|---|---|---|
state.current_url(), state.tabs(), state.focused_element(), state.clipboard(), state.page_load() |
#778 | dom_aware |
tabs.open_in_new(), tabs.close(), tabs.activate() |
#779 | dom_aware |
links.peek_target(selector) |
#780 | dom_aware |
Computer Plane's client refuses these methods at the contract level. Handlers gate on isinstance(client, SupportsBrowserState) AND allowlist.enforce("dom_aware") BEFORE each call.
Capabilities¶
Capabilities (src/mantis_agent/gym/compute_contract.py) is a frozen dataclass advertised at session_init:
@dataclass(frozen=True)
class Capabilities:
dom_aware: bool = False
stealth: bool = True
supports_cdp: bool = False
backend: ComputeBackend = ComputeBackend.COMPUTER_PLANE
Computer Plane returns Capabilities.for_computer_plane(enable_cdp=...). Browser-Use Plane returns Capabilities.for_browser_use_plane(). On the wire, Capabilities is serialized as a dict in SessionInitResponse.capabilities (Phase-0/Phase-1 servers that don't populate the field are interpreted as Computer-Plane CUA-pure via SessionInitResponse.resolved_capabilities()).
CapabilityAllowlist¶
CapabilityAllowlist is the enforcement seam. It is per-executor, configured at startup, and immutable for the lifetime of a run.
# Pure-CUA executor: NO DOM-aware extensions.
allowlist = CapabilityAllowlist.pure_cua(executor="run_holo3")
# Browser-use executor: DOM-aware extensions OK.
allowlist = CapabilityAllowlist.browser_use(executor="run_browser_use_claude")
# Inside a handler:
allowlist.enforce("dom_aware") # raises CapabilityNotAllowed for pure-CUA executors
Consuming a non-allowed capability raises CapabilityNotAllowed. Fail-loud is deliberate — quiet degradation is what feedback_cua_no_dom_access warns against.
compute_backend selection¶
A plan declares which plane it runs on under its runtime block:
runtime:
compute_backend: computer_plane # default — Xvfb + Chrome + xdotool, CUA-pure
# or
compute_backend: browser_use_plane # Playwright/CDP-native, DOM-aware extensions
The same field is also accepted as a submission-time argument (HTTP body / CLI flag). Precedence — highest wins:
- Plan
runtime.compute_backend(the plan author's choice — most local; HN URL-harvest plans setbrowser_use_plane). - Submission-time
compute_backend(operator override — e.g. running an existing plan on the other plane for A/B). - Global default —
computer_plane(Xvfb + xdotool stays the path for stealth-sensitive harvesting; pure-CUA is the safe default).
The wiring through executor + client factory lands in PR 2. At session start, the executor's CapabilityAllowlist is checked against the advertised Capabilities; mismatched runs fail fast at that boundary (not mid-plan).
When to pick which¶
Pick computer_plane if … |
Pick browser_use_plane if … |
|---|---|
| Target site is CF-protected / Turnstile-fronted | Plan needs DOM-aware reads (state.current_url, tab management) |
| Plan works with screenshot + key/mouse only | Plan needs to read anchor href before click |
You don't know — leave it out (default is computer_plane) |
List page has visually-similar links (semantic role disambiguation) |
What lands in PR 1 vs later¶
| PR | What |
|---|---|
| PR 1 (this) | Contract types — Capabilities, ComputeBackend, CapabilityAllowlist, extension Protocols, CapabilityNotAllowed. Wire-model SessionInitResponse.capabilities field (additive). Umbrella docs. No behavior change. |
| PR 2 | Browser-Use Plane scaffold — Playwright host + BrowserUsePlaneClient base impl + compute_backend toggle wired through executor + allowlist enforcement. |
| PR 3 | state.* extensions on Browser-Use Plane (#778). |
| PR 4 | tabs.* + links.* + target_role (#779, #780, #781). |
References¶
- Epic: #785 (browser-use support)
- Sibling spec:
docs/reference/computer-plane.md(#696) - Memory anchors:
feedback_cua_no_dom_access.md,feedback_browser_infra_rpc_surface.md,project_user_feedback_hn_url_collection_2026_06_07.md