Browser-Use Plane — DOM-aware companion to Computer Plane¶
Status: Scaffold landed (PR 2 of #785). PR 3-4 add the DOM-aware extension surface (state.*, tabs.*, links.*).
Owners: TBD
Tracks issue: #785
Summary¶
Browser-Use Plane is the second Mantis compute plane — the DOM-aware companion to Computer Plane. Chrome runs under Playwright / CDP-native control (no Xvfb, no xdotool). Both planes implement the same base ComputeClient contract; what differs is the dispatch primitives and the extension verbs Browser-Use Plane admits.
| Computer Plane | Browser-Use Plane | |
|---|---|---|
| Driver | Xvfb + xdotool | Playwright (headless Chromium by default) |
| Capabilities | dom_aware=False, stealth=True |
dom_aware=True, stealth=False (v1) |
| Dispatch | raw xdotool argv |
structured action verbs (click/key/type/scroll) |
| Profile storage | Chrome --user-data-dir blob |
Playwright userDataDir + storageState |
| CF / Turnstile parity | yes | non-goal at v1 |
Pick computer_plane (the default) for stealth-sensitive harvesting; pick browser_use_plane when the plan needs DOM-aware reads (tab management, anchor href peek, semantic click role disambiguation).
Wire contract¶
Mirrors docs/reference/computer-plane.md in shape; differs in the dispatch verb.
Base surface (both planes implement)¶
| Method | Path | Required | Notes |
|---|---|---|---|
POST |
/session/init |
yes | Bind (tenant_id, profile_id, run_id). Advertises Capabilities (PR 1). Idempotent on run_id. |
POST |
/session/close |
yes | Tear down browser context + Playwright runtime. |
POST |
/screenshot |
yes | PNG base64 + viewport metadata. |
POST |
/dispatch |
yes | Structured action verb (click/key/type/scroll) + step_id. Server keeps a TTL-bounded LRU and returns deduplicated=true on retry. |
GET |
/health |
yes | Liveness + last-action timestamp. |
state.* extensions (PR 3, #778) — Browser-Use Plane only¶
Capability-gated behind dom_aware. Computer Plane does NOT expose these endpoints.
| Method | Path | Notes |
|---|---|---|
GET |
/state/current_url |
Active tab URL. |
GET |
/state/tabs |
Tab list with id/title/url/is_active. |
GET |
/state/focused_element |
tag/role/aria_label/text/href — null when nothing focused. |
GET |
/state/clipboard |
Best-effort navigator.clipboard.readText(); empty string on permission denial. |
GET |
/state/page_load |
document.readyState (loading/interactive/complete) + last_resource_ms. |
POST |
/state/safe_back |
History pop with overshoot guard against a pinned_origin pattern. Idempotent on step_id. |
tabs.* extensions (PR 4, #779) — Browser-Use Plane only¶
Capability-gated behind dom_aware. Each method is idempotent on step_id.
| Method | Path | Notes |
|---|---|---|
POST |
/tabs/open_in_new |
Open new tab via url (direct) or via_selector (modifier-aware click → popup race). Returns stable tab_id. |
POST |
/tabs/close |
Close by tab_id. Reaps the active page; falls back to another open page so sess.page stays valid. |
POST |
/tabs/activate |
page.bring_to_front() + set sess.page. Returns the activated tab's URL. |
links.peek_target (PR 4, #780) — Browser-Use Plane only¶
| Method | Path | Notes |
|---|---|---|
POST |
/links/peek_target |
Read anchor href without clicking. Accepts selector (CSS) or bbox ([x1,y1,x2,y2] — vision-grounded). Walks up to nearest <a> for bbox hits. Returns {href, target, tag}. |
Semantic click disambiguation (PR 4, #781)¶
Plan-level target_role field maps the intended click target (title / comment_count / author / ...) to a stable CSS selector via a per-site recipe. See docs/recipes/news_ycombinator_com.md for the HN reference recipe and the canonical "collect outbound URLs" plan that exercises capture_link_in_new_tab.
Vision fallback is intentional when target_role resolves to no recipe entry — keeps plan authors' intent observable rather than crashing pure-CUA-only plans.
Pydantic wire models¶
Defined in src/mantis_agent/gym/browser_use_wire.py:
BrowserUseSessionInitRequest/BrowserUseSessionInitResponseBrowserUseSessionCloseRequest/BrowserUseSessionCloseResponseBrowserUseScreenshotResponseDispatchActionRequest/DispatchActionResponseBrowserUseHealthResponse
Capabilities¶
session/init returns Capabilities.for_browser_use_plane():
Capabilities(
dom_aware=True,
stealth=False, # explicit non-goal at v1
supports_cdp=True, # Playwright IS CDP under the hood
backend=ComputeBackend.BROWSER_USE_PLANE,
)
Architecture¶
┌──────────────────────────────────────────────────────┐
│ Modal app "mantis-browser-use" (NEW, this PR) │
│ │
│ ┌──────────── BROWSER-USE PLANE ────────────┐ │
│ │ Browser-Use Agent FastAPI: │ │
│ │ POST /session/init │ │
│ │ POST /session/close │ │
│ │ POST /screenshot │ │
│ │ POST /dispatch │ │
│ │ GET /health │ │
│ │ Playwright + Chromium (headless) │ │
│ └───────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────┘
▲
│ HTTPS
│
┌──────────────────────────────────────────────────────┐
│ Modal app "mantis-cua-server" (UNCHANGED) │
│ │
│ Brain plane → run_browser_use executor → │
│ BrowserUsePlaneClient ───────────────┘ │
└──────────────────────────────────────────────────────┘
Two Modal apps. Independent deploy cadence — redeploying the brain doesn't redeploy Browser-Use Plane and vice versa.
Image (Modal)¶
mcr.microsoft.com/playwright-python:v1.49.0-jammy — bundles Chromium + the Playwright Python SDK pinned to the same version. CPU-only (no GPU, no Xvfb, no xdotool). ~2 GB.
Locale + TZ matched to Computer Plane so screenshot-comparison and date-format-sensitive sites behave identically across planes.
Deploy¶
Reads the function URL after deploy:
Brain-plane integration¶
src/mantis_agent/run_browser_use.py exports run_browser_use_executor(...) — the executor entry point. PR 2 leaves it un-wired from modal_cua_server.py's cua_model dispatch table; PR 3 wires it once the DOM-aware extensions make the surface useful for plan authors.
Configure plan-level via runtime.compute_backend: browser_use_plane:
runtime:
compute_backend: browser_use_plane
steps:
- intent: "Open Hacker News"
type: navigate
url: https://news.ycombinator.com
The resolver in src/mantis_agent/gym/compute_backend_resolver.py reads this; the factory in src/mantis_agent/gym/compute_factory.py dispatches to the right client. Default remains computer_plane.
Capability enforcement¶
run_browser_use configures CapabilityAllowlist.browser_use(executor="run_browser_use") — admits dom_aware + supports_cdp. Pure-CUA executors (run_claude_cua, run_holo3, etc.) use CapabilityAllowlist.pure_cua() and will raise CapabilityNotAllowed if a handler tries to consume DOM-aware extensions against them even when the client speaks Browser-Use Plane.
The mismatch check runs at session start (run_browser_use._validate_executor_compat) — failures here happen before any browser action, not mid-plan.
Profile + proxy at v1¶
Profiles are per-plane. The same (tenant_id, profile_id) identity exists on both planes but storage is independent. Layout (this plane):
Computer Plane keeps its existing layout at /data/chrome-profile/.... Both volumes mounted; no shared bytes. Cross-plane profile handoff is the deferred follow-up gated on real demand (#785).
Proxy is passed at session/init and forwarded to Playwright's launch({proxy: {server}}). Same PrivateProxy creds as Computer Plane.
Non-goals (v1)¶
- Stealth on CF-protected sites — use Computer Plane for those.
- Cross-plane profile handoff — deferred follow-up.
- Concurrent sessions per container — single-session pinned at v1, matching Computer Plane's posture.
Open questions¶
- Async vs sync Playwright. v1 uses
sync_playwrightfor simplicity. Switching toasync_apiis a non-trivial refactor — defer until concurrency pressure makes it worth it. - Session pool sizing. Same as Computer Plane: pin to one container at v1, revisit after a real workload.
References¶
- Umbrella contract:
docs/reference/compute-client.md - Sibling plane spec:
docs/reference/computer-plane.md - Epic: #785
- PR 1 (foundation): #786