Skip to content

Browser-Use Plane — DOM-aware companion to Computer Plane

Status: Scaffold landed (PR 2 of #785). PR 3-4 add the DOM-aware extension surface (state.*, tabs.*, links.*). Owners: TBD Tracks issue: #785

Summary

Browser-Use Plane is the second Mantis compute plane — the DOM-aware companion to Computer Plane. Chrome runs under Playwright / CDP-native control (no Xvfb, no xdotool). Both planes implement the same base ComputeClient contract; what differs is the dispatch primitives and the extension verbs Browser-Use Plane admits.

Computer Plane Browser-Use Plane
Driver Xvfb + xdotool Playwright (headless Chromium by default)
Capabilities dom_aware=False, stealth=True dom_aware=True, stealth=False (v1)
Dispatch raw xdotool argv structured action verbs (click/key/type/scroll)
Profile storage Chrome --user-data-dir blob Playwright userDataDir + storageState
CF / Turnstile parity yes non-goal at v1

Pick computer_plane (the default) for stealth-sensitive harvesting; pick browser_use_plane when the plan needs DOM-aware reads (tab management, anchor href peek, semantic click role disambiguation).

Wire contract

Mirrors docs/reference/computer-plane.md in shape; differs in the dispatch verb.

Base surface (both planes implement)

Method Path Required Notes
POST /session/init yes Bind (tenant_id, profile_id, run_id). Advertises Capabilities (PR 1). Idempotent on run_id.
POST /session/close yes Tear down browser context + Playwright runtime.
POST /screenshot yes PNG base64 + viewport metadata.
POST /dispatch yes Structured action verb (click/key/type/scroll) + step_id. Server keeps a TTL-bounded LRU and returns deduplicated=true on retry.
GET /health yes Liveness + last-action timestamp.

state.* extensions (PR 3, #778) — Browser-Use Plane only

Capability-gated behind dom_aware. Computer Plane does NOT expose these endpoints.

Method Path Notes
GET /state/current_url Active tab URL.
GET /state/tabs Tab list with id/title/url/is_active.
GET /state/focused_element tag/role/aria_label/text/href — null when nothing focused.
GET /state/clipboard Best-effort navigator.clipboard.readText(); empty string on permission denial.
GET /state/page_load document.readyState (loading/interactive/complete) + last_resource_ms.
POST /state/safe_back History pop with overshoot guard against a pinned_origin pattern. Idempotent on step_id.

tabs.* extensions (PR 4, #779) — Browser-Use Plane only

Capability-gated behind dom_aware. Each method is idempotent on step_id.

Method Path Notes
POST /tabs/open_in_new Open new tab via url (direct) or via_selector (modifier-aware click → popup race). Returns stable tab_id.
POST /tabs/close Close by tab_id. Reaps the active page; falls back to another open page so sess.page stays valid.
POST /tabs/activate page.bring_to_front() + set sess.page. Returns the activated tab's URL.

links.peek_target (PR 4, #780) — Browser-Use Plane only

Method Path Notes
POST /links/peek_target Read anchor href without clicking. Accepts selector (CSS) or bbox ([x1,y1,x2,y2] — vision-grounded). Walks up to nearest <a> for bbox hits. Returns {href, target, tag}.

Semantic click disambiguation (PR 4, #781)

Plan-level target_role field maps the intended click target (title / comment_count / author / ...) to a stable CSS selector via a per-site recipe. See docs/recipes/news_ycombinator_com.md for the HN reference recipe and the canonical "collect outbound URLs" plan that exercises capture_link_in_new_tab.

Vision fallback is intentional when target_role resolves to no recipe entry — keeps plan authors' intent observable rather than crashing pure-CUA-only plans.

Pydantic wire models

Defined in src/mantis_agent/gym/browser_use_wire.py:

  • BrowserUseSessionInitRequest / BrowserUseSessionInitResponse
  • BrowserUseSessionCloseRequest / BrowserUseSessionCloseResponse
  • BrowserUseScreenshotResponse
  • DispatchActionRequest / DispatchActionResponse
  • BrowserUseHealthResponse

Capabilities

session/init returns Capabilities.for_browser_use_plane():

Capabilities(
    dom_aware=True,
    stealth=False,           # explicit non-goal at v1
    supports_cdp=True,       # Playwright IS CDP under the hood
    backend=ComputeBackend.BROWSER_USE_PLANE,
)

Architecture

┌──────────────────────────────────────────────────────┐
│  Modal app "mantis-browser-use"  (NEW, this PR)      │
│                                                       │
│  ┌──────────── BROWSER-USE PLANE ────────────┐       │
│  │ Browser-Use Agent FastAPI:                │       │
│  │   POST /session/init                      │       │
│  │   POST /session/close                     │       │
│  │   POST /screenshot                        │       │
│  │   POST /dispatch                          │       │
│  │   GET  /health                            │       │
│  │ Playwright + Chromium (headless)          │       │
│  └───────────────────────────────────────────┘       │
└──────────────────────────────────────────────────────┘
                       │ HTTPS
┌──────────────────────────────────────────────────────┐
│  Modal app "mantis-cua-server"  (UNCHANGED)          │
│                                                       │
│  Brain plane → run_browser_use executor →            │
│    BrowserUsePlaneClient ───────────────┘            │
└──────────────────────────────────────────────────────┘

Two Modal apps. Independent deploy cadence — redeploying the brain doesn't redeploy Browser-Use Plane and vice versa.

Image (Modal)

mcr.microsoft.com/playwright-python:v1.49.0-jammy — bundles Chromium + the Playwright Python SDK pinned to the same version. CPU-only (no GPU, no Xvfb, no xdotool). ~2 GB.

Locale + TZ matched to Computer Plane so screenshot-comparison and date-format-sensitive sites behave identically across planes.

Deploy

modal deploy deploy/modal/browser_use_plane.py

Reads the function URL after deploy:

import modal
url = modal.Function.from_name("mantis-browser-use", "browser_use").get_web_url()

Brain-plane integration

src/mantis_agent/run_browser_use.py exports run_browser_use_executor(...) — the executor entry point. PR 2 leaves it un-wired from modal_cua_server.py's cua_model dispatch table; PR 3 wires it once the DOM-aware extensions make the surface useful for plan authors.

Configure plan-level via runtime.compute_backend: browser_use_plane:

runtime:
  compute_backend: browser_use_plane

steps:
  - intent: "Open Hacker News"
    type: navigate
    url: https://news.ycombinator.com

The resolver in src/mantis_agent/gym/compute_backend_resolver.py reads this; the factory in src/mantis_agent/gym/compute_factory.py dispatches to the right client. Default remains computer_plane.

Capability enforcement

run_browser_use configures CapabilityAllowlist.browser_use(executor="run_browser_use") — admits dom_aware + supports_cdp. Pure-CUA executors (run_claude_cua, run_holo3, etc.) use CapabilityAllowlist.pure_cua() and will raise CapabilityNotAllowed if a handler tries to consume DOM-aware extensions against them even when the client speaks Browser-Use Plane.

The mismatch check runs at session start (run_browser_use._validate_executor_compat) — failures here happen before any browser action, not mid-plan.

Profile + proxy at v1

Profiles are per-plane. The same (tenant_id, profile_id) identity exists on both planes but storage is independent. Layout (this plane):

/data/browser-use-profile/<tenant>__<profile_id>/   ← Playwright userDataDir blob

Computer Plane keeps its existing layout at /data/chrome-profile/.... Both volumes mounted; no shared bytes. Cross-plane profile handoff is the deferred follow-up gated on real demand (#785).

Proxy is passed at session/init and forwarded to Playwright's launch({proxy: {server}}). Same PrivateProxy creds as Computer Plane.

Non-goals (v1)

  • Stealth on CF-protected sites — use Computer Plane for those.
  • Cross-plane profile handoff — deferred follow-up.
  • Concurrent sessions per container — single-session pinned at v1, matching Computer Plane's posture.

Open questions

  1. Async vs sync Playwright. v1 uses sync_playwright for simplicity. Switching to async_api is a non-trivial refactor — defer until concurrency pressure makes it worth it.
  2. Session pool sizing. Same as Computer Plane: pin to one container at v1, revisit after a real workload.

References

  • Umbrella contract: docs/reference/compute-client.md
  • Sibling plane spec: docs/reference/computer-plane.md
  • Epic: #785
  • PR 1 (foundation): #786