Glossary¶
Quick definitions for project-specific terms. For deeper explanations, follow the links into the relevant section.
| Term | Definition |
|---|---|
| Action | A unit of agent behavior the env can execute. One of CLICK, DOUBLE_CLICK, TYPE, KEY_PRESS, SCROLL, DRAG, WAIT, DONE. |
| Brain | An inference client. Today: BrainHolo3 (llama.cpp / OpenAI-compat), BrainClaude (Anthropic API). |
| CUA | Computer Use Agent. The category Mantis sits in. |
| Detached run | A run started with detached: true. The server returns a run_id immediately and continues work in the background. Caller polls / waits for a webhook. |
| Gate | A micro-plan step with gate: true. Claude verifies a free-text condition; if it fails, the run halts. |
| GymEnvironment | The abstract env interface: reset(), step(action), screenshot(), close(). XdotoolGymEnv is the concrete production impl. |
| Holo3 | Holo3-35B-A3B, the small specialist multimodal model used for click / scroll / type / drag. Runs on a single GPU as GGUF. |
| Idempotency-Key | Header-based dedup key. Same key → same run_id returned. 24 h cache. |
| MicroPlan | A list of micro-step objects: [{intent, type, section, ...}, ...]. Reliable for high-volume workflows. |
| MicroPlanRunner | The orchestrator. Walks the micro-plan, runs each step, handles retries / gates / loops / checkpoint resume. |
| Plan text | Plain-English instruction submitted via plan_text; the server decomposes via Claude (cached). |
| Polished video | The composed walkthrough (title card → captioned run with action overlays → outro card) produced when record_video: true. |
| Raw recording | The unprocessed Xvfb screencast. Available via ?raw=1 on the video endpoint. |
run_id |
Server-generated identifier for a detached run. Format: <YYYYMMDD>_<HHMMSS>_<random_hex>. |
state_key |
Caller-chosen identifier for a workflow. Server prefixes it with tenant_id to isolate. Controls browser-profile reuse and checkpoint resume. |
| Step | One element of a micro-plan. Has intent, type, optional section, gate, verify, loop_target, etc. |
| Task suite | Multi-task plan format ({tasks: [...]}). Each task has its own verify predicate. |
| Tenant | A unit of isolation. Each tenant has its own token, scopes, caps, browser profile dir, run dir, and optionally its own Anthropic key. |
| TenantConfig | The resolved configuration for one request: (tenant_id, scopes, max_*, anthropic_secret_name, allowed_domains, webhook_*). |
| Tier 1 / Tier 2 | The hardening tiers. Tier 1: multi-tenant safe (auth, caps, isolation). Tier 2: production-quality (rate limits, idempotency, webhooks, allowlist, metrics). |
| xdotool | The X11 input-injection tool used to send clicks / keys to Chrome inside Xvfb. Fingerprint-free (no DevTools / CDP signal). |
| Xvfb | X virtual framebuffer. The headless display the agent's browser paints to and ffmpeg captures. |
X-Mantis-Token |
The container-level auth header. Custom header (not Authorization: Bearer) so it doesn't collide with platform-gateway auth. |
register_tool |
Host-provided tool registration on MicroPlanRunner. Lets the embedding host expose its own tools — auth_credentials, request_user_input, bash, etc. — to the brain mid-plan. JSON-schema input definition. See issue #71. |
PauseRequested |
Exception a registered tool raises when it needs out-of-band input (OTP, 2FA, confirmation). The runner catches it, snapshots state, and returns a serializable PauseState. Resume by calling runner.resume(state, user_input=..., plan=plan). See issue #73. |
PauseState |
JSON-safe snapshot of a paused run: step index, pending tool + arguments, replayed step results, loop counters. Designed to round-trip through Postgres JSONB. |
RunnerResult |
Rich return type from MicroPlanRunner.run_with_status(plan) and runner.resume(...). Carries steps, status (completed / halted / cancelled / paused), cancelled bool, and optional pause_state. |
cancel_event |
External cancellation hook on MicroPlanRunner. Pass any object with .is_set() (e.g. threading.Event) or a no-arg callable. Checked at every step boundary. ECS deploy SIGTERM is the canonical use case. See issue #76. |
step_callback |
Per-step observability hook: Callable[[idx, intent, action_or_none, ok], None]. Errors are logged, never raised — observability bugs cannot break a run. See issue #74. |
LAUNCH_APP |
An ActionType that lets a plan start a desktop binary on demand (chromium, custom wrapper script, ...). Closes the symmetry gap with the Claude backend's bash tool. See issue #72. |
scale_brain_to_display |
Helper that maps brain-space pixel coordinates to display-space pixels when a host resizes screenshots before inference. The contract that prevents the 1.5× click-offset bug class. See reference/coordinate-spaces.md and issue #75. |
fill_field |
Step type for a labelled text input. params={"label": "User ID", "value": "alice"}. The runner clicks the field, clears any pre-filled value, and types value. Uses find_form_target (single labelled element), not find_all_listings. See issue #80. |
submit |
Step type for a single labelled button — Login / Save / Update / Submit / Continue. params={"label": "Login", "aliases"?: ["Sign In", "Continue"]}. One click, then waits 2.5 s. The runner scrolls Page_Down up to 4 times when the button isn't in viewport (issue #89 §2); aliases covers copy variation across products (e.g. "Update Lead" / "Save" / "Save Changes"). |
select_option |
Step type for a dropdown / <select> choice. params={"dropdown_label": "Industry Vertical", "option_label": "Space Exploration"}. Two-phase: click dropdown → screenshot the open menu → click option. |
find_form_target |
ClaudeExtractor method that locates one labelled element on a non-listings page. Counterpart to find_all_listings for forms. Returns {x, y, action, value, label}. |