Skip to content

Glossary

Quick definitions for project-specific terms. For deeper explanations, follow the links into the relevant section.

Term Definition
Action A unit of agent behavior the env can execute. One of CLICK, DOUBLE_CLICK, TYPE, KEY_PRESS, SCROLL, DRAG, WAIT, DONE.
Brain An inference client. Today: BrainHolo3 (llama.cpp / OpenAI-compat), BrainClaude (Anthropic API).
CUA Computer Use Agent. The category Mantis sits in.
Detached run A run started with detached: true. The server returns a run_id immediately and continues work in the background. Caller polls / waits for a webhook.
Gate A micro-plan step with gate: true. Claude verifies a free-text condition; if it fails, the run halts.
GymEnvironment The abstract env interface: reset(), step(action), screenshot(), close(). XdotoolGymEnv is the concrete production impl.
Holo3 Holo3-35B-A3B, the small specialist multimodal model used for click / scroll / type / drag. Runs on a single GPU as GGUF.
Idempotency-Key Header-based dedup key. Same key → same run_id returned. 24 h cache.
MicroPlan A list of micro-step objects: [{intent, type, section, ...}, ...]. Reliable for high-volume workflows.
MicroPlanRunner The orchestrator. Walks the micro-plan, runs each step, handles retries / gates / loops / checkpoint resume.
Plan text Plain-English instruction submitted via plan_text; the server decomposes via Claude (cached).
Polished video The composed walkthrough (title card → captioned run with action overlays → outro card) produced when record_video: true.
Raw recording The unprocessed Xvfb screencast. Available via ?raw=1 on the video endpoint.
run_id Server-generated identifier for a detached run. Format: <YYYYMMDD>_<HHMMSS>_<random_hex>.
profile_id (#341) Caller-chosen Chrome user-data-dir identity. Server prefixes with tenant_id to isolate. Sticky across plan revisions — same id ⇒ same cookies / logged-in sessions (reusing a login needs only the same profile_id, not resume_state). Two concurrent runs on one profile_id get a 409 (Chrome locks the user-data-dir); distinct ids parallelize. Defaults to "default". See Profiles & login reuse.
workflow_id (#341) Caller-chosen checkpoint identity. Server prefixes with tenant_id to isolate. Rotate when the plan definition changes; pair with resume_state: true to pick up where the last run with this id stopped. Defaults to plan_signature[:12].
state_key Legacy single-field identity. When set alone, the server routes it to both profile_id and workflow_id for back-compat. Prefer the split fields in new code; see #341. Concurrency rule: because a key resolves to a Chrome user-data-dir and a checkpoint, two in-flight calls must not share one state_key — they would race on the same profile + checkpoint. Distinct keys run in parallel (Modal scales out via @modal.concurrent). Fan out under this rule with StateKeyDispatcher — see Continuous-improvement architecture §6 and experiments/holdout/state_key_dispatcher.py.
Step One element of a micro-plan. Has intent, type, optional section, gate, verify, loop_target, etc.
Task suite Multi-task plan format ({tasks: [...]}). Each task has its own verify predicate.
Tenant A unit of isolation. Each tenant has its own token, scopes, caps, browser profile dir, run dir, and optionally its own Anthropic key.
TenantConfig The resolved configuration for one request: (tenant_id, scopes, max_*, anthropic_secret_name, allowed_domains, webhook_*).
Tier 1 / Tier 2 The hardening tiers. Tier 1: multi-tenant safe (auth, caps, isolation). Tier 2: production-quality (rate limits, idempotency, webhooks, allowlist, metrics).
xdotool The X11 input-injection tool used to send clicks / keys to Chrome inside Xvfb. Fingerprint-free (no DevTools / CDP signal).
Xvfb X virtual framebuffer. The headless display the agent's browser paints to and ffmpeg captures.
X-Mantis-Token The container-level auth header. Custom header (not Authorization: Bearer) so it doesn't collide with platform-gateway auth.
register_tool Host-provided tool registration on MicroPlanRunner and GymRunner (#285). Lets the embedding host expose its own tools — auth_credentials, request_user_input, bash, etc. — to the brain mid-run. JSON-schema input definition. See issue #71.
PauseRequested Exception a registered tool raises when it needs out-of-band input (OTP, 2FA, confirmation). Both runners catch it, snapshot state, and return a serializable PauseState. Resume by calling runner.resume(state, user_input=..., plan=plan) on MicroPlanRunner or runner.resume(state, user_input=...) on GymRunner. See issues #73, #285.
PauseState JSON-safe snapshot of a paused run. MicroPlanRunner fills step_results + loop_counters + listings_on_page; GymRunner fills trajectory_steps + task + task_id instead. The shared fields (step_index, pending_tool, prompt, timestamp) are populated by both. Designed to round-trip through Postgres JSONB.
ActionType.TOOL_CALL Action type a brain emits to invoke a registered host tool — Action(TOOL_CALL, params={"name": str, "args": dict}). The runner short-circuits env.step and routes through tool_channel.invoke; handler return values land in the trajectory's feedback field, raised PauseRequested triggers a RunResult(paused=True, ...). See issue #285.
RunnerResult Rich return type from MicroPlanRunner.run_with_status(plan) and runner.resume(...). Carries steps, status (completed / halted / cancelled / paused), cancelled bool, and optional pause_state.
cancel_event External cancellation hook on MicroPlanRunner and GymRunner (#288). Pass any object with .is_set() (e.g. threading.Event) or a no-arg callable. Checked at every step boundary. On a positive check, GymRunner returns RunResult(paused=True, pause_state=..., termination_reason="cancelled") with pending_reason="cancelled" — same snapshot shape as the PauseRequested path, so resume() rehydrates via the same code path. ECS / k8s SIGTERM is the canonical use case. See issues #76, #288.
step_callback Per-step observability hook: Callable[[idx, intent, action_or_none, ok], None]. Errors are logged, never raised — observability bugs cannot break a run. See issue #74.
LAUNCH_APP An ActionType that lets a plan start a desktop binary on demand (chromium, custom wrapper script, ...). Closes the symmetry gap with the Claude backend's bash tool. See issue #72.
scale_brain_to_display Helper that maps brain-space pixel coordinates to display-space pixels when a host resizes screenshots before inference. The contract that prevents the 1.5× click-offset bug class. See reference/coordinate-spaces.md and issue #75.
fill_field Step type for a labelled text input. params={"label": "User ID", "value": "alice"}. The runner clicks the field, clears any pre-filled value, and types value. Uses find_form_target (single labelled element), not find_all_listings. See issue #80.
submit Step type for a single labelled button — Login / Save / Update / Submit / Continue. params={"label": "Login", "aliases"?: ["Sign In", "Continue"]}. One click, then waits 2.5 s. The runner scrolls Page_Down up to 4 times when the button isn't in viewport (issue #89 §2); aliases covers copy variation across products (e.g. "Update Lead" / "Save" / "Save Changes").
select_option Step type for a dropdown / <select> choice. params={"dropdown_label": "Industry Vertical", "option_label": "Space Exploration"}. Two-phase: click dropdown → screenshot the open menu → click option.
find_form_target ClaudeExtractor method that locates one labelled element on a non-listings page. Counterpart to find_all_listings for forms. Returns {x, y, action, value, label}.
GroundingCache TTL+LRU cache that short-circuits Claude grounding calls when the same (perceptual_hash(crop), description_hash) pair recurs. Constructed once per Runtime and shared across ClaudeGrounding instances. Hit/miss/eviction/size emitted on Prometheus. See issue #117.
prefer_som_grounding SiteConfig flag. When True (and GymRunner has a page_discovery), the runner promotes Set-of-Mark dispatch (DOM-discovered candidates → brain choice) ABOVE the direct executor. On SoM-friendly sites this skips the Claude grounding round-trip. Default False. See issue #117.
Set-of-Mark / SoM The dispatch pattern proven in VWA: overlay numbered boxes on DOM-derived candidates, ask the brain "which N?", click the chosen element via DOM reference. Implemented in gym/page_discovery.py + _try_discovery_execution.
predicted_outcome / observed_outcome Fields on TrajectoryStep (#120). The brain emits Predicted: <delta> describing what its action will cause; the runner records the runtime feedback as the observed delta. Reward functions compute Jaccard similarity to derive a per-step world-model accuracy signal — see mantis_agent.rewards.world_model_accuracy_reward and PlanAdherenceReward.world_model_weight.
LoopDetector Detector with three signals: is_repeat_loop (byte-equal), is_drift_loop (coordinate-near), is_state_loop (URL/screenshot frozen). Both GymRunner and StreamingCUA call is_any_loop(window=...) after recording each action. Reset at plan-step boundaries (#116) so cross-step false positives clear.
BrainLadder Wraps a primary + fallback Brain. force_fallback() routes the next think() call to the fallback (one-shot or sticky). Annotates results with brain_used. Emits mantis_brain_escalation_total per call.