Skip to content

Mantis CUA

Environment variables

mercurialsolo/mantis

Environment variables¶

Reference for every server-side env knob. Set on the container (Baseten / k8s Deployment / docker run -e ...) — never in client-side code.

Auth¶

Var	Default	Effect
`MANTIS_API_TOKEN`	unset	Single-tenant mode: any caller with this token gets `DEFAULT_TENANT` permissions. Ignored if `MANTIS_TENANT_KEYS_PATH` is set.
`MANTIS_TENANT_KEYS_PATH`	unset	Path to JSON keys file. When set, the server runs in multi-tenant mode.

Caps¶

Var	Default	Effect
`MANTIS_MAX_STEPS_PER_PLAN`	200	Reject plans larger than this with 400
`MANTIS_MAX_LOOP_ITERATIONS`	50	Silently clamp `loop_count` in micro-plans
`MANTIS_MAX_RUNTIME_MINUTES`	60	Hard wall-time cap on every run
`MANTIS_MAX_COST_USD`	25.0	Hard cost cap on every run

These are global hard caps; tenant config can be tighter, never looser.

Paths¶

Var	Default	Effect
`MANTIS_DATA_DIR`	`/workspace/mantis-data`	Top-level data volume. Per-tenant subtree at `tenants/<tenant_id>/`.
`MANTIS_REPO_ROOT`	`/workspace/cua-agent`	Where `task_file` / `micro` paths are resolved from.
`MANTIS_DEBUG_DIR`	`<MANTIS_DATA_DIR>/screenshots/claude_debug`	Where Claude extraction prompt + screenshot debug bundles land.
`MANTIS_IDEMPOTENCY_DIR`	`<MANTIS_DATA_DIR>/idempotency`	Sidecar files for idempotency cache.
`MANTIS_CHROME_PROFILE_DIR`	set per-request by handler	Chrome profile dir used by the Xvfb env. The handler overrides this per tenant + `profile_id` (#341; falls back to legacy `state_key` when `profile_id` isn't set).

Inference¶

Var	Default	Effect
`MANTIS_LLAMA_PORT`	18080	Internal port the in-pod llama.cpp server binds to. The `/v1/chat/completions` proxy forwards here.
`MANTIS_BRAIN`	`holo3`	Brain backend selector. One of `holo3`, `claude`, `opencua`, `llamacpp`, `gemma4`, `agent-s`, `mock`. Wins over the legacy `MANTIS_MODEL`. `mock` is a deterministic always-DONE stub for plan authoring without GPU / API cost (#274).
`MANTIS_MODEL`	(set by Truss)	Legacy alias of `MANTIS_BRAIN` for one minor release. `gemma4-cua` aliases to `gemma4`.
`MANTIS_HOLO3_MODEL_DIR`	`/models/holo3`	Where Holo3 GGUF weights are mounted.
`ANTHROPIC_API_KEY`	unset	Default Anthropic key. Per-tenant `anthropic_secret_name` overrides per request.
`MANTIS_PROMPTS_DIR`	unset	Override directory for prompt files. When set, the loader reads `<dir>/<name>.txt` before falling back to the in-tree constant — lets a tenant tune wording without forking the wheel. Names: `system_v1`, `gemma4_system`, `holo3_system`, `claude_system`, `opencua_system`, `llamacpp_system`.

Proxy (IPRoyal)¶

Var	Default	Effect
`PROXY_URL`	unset	`host:port` for the upstream IPRoyal proxy
`PROXY_USER`	unset	session id
`PROXY_PASS`	unset	password
`MANTIS_PROXY_CITY`	unset	Default proxy geo override (caller can override per request)
`MANTIS_PROXY_STATE`	unset	Same

Webhooks¶

Var	Default	Effect
`MANTIS_WEBHOOK_SECRET_DEFAULT`	unset	Fallback HMAC signing secret when a tenant's `webhook_secret_name` doesn't resolve

Cost model (#122)¶

Var	Default	Effect
`MANTIS_COST_GPU_HOURLY_USD`	`3.25`	GPU compute, $/hour. Used by `CostConfig.gpu_cost`.
`MANTIS_COST_CLAUDE_CALL_USD`	`0.003`	Per-Claude-API-call rate. Multiplied by `claude_extract` + `claude_grounding` counters.
`MANTIS_COST_PROXY_PER_GB_USD`	`5.00`	Egress proxy bandwidth $/GB.
`MANTIS_COST_GPU_SECONDS_PER_STEP`	`3.0`	Per-step GPU seconds when the runner doesn't measure exact wall time.
`MANTIS_COST_PROXY_MB_PER_NAV`	`5.0`	Estimated proxy MB per page load.
`MANTIS_COST_PROXY_MB_PER_SCROLL`	`0.5`	Estimated proxy MB per scroll.

See operations/cost.md for the full rate-tuning workflow.

Trace export (#155)¶

Var	Default	Effect
`MANTIS_TRACE_EXPORT_DIR`	unset	Enable per-run trace export. When set, every completed / halted / cancelled / paused run writes `<dir>/<tenant_id>/<run_id>.json` with the full step list, costs, status, and predicted/observed outcomes. Empty `tenant_id` falls back to `__shared__/`. Off by default — feature flag for the continual-fine-tuning pipeline.
`MANTIS_TRACE_INCLUDE_SCREENSHOTS`	unset	When truthy (`1`/`true`/`yes`/`on`) and trace export is enabled, also persists per-step PNG screenshots to `<dir>/<tenant_id>/<run_id>_screens/<step:04d>.png`. Default off because screenshot bytes ~100× the on-disk trace size.

Augur observability (#509)¶

Active only when the augur-sdk package is importable; install via pip install 'mantis-agent[observability]'. See Augur integration for the full contract.

Var	Default	Effect
`AUGUR_DSN`	unset	Sentry-style DSN. When set, the SDK opens a streaming sink to the workspace alongside the on-disk bundle. When unset, only the bundle is written.
`AUGUR_CAPTURE_MODE`	`screenshots`	One of `off` / `metadata` / `trace` / `screenshots` / `video` / `model_io` / `dispatch` / `replay` / `full`. Controls what the SDK captures.
`MANTIS_AUGUR_DIR`	unset	Override the root directory where bundles are written. Run id is still appended. Falls back to `<MANTIS_DATA_DIR>/augur/`.
`MANTIS_AUGUR_DISABLED`	unset	Truthy (`1`/`true`/`yes`/`on`) → adapter is a no-op even with the SDK installed. Useful for tests / CI.
`MANTIS_VERSION`	unset	Surfaced as `client.version` on the bundle manifest — useful for bisecting which build produced a bundle.
`MANTIS_GIT_SHA`	unset	Surfaced as `client.git_sha` on the bundle manifest.

Logging¶

Var	Default	Effect
`LOG_LEVEL`	`INFO`	Standard Python logging level
`MANTIS_LOG_FORMAT`	`json`	`json` (default) emits one-line JSON per record with `tenant_id` enrichment; `plain` reverts to ad-hoc format

Runner / verification¶

Var	Default	Effect
`MANTIS_PREDICATE_VERIFY`	`enabled`	Per-step world-model verification (#291). When the brain emits a structured prediction (`{"expected": [...]}` or `Predicted: ...`), the runner parses, evaluates, and writes per-predicate booleans into the trajectory plus a `world_model_error` reward component. Set to `disabled` to ablate — `predicted_outcome` is still recorded for distillation, but no evaluation runs. See Predicate grammar.
`MANTIS_DONE_GATE`	`enabled`	Deterministic done-acceptance gate (#303). Runs cheap predicates (empty summary, plan steps incomplete, pending form values, etc.) before the model-based `verify_done`. Set to `disabled` to ablate — the runner falls through to the existing model verifier and `done_rejections_by_reason` stays empty. See Done-acceptance gate.
`MANTIS_FORM_CONTROLLER`	`enabled`	First-class runtime form controller (#301) owning pending-values / used-regions / submit-latch state. Set to `disabled` to ablate — the runner falls back to the legacy scattered `force_fill_*` locals; `runner.form_controller` is `None`. See Form controller.
`MANTIS_ADAPTIVE_SETTLE`	`enabled`	Replaces post-action `time.sleep(settle_time)` (#294) with a frame-stability gate (xdotool path) or `wait_for_load_state("networkidle")` gate (Playwright path), capped at the legacy budget. Set to `disabled` to ablate — both gates short-circuit back to a fixed sleep without a redeploy. See Adaptive settle.
`MANTIS_CHROME_REUSE`	`enabled`	Container-scoped Xvfb + Chrome session reuse (#311). Successive `/v1/cua` requests with the same `(profile_dir, proxy_key)` reuse the live browser instead of paying the ~10 s launch tax. Set to `disabled` to ablate. Per-request opt-out: `payload["reuse_session"]=false`. See Chrome session reuse.
`MANTIS_SPECULATIVE_INFERENCE`	`disabled`	Wraps the inner brain in `SpeculativeBrain` (#118) so `think()` overlaps with the post-action settle. Default OFF because the E2E ablation on Holo3 Q8 + single-llama.cpp showed a wall-time regression (GPU contention between speculative + sync requests, 55.6% hit rate → +52% wall). Quality is preserved by the strict validator. Enable on multi-GPU backends where the two `think()` requests don't serialize. See Speculative inference.
`MANTIS_PERCEPTUAL_VERIFY`	`enabled`	Perceptual-diff verifier (#293) for high-risk actions (submit, confirm, buy, send, delete, login, save). Compares pre/post frame hashes — both global and a 200×200 region around the click — and emits `action_effect_observed: bool` per step. WARNING line injected into next step's feedback on no-effect. Observational only — never blocks or substitutes the action. Set to `disabled` to ablate. See Perceptual diff verifier.
`MANTIS_LOOP_RECOVERY`	`enabled`	Action-class-transition policy (#302) that forces TAB / TYPE / RETURN when the brain loops on a no-effect click. Runs after the existing substitution chain (force-fill, force-submit, claude-director, top-click-guard) — the last gate before dispatch. Per-reason count surfaces on `RunResult.loop_recoveries_by_reason`. Set to `disabled` to ablate. See Loop recovery policy.

API documentation surface¶

Var	Default	Effect
`MANTIS_ENABLE_DOCS_UI`	`1`	Serve `/docs` (Swagger) and `/redoc` (Redoc) over the FastAPI app. Set to `0` / `false` / `no` / `off` on production tenant fleets that don't want the interactive UIs exposed publicly. `/openapi.json` is served regardless.
`MANTIS_GIT_SHA`	unset	Surfaced verbatim in `GET /v1/version` so clients can pin to a specific build. Typically populated by the deploy pipeline.
`MANTIS_BUILD_TIME`	unset	Surfaced verbatim in `GET /v1/version`. Populated by the deploy pipeline.

Context (set per request, not per deployment)¶

The handler sets these on every /v1/predict so downstream code (the runtime, the JSON log formatter) can read them via os.environ. Don't rely on them being set at deployment time.

MANTIS_TENANT_ID — current request's tenant id
MANTIS_CHROME_PROFILE_DIR — per-tenant per-profile_id Chrome user-data-dir for this run (#341)

See also¶

Operations / Tenant keys — multi-tenant config
Hosting — platform-specific deploy paths