Skip to content

Embedding MicroPlanRunner in a host application

This is the reference for hosts that import the mantis-agent library and drive MicroPlanRunner in their own process. If you only call the HTTP /v1/predict endpoint, you don't need this doc; see Sending plans instead.

Companion reading: the any-agent integration playbook covers the runtime contract and pre-flight checklist your host's env wrapper must satisfy.


Install

The orchestrator surface is gated behind a pip extra so heavy GPU / browser deps stay out of host processes:

pip install "mantis-agent[orchestrator]"

The [orchestrator] extra pulls only requests and pydantic. There is a CI test (tests/test_orchestrator_surface.py) that fails the build if a PR sneaks torch, vllm, transformers, pyautogui, or playwright into the orchestrator import chain — so pinning to a tagged release in your host's pyproject.toml is safe.

Public surface

Everything a host needs is re-exported at the top level:

from mantis_agent import (
    MicroPlanRunner,
    RunnerResult,
    PauseRequested,
    PauseState,
    StepResult,
    MicroPlan,
    MicroIntent,
    PlanDecomposer,
    scale_brain_to_display,
)

Lazy-loaded — import mantis_agent itself is cheap and side-effect-free. Submodules are pulled on first attribute access.

Minimal end-to-end shape

from mantis_agent import MicroPlanRunner, MicroPlan
from mantis_agent.brain_holo3 import Holo3Brain
from mantis_agent.extraction import ClaudeExtractor
from mantis_agent.grounding import ClaudeGrounding

# 1. Wire the brain. extra_headers is the integration knob hosts use to
#    talk to a deployed Mantis service through any host (Baseten / Modal /
#    EKS / GKE).
brain = Holo3Brain(
    base_url=f"{settings.mantis_endpoint}/v1",
    extra_headers={
        "X-Mantis-Token": settings.mantis_api_token,
        # Optional — only when an upstream gateway demands it. Sent verbatim:
        **({"Authorization": settings.mantis_gateway_authorization}
           if settings.mantis_gateway_authorization else {}),
    },
    timeout=180,
)

# 2. Claude helpers go DIRECT to Anthropic. The host already has a key.
extractor = ClaudeExtractor(api_key=settings.anthropic_api_key)
grounding = ClaudeGrounding(api_key=settings.anthropic_api_key)

# 3. Build the runner. The env is host-supplied — typically a
#    GymEnvironment subclass that adapts the host's existing Xvfb desktop.
runner = MicroPlanRunner(
    brain=brain,
    env=my_gym_env,
    grounding=grounding,
    extractor=extractor,
    session_name="my_workflow",
    max_cost=5.0,
    max_time_minutes=30,
)

# 4. Run.
plan = MicroPlan.from_dict(my_plan_payload)  # or PlanDecomposer().decompose_text(prompt)
result = runner.run_with_status(plan)

The four host-integration knobs

These are the four primitives a host integration typically reaches for. Each is opt-in — runs that don't set them see no change in behaviour.

1. step_callback — per-step observability (#74)

def on_step(idx: int, intent: str, action, ok: bool) -> None:
    log.info("step %d %s: %s", idx, "ok" if ok else "fail", intent)

runner = MicroPlanRunner(..., step_callback=on_step)

Every StepResult also carries screenshot_png: bytes | None — encoded PNG of the post-step display. Hosts can feed those bytes into their own sidecar (e.g. browser-context extraction) without parsing message structure. keep_screenshots=N caps retention to the most-recent N runs to bound memory on long plans.

2. cancel_event — clean SIGTERM exit (#76)

shutdown = threading.Event()
runner = MicroPlanRunner(..., cancel_event=shutdown)
# ... another thread sets shutdown when SIGTERM arrives.

result = runner.run_with_status(plan)
if result.cancelled:
    # state already persisted in the checkpoint — host returns the
    # equivalent of CUALoopResult(shutdown_requested=True, ...)
    persist_state(result.steps[-1] if result.steps else None)

Accepts any object with .is_set() or a plain callable. Checked at every step boundary.

3. register_tool — host tools to the brain (#71)

for tool in extra_anthropic_tools:
    runner.register_tool(
        name=tool.name,
        schema=tool.to_params(),  # JSON-schema; matches GenericToolAdapter shape
        handler=lambda kwargs, _t=tool: _t(**kwargs),
    )

Errors raised by handlers surface as success=False step results with a diagnostic data string ("tool:NAME:error:TypeName:msg") — never silently swallowed. The exception to swallow-on-error is PauseRequested, which the runner catches and turns into a clean pause (next).

4. PauseRequested + runner.resume() — OTP / 2FA / human-in-the-loop (#73)

def request_user_input(args):
    staged = runner.consume_pause_input(default=None)
    if staged is None:
        # First call — pause the run.
        raise PauseRequested(reason="user_input", prompt=args["prompt"])
    # Resumed call — staged is the user's reply.
    return staged

runner.register_tool(
    "request_user_input",
    {"type": "object", "properties": {"prompt": {"type": "string"}}},
    request_user_input,
)

result = runner.run_with_status(plan)
if result.paused:
    state_blob = result.pause_state.to_dict()  # JSON-safe; store anywhere
    save_to_db(state_blob)
    return  # release the worker; resume on a fresh request

# Later, when the user replies:
state = PauseState.from_dict(load_from_db())
result = runner.resume(state, user_input="123456", plan=plan)

PauseState round-trips through json.dumps — Postgres JSONB friendly. Plan-signature mismatch on resume raises ValueError: you can't resume a different plan than the one that paused.

Coordinate-space contract

The brain emits (x, y) in the same pixel space as the screenshot it saw. The env dispatches in display pixels. When those two spaces differ (host pre-resizes screenshots before inference), the host adapter is responsible for the scaling — never push it onto the brain.

Use the helper:

from mantis_agent import scale_brain_to_display

x_disp, y_disp = scale_brain_to_display(
    x_brain=action.params["x"],
    y_brain=action.params["y"],
    brain_size=brain_image.size,         # (w, h) of the image fed to inference
    display_size=desktop.viewport_size,  # (w, h) of the dispatch target
)

Full contract + worked examples + the bug-class history: reference/coordinate-spaces.md.

LAUNCH_APP action (#72)

A new ActionType.LAUNCH_APP lets a plan start a desktop binary explicitly — symmetric with the Claude backend's bash tool. Hosts that want browser launch on demand implement the dispatch in their GymEnvironment.step():

case ActionType.LAUNCH_APP:
    subprocess.Popen(
        [params["name"], *params.get("args", [])],
        env={**self._desktop_env, **params.get("env", {})},
    )

Failure to launch surfaces as a step error rather than a runner crash — the next screenshot is what the plan checks.

Backwards-compat invariants

A host integration must preserve everything that worked before — these invariants are non-negotiable:

  • The orchestrator surface is purely additive. No constructor argument to MicroPlanRunner is required for the new behaviour; defaults are None everywhere.
  • StepResult.to_dict() excludes screenshot_png and last_action, so the existing checkpoint JSON shape is byte-identical.
  • mantis_agent import does not pull torch, vllm, transformers, pyautogui, or playwright. A CI test (tests/test_orchestrator_surface.py) enforces this.
  • Existing MicroPlanRunner.run(plan) callers receive list[StepResult] unchanged. Use run_with_status(plan) only when you need the RunnerResult shape.

When your host carries an existing CUA backend that you want to keep working alongside Mantis, the typical invariants to preserve are:

  • Don't refactor your existing backend's class — add Mantis as a sibling selected by an env flag.
  • Don't change the agent-state blob shape for the existing backend's runs; just add a new key for Mantis runs alongside.
  • Default the env flag to the existing backend so nothing flips without an explicit opt-in.

Sharing this with another host

If you're integrating Mantis into a fresh host, the relevant docs in pull-this-order are:

  1. Integrating any agent — runtime contract + pre-flight checklist + the integration mistakes to avoid.
  2. This doc — what to import + the four host-integration knobs.
  3. reference/coordinate-spaces.md — required reading before you write a step() method.
  4. reference/glossary.md — quick definitions for terms used throughout.
  5. reference/env-vars.md — server-side env vars (only relevant if you self-host the Mantis service rather than calling Baseten).