Skip to content

Use cases — what Mantis is good at

Mantis is a generic computer-use agent. Anything a human does with a keyboard, a mouse, and a browser, Mantis can drive — provided the steps are explainable in a structured plan. This page is a tour of the patterns that show up most often in production.

For copy-paste plans you can adapt, jump to Recipes.


Read flows — pull structured data out of a UI

Mantis reads what's on the screen and returns a JSON row per item. The schema is plan-driven; the same runtime handles vehicle listings, job postings, news articles, and real-estate inventory without a code change.

Pattern Real-world target Output
Marketplace listings Vehicle / boat / RV listings; consumer-goods classifieds {year, make, model, price, phone, seller, url} per listing
Job postings Greenhouse, Lever, Workday, Ashby, public ATS pages {title, team, location, department, url} per role
Real-estate Zillow, Redfin, Trulia, MLS-public listings {address, price, beds, baths, sqft, agent, url} per home
Product catalog Amazon, Shopify storefronts, Walmart, Etsy {name, price, rating, availability, brand, url} per product
News / content Newsroom indices, blog archives, RSS-less sites {headline, byline, published, summary, url} per article
Admin user lookup Internal admin consoles ("find user by email") {email, full_name, plan, signup_date, last_login_at, billing_status, url}
Compliance / audit screens Console event logs, settings pages Snapshot or row-by-row export, often with a recorded screencast

→ See Recipes 1–4 + 9 for working plans.


Write flows — change state in a SaaS UI

Same agent, different verbs. Form-shaped plans (fill_field / select_option / submit) drive logged-in workflows on systems that have no reliable public API for the target field.

Pattern Real-world target Action
CRM record edit Salesforce / HubSpot / Zoho / Pipedrive / custom CRMs Open lead → edit field (status / industry / owner) → Save
Contact upsert Same; webhook-driven sync from your own DB New Contact → fill name/email/phone/owner → Save
Stage moves ATS pipelines, deal stages, ticket statuses Open record → change stage dropdown → confirm
Refund / chargeback Stripe / Shopify / Square admin Search by order ID → open payment → Refund → confirm amount
Inventory adjust Shopify / Woo / NetSuite Open product → set stock → save
Customer-support reply Zendesk / Intercom / Front Open ticket → paste templated reply → assign macro → submit
Settings / config OAuth apps, billing portals, IAM consoles Toggle setting → confirm dialog → snapshot post-state

→ See Recipes 5, 9, 10, 11.


Authenticated multi-step workflows

The agent is plan-driven, so end-to-end flows that span login, navigation, search, edit, and verify all live in one plan. Examples:

  • Sales operations — log into CRM → pull all leads in stage X → for each, update owner and post a Slack note via webhook
  • Recruiting hygiene — log into ATS → walk pipeline → close stale candidates, move qualified to next stage
  • Customer-success motion — log into product admin → snapshot feature-usage page → email customer the export
  • Bookkeeping — log into bank/payments dashboard → reconcile a date range against your accounting system

Each step is a submit / fill_field / select_option / click / extract_data. Failures inside a step are isolated; a failed submit halts the plan before downstream damage.


Social / publishing

The same form-flow vocabulary drives any web-based composer. Logging in through the UI sidesteps the requirement of platform API access for small-volume use cases.

Pattern Use case
LinkedIn post Weekly product update, hiring announcement, employee shout-out
Reddit submission Subreddit digest, AMA invitation, release note
Instagram feed post Scheduled photo + caption + hashtags
Twitter / X reply Customer-support response, thread continuation

→ See Recipes 6, 7, 8.

Heads-up. Action recipes carry real-world consequences. Always include a gate: true extract step that verifies the action posted, the refund cleared, the lead saved. Without that gate a plan can "succeed" while the underlying action silently failed. See the safety note at the bottom of Recipes.


Desktop tasks (Xvfb, not just browser)

The agent's runtime is xdotool driving any X application. Browser is the most common target, but the same pipeline drives:

  • File manager — open a folder, drag-drop into archive, rename a batch
  • Terminal — run a command, capture stdout, paste into another app
  • LibreOffice / Office 365 — apply a style, fix headings, regenerate a ToC
  • Image / PDF tools — crop, annotate, export

These need a desktop environment in the runner (Xvfb + window manager). The Baseten and Modal images both ship with Xvfb + Chrome + xdotool; adding libreoffice-core to the image extends them.


Adversarial / anti-bot targets

Mantis drives a real Chrome via xdotool — no Playwright fingerprints, no WebDriver flag, real user-agent. Sites with bot detection (Cloudflare, PerimeterX, DataDome) usually let it through, especially when paired with a residential proxy (PROXY_URL env var). Examples that have worked:

  • Listing sites with Cloudflare Bot Fight
  • E-commerce checkouts behind reCAPTCHA invisible
  • Banking dashboards with TLS fingerprinting

Captchas remain user-visible only — Mantis surfaces a gate_failed result so you can hand off to a captcha-solver step or bail early.


When NOT to use Mantis

The agent costs ~$0.50 per minute of GPU + Claude calls. If a job has a clean public API or a stable Playwright path, use that instead. Mantis shines when:

  • The target has no API, or the API doesn't expose the field you need
  • The UI changes shape often enough that XPath/CSS selectors keep breaking
  • The flow spans multiple apps and you want one plan, not a chain of glue
  • A human can describe the task in 5–10 sentences

If the workflow is "fetch a row from a database" or "POST to a known endpoint", do that — don't pay GPU time to render a screen and read it back.


Next

  • Recipes — copy-paste plans for the 11 most common patterns
  • Concepts — the runtime model: plans, sections, gates, loops
  • Plan formats — every step type, every field
  • Quickstart — run your first plan in 5 minutes