Use cases — what Mantis is good at¶

Mantis is a generic computer-use agent. Anything a human does with a keyboard, a mouse, and a browser, Mantis can drive — provided the steps are explainable in a structured plan. This page is a tour of the patterns that show up most often in production.

For copy-paste plans you can adapt, jump to Recipes.

Read flows — pull structured data out of a UI¶

Mantis reads what's on the screen and returns a JSON row per item. The schema is plan-driven; the same runtime handles vehicle listings, job postings, news articles, and real-estate inventory without a code change.

Pattern	Real-world target	Output
Marketplace listings	Vehicle / boat / RV listings; consumer-goods classifieds	`{year, make, model, price, phone, seller, url}` per listing
Job postings	Greenhouse, Lever, Workday, Ashby, public ATS pages	`{title, team, location, department, url}` per role
Real-estate	Zillow, Redfin, Trulia, MLS-public listings	`{address, price, beds, baths, sqft, agent, url}` per home
Product catalog	Amazon, Shopify storefronts, Walmart, Etsy	`{name, price, rating, availability, brand, url}` per product
News / content	Newsroom indices, blog archives, RSS-less sites	`{headline, byline, published, summary, url}` per article
Admin user lookup	Internal admin consoles ("find user by email")	`{email, full_name, plan, signup_date, last_login_at, billing_status, url}`
Compliance / audit screens	Console event logs, settings pages	Snapshot or row-by-row export, often with a recorded screencast

→ See Recipes 1–4 + 9 for working plans.

Write flows — change state in a SaaS UI¶

Same agent, different verbs. Form-shaped plans (fill_field / select_option / submit) drive logged-in workflows on systems that have no reliable public API for the target field.

Pattern	Real-world target	Action
CRM record edit	Salesforce / HubSpot / Zoho / Pipedrive / custom CRMs	Open lead → edit field (status / industry / owner) → Save
Contact upsert	Same; webhook-driven sync from your own DB	New Contact → fill name/email/phone/owner → Save
Stage moves	ATS pipelines, deal stages, ticket statuses	Open record → change stage dropdown → confirm
Refund / chargeback	Stripe / Shopify / Square admin	Search by order ID → open payment → Refund → confirm amount
Inventory adjust	Shopify / Woo / NetSuite	Open product → set stock → save
Customer-support reply	Zendesk / Intercom / Front	Open ticket → paste templated reply → assign macro → submit
Settings / config	OAuth apps, billing portals, IAM consoles	Toggle setting → confirm dialog → snapshot post-state

→ See Recipes 5, 9, 10, 11.

Authenticated multi-step workflows¶

The agent is plan-driven, so end-to-end flows that span login, navigation, search, edit, and verify all live in one plan. Examples:

Sales operations — log into CRM → pull all leads in stage X → for each, update owner and post a Slack note via webhook
Recruiting hygiene — log into ATS → walk pipeline → close stale candidates, move qualified to next stage
Customer-success motion — log into product admin → snapshot feature-usage page → email customer the export
Bookkeeping — log into bank/payments dashboard → reconcile a date range against your accounting system

Each step is a submit / fill_field / select_option / click / extract_data. Failures inside a step are isolated; a failed submit halts the plan before downstream damage.

The same form-flow vocabulary drives any web-based composer. Logging in through the UI sidesteps the requirement of platform API access for small-volume use cases.

Pattern	Use case
LinkedIn post	Weekly product update, hiring announcement, employee shout-out
Reddit submission	Subreddit digest, AMA invitation, release note
Instagram feed post	Scheduled photo + caption + hashtags
Twitter / X reply	Customer-support response, thread continuation

→ See Recipes 6, 7, 8.

Heads-up. Action recipes carry real-world consequences. Always include a gate: true extract step that verifies the action posted, the refund cleared, the lead saved. Without that gate a plan can "succeed" while the underlying action silently failed. See the safety note at the bottom of Recipes.

Desktop tasks (Xvfb, not just browser)¶

The agent's runtime is xdotool driving any X application. Browser is the most common target, but the same pipeline drives:

File manager — open a folder, drag-drop into archive, rename a batch
Terminal — run a command, capture stdout, paste into another app
LibreOffice / Office 365 — apply a style, fix headings, regenerate a ToC
Image / PDF tools — crop, annotate, export

These need a desktop environment in the runner (Xvfb + window manager). The Baseten and Modal images both ship with Xvfb + Chrome + xdotool; adding libreoffice-core to the image extends them.

Adversarial / anti-bot targets¶

Mantis drives a real Chrome via xdotool — no Playwright fingerprints, no WebDriver flag, real user-agent. Sites with bot detection (Cloudflare, PerimeterX, DataDome) usually let it through, especially when paired with a residential proxy (PROXY_URL env var). Examples that have worked:

Listing sites with Cloudflare Bot Fight
E-commerce checkouts behind reCAPTCHA invisible
Banking dashboards with TLS fingerprinting

Captchas remain user-visible only — Mantis surfaces a gate_failed result so you can hand off to a captcha-solver step or bail early.

When NOT to use Mantis¶

The agent costs ~$0.50 per minute of GPU + Claude calls. If a job has a clean public API or a stable Playwright path, use that instead. Mantis shines when:

The target has no API, or the API doesn't expose the field you need
The UI changes shape often enough that XPath/CSS selectors keep breaking
The flow spans multiple apps and you want one plan, not a chain of glue
A human can describe the task in 5–10 sentences

If the workflow is "fetch a row from a database" or "POST to a known endpoint", do that — don't pay GPU time to render a screen and read it back.

Next¶

Recipes — copy-paste plans for the 11 most common patterns
Concepts — the runtime model: plans, sections, gates, loops
Plan formats — every step type, every field
Quickstart — run your first plan in 5 minutes