Quickstart — five minutes to a real extraction¶

If someone has already deployed Mantis (or you're using the public Baseten reference deployment), all you need is a curl, your tenant token, and a plan. This page does an end-to-end run that walks a public listings page and returns three structured rows. Adapt the plan path to whatever shape you actually need (jobs, products, real estate, CRM record edits — see Use cases).

Prerequisites¶

curl, jq
A Baseten API key (BASETEN_API_KEY) for the gateway
A Mantis tenant token (MANTIS_API_TOKEN) issued by your operator — see Tenant keys if you're the operator

export BASETEN_API_KEY="<your baseten api key>"
export MANTIS_API_TOKEN="<your mantis tenant token>"
# Baseten forwards /sync/<any path> to the container's FastAPI app. The
# legacy /predict route still works (gateway treats it as /sync/predict).
export ENDPOINT="https://model-qvvgkneq.api.baseten.co/production/sync"

1. Submit the plan¶

RESP=$(curl -fsS -X POST "$ENDPOINT/v1/predict" \
  -H "Authorization: Api-Key $BASETEN_API_KEY" \
  -H "X-Mantis-Token: $MANTIS_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "detached": true,
    "micro": "plans/example/extract_listings.json",
    "state_key": "first-quickstart",
    "max_cost": 2,
    "max_time_minutes": 20,
    "record_video": true
  }')

RUN_ID=$(echo "$RESP" | jq -r .run_id)
echo "run_id: $RUN_ID"

Expected response:

{
  "status": "queued",
  "model": "holo3",
  "mode": "detached",
  "run_id": "20260428_021432_076255ef",
  ...
}

2. Poll for completion¶

while true; do
  STATUS=$(curl -fsS -X POST "$ENDPOINT/v1/predict" \
    -H "Authorization: Api-Key $BASETEN_API_KEY" \
    -H "X-Mantis-Token: $MANTIS_API_TOKEN" \
    -H "Content-Type: application/json" \
    -d "{\"action\":\"status\",\"run_id\":\"$RUN_ID\"}" \
    | jq -r .status)
  echo "$(date '+%H:%M:%S')  $STATUS"
  case "$STATUS" in succeeded|failed|cancelled) break ;; esac
  sleep 30
done

A successful run completes in ~10 minutes (cold-start image build is ~50 min if the instance scaled to zero — typically already warm).

3. Fetch the leads¶

curl -fsS -X POST "$ENDPOINT/v1/predict" \
  -H "Authorization: Api-Key $BASETEN_API_KEY" \
  -H "X-Mantis-Token: $MANTIS_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d "{\"action\":\"result\",\"run_id\":\"$RUN_ID\"}" \
  | jq .result.leads

You should see something like:

[
  "VIABLE | Year: <YYYY> | Make: <Make> | Model: <Model> | Price: <Price> | Phone: <Phone or 'none'>",
  "VIABLE | Year: <YYYY> | Make: <Make> | Model: <Model> | Price: <Price> | Phone: none",
  "VIABLE | Year: <YYYY> | Make: <Make> | Model: <Model> | Price: <Price> | Phone: none"
]

The exact field set depends on the extraction_schema you submit (see ExtractionSchema in the recipes).

4. Download the screencast¶

curl -fsS -o demo.mp4 \
  -H "X-Mantis-Token: $MANTIS_API_TOKEN" \
  "$ENDPOINT/v1/runs/$RUN_ID/video"
open demo.mp4

You'll see the three-segment walkthrough:

Title card (Mantis CUA · plan name · tenant · run id)
The actual browser navigation, with per-step captions and overlays for every action (click ripples, scroll arrows, key chord badges, type captions)
Outro card with the result summary (3 viable rows extracted, $0.42, 9.5 min)

Pass ?raw=1 to fetch the un-overlaid screen capture instead.

What just happened¶

your curl                         Baseten gateway          Mantis container         Holo3 GPU
──────────                        ────────────────         ────────────────         ─────────
POST /v1/predict ──────────────►  auth: Api-Key  ─────────► auth: X-Mantis-Token
                                                            │
                                                            ▼
                                                     MicroPlanRunner
                                                     loads the 3-listing plan
                                                            │
                                                            ▼
                                                     for each step:
                                                       ↓
                                                     Holo3 ◄──────────────────────► /v1/chat/completions
                                                       ↓                            (action proposal)
                                                     ClaudeGrounding ◄───────────►  Anthropic API
                                                       ↓                            (refined click)
                                                     xdotool click on Xvfb
                                                       ↓
                                                     ClaudeExtractor ◄───────────►  Anthropic API
                                                       ↓                            (lead row)
                                                     checkpoint to volume
                                                            ▼
                                                     ffmpeg captures screencast in parallel
                                                            ▼
GET /v1/runs/<id>/video ──────────►                  polished video composed

You ran one orchestrated extraction against a live website, used both Holo3 (cheap GPU clicks) and Claude (extraction reasoning), with a typed video walkthrough at the end.

Next steps¶

Read the Concepts page so you understand state_key, max_cost, gates, and sections before designing your own plan.
Browse the Plan formats to pick the right shape for your workflow.
Want to host your own instance? Go to Hosting.
Building an integration? Start with Client → Authentication.