Hosting¶

Pick the path that matches your infra.

Platform	Best for	Cost shape	Provisioned by
Baseten	Fastest path to prod; managed autoscale; nothing to operate	$/hour active GPU + per-call ($0.08–$0.15 per request)	`truss push`
Modal	Detached batch runs, scale-to-zero, GPU on demand	$/second of GPU active	`modal run --detach`
AWS (EKS)	Existing AWS estate, control over networking + node pools	EC2 g6e.2xlarge ~$1.86/hr + EFS + ECR	Terraform + `kubectl apply`
GKE	Existing GCP estate; A100 spot pricing	a2-highgpu-1g ~$3.67/hr + Filestore + Artifact Registry	Terraform + `kubectl apply`
Local (Docker)	Dev / single-machine; you bring the GPU	your own metal	`docker run`

What you provision regardless of platform¶

Every deployment needs the same five secrets, named the way the container expects (the platform-specific pages walk through how to set them):

Secret name	Used by	How to get one
`mantis_api_token` (single-tenant) or `mantis_tenant_keys` (multi-tenant)	Container auth	`openssl rand -hex 32` for single; JSON keys file for multi — see Tenant keys
`anthropic_api_key`	Claude grounding / extraction / gates	console.anthropic.com → API keys
`proxy_url`	IPRoyal proxy host:port	iproyal.com → residential plan
`proxy_user`	IPRoyal session ID	same
`proxy_pass`	IPRoyal password	same

For multi-tenant deployments add as many anthropic_api_key_<tenant> secrets as you have tenants — the keys file routes each tenant to its own.

Smoke test (any platform)¶

Once your deploy is up, this curl validates the whole chain end-to-end:

ENDPOINT="https://your-mantis-host.example.com"
TOKEN="<your tenant token>"

# Health check (no auth)
curl -fsS "$ENDPOINT/health"

# Auth + plan submission
RESP=$(curl -fsS -X POST "$ENDPOINT/v1/predict" \
  -H "X-Mantis-Token: $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "detached": true,
    "micro": "plans/example/extract_listings.json",
    "profile_id":  "deploy-smoke",
    "workflow_id": "deploy-smoke-v1",
    "max_cost": 2,
    "max_time_minutes": 20
  }')
echo "$RESP" | jq .

Expected: a queued response with a run_id in under a second. If you get a 401 → token wrong. 503 → server auth not configured. 429 → tenant rate-limited (or above its concurrency cap).

Common operational tasks¶

After the deploy works:

Provision tenants → Tenant keys
Wire up monitoring → Metrics
Cap blast radius → Rate limits, URL allowlist
Get notified on run completion → Webhooks
Make retries safe → Idempotency