Cost model (#122)¶
Per-resource pricing for the CUA runtime lives in
src/mantis_agent/cost_config.py as a CostConfig dataclass. Defaults
match the historical hardcoded rates so existing deployments see no
change without an env-var override.
Overridable knobs¶
All env vars are interpreted as floats. Bad values fail fast (operators see the error instead of silent fallback).
| Env var | Default | Purpose |
|---|---|---|
MANTIS_COST_GPU_HOURLY_USD |
3.25 |
GPU compute, $/hour. Multiply by gpu_seconds / 3600 for the run's GPU bill. |
MANTIS_COST_CLAUDE_CALL_USD |
0.003 |
Per-Claude-API-call rate. Applied to claude_extract + claude_grounding counters. |
MANTIS_COST_PROXY_PER_GB_USD |
5.00 |
Egress proxy bandwidth, $/GB. Multiply by proxy_mb / 1024. |
MANTIS_COST_GPU_SECONDS_PER_STEP |
3.0 |
Per-step GPU budget used when the runner doesn't measure exact seconds. |
MANTIS_COST_PROXY_MB_PER_NAV |
5.0 |
Estimated proxy MB consumed by one page load. |
MANTIS_COST_PROXY_MB_PER_SCROLL |
0.5 |
Estimated proxy MB per scroll. |
Set these once at deploy time (Modal secret, k8s ConfigMap,
docker run -e ..., Baseten Truss environment_variables). Per-tenant
overrides happen at the MicroPlanRunner constructor (cost_config=).
Wiring¶
CostConfig.from_env() → CostMeter(cost_config=...) → MicroPlanRunner
│
└─ runs cost_meter.totals(),
emits inflight + final
gauges/histograms on Prometheus
The CostMeter rolls up per-resource counters from the runner's costs
dict (gpu_steps, gpu_seconds, claude_extract, claude_grounding,
proxy_mb) and produces the four-tuple (gpu, claude, proxy, total)
in USD.
Prometheus¶
Two Prometheus surfaces ship with the cost model — see Metrics for the full table.
| Metric | Type | Labels |
|---|---|---|
mantis_run_cost_usd |
histogram | tenant_id, model, status |
mantis_run_cost_usd_inflight |
gauge | tenant_id, component (gpu / claude / proxy / total) |
The histogram captures terminal cost per detached run; the inflight gauge updates on every progress log so live runs show up on dashboards without waiting for terminal observation.
Tuning the rates¶
When upstream prices change (Modal A100 hourly, Anthropic per-token, proxy provider) operators can:
- Update the env var on the deployment (no code change).
- Confirm the new rate is live by hitting
/metricsand readingmantis_run_cost_usd_inflight— the next progress emission shows the override. - Tenant-specific rates are out of scope for this surface — pass
a custom
CostConfiginstance toMicroPlanRunner(cost_config=...)inside the per-tenant request handler.
Useful queries¶
# Per-tenant cost burn rate ($/hour averaged over the last hour)
sum by (tenant_id) (
increase(mantis_run_cost_usd_sum[1h])
)
# Cost split by component, last 24h
sum by (component) (
rate(mantis_run_cost_usd_inflight[24h])
)
# p95 cost per detached run (catches bug-runs blowing past the cap)
histogram_quantile(0.95,
sum by (tenant_id, le) (
rate(mantis_run_cost_usd_bucket[1h])
)
)
Tests¶
tests/test_cost_config.py covers env-override parsing + the three
cost computation helpers. tests/test_cost_meter.py covers the
roll-up + Prometheus emission surface.