Cost model (#122)¶

Per-resource pricing for the CUA runtime lives in src/mantis_agent/cost_config.py as a CostConfig dataclass. Defaults match the historical hardcoded rates so existing deployments see no change without an env-var override.

Overridable knobs¶

All env vars are interpreted as floats. Bad values fail fast (operators see the error instead of silent fallback).

Env var	Default	Purpose
`MANTIS_COST_GPU_HOURLY_USD`	`3.25`	GPU compute, $/hour. Multiply by `gpu_seconds / 3600` for the run's GPU bill.
`MANTIS_COST_CLAUDE_CALL_USD`	`0.003`	Per-Claude-API-call rate. Applied to `claude_extract` + `claude_grounding` counters.
`MANTIS_COST_PROXY_PER_GB_USD`	`5.00`	Egress proxy bandwidth, $/GB. Multiply by `proxy_mb / 1024`.
`MANTIS_COST_GPU_SECONDS_PER_STEP`	`3.0`	Per-step GPU budget used when the runner doesn't measure exact seconds.
`MANTIS_COST_PROXY_MB_PER_NAV`	`5.0`	Estimated proxy MB consumed by one page load.
`MANTIS_COST_PROXY_MB_PER_SCROLL`	`0.5`	Estimated proxy MB per scroll.

Set these once at deploy time (Modal secret, k8s ConfigMap, docker run -e ..., Baseten Truss environment_variables). Per-tenant overrides happen at the MicroPlanRunner constructor (cost_config=).

Wiring¶

CostConfig.from_env()  →  CostMeter(cost_config=...)  →  MicroPlanRunner
                                       │
                                       └─ runs cost_meter.totals(),
                                          emits inflight + final
                                          gauges/histograms on Prometheus

The CostMeter rolls up per-resource counters from the runner's costs dict (gpu_steps, gpu_seconds, claude_extract, claude_grounding, proxy_mb) and produces the four-tuple (gpu, claude, proxy, total) in USD.

Prometheus¶

Two Prometheus surfaces ship with the cost model — see Metrics for the full table.

Metric	Type	Labels
`mantis_run_cost_usd`	histogram	`tenant_id`, `model`, `status`
`mantis_run_cost_usd_inflight`	gauge	`tenant_id`, `component` (`gpu` / `claude` / `proxy` / `total`)

The histogram captures terminal cost per detached run; the inflight gauge updates on every progress log so live runs show up on dashboards without waiting for terminal observation.

Tuning the rates¶

When upstream prices change (Modal A100 hourly, Anthropic per-token, proxy provider) operators can:

Update the env var on the deployment (no code change).
Confirm the new rate is live by hitting /metrics and reading mantis_run_cost_usd_inflight — the next progress emission shows the override.
Tenant-specific rates are out of scope for this surface — pass a custom CostConfig instance to MicroPlanRunner(cost_config=...) inside the per-tenant request handler.

Useful queries¶

# Per-tenant cost burn rate ($/hour averaged over the last hour)
sum by (tenant_id) (
  increase(mantis_run_cost_usd_sum[1h])
)

# Cost split by component, last 24h
sum by (component) (
  rate(mantis_run_cost_usd_inflight[24h])
)

# p95 cost per detached run (catches bug-runs blowing past the cap)
histogram_quantile(0.95,
  sum by (tenant_id, le) (
    rate(mantis_run_cost_usd_bucket[1h])
  )
)

Tests¶

tests/test_cost_config.py covers env-override parsing + the three cost computation helpers. tests/test_cost_meter.py covers the roll-up + Prometheus emission surface.