GKE¶
Self-hosted on GKE Standard (not Autopilot — Autopilot doesn't yet support custom GPU node pools with the taint config Mantis needs). Full Terraform + k8s manifests + runbook live at deploy/gke/.
Architecture¶
GCLB ingress (HTTPS, managed cert)
↓
GKE Standard cluster
• GPU NodePool (a2-highgpu-1g — 1× A100 40 GB)
• System NodePool
↓
Filestore (1 TB Standard, RWX — runs, checkpoints, profiles, recordings)
Artifact Registry (mantis-holo3-server image)
Secret Manager (anthropic_api_key, proxy_*, mantis_api_token)
Secret Manager CSI driver → in-pod tmpfs → envFrom Secret
Footprint¶
| Resource | Type | Cost (us-central1) |
|---|---|---|
| GPU node | a2-highgpu-1g (A100 40 GB) |
~$3.67/hr on-demand |
a2-ultragpu-1g (A100 80 GB) |
~$4.61/hr (more headroom) | |
a3-highgpu-1g (H100 80 GB) |
~$11.06/hr (high throughput) | |
a2-highgpu-1g Spot |
~$1.10/hr (preemptible) | |
| Filestore Standard | 1 TB | ~$200/month fixed |
| GCLB | LB + per-rule | ~$18/month + traffic |
For Holo3 Q8_0 (~34 GB VRAM), a2-highgpu-1g (A100 40 GB) is the sweet spot. Use Spot if you can tolerate preemption.
End-to-end deploy¶
The detailed runbook is in deploy/gke/README.md. High-level:
-
Enable required APIs
-
Install cluster add-ons
- Secret Manager CSI driver for Secret Manager → tmpfs
-
GPU device plugin DaemonSet (or auto-install on node pool creation)
-
Build + push image to Artifact Registry
-
Provision infra
-
Populate Secret Manager
-
Pre-warm Holo3 weights
-
Deploy the workload
-
Smoke-test
HOST=$(kubectl get ingress mantis-holo3-server \ -o jsonpath='{.status.loadBalancer.ingress[0].ip}') TOK=$(gcloud secrets versions access latest --secret=mantis-prod-mantis_api_token) curl -fsS -X POST "https://$HOST/v1/predict" \ -H "X-Mantis-Token: $TOK" \ -d '{"detached": true, "micro": "plans/example/extract_listings.json", "state_key": "smoke"}'
Operational notes¶
- Spot pricing: set
use_spot_gpus = truefor ~70 % savings. Plan for occasional preemption — runs survive a replica restart because state lives on Filestore. - Workload Identity: the Terraform creates a GSA bound to
default/mantis-holo3-serverKSA so the pod reads Secret Manager without a JSON key on disk. - Filestore is fixed-cost: 1 TB Standard is $200/month regardless of usage. For dev / lower-volume, drop to the minimum (
capacity_gb = 1024is the floor for Standard). - Region: A100s are scarce in some regions;
us-central1,us-west1, andeurope-west4are the safest bets.
Status¶
Like the AWS path, this is a starter Terraform — tested locally with terraform validate but not deployed end-to-end against a live GCP project on this branch. Review for your VPC/IAM before terraform apply.
See also¶
deploy/gke/README.md— full runbookdeploy/gke/k8s/— manifests- Tenant keys