Skip to content

GKE

Self-hosted on GKE Standard (not Autopilot — Autopilot doesn't yet support custom GPU node pools with the taint config Mantis needs). Full Terraform + k8s manifests + runbook live at deploy/gke/.

Architecture

GCLB ingress (HTTPS, managed cert)
GKE Standard cluster
  • GPU NodePool (a2-highgpu-1g — 1× A100 40 GB)
  • System NodePool
Filestore (1 TB Standard, RWX — runs, checkpoints, profiles, recordings)
Artifact Registry (mantis-holo3-server image)
Secret Manager (anthropic_api_key, proxy_*, mantis_api_token)
Secret Manager CSI driver → in-pod tmpfs → envFrom Secret

Footprint

Resource Type Cost (us-central1)
GPU node a2-highgpu-1g (A100 40 GB) ~$3.67/hr on-demand
a2-ultragpu-1g (A100 80 GB) ~$4.61/hr (more headroom)
a3-highgpu-1g (H100 80 GB) ~$11.06/hr (high throughput)
a2-highgpu-1g Spot ~$1.10/hr (preemptible)
Filestore Standard 1 TB ~$200/month fixed
GCLB LB + per-rule ~$18/month + traffic

For Holo3 Q8_0 (~34 GB VRAM), a2-highgpu-1g (A100 40 GB) is the sweet spot. Use Spot if you can tolerate preemption.

End-to-end deploy

The detailed runbook is in deploy/gke/README.md. High-level:

  1. Enable required APIs

    gcloud services enable container.googleapis.com \
      artifactregistry.googleapis.com \
      secretmanager.googleapis.com
    

  2. Install cluster add-ons

  3. Secret Manager CSI driver for Secret Manager → tmpfs
  4. GPU device plugin DaemonSet (or auto-install on node pool creation)

  5. Build + push image to Artifact Registry

    docker build --platform linux/amd64 -f docker/server.Dockerfile \
      -t "$REGION-docker.pkg.dev/$PROJECT/$REPO/mantis-holo3-server:$(git rev-parse --short HEAD)" .
    docker push "..."
    

  6. Provision infra

    cd deploy/gke/terraform
    terraform init
    terraform apply -var "project_id=$PROJECT" \
                    -var "cluster_name=mantis-prod" \
                    -var "image_tag=$(git rev-parse --short HEAD)" \
                    -var "use_spot_gpus=false"
    

  7. Populate Secret Manager

    for k in anthropic_api_key proxy_url proxy_user proxy_pass mantis_api_token; do
      read -rp "$k: " v
      printf "%s" "$v" | gcloud secrets versions add "mantis-prod-$k" --data-file=-
    done
    

  8. Pre-warm Holo3 weights

    kubectl apply -f deploy/gke/k8s/init-holo3-weights.yaml
    

  9. Deploy the workload

    kubectl apply -f deploy/gke/k8s/secret-provider-class.yaml
    kubectl apply -f deploy/gke/k8s/deployment.yaml
    kubectl apply -f deploy/gke/k8s/service.yaml
    kubectl apply -f deploy/gke/k8s/ingress.yaml
    

  10. Smoke-test

    HOST=$(kubectl get ingress mantis-holo3-server \
      -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
    TOK=$(gcloud secrets versions access latest --secret=mantis-prod-mantis_api_token)
    curl -fsS -X POST "https://$HOST/v1/predict" \
      -H "X-Mantis-Token: $TOK" \
      -d '{"detached": true, "micro": "plans/example/extract_listings.json", "state_key": "smoke"}'
    

Operational notes

  • Spot pricing: set use_spot_gpus = true for ~70 % savings. Plan for occasional preemption — runs survive a replica restart because state lives on Filestore.
  • Workload Identity: the Terraform creates a GSA bound to default/mantis-holo3-server KSA so the pod reads Secret Manager without a JSON key on disk.
  • Filestore is fixed-cost: 1 TB Standard is $200/month regardless of usage. For dev / lower-volume, drop to the minimum (capacity_gb = 1024 is the floor for Standard).
  • Region: A100s are scarce in some regions; us-central1, us-west1, and europe-west4 are the safest bets.

Status

Like the AWS path, this is a starter Terraform — tested locally with terraform validate but not deployed end-to-end against a live GCP project on this branch. Review for your VPC/IAM before terraform apply.

See also