Skip to main content

Production Deployment

This guide covers the hardened, production setup. For a laptop, use the dev overlay.

Prerequisites

  • Kubernetes 1.29+ with a CNI that enforces NetworkPolicy (Cilium, Calico, …).
  • agent-sandbox v0.4.5+ installed.
  • A dedicated sandbox node pool labeled vibed.dev/sandbox-node: "true", running containerd + containerd-shim-kata-v2, with KVM available (bare metal, *.metal on AWS, or nested-virt images on GCP).
  • A Kata RuntimeClasskata-fc (Firecracker, needs KVM) or kata-qemu.
  • S3 or MinIO for source tarballs.
  • A DNS-01 capable DNS provider for Caddy's wildcard cert on *.<your-domain>.

1. Container images

CI builds and pushes all component images to GHCR: vibed, vibed-controller, vibed-router, and the five template images. Pin a released tag (never latest) in your values.

2. Production values

image: { repository: ghcr.io/vibed-project/vibed, tag: "v0.4.1", pullPolicy: IfNotPresent }
controller: { image: { tag: "v0.4.1" }, domain: apps.example.com }
router: { image: { tag: "v0.4.1" } }

# Sandbox isolation
runtime:
defaultClass: kata-fc
nodeSelector: { vibed.dev/sandbox-node: "true" }
sandboxNetworkPolicy: Unmanaged # vibeD owns the policy (see below)

networkPolicy:
enabled: true # default-deny + control-plane->sandbox + DNS/S3 egress

# Source store — MUST be s3 in production
config:
storage:
tarball:
backend: s3
s3: { bucket: vibed-sources, region: us-east-1, presignTTL: "15m" }
server: { logFormat: json }

# Wildcard TLS
caddy:
tls:
dns01: { enabled: true, provider: cloudflare, tokenSecret: vibed-cloudflare }

auth:
enabled: true
mode: oidc # or apikey
oidc: { issuer: "https://idp.example.com/...", audience: vibed, adminRole: vibed-admin }

3. The network model (important)

agent-sandbox gives sandbox pods no cluster-internal egress and no cluster DNS once a NetworkPolicy is enforced — this is where enterprise data-egress controls live. Two consequences:

  • Source must come from s3. A pre-signed S3/MinIO URL is reachable over the sandbox's allowed public egress; the in-cluster served backend is not. served is dev-only.
  • vibeD must own the NetworkPolicy. agent-sandbox's Managed mode allows ingress only from an app: sandbox-router pod it doesn't ship, and blocks all cluster egress — which breaks the controller's probe and Caddy's proxy. Set runtime.sandboxNetworkPolicy: Unmanaged and networkPolicy.enabled: true; vibeD then ships a policy that allows exactly: control-plane → sandbox :8080/:9000, DNS, and public egress (for the S3 pull), with cluster-internal CIDRs denied.

4. Install

helm install vibed deploy/helm/vibed/ -n vibed-system --create-namespace -f values-production.yaml
kubectl rollout status deploy/vibed -n vibed-system
kubectl get sandboxwarmpool -n vibed-apps

5. Expose Caddy

Front the vibed-caddy Service with a LoadBalancer or Ingress, and point your wildcard DNS record *.apps.example.com at it. Caddy obtains the wildcard cert via DNS-01 and routes <id>.apps.example.com to each app's per-app Service.

Upgrading

helm upgrade vibed deploy/helm/vibed/ -n vibed-system -f values-production.yaml
CRD upgrades are not automatic

Helm installs CRDs from crds/ only on first install. If the release changes the VibedApp schema, apply it by hand or new status fields are silently dropped:

kubectl apply -f deploy/helm/vibed/crds/vibed.dev_vibedapps.yaml

Software supply chain (SBOMs & attestations)

Every tagged release ships a Software Bill of Materials so you can audit what you're deploying:

  • In-registry attestations — each image carries an SBOM attestation generated by BuildKit at push time. Inspect it without downloading the image:

    docker buildx imagetools inspect ghcr.io/vibed-project/vibed:<version> --format '{{ json .SBOM }}'
  • Downloadable SPDX files — the GitHub Release attaches an SPDX SBOM per image (<image>.spdx.json) plus a source SBOM (vibed-source.spdx.json), suitable for ingesting into a vulnerability scanner or compliance pipeline.

Both are produced automatically by the release workflow (.github/workflows/release.yaml); no action is needed beyond tagging a release.

Governance (multi-tenant installs)

For shared/enterprise installs, enable the governance controls (each has a dedicated page):

  • Egress control — per-app outbound allow-lists enforced by a forward proxy.
  • Deploy quotas — per-owner / per-department caps on concurrent apps.
  • Audit trail — who deployed/deleted/rolled back what (persistent on the SQLite store).

Security checklist

  • auth.enabled: true before exposing the API
  • Image tags pinned to a release (not latest)
  • runtime.sandboxNetworkPolicy: Unmanaged + networkPolicy.enabled: true
  • storage.tarball.backend: s3 (never served)
  • Sandbox node pool isolated + Kata RuntimeClass in place
  • Caddy wildcard DNS-01 TLS enabled with the provider token in a Secret
  • controller.domain is a real domain (not localhost)
  • CRD re-applied after any schema-changing upgrade
  • (Multi-tenant) egressControl.enabled + quotas.enabled reviewed; audit trail on a persistent store
  • Release SBOMs reviewed / ingested into your scanner