Skip to content

Container Sandboxing

Why Sandboxing?

Traditional containers share the host kernel. A kernel vulnerability exploit in one container can compromise the entire host and all other containers. Runtime sandboxing adds an additional isolation layer between the container and the host kernel.

CKS Exam Relevance

The CKS exam tests your ability to:

  • Understand the differences between gVisor, Kata Containers, and standard runtimes
  • Create and configure RuntimeClass resources
  • Assign pods to use sandboxed runtimes
  • Understand the security vs performance tradeoffs

Container Isolation Comparison

Aspectrunc (Traditional)gVisor (runsc)Kata Containers
IsolationNamespaces + cgroupsUser-space kernelLightweight VM
Kernel sharingShared host kernelIntercepted syscallsDedicated guest kernel
OverheadMinimalLow-moderateModerate
Startup timeFast (ms)Fast (ms)Slower (seconds)
CompatibilityFullMost workloadsFull
Attack surfaceLargestReducedSmallest
Resource usageLowestLow-moderateHigher (VM overhead)

gVisor (runsc)

What Is gVisor?

gVisor is an application kernel written in Go that implements a substantial portion of the Linux system call interface. It runs in user space and intercepts system calls from the containerized application, providing a strong isolation boundary without the overhead of a full virtual machine.

How gVisor Works

Key components:

  • Sentry -- Intercepts and handles application syscalls in user space. Implements ~200+ Linux syscalls without passing them to the host kernel.
  • Gofer -- A file system proxy that handles file operations on behalf of the sandbox, running as a separate process with minimal privileges.

gVisor Limitations

  • Not all syscalls are supported (some applications may not work)
  • Higher CPU overhead due to syscall interception
  • No GPU support
  • Networking performance impact
  • Some /proc and /sys entries may differ

Kata Containers

What Are Kata Containers?

Kata Containers run workloads inside lightweight virtual machines using a hypervisor (QEMU/Cloud Hypervisor). Each container gets its own guest kernel, providing VM-level isolation with container-like usability.

How Kata Containers Work

  • Each pod runs inside a dedicated lightweight VM
  • Uses hardware virtualization (VT-x/AMD-V)
  • Compatible with OCI runtime standard
  • Integrates with Kubernetes via CRI
  • Guest kernel is minimal and purpose-built

Kata Containers Tradeoffs

Advantages:

  • Strongest isolation (hardware-enforced)
  • Full Linux kernel compatibility
  • Each container has its own kernel

Disadvantages:

  • Requires hardware virtualization support
  • Higher memory overhead (guest OS per container)
  • Slower startup than gVisor or runc
  • Not suitable for all environments (nested virtualization issues)

RuntimeClass Resource

The RuntimeClass is the Kubernetes mechanism for selecting which container runtime a pod should use. This is the key resource for the CKS exam.

Creating a RuntimeClass

yaml
# RuntimeClass for gVisor
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
  name: gvisor
handler: runsc                    # Must match the handler name configured
                                  # on the container runtime (containerd/CRI-O)
yaml
# RuntimeClass for Kata Containers
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
  name: kata
handler: kata-runtime             # Handler name for Kata

Handler Name

The handler field must match the runtime handler name configured in the container runtime (containerd or CRI-O) on the nodes. This is a node-level configuration, not a Kubernetes configuration.

containerd Configuration for gVisor

On each node, the container runtime must be configured to know about the sandboxed runtime. For containerd:

toml
# /etc/containerd/config.toml
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runsc]
  runtime_type = "io.containerd.runsc.v1"

After modifying, restart containerd:

bash
sudo systemctl restart containerd

Using RuntimeClass in Pods

Assigning a Pod to gVisor

yaml
apiVersion: v1
kind: Pod
metadata:
  name: sandboxed-pod
spec:
  runtimeClassName: gvisor        # Reference the RuntimeClass
  containers:
  - name: app
    image: nginx:1.25
    ports:
    - containerPort: 80

Assigning a Deployment to Kata

yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: secure-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: secure-app
  template:
    metadata:
      labels:
        app: secure-app
    spec:
      runtimeClassName: kata      # All pods use Kata Containers
      containers:
      - name: app
        image: myregistry/secure-app:v1.0

Exam Pattern

A typical CKS exam question:

  1. Create a RuntimeClass named gvisor with handler runsc
  2. Modify a pod or deployment to use this RuntimeClass
  3. Verify the pod is running with the sandboxed runtime

Verifying the Runtime

bash
# Check which RuntimeClass a pod is using
kubectl get pod sandboxed-pod -o jsonpath='{.spec.runtimeClassName}'
# gvisor

# Verify the RuntimeClass exists
kubectl get runtimeclass
# NAME     HANDLER   AGE
# gvisor   runsc     5m

# Check pod is running
kubectl get pod sandboxed-pod
# NAME            READY   STATUS    RESTARTS   AGE
# sandboxed-pod   1/1     Running   0          1m

# Inside the pod, check the kernel (gVisor has a different kernel version)
kubectl exec sandboxed-pod -- uname -r
# 4.4.0 (gVisor uses its own reported version)
kubectl exec sandboxed-pod -- dmesg | head -1
# gVisor will show different boot messages

RuntimeClass with Scheduling

RuntimeClass supports scheduling to ensure pods land on nodes that have the sandboxed runtime installed:

yaml
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
  name: gvisor
handler: runsc
scheduling:
  nodeSelector:
    sandbox: gvisor               # Only schedule on nodes with this label
  tolerations:
  - key: "sandbox"
    operator: "Equal"
    value: "gvisor"
    effect: "NoSchedule"

This ensures that pods requesting the gvisor RuntimeClass are only scheduled on nodes that actually have gVisor installed.


RuntimeClass with Overhead

RuntimeClass can declare resource overhead to account for the additional resources the sandbox consumes:

yaml
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
  name: kata
handler: kata-runtime
overhead:
  podFixed:
    memory: "160Mi"               # Kata VM overhead
    cpu: "250m"

This overhead is automatically added to the pod's resource accounting by the scheduler and kubelet.


When to Use Sandboxing

ScenarioRecommended RuntimeReason
CI/CD pipeline executiongVisorUntrusted build code
Multi-tenant SaaSKata ContainersStrong tenant isolation
Running user-submitted codegVisorUnknown code safety
Compliance requirementsKata ContainersHardware-level isolation
Standard microservicesrunc + SecurityContextPerformance, compatibility
Network-intensive workloadsrunc + SecurityContextgVisor network overhead
Database workloadsrunc + SecurityContextI/O performance

Security Comparison Summary

Security FeatureruncgVisorKata
Kernel isolationNoPartial (user-space)Full (VM)
Syscall filteringSeccomp onlyBuilt-in interceptionGuest kernel
Filesystem isolationMount namespacesGofer proxyVM boundary
Network isolationNetwork namespacesNetstackVM network
Hardware-level isolationNoNoYes (hypervisor)
Container escape difficultyModerateHardVery Hard

Quick Reference

bash
# List all RuntimeClasses
kubectl get runtimeclass

# Create a RuntimeClass (imperative - not possible, must use YAML)
# Apply a RuntimeClass manifest
kubectl apply -f runtimeclass.yaml

# Check which runtime a pod is using
kubectl get pod <name> -o jsonpath='{.spec.runtimeClassName}'

# Describe RuntimeClass details
kubectl describe runtimeclass gvisor

# Verify gVisor is running inside a pod
kubectl exec <pod> -- dmesg 2>&1 | head -5
kubectl exec <pod> -- uname -r

# Check containerd runtime configuration
cat /etc/containerd/config.toml | grep -A5 runsc

Released under the MIT License.