Container Sandboxing
Why Sandboxing?
Traditional containers share the host kernel. A kernel vulnerability exploit in one container can compromise the entire host and all other containers. Runtime sandboxing adds an additional isolation layer between the container and the host kernel.
CKS Exam Relevance
The CKS exam tests your ability to:
- Understand the differences between gVisor, Kata Containers, and standard runtimes
- Create and configure RuntimeClass resources
- Assign pods to use sandboxed runtimes
- Understand the security vs performance tradeoffs
Container Isolation Comparison
| Aspect | runc (Traditional) | gVisor (runsc) | Kata Containers |
|---|---|---|---|
| Isolation | Namespaces + cgroups | User-space kernel | Lightweight VM |
| Kernel sharing | Shared host kernel | Intercepted syscalls | Dedicated guest kernel |
| Overhead | Minimal | Low-moderate | Moderate |
| Startup time | Fast (ms) | Fast (ms) | Slower (seconds) |
| Compatibility | Full | Most workloads | Full |
| Attack surface | Largest | Reduced | Smallest |
| Resource usage | Lowest | Low-moderate | Higher (VM overhead) |
gVisor (runsc)
What Is gVisor?
gVisor is an application kernel written in Go that implements a substantial portion of the Linux system call interface. It runs in user space and intercepts system calls from the containerized application, providing a strong isolation boundary without the overhead of a full virtual machine.
How gVisor Works
Key components:
- Sentry -- Intercepts and handles application syscalls in user space. Implements ~200+ Linux syscalls without passing them to the host kernel.
- Gofer -- A file system proxy that handles file operations on behalf of the sandbox, running as a separate process with minimal privileges.
gVisor Limitations
- Not all syscalls are supported (some applications may not work)
- Higher CPU overhead due to syscall interception
- No GPU support
- Networking performance impact
- Some
/procand/sysentries may differ
Kata Containers
What Are Kata Containers?
Kata Containers run workloads inside lightweight virtual machines using a hypervisor (QEMU/Cloud Hypervisor). Each container gets its own guest kernel, providing VM-level isolation with container-like usability.
How Kata Containers Work
- Each pod runs inside a dedicated lightweight VM
- Uses hardware virtualization (VT-x/AMD-V)
- Compatible with OCI runtime standard
- Integrates with Kubernetes via CRI
- Guest kernel is minimal and purpose-built
Kata Containers Tradeoffs
Advantages:
- Strongest isolation (hardware-enforced)
- Full Linux kernel compatibility
- Each container has its own kernel
Disadvantages:
- Requires hardware virtualization support
- Higher memory overhead (guest OS per container)
- Slower startup than gVisor or runc
- Not suitable for all environments (nested virtualization issues)
RuntimeClass Resource
The RuntimeClass is the Kubernetes mechanism for selecting which container runtime a pod should use. This is the key resource for the CKS exam.
Creating a RuntimeClass
# RuntimeClass for gVisor
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
name: gvisor
handler: runsc # Must match the handler name configured
# on the container runtime (containerd/CRI-O)# RuntimeClass for Kata Containers
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
name: kata
handler: kata-runtime # Handler name for KataHandler Name
The handler field must match the runtime handler name configured in the container runtime (containerd or CRI-O) on the nodes. This is a node-level configuration, not a Kubernetes configuration.
containerd Configuration for gVisor
On each node, the container runtime must be configured to know about the sandboxed runtime. For containerd:
# /etc/containerd/config.toml
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runsc]
runtime_type = "io.containerd.runsc.v1"After modifying, restart containerd:
sudo systemctl restart containerdUsing RuntimeClass in Pods
Assigning a Pod to gVisor
apiVersion: v1
kind: Pod
metadata:
name: sandboxed-pod
spec:
runtimeClassName: gvisor # Reference the RuntimeClass
containers:
- name: app
image: nginx:1.25
ports:
- containerPort: 80Assigning a Deployment to Kata
apiVersion: apps/v1
kind: Deployment
metadata:
name: secure-app
spec:
replicas: 3
selector:
matchLabels:
app: secure-app
template:
metadata:
labels:
app: secure-app
spec:
runtimeClassName: kata # All pods use Kata Containers
containers:
- name: app
image: myregistry/secure-app:v1.0Exam Pattern
A typical CKS exam question:
- Create a RuntimeClass named
gvisorwith handlerrunsc - Modify a pod or deployment to use this RuntimeClass
- Verify the pod is running with the sandboxed runtime
Verifying the Runtime
# Check which RuntimeClass a pod is using
kubectl get pod sandboxed-pod -o jsonpath='{.spec.runtimeClassName}'
# gvisor
# Verify the RuntimeClass exists
kubectl get runtimeclass
# NAME HANDLER AGE
# gvisor runsc 5m
# Check pod is running
kubectl get pod sandboxed-pod
# NAME READY STATUS RESTARTS AGE
# sandboxed-pod 1/1 Running 0 1m
# Inside the pod, check the kernel (gVisor has a different kernel version)
kubectl exec sandboxed-pod -- uname -r
# 4.4.0 (gVisor uses its own reported version)
kubectl exec sandboxed-pod -- dmesg | head -1
# gVisor will show different boot messagesRuntimeClass with Scheduling
RuntimeClass supports scheduling to ensure pods land on nodes that have the sandboxed runtime installed:
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
name: gvisor
handler: runsc
scheduling:
nodeSelector:
sandbox: gvisor # Only schedule on nodes with this label
tolerations:
- key: "sandbox"
operator: "Equal"
value: "gvisor"
effect: "NoSchedule"This ensures that pods requesting the gvisor RuntimeClass are only scheduled on nodes that actually have gVisor installed.
RuntimeClass with Overhead
RuntimeClass can declare resource overhead to account for the additional resources the sandbox consumes:
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
name: kata
handler: kata-runtime
overhead:
podFixed:
memory: "160Mi" # Kata VM overhead
cpu: "250m"This overhead is automatically added to the pod's resource accounting by the scheduler and kubelet.
When to Use Sandboxing
Recommended Use Cases
| Scenario | Recommended Runtime | Reason |
|---|---|---|
| CI/CD pipeline execution | gVisor | Untrusted build code |
| Multi-tenant SaaS | Kata Containers | Strong tenant isolation |
| Running user-submitted code | gVisor | Unknown code safety |
| Compliance requirements | Kata Containers | Hardware-level isolation |
| Standard microservices | runc + SecurityContext | Performance, compatibility |
| Network-intensive workloads | runc + SecurityContext | gVisor network overhead |
| Database workloads | runc + SecurityContext | I/O performance |
Security Comparison Summary
| Security Feature | runc | gVisor | Kata |
|---|---|---|---|
| Kernel isolation | No | Partial (user-space) | Full (VM) |
| Syscall filtering | Seccomp only | Built-in interception | Guest kernel |
| Filesystem isolation | Mount namespaces | Gofer proxy | VM boundary |
| Network isolation | Network namespaces | Netstack | VM network |
| Hardware-level isolation | No | No | Yes (hypervisor) |
| Container escape difficulty | Moderate | Hard | Very Hard |
Quick Reference
# List all RuntimeClasses
kubectl get runtimeclass
# Create a RuntimeClass (imperative - not possible, must use YAML)
# Apply a RuntimeClass manifest
kubectl apply -f runtimeclass.yaml
# Check which runtime a pod is using
kubectl get pod <name> -o jsonpath='{.spec.runtimeClassName}'
# Describe RuntimeClass details
kubectl describe runtimeclass gvisor
# Verify gVisor is running inside a pod
kubectl exec <pod> -- dmesg 2>&1 | head -5
kubectl exec <pod> -- uname -r
# Check containerd runtime configuration
cat /etc/containerd/config.toml | grep -A5 runsc