Container Hardening Best Practices
Overview
Container hardening reduces the attack surface of your containerized applications. A hardened container limits what an attacker can do even if they compromise the application. This is a defense-in-depth strategy that complements SecurityContexts and admission controllers.
CKS Exam Relevance
Container hardening questions test your ability to:
- Choose appropriate base images
- Understand multi-stage Dockerfile builds
- Configure non-root containers
- Set up read-only filesystems with writable temp directories
- Apply resource limits to prevent DoS attacks
- Combine multiple hardening techniques in a single pod spec
Container Hardening Layers
Minimal Base Images
Why Base Image Choice Matters
| Base Image | Size | Packages | Shell | Attack Surface |
|---|---|---|---|---|
ubuntu:22.04 | ~77 MB | Many | Yes | Large |
debian:bookworm-slim | ~74 MB | Some | Yes | Medium-Large |
alpine:3.19 | ~7 MB | Minimal | Yes (ash) | Small |
gcr.io/distroless/static | ~2 MB | None | No | Minimal |
scratch | 0 MB | None | No | Zero |
Distroless Images
Google's distroless images contain only the application and its runtime dependencies. No package manager, no shell, no utilities.
# Distroless for a Go application
FROM gcr.io/distroless/static:nonroot
COPY --from=builder /app/myapp /myapp
USER 65532:65532
ENTRYPOINT ["/myapp"]# Distroless for a Java application
FROM gcr.io/distroless/java17-debian11:nonroot
COPY --from=builder /app/target/app.jar /app.jar
USER 65532:65532
ENTRYPOINT ["java", "-jar", "/app.jar"]Security Advantage
With no shell in distroless images, an attacker who gains code execution cannot open an interactive shell, install tools, or explore the filesystem. This dramatically limits post-exploitation capabilities.
Alpine Images
Alpine Linux uses musl libc and BusyBox, resulting in a very small footprint:
FROM alpine:3.19
RUN apk add --no-cache ca-certificates && \
adduser -D -u 10001 appuser
COPY --from=builder /app/myapp /myapp
USER 10001
ENTRYPOINT ["/myapp"]Scratch Images
The scratch base is completely empty -- suitable for statically linked binaries:
FROM scratch
COPY --from=builder /app/myapp /myapp
USER 10001:10001
ENTRYPOINT ["/myapp"]Scratch Limitations
scratch has no CA certificates, no timezone data, no /etc/passwd, and no shell. Your binary must be statically compiled and include all needed data.
Multi-Stage Builds
Multi-stage builds ensure that build tools, source code, and intermediate artifacts never appear in the final image.
Standard Multi-Stage Build
# Stage 1: Build
FROM golang:1.22-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o myapp .
# Stage 2: Runtime (minimal image)
FROM gcr.io/distroless/static:nonroot
COPY --from=builder /app/myapp /myapp
USER 65532:65532
EXPOSE 8080
ENTRYPOINT ["/myapp"]Why This Matters for Security
| Component | Build Stage | Final Image |
|---|---|---|
| Go compiler | Present | Absent |
| Source code | Present | Absent |
| Build dependencies | Present | Absent |
| Git history | Present | Absent |
| Test files | Present | Absent |
| Final binary | Present | Present |
Attack surface reduction: The final image contains only the compiled binary. No compilers, no package managers, no debug tools that an attacker could leverage.
Python Multi-Stage Example
# Stage 1: Build dependencies
FROM python:3.12-slim AS builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir --target=/app/deps -r requirements.txt
# Stage 2: Runtime
FROM python:3.12-slim
RUN groupadd -r appgroup && useradd -r -g appgroup -u 10001 appuser
WORKDIR /app
COPY --from=builder /app/deps /usr/local/lib/python3.12/site-packages/
COPY --chown=appuser:appgroup . .
USER 10001
EXPOSE 8000
CMD ["python", "app.py"]Non-Root Containers
Running containers as non-root is one of the most effective hardening techniques.
In the Dockerfile
FROM alpine:3.19
# Create a non-root user
RUN addgroup -g 10001 -S appgroup && \
adduser -u 10001 -S appuser -G appgroup
# Set ownership
COPY --chown=appuser:appgroup . /app
WORKDIR /app
# Switch to non-root user
USER 10001
CMD ["/app/myapp"]In the Pod Spec (Defense in Depth)
Even if the Dockerfile sets USER, always enforce it in the pod spec:
apiVersion: v1
kind: Pod
metadata:
name: non-root-app
spec:
securityContext:
runAsUser: 10001
runAsGroup: 10001
runAsNonRoot: true # Reject if image tries to run as root
fsGroup: 10001
containers:
- name: app
image: myregistry/myapp:v1.0
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALLDouble Protection
Setting the user in both the Dockerfile and the pod spec provides defense in depth:
- Dockerfile
USER: Default user when no securityContext is set - Pod
runAsUser: Overrides the image USER, ensures enforcement - Pod
runAsNonRoot: Kubernetes-level rejection of root containers
Read-Only Filesystems
A read-only root filesystem prevents attackers from writing malicious files, scripts, or binaries to the container.
Basic Configuration
apiVersion: v1
kind: Pod
metadata:
name: readonly-app
spec:
containers:
- name: app
image: nginx:1.25
securityContext:
readOnlyRootFilesystem: true
volumeMounts:
- name: tmp
mountPath: /tmp
- name: var-cache-nginx
mountPath: /var/cache/nginx
- name: var-run
mountPath: /var/run
volumes:
- name: tmp
emptyDir:
sizeLimit: 64Mi # Limit temp storage
- name: var-cache-nginx
emptyDir:
sizeLimit: 128Mi
- name: var-run
emptyDir:
sizeLimit: 1MiFinding Writable Paths
When making a container read-only, you need to identify which paths the application writes to:
# Run the container without read-only first
kubectl run test-app --image=nginx:1.25 -- sleep 3600
# Find writable directories and recent file modifications
kubectl exec test-app -- find / -writable -type d 2>/dev/null
kubectl exec test-app -- find / -newer /etc/hostname -type f 2>/dev/null
# Common writable paths by application:
# nginx: /var/cache/nginx, /var/run, /tmp
# node: /tmp, /home/node/.npm
# python: /tmp, /app/__pycache__
# java: /tmp, /app/logsExam Pattern
A common CKS exam question provides a pod that needs readOnlyRootFilesystem: true. You must:
- Add
readOnlyRootFilesystem: trueto the security context - Identify the writable paths needed by the application
- Add
emptyDirvolumes mounted at those paths - Optionally set
sizeLimiton the emptyDir volumes
Resource Limits
Resource limits prevent containers from consuming excessive CPU and memory, which protects against DoS attacks, fork bombs, and resource starvation.
Setting Limits
apiVersion: v1
kind: Pod
metadata:
name: limited-app
spec:
containers:
- name: app
image: myapp:v1.0
resources:
requests:
memory: "64Mi"
cpu: "125m"
limits:
memory: "128Mi" # Hard cap -- OOMKilled if exceeded
cpu: "500m" # Throttled if exceeded
ephemeral-storage: "256Mi" # Evicted if exceededWhat Happens When Limits Are Exceeded
| Resource | Behavior When Exceeded |
|---|---|
| Memory limit | Container is OOMKilled (restarted) |
| CPU limit | Container is throttled (not killed) |
| Ephemeral storage | Pod is evicted |
Why Limits Matter for Security
| Threat | Without Limits | With Limits |
|---|---|---|
| Fork bomb | Consumes all node CPU/memory | Contained to limit |
| Memory leak | Crashes other pods | Only this pod OOMKilled |
| Crypto mining | Uses all available CPU | Throttled to limit |
| Log flooding | Fills node disk | Ephemeral storage limit |
| Resource starvation | Starves other workloads | Isolated |
Always Set Resource Limits
A container without resource limits can consume all available resources on a node, affecting every other workload. This is both a security and reliability concern.
LimitRange (Namespace Default Limits)
Enforce default resource limits for all pods in a namespace:
apiVersion: v1
kind: LimitRange
metadata:
name: default-limits
namespace: production
spec:
limits:
- default: # Default limits
cpu: "500m"
memory: "256Mi"
defaultRequest: # Default requests
cpu: "100m"
memory: "64Mi"
max: # Maximum allowed
cpu: "2"
memory: "1Gi"
min: # Minimum required
cpu: "50m"
memory: "32Mi"
type: ContainerComplete Hardened Container Example
This combines all hardening techniques into a single, production-ready configuration:
Dockerfile
# Stage 1: Build
FROM golang:1.22-alpine AS builder
RUN apk add --no-cache git ca-certificates
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 \
go build -ldflags="-w -s" -o /app/server .
# Stage 2: Minimal runtime
FROM gcr.io/distroless/static:nonroot
# Copy only the binary and CA certs
COPY --from=builder /app/server /server
COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/
# Non-root user (distroless:nonroot uses 65532)
USER 65532:65532
EXPOSE 8080
ENTRYPOINT ["/server"]Pod Specification
apiVersion: v1
kind: Pod
metadata:
name: hardened-app
namespace: production
labels:
app: hardened-app
spec:
automountServiceAccountToken: false
securityContext:
runAsUser: 65532
runAsGroup: 65532
runAsNonRoot: true
fsGroup: 65532
seccompProfile:
type: RuntimeDefault
containers:
- name: app
image: gcr.io/my-company/hardened-app:v1.0@sha256:abc123...
ports:
- containerPort: 8080
protocol: TCP
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
resources:
requests:
memory: "64Mi"
cpu: "100m"
limits:
memory: "128Mi"
cpu: "250m"
ephemeral-storage: "64Mi"
volumeMounts:
- name: tmp
mountPath: /tmp
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 3
periodSeconds: 5
volumes:
- name: tmp
emptyDir:
medium: Memory
sizeLimit: 32MiHardening Checklist Applied
| Technique | Applied? | Detail |
|---|---|---|
| Minimal base image | Yes | distroless/static:nonroot |
| Multi-stage build | Yes | Builder stage discarded |
| Non-root user | Yes | UID 65532, runAsNonRoot: true |
| Read-only filesystem | Yes | readOnlyRootFilesystem: true |
| Writable temp via emptyDir | Yes | Memory-backed, 32Mi limit |
| Resource limits | Yes | CPU, memory, ephemeral storage |
| Drop all capabilities | Yes | drop: ["ALL"] |
| No privilege escalation | Yes | allowPrivilegeEscalation: false |
| No SA token mount | Yes | automountServiceAccountToken: false |
| Seccomp profile | Yes | RuntimeDefault |
| Image digest | Yes | @sha256:abc123... |
| Health probes | Yes | Liveness + readiness |
Additional Hardening Techniques
Use Image Digests Instead of Tags
# BAD - Tag can be overwritten with different content
image: nginx:1.25
# GOOD - Digest is immutable
image: nginx@sha256:6926dd802f40e5e7257fded83e0d8030039642e4e10c4a98a6478e9c6be0f536Disable Service Account Token Auto-Mount
# Pod level
spec:
automountServiceAccountToken: false
# Or ServiceAccount level
apiVersion: v1
kind: ServiceAccount
metadata:
name: my-sa
automountServiceAccountToken: falseUse Seccomp Profiles
securityContext:
seccompProfile:
type: RuntimeDefault # Default Docker/containerd profile
# type: Localhost # Custom profile
# localhostProfile: profiles/my-profile.jsonNetwork-Level Isolation
# Deny all traffic, then allow only what's needed
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: deny-all
namespace: production
spec:
podSelector: {}
policyTypes:
- Ingress
- EgressScanning Images for Vulnerabilities
While not always a hands-on CKS task, understanding image scanning is important:
# Scan with Trivy
trivy image nginx:1.25
# Scan with specific severity
trivy image --severity HIGH,CRITICAL nginx:1.25
# Scan and fail if vulnerabilities found (CI/CD)
trivy image --exit-code 1 --severity CRITICAL nginx:1.25Key Principle
Container hardening follows the principle of least privilege: give the container the absolute minimum it needs to function, and nothing more. Every additional capability, package, or permission is a potential attack vector.
Quick Reference
# Check if a container runs as root
kubectl exec <pod> -- id
kubectl exec <pod> -- whoami
# Check if root filesystem is writable
kubectl exec <pod> -- touch /test-write 2>&1
# Check resource limits
kubectl describe pod <pod> | grep -A5 Limits
# Check capabilities
kubectl exec <pod> -- cat /proc/1/status | grep Cap
# Check if SA token is mounted
kubectl exec <pod> -- ls /var/run/secrets/kubernetes.io/serviceaccount/ 2>&1
# List installed packages (if shell is available)
kubectl exec <pod> -- apk list --installed 2>/dev/null # Alpine
kubectl exec <pod> -- dpkg -l 2>/dev/null # Debian
# Check image details
kubectl get pod <pod> -o jsonpath='{.spec.containers[0].image}'
kubectl get pod <pod> -o jsonpath='{.status.containerStatuses[0].imageID}'