Skip to content

Container Hardening Best Practices

Overview

Container hardening reduces the attack surface of your containerized applications. A hardened container limits what an attacker can do even if they compromise the application. This is a defense-in-depth strategy that complements SecurityContexts and admission controllers.

CKS Exam Relevance

Container hardening questions test your ability to:

  • Choose appropriate base images
  • Understand multi-stage Dockerfile builds
  • Configure non-root containers
  • Set up read-only filesystems with writable temp directories
  • Apply resource limits to prevent DoS attacks
  • Combine multiple hardening techniques in a single pod spec

Container Hardening Layers


Minimal Base Images

Why Base Image Choice Matters

Base ImageSizePackagesShellAttack Surface
ubuntu:22.04~77 MBManyYesLarge
debian:bookworm-slim~74 MBSomeYesMedium-Large
alpine:3.19~7 MBMinimalYes (ash)Small
gcr.io/distroless/static~2 MBNoneNoMinimal
scratch0 MBNoneNoZero

Distroless Images

Google's distroless images contain only the application and its runtime dependencies. No package manager, no shell, no utilities.

dockerfile
# Distroless for a Go application
FROM gcr.io/distroless/static:nonroot
COPY --from=builder /app/myapp /myapp
USER 65532:65532
ENTRYPOINT ["/myapp"]
dockerfile
# Distroless for a Java application
FROM gcr.io/distroless/java17-debian11:nonroot
COPY --from=builder /app/target/app.jar /app.jar
USER 65532:65532
ENTRYPOINT ["java", "-jar", "/app.jar"]

Security Advantage

With no shell in distroless images, an attacker who gains code execution cannot open an interactive shell, install tools, or explore the filesystem. This dramatically limits post-exploitation capabilities.

Alpine Images

Alpine Linux uses musl libc and BusyBox, resulting in a very small footprint:

dockerfile
FROM alpine:3.19
RUN apk add --no-cache ca-certificates && \
    adduser -D -u 10001 appuser
COPY --from=builder /app/myapp /myapp
USER 10001
ENTRYPOINT ["/myapp"]

Scratch Images

The scratch base is completely empty -- suitable for statically linked binaries:

dockerfile
FROM scratch
COPY --from=builder /app/myapp /myapp
USER 10001:10001
ENTRYPOINT ["/myapp"]

Scratch Limitations

scratch has no CA certificates, no timezone data, no /etc/passwd, and no shell. Your binary must be statically compiled and include all needed data.


Multi-Stage Builds

Multi-stage builds ensure that build tools, source code, and intermediate artifacts never appear in the final image.

Standard Multi-Stage Build

dockerfile
# Stage 1: Build
FROM golang:1.22-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o myapp .

# Stage 2: Runtime (minimal image)
FROM gcr.io/distroless/static:nonroot
COPY --from=builder /app/myapp /myapp
USER 65532:65532
EXPOSE 8080
ENTRYPOINT ["/myapp"]

Why This Matters for Security

ComponentBuild StageFinal Image
Go compilerPresentAbsent
Source codePresentAbsent
Build dependenciesPresentAbsent
Git historyPresentAbsent
Test filesPresentAbsent
Final binaryPresentPresent

Attack surface reduction: The final image contains only the compiled binary. No compilers, no package managers, no debug tools that an attacker could leverage.

Python Multi-Stage Example

dockerfile
# Stage 1: Build dependencies
FROM python:3.12-slim AS builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir --target=/app/deps -r requirements.txt

# Stage 2: Runtime
FROM python:3.12-slim
RUN groupadd -r appgroup && useradd -r -g appgroup -u 10001 appuser
WORKDIR /app
COPY --from=builder /app/deps /usr/local/lib/python3.12/site-packages/
COPY --chown=appuser:appgroup . .
USER 10001
EXPOSE 8000
CMD ["python", "app.py"]

Non-Root Containers

Running containers as non-root is one of the most effective hardening techniques.

In the Dockerfile

dockerfile
FROM alpine:3.19

# Create a non-root user
RUN addgroup -g 10001 -S appgroup && \
    adduser -u 10001 -S appuser -G appgroup

# Set ownership
COPY --chown=appuser:appgroup . /app
WORKDIR /app

# Switch to non-root user
USER 10001

CMD ["/app/myapp"]

In the Pod Spec (Defense in Depth)

Even if the Dockerfile sets USER, always enforce it in the pod spec:

yaml
apiVersion: v1
kind: Pod
metadata:
  name: non-root-app
spec:
  securityContext:
    runAsUser: 10001
    runAsGroup: 10001
    runAsNonRoot: true            # Reject if image tries to run as root
    fsGroup: 10001
  containers:
  - name: app
    image: myregistry/myapp:v1.0
    securityContext:
      allowPrivilegeEscalation: false
      readOnlyRootFilesystem: true
      capabilities:
        drop:
        - ALL

Double Protection

Setting the user in both the Dockerfile and the pod spec provides defense in depth:

  • Dockerfile USER: Default user when no securityContext is set
  • Pod runAsUser: Overrides the image USER, ensures enforcement
  • Pod runAsNonRoot: Kubernetes-level rejection of root containers

Read-Only Filesystems

A read-only root filesystem prevents attackers from writing malicious files, scripts, or binaries to the container.

Basic Configuration

yaml
apiVersion: v1
kind: Pod
metadata:
  name: readonly-app
spec:
  containers:
  - name: app
    image: nginx:1.25
    securityContext:
      readOnlyRootFilesystem: true
    volumeMounts:
    - name: tmp
      mountPath: /tmp
    - name: var-cache-nginx
      mountPath: /var/cache/nginx
    - name: var-run
      mountPath: /var/run
  volumes:
  - name: tmp
    emptyDir:
      sizeLimit: 64Mi             # Limit temp storage
  - name: var-cache-nginx
    emptyDir:
      sizeLimit: 128Mi
  - name: var-run
    emptyDir:
      sizeLimit: 1Mi

Finding Writable Paths

When making a container read-only, you need to identify which paths the application writes to:

bash
# Run the container without read-only first
kubectl run test-app --image=nginx:1.25 -- sleep 3600

# Find writable directories and recent file modifications
kubectl exec test-app -- find / -writable -type d 2>/dev/null
kubectl exec test-app -- find / -newer /etc/hostname -type f 2>/dev/null

# Common writable paths by application:
# nginx:  /var/cache/nginx, /var/run, /tmp
# node:   /tmp, /home/node/.npm
# python: /tmp, /app/__pycache__
# java:   /tmp, /app/logs

Exam Pattern

A common CKS exam question provides a pod that needs readOnlyRootFilesystem: true. You must:

  1. Add readOnlyRootFilesystem: true to the security context
  2. Identify the writable paths needed by the application
  3. Add emptyDir volumes mounted at those paths
  4. Optionally set sizeLimit on the emptyDir volumes

Resource Limits

Resource limits prevent containers from consuming excessive CPU and memory, which protects against DoS attacks, fork bombs, and resource starvation.

Setting Limits

yaml
apiVersion: v1
kind: Pod
metadata:
  name: limited-app
spec:
  containers:
  - name: app
    image: myapp:v1.0
    resources:
      requests:
        memory: "64Mi"
        cpu: "125m"
      limits:
        memory: "128Mi"           # Hard cap -- OOMKilled if exceeded
        cpu: "500m"               # Throttled if exceeded
        ephemeral-storage: "256Mi" # Evicted if exceeded

What Happens When Limits Are Exceeded

ResourceBehavior When Exceeded
Memory limitContainer is OOMKilled (restarted)
CPU limitContainer is throttled (not killed)
Ephemeral storagePod is evicted

Why Limits Matter for Security

ThreatWithout LimitsWith Limits
Fork bombConsumes all node CPU/memoryContained to limit
Memory leakCrashes other podsOnly this pod OOMKilled
Crypto miningUses all available CPUThrottled to limit
Log floodingFills node diskEphemeral storage limit
Resource starvationStarves other workloadsIsolated

Always Set Resource Limits

A container without resource limits can consume all available resources on a node, affecting every other workload. This is both a security and reliability concern.

LimitRange (Namespace Default Limits)

Enforce default resource limits for all pods in a namespace:

yaml
apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: production
spec:
  limits:
  - default:                      # Default limits
      cpu: "500m"
      memory: "256Mi"
    defaultRequest:                # Default requests
      cpu: "100m"
      memory: "64Mi"
    max:                          # Maximum allowed
      cpu: "2"
      memory: "1Gi"
    min:                          # Minimum required
      cpu: "50m"
      memory: "32Mi"
    type: Container

Complete Hardened Container Example

This combines all hardening techniques into a single, production-ready configuration:

Dockerfile

dockerfile
# Stage 1: Build
FROM golang:1.22-alpine AS builder
RUN apk add --no-cache git ca-certificates
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 \
    go build -ldflags="-w -s" -o /app/server .

# Stage 2: Minimal runtime
FROM gcr.io/distroless/static:nonroot

# Copy only the binary and CA certs
COPY --from=builder /app/server /server
COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/

# Non-root user (distroless:nonroot uses 65532)
USER 65532:65532

EXPOSE 8080
ENTRYPOINT ["/server"]

Pod Specification

yaml
apiVersion: v1
kind: Pod
metadata:
  name: hardened-app
  namespace: production
  labels:
    app: hardened-app
spec:
  automountServiceAccountToken: false
  securityContext:
    runAsUser: 65532
    runAsGroup: 65532
    runAsNonRoot: true
    fsGroup: 65532
    seccompProfile:
      type: RuntimeDefault
  containers:
  - name: app
    image: gcr.io/my-company/hardened-app:v1.0@sha256:abc123...
    ports:
    - containerPort: 8080
      protocol: TCP
    securityContext:
      allowPrivilegeEscalation: false
      readOnlyRootFilesystem: true
      capabilities:
        drop:
        - ALL
    resources:
      requests:
        memory: "64Mi"
        cpu: "100m"
      limits:
        memory: "128Mi"
        cpu: "250m"
        ephemeral-storage: "64Mi"
    volumeMounts:
    - name: tmp
      mountPath: /tmp
    livenessProbe:
      httpGet:
        path: /healthz
        port: 8080
      initialDelaySeconds: 5
      periodSeconds: 10
    readinessProbe:
      httpGet:
        path: /ready
        port: 8080
      initialDelaySeconds: 3
      periodSeconds: 5
  volumes:
  - name: tmp
    emptyDir:
      medium: Memory
      sizeLimit: 32Mi

Hardening Checklist Applied

TechniqueApplied?Detail
Minimal base imageYesdistroless/static:nonroot
Multi-stage buildYesBuilder stage discarded
Non-root userYesUID 65532, runAsNonRoot: true
Read-only filesystemYesreadOnlyRootFilesystem: true
Writable temp via emptyDirYesMemory-backed, 32Mi limit
Resource limitsYesCPU, memory, ephemeral storage
Drop all capabilitiesYesdrop: ["ALL"]
No privilege escalationYesallowPrivilegeEscalation: false
No SA token mountYesautomountServiceAccountToken: false
Seccomp profileYesRuntimeDefault
Image digestYes@sha256:abc123...
Health probesYesLiveness + readiness

Additional Hardening Techniques

Use Image Digests Instead of Tags

yaml
# BAD - Tag can be overwritten with different content
image: nginx:1.25

# GOOD - Digest is immutable
image: nginx@sha256:6926dd802f40e5e7257fded83e0d8030039642e4e10c4a98a6478e9c6be0f536

Disable Service Account Token Auto-Mount

yaml
# Pod level
spec:
  automountServiceAccountToken: false

# Or ServiceAccount level
apiVersion: v1
kind: ServiceAccount
metadata:
  name: my-sa
automountServiceAccountToken: false

Use Seccomp Profiles

yaml
securityContext:
  seccompProfile:
    type: RuntimeDefault           # Default Docker/containerd profile
    # type: Localhost              # Custom profile
    # localhostProfile: profiles/my-profile.json

Network-Level Isolation

yaml
# Deny all traffic, then allow only what's needed
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-all
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress

Scanning Images for Vulnerabilities

While not always a hands-on CKS task, understanding image scanning is important:

bash
# Scan with Trivy
trivy image nginx:1.25

# Scan with specific severity
trivy image --severity HIGH,CRITICAL nginx:1.25

# Scan and fail if vulnerabilities found (CI/CD)
trivy image --exit-code 1 --severity CRITICAL nginx:1.25

Key Principle

Container hardening follows the principle of least privilege: give the container the absolute minimum it needs to function, and nothing more. Every additional capability, package, or permission is a potential attack vector.


Quick Reference

bash
# Check if a container runs as root
kubectl exec <pod> -- id
kubectl exec <pod> -- whoami

# Check if root filesystem is writable
kubectl exec <pod> -- touch /test-write 2>&1

# Check resource limits
kubectl describe pod <pod> | grep -A5 Limits

# Check capabilities
kubectl exec <pod> -- cat /proc/1/status | grep Cap

# Check if SA token is mounted
kubectl exec <pod> -- ls /var/run/secrets/kubernetes.io/serviceaccount/ 2>&1

# List installed packages (if shell is available)
kubectl exec <pod> -- apk list --installed 2>/dev/null  # Alpine
kubectl exec <pod> -- dpkg -l 2>/dev/null               # Debian

# Check image details
kubectl get pod <pod> -o jsonpath='{.spec.containers[0].image}'
kubectl get pod <pod> -o jsonpath='{.status.containerStatuses[0].imageID}'

Released under the MIT License.