Container Hardening Best Practices

Overview

Container hardening reduces the attack surface of your containerized applications. A hardened container limits what an attacker can do even if they compromise the application. This is a defense-in-depth strategy that complements SecurityContexts and admission controllers.

CKS Exam Relevance

Container hardening questions test your ability to:

Choose appropriate base images
Understand multi-stage Dockerfile builds
Configure non-root containers
Set up read-only filesystems with writable temp directories
Apply resource limits to prevent DoS attacks
Combine multiple hardening techniques in a single pod spec

Container Hardening Layers

Minimal Base Images

Why Base Image Choice Matters

Base Image	Size	Packages	Shell	Attack Surface
`ubuntu:22.04`	~77 MB	Many	Yes	Large
`debian:bookworm-slim`	~74 MB	Some	Yes	Medium-Large
`alpine:3.19`	~7 MB	Minimal	Yes (ash)	Small
`gcr.io/distroless/static`	~2 MB	None	No	Minimal
`scratch`	0 MB	None	No	Zero

Distroless Images

Google's distroless images contain only the application and its runtime dependencies. No package manager, no shell, no utilities.

dockerfile

# Distroless for a Go application
FROM gcr.io/distroless/static:nonroot
COPY --from=builder /app/myapp /myapp
USER 65532:65532
ENTRYPOINT ["/myapp"]

dockerfile

# Distroless for a Java application
FROM gcr.io/distroless/java17-debian11:nonroot
COPY --from=builder /app/target/app.jar /app.jar
USER 65532:65532
ENTRYPOINT ["java", "-jar", "/app.jar"]

Security Advantage

With no shell in distroless images, an attacker who gains code execution cannot open an interactive shell, install tools, or explore the filesystem. This dramatically limits post-exploitation capabilities.

Alpine Images

Alpine Linux uses musl libc and BusyBox, resulting in a very small footprint:

dockerfile

FROM alpine:3.19
RUN apk add --no-cache ca-certificates && \
    adduser -D -u 10001 appuser
COPY --from=builder /app/myapp /myapp
USER 10001
ENTRYPOINT ["/myapp"]

Scratch Images

The scratch base is completely empty -- suitable for statically linked binaries:

dockerfile

FROM scratch
COPY --from=builder /app/myapp /myapp
USER 10001:10001
ENTRYPOINT ["/myapp"]

Scratch Limitations

scratch has no CA certificates, no timezone data, no /etc/passwd, and no shell. Your binary must be statically compiled and include all needed data.

Multi-Stage Builds

Multi-stage builds ensure that build tools, source code, and intermediate artifacts never appear in the final image.

Standard Multi-Stage Build

dockerfile

# Stage 1: Build
FROM golang:1.22-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o myapp .

# Stage 2: Runtime (minimal image)
FROM gcr.io/distroless/static:nonroot
COPY --from=builder /app/myapp /myapp
USER 65532:65532
EXPOSE 8080
ENTRYPOINT ["/myapp"]

Why This Matters for Security

Component	Build Stage	Final Image
Go compiler	Present	Absent
Source code	Present	Absent
Build dependencies	Present	Absent
Git history	Present	Absent
Test files	Present	Absent
Final binary	Present	Present

Attack surface reduction: The final image contains only the compiled binary. No compilers, no package managers, no debug tools that an attacker could leverage.

Python Multi-Stage Example

dockerfile

# Stage 1: Build dependencies
FROM python:3.12-slim AS builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir --target=/app/deps -r requirements.txt

# Stage 2: Runtime
FROM python:3.12-slim
RUN groupadd -r appgroup && useradd -r -g appgroup -u 10001 appuser
WORKDIR /app
COPY --from=builder /app/deps /usr/local/lib/python3.12/site-packages/
COPY --chown=appuser:appgroup . .
USER 10001
EXPOSE 8000
CMD ["python", "app.py"]

Non-Root Containers

Running containers as non-root is one of the most effective hardening techniques.

In the Dockerfile

dockerfile

FROM alpine:3.19

# Create a non-root user
RUN addgroup -g 10001 -S appgroup && \
    adduser -u 10001 -S appuser -G appgroup

# Set ownership
COPY --chown=appuser:appgroup . /app
WORKDIR /app

# Switch to non-root user
USER 10001

CMD ["/app/myapp"]

In the Pod Spec (Defense in Depth)

Even if the Dockerfile sets USER, always enforce it in the pod spec:

yaml

apiVersion: v1
kind: Pod
metadata:
  name: non-root-app
spec:
  securityContext:
    runAsUser: 10001
    runAsGroup: 10001
    runAsNonRoot: true            # Reject if image tries to run as root
    fsGroup: 10001
  containers:
  - name: app
    image: myregistry/myapp:v1.0
    securityContext:
      allowPrivilegeEscalation: false
      readOnlyRootFilesystem: true
      capabilities:
        drop:
        - ALL

Double Protection

Setting the user in both the Dockerfile and the pod spec provides defense in depth:

Dockerfile USER: Default user when no securityContext is set
Pod runAsUser: Overrides the image USER, ensures enforcement
Pod runAsNonRoot: Kubernetes-level rejection of root containers

Read-Only Filesystems

A read-only root filesystem prevents attackers from writing malicious files, scripts, or binaries to the container.

Basic Configuration

yaml

apiVersion: v1
kind: Pod
metadata:
  name: readonly-app
spec:
  containers:
  - name: app
    image: nginx:1.25
    securityContext:
      readOnlyRootFilesystem: true
    volumeMounts:
    - name: tmp
      mountPath: /tmp
    - name: var-cache-nginx
      mountPath: /var/cache/nginx
    - name: var-run
      mountPath: /var/run
  volumes:
  - name: tmp
    emptyDir:
      sizeLimit: 64Mi             # Limit temp storage
  - name: var-cache-nginx
    emptyDir:
      sizeLimit: 128Mi
  - name: var-run
    emptyDir:
      sizeLimit: 1Mi

Finding Writable Paths

When making a container read-only, you need to identify which paths the application writes to:

bash

# Run the container without read-only first
kubectl run test-app --image=nginx:1.25 -- sleep 3600

# Find writable directories and recent file modifications
kubectl exec test-app -- find / -writable -type d 2>/dev/null
kubectl exec test-app -- find / -newer /etc/hostname -type f 2>/dev/null

# Common writable paths by application:
# nginx:  /var/cache/nginx, /var/run, /tmp
# node:   /tmp, /home/node/.npm
# python: /tmp, /app/__pycache__
# java:   /tmp, /app/logs

Exam Pattern

A common CKS exam question provides a pod that needs readOnlyRootFilesystem: true. You must:

Add readOnlyRootFilesystem: true to the security context
Identify the writable paths needed by the application
Add emptyDir volumes mounted at those paths
Optionally set sizeLimit on the emptyDir volumes

Resource Limits

Resource limits prevent containers from consuming excessive CPU and memory, which protects against DoS attacks, fork bombs, and resource starvation.

Setting Limits

yaml

apiVersion: v1
kind: Pod
metadata:
  name: limited-app
spec:
  containers:
  - name: app
    image: myapp:v1.0
    resources:
      requests:
        memory: "64Mi"
        cpu: "125m"
      limits:
        memory: "128Mi"           # Hard cap -- OOMKilled if exceeded
        cpu: "500m"               # Throttled if exceeded
        ephemeral-storage: "256Mi" # Evicted if exceeded

What Happens When Limits Are Exceeded

Resource	Behavior When Exceeded
Memory limit	Container is OOMKilled (restarted)
CPU limit	Container is throttled (not killed)
Ephemeral storage	Pod is evicted

Why Limits Matter for Security

Threat	Without Limits	With Limits
Fork bomb	Consumes all node CPU/memory	Contained to limit
Memory leak	Crashes other pods	Only this pod OOMKilled
Crypto mining	Uses all available CPU	Throttled to limit
Log flooding	Fills node disk	Ephemeral storage limit
Resource starvation	Starves other workloads	Isolated

Always Set Resource Limits

A container without resource limits can consume all available resources on a node, affecting every other workload. This is both a security and reliability concern.

LimitRange (Namespace Default Limits)

Enforce default resource limits for all pods in a namespace:

yaml

apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: production
spec:
  limits:
  - default:                      # Default limits
      cpu: "500m"
      memory: "256Mi"
    defaultRequest:                # Default requests
      cpu: "100m"
      memory: "64Mi"
    max:                          # Maximum allowed
      cpu: "2"
      memory: "1Gi"
    min:                          # Minimum required
      cpu: "50m"
      memory: "32Mi"
    type: Container

Complete Hardened Container Example

This combines all hardening techniques into a single, production-ready configuration:

Dockerfile

dockerfile

# Stage 1: Build
FROM golang:1.22-alpine AS builder
RUN apk add --no-cache git ca-certificates
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 \
    go build -ldflags="-w -s" -o /app/server .

# Stage 2: Minimal runtime
FROM gcr.io/distroless/static:nonroot

# Copy only the binary and CA certs
COPY --from=builder /app/server /server
COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/

# Non-root user (distroless:nonroot uses 65532)
USER 65532:65532

EXPOSE 8080
ENTRYPOINT ["/server"]

Pod Specification

yaml

apiVersion: v1
kind: Pod
metadata:
  name: hardened-app
  namespace: production
  labels:
    app: hardened-app
spec:
  automountServiceAccountToken: false
  securityContext:
    runAsUser: 65532
    runAsGroup: 65532
    runAsNonRoot: true
    fsGroup: 65532
    seccompProfile:
      type: RuntimeDefault
  containers:
  - name: app
    image: gcr.io/my-company/hardened-app:v1.0@sha256:abc123...
    ports:
    - containerPort: 8080
      protocol: TCP
    securityContext:
      allowPrivilegeEscalation: false
      readOnlyRootFilesystem: true
      capabilities:
        drop:
        - ALL
    resources:
      requests:
        memory: "64Mi"
        cpu: "100m"
      limits:
        memory: "128Mi"
        cpu: "250m"
        ephemeral-storage: "64Mi"
    volumeMounts:
    - name: tmp
      mountPath: /tmp
    livenessProbe:
      httpGet:
        path: /healthz
        port: 8080
      initialDelaySeconds: 5
      periodSeconds: 10
    readinessProbe:
      httpGet:
        path: /ready
        port: 8080
      initialDelaySeconds: 3
      periodSeconds: 5
  volumes:
  - name: tmp
    emptyDir:
      medium: Memory
      sizeLimit: 32Mi

Hardening Checklist Applied

Technique	Applied?	Detail
Minimal base image	Yes	`distroless/static:nonroot`
Multi-stage build	Yes	Builder stage discarded
Non-root user	Yes	UID 65532, `runAsNonRoot: true`
Read-only filesystem	Yes	`readOnlyRootFilesystem: true`
Writable temp via emptyDir	Yes	Memory-backed, 32Mi limit
Resource limits	Yes	CPU, memory, ephemeral storage
Drop all capabilities	Yes	`drop: ["ALL"]`
No privilege escalation	Yes	`allowPrivilegeEscalation: false`
No SA token mount	Yes	`automountServiceAccountToken: false`
Seccomp profile	Yes	`RuntimeDefault`
Image digest	Yes	`@sha256:abc123...`
Health probes	Yes	Liveness + readiness

Additional Hardening Techniques

Use Image Digests Instead of Tags

yaml

# BAD - Tag can be overwritten with different content
image: nginx:1.25

# GOOD - Digest is immutable
image: nginx@sha256:6926dd802f40e5e7257fded83e0d8030039642e4e10c4a98a6478e9c6be0f536

Disable Service Account Token Auto-Mount

yaml

# Pod level
spec:
  automountServiceAccountToken: false

# Or ServiceAccount level
apiVersion: v1
kind: ServiceAccount
metadata:
  name: my-sa
automountServiceAccountToken: false

Use Seccomp Profiles

yaml

securityContext:
  seccompProfile:
    type: RuntimeDefault           # Default Docker/containerd profile
    # type: Localhost              # Custom profile
    # localhostProfile: profiles/my-profile.json

Network-Level Isolation

yaml

# Deny all traffic, then allow only what's needed
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-all
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress

Scanning Images for Vulnerabilities

While not always a hands-on CKS task, understanding image scanning is important:

bash

# Scan with Trivy
trivy image nginx:1.25

# Scan with specific severity
trivy image --severity HIGH,CRITICAL nginx:1.25

# Scan and fail if vulnerabilities found (CI/CD)
trivy image --exit-code 1 --severity CRITICAL nginx:1.25

Key Principle

Container hardening follows the principle of least privilege: give the container the absolute minimum it needs to function, and nothing more. Every additional capability, package, or permission is a potential attack vector.

Quick Reference

bash

# Check if a container runs as root
kubectl exec <pod> -- id
kubectl exec <pod> -- whoami

# Check if root filesystem is writable
kubectl exec <pod> -- touch /test-write 2>&1

# Check resource limits
kubectl describe pod <pod> | grep -A5 Limits

# Check capabilities
kubectl exec <pod> -- cat /proc/1/status | grep Cap

# Check if SA token is mounted
kubectl exec <pod> -- ls /var/run/secrets/kubernetes.io/serviceaccount/ 2>&1

# List installed packages (if shell is available)
kubectl exec <pod> -- apk list --installed 2>/dev/null  # Alpine
kubectl exec <pod> -- dpkg -l 2>/dev/null               # Debian

# Check image details
kubectl get pod <pod> -o jsonpath='{.spec.containers[0].image}'
kubectl get pod <pod> -o jsonpath='{.status.containerStatuses[0].imageID}'

Container Hardening Best Practices ​

Overview ​

Container Hardening Layers ​

Minimal Base Images ​

Why Base Image Choice Matters ​

Distroless Images ​

Alpine Images ​

Scratch Images ​

Multi-Stage Builds ​

Standard Multi-Stage Build ​

Why This Matters for Security ​

Python Multi-Stage Example ​

Non-Root Containers ​

In the Dockerfile ​

In the Pod Spec (Defense in Depth) ​

Read-Only Filesystems ​

Basic Configuration ​

Finding Writable Paths ​

Resource Limits ​

Setting Limits ​

What Happens When Limits Are Exceeded ​

Why Limits Matter for Security ​

LimitRange (Namespace Default Limits) ​

Complete Hardened Container Example ​

Dockerfile ​

Pod Specification ​

Hardening Checklist Applied ​

Additional Hardening Techniques ​

Use Image Digests Instead of Tags ​

Disable Service Account Token Auto-Mount ​

Use Seccomp Profiles ​

Network-Level Isolation ​

Scanning Images for Vulnerabilities ​

Quick Reference ​

Container Hardening Best Practices

Overview

Container Hardening Layers

Minimal Base Images

Why Base Image Choice Matters

Distroless Images

Alpine Images

Scratch Images

Multi-Stage Builds

Standard Multi-Stage Build

Why This Matters for Security

Python Multi-Stage Example

Non-Root Containers

In the Dockerfile

In the Pod Spec (Defense in Depth)

Read-Only Filesystems

Basic Configuration

Finding Writable Paths

Resource Limits

Setting Limits

What Happens When Limits Are Exceeded

Why Limits Matter for Security

LimitRange (Namespace Default Limits)

Complete Hardened Container Example

Dockerfile

Pod Specification

Hardening Checklist Applied

Additional Hardening Techniques

Use Image Digests Instead of Tags

Disable Service Account Token Auto-Mount

Use Seccomp Profiles

Network-Level Isolation

Scanning Images for Vulnerabilities

Quick Reference