Skip to content

Host Namespaces and Privileges

Overview

Linux namespaces are the fundamental isolation mechanism that makes containers possible. Each namespace type provides a separate view of a specific system resource. Kubernetes allows pods to opt out of this isolation and share namespaces with the host -- a powerful but extremely dangerous configuration.

This section covers the risks of sharing host namespaces, running privileged containers, and the security controls you must enforce to prevent these attack vectors.

Linux Namespaces in Containers

hostNetwork

When hostNetwork: true is set, the pod uses the host's network namespace instead of getting its own.

What It Enables

  • Pod sees all network interfaces on the host (eth0, docker0, cni0, etc.)
  • Pod can bind to any host port directly
  • Pod can see all network traffic on the host
  • Pod has the same IP address as the host node
  • Pod can access node-level services listening on localhost

The Risk

yaml
# DANGEROUS: Pod with host network access
apiVersion: v1
kind: Pod
metadata:
  name: host-network-pod
spec:
  hostNetwork: true   # Shares the host's network namespace
  containers:
  - name: attacker
    image: nicolaka/netshoot
    command: ["sleep", "infinity"]

Why hostNetwork Is Dangerous

A container with hostNetwork: true can:

  • Sniff all network traffic on the node (including other pods)
  • Access the kubelet API on localhost:10250
  • Access the metadata service (cloud provider instance metadata)
  • Bind to any port on the host, potentially impersonating services
  • Bypass NetworkPolicies (which operate on pod IPs, not host IPs)
  • Access etcd if running on a control plane node (localhost:2379)

Legitimate Use Cases

Only a few workloads genuinely need hostNetwork:

  • CNI plugins (Calico, Cilium, Flannel) -- they configure the host network
  • kube-proxy -- manages iptables rules on the host
  • Ingress controllers -- sometimes need direct host port access
  • Monitoring agents -- node-level network metrics

hostPID

When hostPID: true is set, the pod shares the host's PID namespace.

What It Enables

  • Pod can see all processes running on the host
  • Pod can see processes in other containers
  • Pod can send signals to host processes (with appropriate capabilities)
  • Pod can read /proc/<pid>/ of host processes

The Risk

yaml
# DANGEROUS: Pod with host PID namespace
apiVersion: v1
kind: Pod
metadata:
  name: host-pid-pod
spec:
  hostPID: true   # Shares the host's PID namespace
  containers:
  - name: attacker
    image: busybox
    command: ["sleep", "infinity"]

Why hostPID Is Dangerous

A container with hostPID: true can:

  • List all host processes: ps aux shows everything on the node
  • Read environment variables of other processes: cat /proc/<pid>/environ (may contain secrets)
  • Read process memory (with SYS_PTRACE capability)
  • Send signals to host processes: kill -9 <host-pid>
  • Access /proc filesystem entries for kubelet, dockerd, etcd

Demonstrating the Risk

bash
# Inside a pod with hostPID: true
# See ALL host processes
ps aux

# Read environment variables of process 1 (systemd/init)
cat /proc/1/environ | tr '\0' '\n'

# See kubelet's command line arguments (may expose tokens)
cat /proc/$(pgrep kubelet)/cmdline | tr '\0' ' '

hostIPC

When hostIPC: true is set, the pod shares the host's IPC namespace.

What It Enables

  • Pod can access host shared memory segments
  • Pod can access host semaphores and message queues
  • Pod can communicate with host processes via System V IPC

The Risk

yaml
# DANGEROUS: Pod with host IPC namespace
apiVersion: v1
kind: Pod
metadata:
  name: host-ipc-pod
spec:
  hostIPC: true   # Shares the host's IPC namespace
  containers:
  - name: attacker
    image: busybox
    command: ["sleep", "infinity"]

Why hostIPC Is Dangerous

A container with hostIPC: true can:

  • Read shared memory of host processes (may contain sensitive data)
  • Interfere with host IPC mechanisms
  • Access databases that use shared memory (PostgreSQL, Oracle)
  • Enable side-channel attacks through shared memory inspection

Privileged Containers

Setting privileged: true is the most dangerous configuration possible. It effectively removes all container isolation.

What Privileged Mode Grants

FeatureNormal ContainerPrivileged Container
Capabilities~14 defaultALL capabilities
Device AccessNoneAccess to ALL host devices
AppArmorEnforcedDisabled
SeccompRuntimeDefaultDisabled
/procMasked pathsFull access
/sysRead-onlyRead-write
SELinuxEnforcedUnconfined
CgroupsEnforcedCan modify
yaml
# EXTREMELY DANGEROUS: Privileged container
apiVersion: v1
kind: Pod
metadata:
  name: privileged-pod
spec:
  containers:
  - name: root-access
    image: ubuntu:22.04
    securityContext:
      privileged: true   # Full host access
    command: ["sleep", "infinity"]

Why Privileged Containers Are Effectively Root on the Host

A privileged container can:

  • Mount the host filesystem: mount /dev/sda1 /mnt -- read/write everything on the host
  • Load kernel modules: insmod malicious.ko -- run code in the kernel
  • Access all devices: /dev/mem, /dev/sda -- raw disk and memory access
  • Modify iptables: change firewall rules, redirect traffic
  • Escape the container: trivially break out to the host
  • Compromise the entire cluster: pivot to other nodes via the kubelet

There is almost never a legitimate reason to run a privileged container in production.

Container Escape from Privileged Pod

This demonstrates why privileged containers are so dangerous:

bash
# Inside a privileged container -- escape to host filesystem
mkdir -p /mnt/host
mount /dev/sda1 /mnt/host

# Now you can read/write the entire host filesystem
cat /mnt/host/etc/shadow
cat /mnt/host/etc/kubernetes/admin.conf

# Or use nsenter to get a host shell
nsenter --target 1 --mount --uts --ipc --net --pid -- /bin/bash
# You are now running as root on the host

ProcMount Settings

The /proc filesystem exposes kernel and process information. By default, Kubernetes masks sensitive paths within /proc.

Default vs Unmasked

yaml
# Default (masked) - safe
securityContext:
  procMount: Default
  # Masked paths: /proc/acpi, /proc/kcore, /proc/keys,
  # /proc/latency_stats, /proc/timer_list, /proc/timer_stats,
  # /proc/sched_debug, /proc/scsi

# Unmasked - dangerous, exposes all of /proc
securityContext:
  procMount: Unmasked

WARNING

procMount: Unmasked should almost never be used. It exposes sensitive kernel information that can aid in container escapes and privilege escalation.

Read-Only Root Filesystem

Setting readOnlyRootFilesystem: true prevents the container from writing to its root filesystem.

Why It Matters

  • Prevents attackers from modifying binaries in the container
  • Blocks web shell drops and malware installation
  • Forces applications to use designated writable volumes (emptyDir, etc.)
  • Enforces immutable infrastructure principles
yaml
apiVersion: v1
kind: Pod
metadata:
  name: readonly-pod
spec:
  containers:
  - name: app
    image: nginx:1.27
    securityContext:
      readOnlyRootFilesystem: true
    volumeMounts:
    - name: cache
      mountPath: /var/cache/nginx
    - name: run
      mountPath: /var/run
    - name: tmp
      mountPath: /tmp
  volumes:
  - name: cache
    emptyDir: {}
  - name: run
    emptyDir: {}
  - name: tmp
    emptyDir: {}

Handling Read-Only Filesystem

When readOnlyRootFilesystem: true is set, applications that need to write temporary files will fail. The solution is to mount emptyDir volumes at the paths where the application needs to write (typically /tmp, /var/run, /var/cache).

Running as Non-Root

Running containers as non-root is one of the most important security practices.

runAsNonRoot

This tells the kubelet to validate that the container does not run as root (UID 0). If the container image is configured to run as root, the pod will fail to start.

yaml
apiVersion: v1
kind: Pod
metadata:
  name: nonroot-pod
spec:
  securityContext:
    runAsNonRoot: true    # Reject if UID is 0
  containers:
  - name: app
    image: nginx:1.27     # This will FAIL -- nginx runs as root by default

runAsUser and runAsGroup

Explicitly set the UID and GID:

yaml
apiVersion: v1
kind: Pod
metadata:
  name: specific-user-pod
spec:
  securityContext:
    runAsUser: 1000       # Run as UID 1000
    runAsGroup: 1000      # Run as GID 1000
    fsGroup: 1000         # Files created will have this GID
    runAsNonRoot: true    # Additional validation
  containers:
  - name: app
    image: python:3.12-slim
    command: ["python", "-m", "http.server", "8080"]
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
          - ALL

allowPrivilegeEscalation

This controls whether a process can gain more privileges than its parent:

yaml
securityContext:
  allowPrivilegeEscalation: false

When set to false:

  • Setuid binaries cannot escalate privileges
  • no_new_privs flag is set on the process
  • The process cannot gain capabilities beyond what it started with

Always Set to false

Unless your application specifically requires setuid binaries (extremely rare), always set allowPrivilegeEscalation: false. This is required by the Restricted Pod Security Standard.

Host Namespace Sharing Risks

Pod Security Standards Summary

Pod Security Standards define three levels that restrict these dangerous configurations. This is covered in detail in the Pod Security Standards section.

SettingPrivilegedBaselineRestricted
hostNetworkAllowedDeniedDenied
hostPIDAllowedDeniedDenied
hostIPCAllowedDeniedDenied
privilegedAllowedDeniedDenied
hostPortsAllowedLimitedLimited
runAsNonRootNot requiredNot requiredRequired
allowPrivilegeEscalationAllowedAllowedMust be false
capabilitiesAnyLimited dropsDrop ALL, limited adds
readOnlyRootFilesystemNot requiredNot requiredNot required*
seccompProfileAnyAnyRuntimeDefault or Localhost

Note: readOnlyRootFilesystem is recommended but not strictly required by the Restricted standard.

Complete Hardened Pod Example

This example combines all the host-level restrictions:

yaml
apiVersion: v1
kind: Pod
metadata:
  name: fully-hardened-pod
  labels:
    app: secure-app
spec:
  # Pod-level security
  securityContext:
    runAsUser: 10001
    runAsGroup: 10001
    runAsNonRoot: true
    fsGroup: 10001
    seccompProfile:
      type: RuntimeDefault
  
  # Explicitly deny host namespace sharing
  hostNetwork: false
  hostPID: false
  hostIPC: false
  
  # Prevent service account token automounting
  automountServiceAccountToken: false
  
  containers:
  - name: app
    image: python:3.12-slim
    command: ["python", "-m", "http.server", "8080"]
    ports:
    - containerPort: 8080
    
    # Container-level security
    securityContext:
      privileged: false
      allowPrivilegeEscalation: false
      readOnlyRootFilesystem: true
      capabilities:
        drop:
          - ALL
    
    # Resource limits (prevent DoS)
    resources:
      limits:
        memory: "128Mi"
        cpu: "250m"
      requests:
        memory: "64Mi"
        cpu: "100m"
    
    # Writable volumes for app needs
    volumeMounts:
    - name: tmp
      mountPath: /tmp
  
  volumes:
  - name: tmp
    emptyDir:
      sizeLimit: "50Mi"

Identifying Risky Pods

Finding Pods with Dangerous Settings

bash
# Find pods with hostNetwork
kubectl get pods -A -o json | \
  jq -r '.items[] | select(.spec.hostNetwork==true) | 
  "\(.metadata.namespace)/\(.metadata.name)"'

# Find pods with hostPID
kubectl get pods -A -o json | \
  jq -r '.items[] | select(.spec.hostPID==true) | 
  "\(.metadata.namespace)/\(.metadata.name)"'

# Find privileged containers
kubectl get pods -A -o json | \
  jq -r '.items[] | select(.spec.containers[].securityContext.privileged==true) | 
  "\(.metadata.namespace)/\(.metadata.name)"'

# Find containers running as root
kubectl get pods -A -o json | \
  jq -r '.items[] | select(.spec.securityContext.runAsNonRoot!=true) | 
  "\(.metadata.namespace)/\(.metadata.name)"'

# Combined: find all pods with any dangerous setting
kubectl get pods -A -o json | jq -r '
  .items[] | 
  select(
    .spec.hostNetwork==true or 
    .spec.hostPID==true or 
    .spec.hostIPC==true or 
    (.spec.containers[].securityContext.privileged==true)
  ) | 
  "\(.metadata.namespace)/\(.metadata.name)"'

Quick Reference

Exam Speed Reference

Deny all host access:

yaml
spec:
  hostNetwork: false
  hostPID: false
  hostIPC: false
  containers:
  - name: app
    securityContext:
      privileged: false
      allowPrivilegeEscalation: false
      runAsNonRoot: true
      readOnlyRootFilesystem: true
      capabilities:
        drop: ["ALL"]

Find risky pods:

bash
# Privileged pods
kubectl get pods -A -o json | jq '.items[] | select(.spec.containers[].securityContext.privileged==true) | .metadata.name'

# Host network pods
kubectl get pods -A -o json | jq '.items[] | select(.spec.hostNetwork==true) | .metadata.name'

Key Exam Takeaways

  1. hostNetwork, hostPID, hostIPC should be false for all workloads unless absolutely necessary
  2. privileged: true is the most dangerous setting -- it removes all container isolation
  3. readOnlyRootFilesystem: true prevents filesystem modification attacks
  4. runAsNonRoot: true ensures the container does not run as root
  5. allowPrivilegeEscalation: false prevents gaining additional privileges
  6. These settings are enforced by Pod Security Standards at the namespace level
  7. On the exam, you may need to identify and fix pods with dangerous settings

Released under the MIT License.