Linux Capabilities

What Are Linux Capabilities?

Traditionally, Linux divided processes into two categories:

Privileged (root, UID 0): Can do everything -- bypass all kernel permission checks
Unprivileged (all other users): Subject to full permission checking

This binary model is too coarse-grained. A web server needs to bind to port 80 (a privileged operation), but it does not need to load kernel modules or reboot the system.

Linux Capabilities decompose the monolithic "root privilege" into distinct units that can be independently granted or revoked. Instead of giving a process full root access, you grant only the specific capabilities it needs.

Key Insight

Capabilities turn the question from "does this process run as root?" to "which specific privileges does this process actually need?" This is the foundation of least-privilege security in containers.

Capabilities vs Running as Root

Linux Capability Categories

All Linux Capabilities

Capability	Purpose	Risk Level
`CAP_AUDIT_WRITE`	Write to the kernel audit log	Low
`CAP_CHOWN`	Change file ownership	Medium
`CAP_DAC_OVERRIDE`	Bypass file read/write/execute permission checks	High
`CAP_DAC_READ_SEARCH`	Bypass file read and directory search permissions	High
`CAP_FOWNER`	Bypass permission checks on operations requiring file ownership	Medium
`CAP_FSETID`	Set setuid/setgid bits on files	Medium
`CAP_KILL`	Send signals to any process	Medium
`CAP_MKNOD`	Create special device files	Medium
`CAP_NET_ADMIN`	Network configuration (interfaces, routing, firewall)	High
`CAP_NET_BIND_SERVICE`	Bind to privileged ports (<1024)	Low
`CAP_NET_RAW`	Use raw and packet sockets (ping, ARP)	Medium
`CAP_SETFCAP`	Set file capabilities	High
`CAP_SETGID`	Manipulate process GID	Medium
`CAP_SETPCAP`	Modify process capabilities	High
`CAP_SETUID`	Manipulate process UID	Medium
`CAP_SYS_ADMIN`	Broad admin ops: mount, namespace, syslog, etc.	Critical
`CAP_SYS_BOOT`	Reboot the system	High
`CAP_SYS_CHROOT`	Use chroot()	Medium
`CAP_SYS_MODULE`	Load/unload kernel modules	Critical
`CAP_SYS_NICE`	Set process scheduling priority	Low
`CAP_SYS_PTRACE`	Trace arbitrary processes (debug/inspect)	Critical
`CAP_SYS_RAWIO`	Raw I/O port access	Critical
`CAP_SYS_RESOURCE`	Override resource limits	Medium
`CAP_SYS_TIME`	Set system clock	Medium
`CAP_SYSLOG`	Perform syslog(2) operations	Low

Default Capabilities in Containers

By default, the container runtime (Docker/containerd) grants a limited set of capabilities to containers. These are a subset of root capabilities, chosen to allow most applications to function without full root:

Default Container Capabilities

Capability	Why It's Included
`CAP_AUDIT_WRITE`	Writing audit logs
`CAP_CHOWN`	Changing file ownership during init
`CAP_DAC_OVERRIDE`	Reading files regardless of permissions
`CAP_FOWNER`	Operating on files owned by others
`CAP_FSETID`	Setting setuid/setgid bits
`CAP_KILL`	Sending signals to child processes
`CAP_MKNOD`	Creating device files
`CAP_NET_BIND_SERVICE`	Binding to ports below 1024
`CAP_NET_RAW`	Raw network sockets (ping)
`CAP_SETFCAP`	Setting file capabilities
`CAP_SETGID`	Switching GID
`CAP_SETPCAP`	Modifying capability sets
`CAP_SETUID`	Switching UID
`CAP_SYS_CHROOT`	Using chroot

Even Defaults Are Too Permissive

The default set includes capabilities like NET_RAW (allows ARP spoofing) and DAC_OVERRIDE (bypasses file permissions). For hardened workloads, you should drop ALL and add back only what's needed.

Viewing Current Capabilities

bash

# Inside a container -- check current process capabilities
cat /proc/1/status | grep Cap

# Decode capability hex values
capsh --decode=00000000a80425fb

# On the host -- check a running container
docker inspect --format '{{.HostConfig.CapAdd}}' <container>
docker inspect --format '{{.HostConfig.CapDrop}}' <container>

Dangerous Capabilities

Some capabilities are especially dangerous and should almost never be granted to containers:

CAP_SYS_ADMIN -- The "New Root"

CAP_SYS_ADMIN

SYS_ADMIN is the most dangerous capability. It grants a vast collection of administrative powers:

Mount/unmount filesystems
Perform clone() with new namespaces
Use setns() to join namespaces
Configure kernel parameters via sysctl()
Perform operations on extended attributes
And many more...

Granting SYS_ADMIN to a container is almost equivalent to running it as privileged. It is one of the most common container escape vectors.

CAP_NET_ADMIN

Allows:
- Modify routing tables
- Configure network interfaces
- Modify firewall rules (iptables)
- Set network QoS parameters
- Configure network bridging

Risk: Network-level attacks, ARP poisoning, traffic interception

CAP_SYS_PTRACE

Allows:
- Trace any process using ptrace()
- Read and modify process memory
- Inject code into running processes
- Bypass seccomp filters on traced processes

Risk: Container escape by tracing host processes, credential theft

CAP_SYS_MODULE

Allows:
- Load arbitrary kernel modules
- Unload kernel modules

Risk: Full kernel compromise by loading malicious modules

Configuring Capabilities in Kubernetes

Capabilities are managed through the securityContext.capabilities field in a container spec.

Dropping ALL Capabilities

The most secure starting point -- drop everything and add back only what's needed:

yaml

apiVersion: v1
kind: Pod
metadata:
  name: minimal-caps
spec:
  containers:
  - name: app
    image: nginx:1.27
    securityContext:
      capabilities:
        drop:
          - ALL
        add:
          - NET_BIND_SERVICE

Dropping Specific Dangerous Capabilities

If dropping ALL is too restrictive for your application:

yaml

apiVersion: v1
kind: Pod
metadata:
  name: safer-pod
spec:
  containers:
  - name: app
    image: nginx:1.27
    securityContext:
      capabilities:
        drop:
          - SYS_ADMIN
          - NET_ADMIN
          - SYS_PTRACE
          - NET_RAW
          - SYS_MODULE

Common Capability Configurations by Workload Type

Web Server (nginx, Apache)

yaml

securityContext:
  capabilities:
    drop:
      - ALL
    add:
      - NET_BIND_SERVICE    # Bind to port 80/443
      - CHOWN               # Set file ownership
      - SETGID              # Switch group
      - SETUID              # Switch user (worker process)

Application Container (Node.js, Python, Java)

yaml

securityContext:
  capabilities:
    drop:
      - ALL
    # No capabilities needed for most apps on non-privileged ports

Network Tool / Debug Container

yaml

securityContext:
  capabilities:
    drop:
      - ALL
    add:
      - NET_RAW            # ping, traceroute
      - NET_ADMIN          # network configuration (use cautiously)

Capability Hierarchy and Inheritance

Complete Examples

Example 1: Hardened Nginx Pod

yaml

apiVersion: v1
kind: Pod
metadata:
  name: nginx-hardened
spec:
  containers:
  - name: nginx
    image: nginx:1.27
    ports:
    - containerPort: 80
    securityContext:
      runAsNonRoot: false        # nginx master needs root initially
      readOnlyRootFilesystem: true
      allowPrivilegeEscalation: false
      capabilities:
        drop:
          - ALL
        add:
          - NET_BIND_SERVICE     # Port 80
          - CHOWN                # File ownership
          - SETGID               # Worker process GID
          - SETUID               # Worker process UID
          - DAC_OVERRIDE         # Access config files
    volumeMounts:
    - name: cache
      mountPath: /var/cache/nginx
    - name: run
      mountPath: /var/run
    - name: tmp
      mountPath: /tmp
  volumes:
  - name: cache
    emptyDir: {}
  - name: run
    emptyDir: {}
  - name: tmp
    emptyDir: {}

Example 2: Minimal Application Pod

yaml

apiVersion: v1
kind: Pod
metadata:
  name: app-minimal
spec:
  securityContext:
    runAsUser: 1000
    runAsGroup: 1000
    runAsNonRoot: true
  containers:
  - name: app
    image: python:3.12-slim
    command: ["python", "-m", "http.server", "8080"]
    ports:
    - containerPort: 8080
    securityContext:
      allowPrivilegeEscalation: false
      readOnlyRootFilesystem: true
      capabilities:
        drop:
          - ALL
    volumeMounts:
    - name: tmp
      mountPath: /tmp
  volumes:
  - name: tmp
    emptyDir: {}

Example 3: Deployment with Capability Restrictions

yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: secure-api
spec:
  replicas: 3
  selector:
    matchLabels:
      app: secure-api
  template:
    metadata:
      labels:
        app: secure-api
    spec:
      securityContext:
        runAsUser: 10001
        runAsGroup: 10001
        runAsNonRoot: true
        fsGroup: 10001
        seccompProfile:
          type: RuntimeDefault
      containers:
      - name: api
        image: myapp:latest
        ports:
        - containerPort: 8080
        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: true
          capabilities:
            drop:
              - ALL
        resources:
          limits:
            memory: "256Mi"
            cpu: "500m"
          requests:
            memory: "128Mi"
            cpu: "250m"

Verifying Capabilities

bash

# Check capabilities of a running container
kubectl exec <pod> -- cat /proc/1/status | grep Cap

# Example output:
# CapInh: 0000000000000000
# CapPrm: 00000000a80425fb
# CapEff: 00000000a80425fb
# CapBnd: 00000000a80425fb
# CapAmb: 0000000000000000

# Decode the hex value
# On the host:
capsh --decode=00000000a80425fb

# Output:
# 0x00000000a80425fb=cap_chown,cap_dac_override,cap_fowner,...

# Check if a specific capability is present
kubectl exec <pod> -- grep Cap /proc/1/status

# For a pod with drop ALL + add NET_BIND_SERVICE:
# CapPrm: 0000000000000400
# capsh --decode=0000000000000400
# 0x0000000000000400=cap_net_bind_service

Best Practices for Minimal Capabilities

Capability Hardening Checklist

Always start with drop: ALL -- then add back only what's needed
Never add SYS_ADMIN -- it's nearly equivalent to running privileged
Avoid NET_RAW unless the app genuinely needs raw sockets (ping)
Set allowPrivilegeEscalation: false -- prevents gaining new capabilities via setuid binaries
Run as non-root (runAsNonRoot: true) -- most apps don't need root
Use readOnlyRootFilesystem: true -- prevents filesystem modification
Test incrementally -- add one capability at a time until the app works
Document why each added capability is needed

How to Determine Required Capabilities

If you are unsure which capabilities your application needs:

Start with drop: ALL and no additions
Run the pod and check if it works
If it fails, check the error messages:
- "Permission denied" on bind() -> needs NET_BIND_SERVICE
- "Operation not permitted" on chown() -> needs CHOWN
- "Permission denied" on raw socket -> needs NET_RAW
Add the minimum capability needed and repeat
Alternatively, use strace or auditd to trace what operations fail

Quick Reference

Exam Speed Reference

Drop ALL and add specific:

yaml

securityContext:
  capabilities:
    drop:
      - ALL
    add:
      - NET_BIND_SERVICE

Key facts:

Capabilities use uppercase names without CAP_ prefix in Kubernetes
drop and add are lists of strings
drop: ["ALL"] drops all capabilities
Capabilities are container-level, not pod-level
allowPrivilegeEscalation: false prevents gaining new capabilities
The ALL keyword is special -- it means all capabilities

Key Exam Takeaways

Always drop ALL capabilities and add back only what's needed
Capabilities in Kubernetes use uppercase names without the CAP_ prefix
SYS_ADMIN is essentially root -- never grant it
Set allowPrivilegeEscalation: false alongside capability restrictions
Capabilities are set at the container level, not pod level
Default containers get ~14 capabilities -- far more than most apps need
Combine capability restrictions with runAsNonRoot and readOnlyRootFilesystem

Linux Capabilities ​

What Are Linux Capabilities? ​

Capabilities vs Running as Root ​

Linux Capability Categories ​

All Linux Capabilities ​

Default Capabilities in Containers ​

Default Container Capabilities ​

Viewing Current Capabilities ​

Dangerous Capabilities ​

CAP_SYS_ADMIN -- The "New Root" ​

CAP_NET_ADMIN ​

CAP_SYS_PTRACE ​

CAP_SYS_MODULE ​

Configuring Capabilities in Kubernetes ​

Dropping ALL Capabilities ​

Dropping Specific Dangerous Capabilities ​

Common Capability Configurations by Workload Type ​

Web Server (nginx, Apache) ​

Application Container (Node.js, Python, Java) ​

Network Tool / Debug Container ​

Capability Hierarchy and Inheritance ​

Complete Examples ​

Example 1: Hardened Nginx Pod ​

Example 2: Minimal Application Pod ​

Example 3: Deployment with Capability Restrictions ​

Verifying Capabilities ​

Best Practices for Minimal Capabilities ​

Quick Reference ​

Linux Capabilities

What Are Linux Capabilities?

Capabilities vs Running as Root

Linux Capability Categories

All Linux Capabilities

Default Capabilities in Containers

Default Container Capabilities

Viewing Current Capabilities

Dangerous Capabilities

CAP_SYS_ADMIN -- The "New Root"

CAP_NET_ADMIN

CAP_SYS_PTRACE

CAP_SYS_MODULE

Configuring Capabilities in Kubernetes

Dropping ALL Capabilities

Dropping Specific Dangerous Capabilities

Common Capability Configurations by Workload Type

Web Server (nginx, Apache)

Application Container (Node.js, Python, Java)

Network Tool / Debug Container

Capability Hierarchy and Inheritance

Complete Examples

Example 1: Hardened Nginx Pod

Example 2: Minimal Application Pod

Example 3: Deployment with Capability Restrictions

Verifying Capabilities

Best Practices for Minimal Capabilities

Quick Reference