Skip to content

Linux Capabilities

What Are Linux Capabilities?

Traditionally, Linux divided processes into two categories:

  • Privileged (root, UID 0): Can do everything -- bypass all kernel permission checks
  • Unprivileged (all other users): Subject to full permission checking

This binary model is too coarse-grained. A web server needs to bind to port 80 (a privileged operation), but it does not need to load kernel modules or reboot the system.

Linux Capabilities decompose the monolithic "root privilege" into distinct units that can be independently granted or revoked. Instead of giving a process full root access, you grant only the specific capabilities it needs.

Key Insight

Capabilities turn the question from "does this process run as root?" to "which specific privileges does this process actually need?" This is the foundation of least-privilege security in containers.

Capabilities vs Running as Root

Linux Capability Categories

All Linux Capabilities

CapabilityPurposeRisk Level
CAP_AUDIT_WRITEWrite to the kernel audit logLow
CAP_CHOWNChange file ownershipMedium
CAP_DAC_OVERRIDEBypass file read/write/execute permission checksHigh
CAP_DAC_READ_SEARCHBypass file read and directory search permissionsHigh
CAP_FOWNERBypass permission checks on operations requiring file ownershipMedium
CAP_FSETIDSet setuid/setgid bits on filesMedium
CAP_KILLSend signals to any processMedium
CAP_MKNODCreate special device filesMedium
CAP_NET_ADMINNetwork configuration (interfaces, routing, firewall)High
CAP_NET_BIND_SERVICEBind to privileged ports (<1024)Low
CAP_NET_RAWUse raw and packet sockets (ping, ARP)Medium
CAP_SETFCAPSet file capabilitiesHigh
CAP_SETGIDManipulate process GIDMedium
CAP_SETPCAPModify process capabilitiesHigh
CAP_SETUIDManipulate process UIDMedium
CAP_SYS_ADMINBroad admin ops: mount, namespace, syslog, etc.Critical
CAP_SYS_BOOTReboot the systemHigh
CAP_SYS_CHROOTUse chroot()Medium
CAP_SYS_MODULELoad/unload kernel modulesCritical
CAP_SYS_NICESet process scheduling priorityLow
CAP_SYS_PTRACETrace arbitrary processes (debug/inspect)Critical
CAP_SYS_RAWIORaw I/O port accessCritical
CAP_SYS_RESOURCEOverride resource limitsMedium
CAP_SYS_TIMESet system clockMedium
CAP_SYSLOGPerform syslog(2) operationsLow

Default Capabilities in Containers

By default, the container runtime (Docker/containerd) grants a limited set of capabilities to containers. These are a subset of root capabilities, chosen to allow most applications to function without full root:

Default Container Capabilities

CapabilityWhy It's Included
CAP_AUDIT_WRITEWriting audit logs
CAP_CHOWNChanging file ownership during init
CAP_DAC_OVERRIDEReading files regardless of permissions
CAP_FOWNEROperating on files owned by others
CAP_FSETIDSetting setuid/setgid bits
CAP_KILLSending signals to child processes
CAP_MKNODCreating device files
CAP_NET_BIND_SERVICEBinding to ports below 1024
CAP_NET_RAWRaw network sockets (ping)
CAP_SETFCAPSetting file capabilities
CAP_SETGIDSwitching GID
CAP_SETPCAPModifying capability sets
CAP_SETUIDSwitching UID
CAP_SYS_CHROOTUsing chroot

Even Defaults Are Too Permissive

The default set includes capabilities like NET_RAW (allows ARP spoofing) and DAC_OVERRIDE (bypasses file permissions). For hardened workloads, you should drop ALL and add back only what's needed.

Viewing Current Capabilities

bash
# Inside a container -- check current process capabilities
cat /proc/1/status | grep Cap

# Decode capability hex values
capsh --decode=00000000a80425fb

# On the host -- check a running container
docker inspect --format '{{.HostConfig.CapAdd}}' <container>
docker inspect --format '{{.HostConfig.CapDrop}}' <container>

Dangerous Capabilities

Some capabilities are especially dangerous and should almost never be granted to containers:

CAP_SYS_ADMIN -- The "New Root"

CAP_SYS_ADMIN

SYS_ADMIN is the most dangerous capability. It grants a vast collection of administrative powers:

  • Mount/unmount filesystems
  • Perform clone() with new namespaces
  • Use setns() to join namespaces
  • Configure kernel parameters via sysctl()
  • Perform operations on extended attributes
  • And many more...

Granting SYS_ADMIN to a container is almost equivalent to running it as privileged. It is one of the most common container escape vectors.

CAP_NET_ADMIN

Allows:
- Modify routing tables
- Configure network interfaces
- Modify firewall rules (iptables)
- Set network QoS parameters
- Configure network bridging

Risk: Network-level attacks, ARP poisoning, traffic interception

CAP_SYS_PTRACE

Allows:
- Trace any process using ptrace()
- Read and modify process memory
- Inject code into running processes
- Bypass seccomp filters on traced processes

Risk: Container escape by tracing host processes, credential theft

CAP_SYS_MODULE

Allows:
- Load arbitrary kernel modules
- Unload kernel modules

Risk: Full kernel compromise by loading malicious modules

Configuring Capabilities in Kubernetes

Capabilities are managed through the securityContext.capabilities field in a container spec.

Dropping ALL Capabilities

The most secure starting point -- drop everything and add back only what's needed:

yaml
apiVersion: v1
kind: Pod
metadata:
  name: minimal-caps
spec:
  containers:
  - name: app
    image: nginx:1.27
    securityContext:
      capabilities:
        drop:
          - ALL
        add:
          - NET_BIND_SERVICE

Dropping Specific Dangerous Capabilities

If dropping ALL is too restrictive for your application:

yaml
apiVersion: v1
kind: Pod
metadata:
  name: safer-pod
spec:
  containers:
  - name: app
    image: nginx:1.27
    securityContext:
      capabilities:
        drop:
          - SYS_ADMIN
          - NET_ADMIN
          - SYS_PTRACE
          - NET_RAW
          - SYS_MODULE

Common Capability Configurations by Workload Type

Web Server (nginx, Apache)

yaml
securityContext:
  capabilities:
    drop:
      - ALL
    add:
      - NET_BIND_SERVICE    # Bind to port 80/443
      - CHOWN               # Set file ownership
      - SETGID              # Switch group
      - SETUID              # Switch user (worker process)

Application Container (Node.js, Python, Java)

yaml
securityContext:
  capabilities:
    drop:
      - ALL
    # No capabilities needed for most apps on non-privileged ports

Network Tool / Debug Container

yaml
securityContext:
  capabilities:
    drop:
      - ALL
    add:
      - NET_RAW            # ping, traceroute
      - NET_ADMIN          # network configuration (use cautiously)

Capability Hierarchy and Inheritance

Complete Examples

Example 1: Hardened Nginx Pod

yaml
apiVersion: v1
kind: Pod
metadata:
  name: nginx-hardened
spec:
  containers:
  - name: nginx
    image: nginx:1.27
    ports:
    - containerPort: 80
    securityContext:
      runAsNonRoot: false        # nginx master needs root initially
      readOnlyRootFilesystem: true
      allowPrivilegeEscalation: false
      capabilities:
        drop:
          - ALL
        add:
          - NET_BIND_SERVICE     # Port 80
          - CHOWN                # File ownership
          - SETGID               # Worker process GID
          - SETUID               # Worker process UID
          - DAC_OVERRIDE         # Access config files
    volumeMounts:
    - name: cache
      mountPath: /var/cache/nginx
    - name: run
      mountPath: /var/run
    - name: tmp
      mountPath: /tmp
  volumes:
  - name: cache
    emptyDir: {}
  - name: run
    emptyDir: {}
  - name: tmp
    emptyDir: {}

Example 2: Minimal Application Pod

yaml
apiVersion: v1
kind: Pod
metadata:
  name: app-minimal
spec:
  securityContext:
    runAsUser: 1000
    runAsGroup: 1000
    runAsNonRoot: true
  containers:
  - name: app
    image: python:3.12-slim
    command: ["python", "-m", "http.server", "8080"]
    ports:
    - containerPort: 8080
    securityContext:
      allowPrivilegeEscalation: false
      readOnlyRootFilesystem: true
      capabilities:
        drop:
          - ALL
    volumeMounts:
    - name: tmp
      mountPath: /tmp
  volumes:
  - name: tmp
    emptyDir: {}

Example 3: Deployment with Capability Restrictions

yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: secure-api
spec:
  replicas: 3
  selector:
    matchLabels:
      app: secure-api
  template:
    metadata:
      labels:
        app: secure-api
    spec:
      securityContext:
        runAsUser: 10001
        runAsGroup: 10001
        runAsNonRoot: true
        fsGroup: 10001
        seccompProfile:
          type: RuntimeDefault
      containers:
      - name: api
        image: myapp:latest
        ports:
        - containerPort: 8080
        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: true
          capabilities:
            drop:
              - ALL
        resources:
          limits:
            memory: "256Mi"
            cpu: "500m"
          requests:
            memory: "128Mi"
            cpu: "250m"

Verifying Capabilities

bash
# Check capabilities of a running container
kubectl exec <pod> -- cat /proc/1/status | grep Cap

# Example output:
# CapInh: 0000000000000000
# CapPrm: 00000000a80425fb
# CapEff: 00000000a80425fb
# CapBnd: 00000000a80425fb
# CapAmb: 0000000000000000

# Decode the hex value
# On the host:
capsh --decode=00000000a80425fb

# Output:
# 0x00000000a80425fb=cap_chown,cap_dac_override,cap_fowner,...

# Check if a specific capability is present
kubectl exec <pod> -- grep Cap /proc/1/status

# For a pod with drop ALL + add NET_BIND_SERVICE:
# CapPrm: 0000000000000400
# capsh --decode=0000000000000400
# 0x0000000000000400=cap_net_bind_service

Best Practices for Minimal Capabilities

Capability Hardening Checklist

  1. Always start with drop: ALL -- then add back only what's needed
  2. Never add SYS_ADMIN -- it's nearly equivalent to running privileged
  3. Avoid NET_RAW unless the app genuinely needs raw sockets (ping)
  4. Set allowPrivilegeEscalation: false -- prevents gaining new capabilities via setuid binaries
  5. Run as non-root (runAsNonRoot: true) -- most apps don't need root
  6. Use readOnlyRootFilesystem: true -- prevents filesystem modification
  7. Test incrementally -- add one capability at a time until the app works
  8. Document why each added capability is needed
How to Determine Required Capabilities

If you are unsure which capabilities your application needs:

  1. Start with drop: ALL and no additions
  2. Run the pod and check if it works
  3. If it fails, check the error messages:
    • "Permission denied" on bind() -> needs NET_BIND_SERVICE
    • "Operation not permitted" on chown() -> needs CHOWN
    • "Permission denied" on raw socket -> needs NET_RAW
  4. Add the minimum capability needed and repeat
  5. Alternatively, use strace or auditd to trace what operations fail

Quick Reference

Exam Speed Reference

Drop ALL and add specific:

yaml
securityContext:
  capabilities:
    drop:
      - ALL
    add:
      - NET_BIND_SERVICE

Key facts:

  • Capabilities use uppercase names without CAP_ prefix in Kubernetes
  • drop and add are lists of strings
  • drop: ["ALL"] drops all capabilities
  • Capabilities are container-level, not pod-level
  • allowPrivilegeEscalation: false prevents gaining new capabilities
  • The ALL keyword is special -- it means all capabilities

Key Exam Takeaways

  1. Always drop ALL capabilities and add back only what's needed
  2. Capabilities in Kubernetes use uppercase names without the CAP_ prefix
  3. SYS_ADMIN is essentially root -- never grant it
  4. Set allowPrivilegeEscalation: false alongside capability restrictions
  5. Capabilities are set at the container level, not pod level
  6. Default containers get ~14 capabilities -- far more than most apps need
  7. Combine capability restrictions with runAsNonRoot and readOnlyRootFilesystem

Released under the MIT License.