Skip to content

Seccomp Profiles

What Is Seccomp?

Seccomp (Secure Computing Mode) is a Linux kernel feature that restricts the system calls (syscalls) a process can make. Every interaction between a userspace program and the kernel happens through syscalls -- file operations, network access, process management, and more. Seccomp allows you to define a filter that specifies exactly which syscalls are permitted.

There are over 300 syscalls in the Linux kernel, but most applications need only a small subset. By blocking unused syscalls, seccomp dramatically reduces the kernel attack surface available to a compromised container.

Seccomp vs AppArmor

  • AppArmor restricts what resources a process can access (files, network, capabilities)
  • Seccomp restricts which kernel interfaces (syscalls) a process can invoke
  • They are complementary -- use both for defense in depth
  • Seccomp operates at a lower level, closer to the kernel

How Seccomp Works

Seccomp Profile Types in Kubernetes

Kubernetes supports three seccomp profile types:

TypeDescriptionSecurity Level
RuntimeDefaultUses the container runtime's built-in default profileGood baseline security
LocalhostUses a custom profile stored on the nodeMaximum control
UnconfinedNo seccomp filtering appliedNo protection

Default Behavior

By default (without any seccomp configuration), Kubernetes runs containers as Unconfined -- no syscall filtering is applied. Always explicitly set a seccomp profile.

The Default Docker/containerd Seccomp Profile

The container runtime (Docker, containerd, CRI-O) ships with a default seccomp profile that blocks approximately 44 syscalls out of 300+. This profile is a sensible baseline that blocks dangerous syscalls while allowing most applications to function normally.

Syscalls Blocked by the Default Profile

Blocked SyscallWhy It's Dangerous
kexec_loadLoad a new kernel -- full system compromise
mountMount filesystems -- escape container boundaries
umount2Unmount filesystems
ptraceTrace/debug processes -- read memory, inject code
rebootReboot the host system
setnsJoin another namespace -- container escape
unshareCreate new namespaces
clone (with CLONE_NEWUSER)Create user namespaces
keyctlKernel keyring manipulation
bpfLoad BPF programs into kernel
add_keyAdd keys to kernel keyring
request_keyRequest keys from kernel keyring
init_moduleLoad kernel modules
delete_moduleUnload kernel modules
acctProcess accounting -- information disclosure
swapon / swapoffModify swap -- DoS potential

Syscalls Allowed by the Default Profile

Common syscalls that applications need and the default profile allows:

Allowed SyscallPurpose
read / writeFile and network I/O
open / openatOpen files
closeClose file descriptors
stat / fstatFile metadata
mmap / mprotectMemory management
brkHeap management
socket / connect / bindNetwork operations
fork / execveProcess creation
getpid / getuidProcess/user info
ioctlDevice control

Applying Seccomp via SecurityContext

Using RuntimeDefault

The simplest and most common approach -- applies the container runtime's built-in seccomp profile:

yaml
apiVersion: v1
kind: Pod
metadata:
  name: seccomp-runtime-default
spec:
  containers:
  - name: app
    image: nginx:1.27
    securityContext:
      seccompProfile:
        type: RuntimeDefault

Pod-Level vs Container-Level

You can set seccomp at the pod level (applies to all containers) or at the container level (applies to specific containers). Container-level settings override pod-level.

yaml
spec:
  securityContext:
    seccompProfile:
      type: RuntimeDefault    # Pod level -- applies to all containers
  containers:
  - name: app
    image: nginx:1.27
    securityContext:
      seccompProfile:
        type: Localhost        # Container level -- overrides pod level
        localhostProfile: profiles/custom.json

Using a Custom Localhost Profile

yaml
apiVersion: v1
kind: Pod
metadata:
  name: seccomp-custom
spec:
  containers:
  - name: app
    image: nginx:1.27
    securityContext:
      seccompProfile:
        type: Localhost
        localhostProfile: profiles/nginx-strict.json

Localhost Profile Path

The localhostProfile path is relative to the kubelet's configured seccomp profile root directory, which defaults to:

/var/lib/kubelet/seccomp/

So profiles/nginx-strict.json resolves to:

/var/lib/kubelet/seccomp/profiles/nginx-strict.json
yaml
apiVersion: v1
kind: Pod
metadata:
  name: seccomp-unconfined
spec:
  containers:
  - name: app
    image: nginx:1.27
    securityContext:
      seccompProfile:
        type: Unconfined

Creating Custom Seccomp Profiles

Custom seccomp profiles are JSON files that define exactly which syscalls to allow or deny.

JSON Profile Structure

json
{
  "defaultAction": "SCMP_ACT_ERRNO",
  "architectures": [
    "SCMP_ARCH_X86_64",
    "SCMP_ARCH_X86",
    "SCMP_ARCH_X32"
  ],
  "syscalls": [
    {
      "names": [
        "accept4",
        "bind",
        "close",
        "connect",
        "epoll_create1",
        "epoll_ctl",
        "epoll_wait",
        "execve",
        "exit",
        "exit_group",
        "fstat",
        "futex",
        "getpid",
        "getuid",
        "listen",
        "mmap",
        "mprotect",
        "munmap",
        "openat",
        "read",
        "recvfrom",
        "rt_sigaction",
        "sendto",
        "socket",
        "write"
      ],
      "action": "SCMP_ACT_ALLOW"
    }
  ]
}

Profile Actions

ActionBehavior
SCMP_ACT_ALLOWAllow the syscall
SCMP_ACT_ERRNOBlock the syscall, return error to caller
SCMP_ACT_KILLImmediately kill the process
SCMP_ACT_KILL_PROCESSKill the entire process (not just thread)
SCMP_ACT_TRAPSend SIGSYS signal to the process
SCMP_ACT_LOGAllow the syscall but log it (for auditing)

Strategy: Allowlist vs Denylist

Allowlist (Recommended)

Set defaultAction to SCMP_ACT_ERRNO (block everything by default) and explicitly allow needed syscalls. This is the most secure approach.

json
{
  "defaultAction": "SCMP_ACT_ERRNO",
  "syscalls": [
    {
      "names": ["read", "write", "exit", "..."],
      "action": "SCMP_ACT_ALLOW"
    }
  ]
}

Denylist (Less Secure)

Set defaultAction to SCMP_ACT_ALLOW (allow everything by default) and explicitly block dangerous syscalls. Easier to create but less secure.

json
{
  "defaultAction": "SCMP_ACT_ALLOW",
  "syscalls": [
    {
      "names": ["ptrace", "mount", "kexec_load"],
      "action": "SCMP_ACT_ERRNO"
    }
  ]
}

Complete Profile Examples

Profile: Block All Network Syscalls

json
{
  "defaultAction": "SCMP_ACT_ALLOW",
  "syscalls": [
    {
      "names": [
        "socket",
        "connect",
        "bind",
        "listen",
        "accept",
        "accept4",
        "sendto",
        "recvfrom",
        "sendmsg",
        "recvmsg",
        "shutdown",
        "getsockname",
        "getpeername",
        "setsockopt",
        "getsockopt"
      ],
      "action": "SCMP_ACT_ERRNO"
    }
  ]
}

Profile: Audit Mode (Log Everything)

This is useful for discovering which syscalls your application needs:

json
{
  "defaultAction": "SCMP_ACT_LOG",
  "syscalls": [
    {
      "names": [
        "read",
        "write",
        "exit",
        "exit_group",
        "rt_sigreturn"
      ],
      "action": "SCMP_ACT_ALLOW"
    }
  ]
}

Profile: Strict Web Server

json
{
  "defaultAction": "SCMP_ACT_ERRNO",
  "architectures": [
    "SCMP_ARCH_X86_64"
  ],
  "syscalls": [
    {
      "names": [
        "accept4",
        "access",
        "arch_prctl",
        "bind",
        "brk",
        "clone",
        "close",
        "connect",
        "dup2",
        "epoll_create1",
        "epoll_ctl",
        "epoll_wait",
        "eventfd2",
        "execve",
        "exit",
        "exit_group",
        "fchown",
        "fcntl",
        "fstat",
        "futex",
        "getdents64",
        "getpid",
        "getppid",
        "getuid",
        "ioctl",
        "listen",
        "lseek",
        "madvise",
        "mmap",
        "mprotect",
        "munmap",
        "nanosleep",
        "newfstatat",
        "openat",
        "pipe2",
        "pread64",
        "prlimit64",
        "read",
        "recvfrom",
        "recvmsg",
        "rt_sigaction",
        "rt_sigprocmask",
        "rt_sigreturn",
        "sendmsg",
        "sendto",
        "set_robust_list",
        "set_tid_address",
        "setgid",
        "setgroups",
        "setuid",
        "setsockopt",
        "socket",
        "stat",
        "uname",
        "write",
        "writev"
      ],
      "action": "SCMP_ACT_ALLOW"
    }
  ]
}

Deploying Custom Seccomp Profiles

Step 1: Place the Profile on the Node

bash
# SSH to the node
ssh node01

# Create the seccomp profiles directory
sudo mkdir -p /var/lib/kubelet/seccomp/profiles

# Create the profile
sudo tee /var/lib/kubelet/seccomp/profiles/block-network.json << 'EOF'
{
  "defaultAction": "SCMP_ACT_ALLOW",
  "syscalls": [
    {
      "names": ["socket", "connect", "bind", "listen", "accept", "accept4", "sendto", "recvfrom"],
      "action": "SCMP_ACT_ERRNO"
    }
  ]
}
EOF

Step 2: Create the Pod

yaml
apiVersion: v1
kind: Pod
metadata:
  name: no-network-pod
spec:
  nodeName: node01
  containers:
  - name: app
    image: busybox:1.36
    command: ["sh", "-c", "echo 'Network blocked' && sleep 3600"]
    securityContext:
      seccompProfile:
        type: Localhost
        localhostProfile: profiles/block-network.json

Step 3: Verify

bash
# Apply and check
kubectl apply -f no-network-pod.yaml
kubectl get pod no-network-pod

# Test network -- should fail
kubectl exec no-network-pod -- wget -qO- http://google.com
# Expected: wget: can't connect

# Test file access -- should work
kubectl exec no-network-pod -- ls /
# Expected: normal output

Auditing Syscalls a Container Makes

To create an effective seccomp profile, you need to know which syscalls your application actually uses.

Method 1: Using strace

bash
# Run the container with strace
docker run --rm -it --security-opt seccomp=unconfined \
  strace -cf -o /dev/stderr nginx -g 'daemon off;' 2>&1 | head -50

# Or trace a running container
# Get the PID of the container process
docker inspect --format '{{.State.Pid}}' <container-id>
strace -cf -p <pid>

Method 2: Using the Audit/Log Profile

Create a seccomp profile that logs all syscalls:

json
{
  "defaultAction": "SCMP_ACT_LOG"
}

Then check the audit log:

bash
# After running the container with the log profile
sudo grep SECCOMP /var/log/audit/audit.log
# or
sudo dmesg | grep SECCOMP

# Extract unique syscall numbers
sudo grep SECCOMP /var/log/audit/audit.log | \
  grep -oP 'syscall=\K\d+' | sort -u

Method 3: Using Security Profiles Operator

The Security Profiles Operator (SPO) can automatically generate seccomp profiles by recording syscalls:

yaml
apiVersion: security-profiles-operator.x-k8s.io/v1alpha1
kind: SeccompProfile
metadata:
  name: nginx-profile
  namespace: default
spec:
  defaultAction: SCMP_ACT_ERRNO
  architectures:
    - SCMP_ARCH_X86_64
  syscalls:
    - action: SCMP_ACT_ALLOW
      names:
        - accept4
        - bind
        - close
        - epoll_ctl
        - listen
        - openat
        - read
        - socket
        - write

Exam Tip

The Security Profiles Operator is not typically available in the exam environment, but understanding it demonstrates depth of knowledge. For the exam, focus on manually placing JSON profiles on nodes and referencing them via securityContext.

Seccomp Syscall Filtering Architecture

Common Syscalls Reference

This table lists syscalls frequently referenced in CKS exam scenarios:

SyscallCategoryPurposeRisk if Unrestricted
mountFilesystemMount filesystemsContainer escape via host mounts
umount2FilesystemUnmount filesystemsDisrupt host filesystem
ptraceProcessTrace/debug processesRead process memory, inject code
kexec_loadSystemLoad new kernelFull system takeover
rebootSystemReboot hostDenial of service
setnsNamespaceJoin a namespaceContainer escape
unshareNamespaceCreate new namespacePrivilege escalation
bpfKernelLoad BPF programsKernel-level code execution
init_moduleKernelLoad kernel modulesArbitrary kernel code
cloneProcessCreate child processNamespace manipulation with flags
socketNetworkCreate network socketNetwork access
connectNetworkConnect to remoteOutbound network access
bindNetworkBind to portInbound network access
execveProcessExecute programRunning arbitrary binaries
open / openatFilesystemOpen filesFile access

Quick Reference

Exam Speed Reference

bash
# Create seccomp profile directory on node
sudo mkdir -p /var/lib/kubelet/seccomp/profiles

# Place profile on node
sudo cp my-profile.json /var/lib/kubelet/seccomp/profiles/

# Verify profile is readable
sudo cat /var/lib/kubelet/seccomp/profiles/my-profile.json

Pod spec for RuntimeDefault:

yaml
securityContext:
  seccompProfile:
    type: RuntimeDefault

Pod spec for custom profile:

yaml
securityContext:
  seccompProfile:
    type: Localhost
    localhostProfile: profiles/my-profile.json

Minimal denylist profile (block dangerous syscalls):

json
{
  "defaultAction": "SCMP_ACT_ALLOW",
  "syscalls": [
    {
      "names": ["ptrace","mount","kexec_load","reboot","setns","unshare","bpf","init_module"],
      "action": "SCMP_ACT_ERRNO"
    }
  ]
}

Key Exam Takeaways

  1. Seccomp profiles must be placed on the node at /var/lib/kubelet/seccomp/
  2. localhostProfile is relative to that directory
  3. RuntimeDefault is the quick win -- always apply it when no custom profile is needed
  4. Default Kubernetes behavior is Unconfined -- you must explicitly opt in
  5. Use defaultAction: SCMP_ACT_ERRNO with an allowlist for maximum security
  6. Use defaultAction: SCMP_ACT_LOG to audit which syscalls an app needs
  7. Seccomp and AppArmor are complementary -- use both together

Released under the MIT License.