Seccomp Profiles
What Is Seccomp?
Seccomp (Secure Computing Mode) is a Linux kernel feature that restricts the system calls (syscalls) a process can make. Every interaction between a userspace program and the kernel happens through syscalls -- file operations, network access, process management, and more. Seccomp allows you to define a filter that specifies exactly which syscalls are permitted.
There are over 300 syscalls in the Linux kernel, but most applications need only a small subset. By blocking unused syscalls, seccomp dramatically reduces the kernel attack surface available to a compromised container.
Seccomp vs AppArmor
- AppArmor restricts what resources a process can access (files, network, capabilities)
- Seccomp restricts which kernel interfaces (syscalls) a process can invoke
- They are complementary -- use both for defense in depth
- Seccomp operates at a lower level, closer to the kernel
How Seccomp Works
Seccomp Profile Types in Kubernetes
Kubernetes supports three seccomp profile types:
| Type | Description | Security Level |
|---|---|---|
RuntimeDefault | Uses the container runtime's built-in default profile | Good baseline security |
Localhost | Uses a custom profile stored on the node | Maximum control |
Unconfined | No seccomp filtering applied | No protection |
Default Behavior
By default (without any seccomp configuration), Kubernetes runs containers as Unconfined -- no syscall filtering is applied. Always explicitly set a seccomp profile.
The Default Docker/containerd Seccomp Profile
The container runtime (Docker, containerd, CRI-O) ships with a default seccomp profile that blocks approximately 44 syscalls out of 300+. This profile is a sensible baseline that blocks dangerous syscalls while allowing most applications to function normally.
Syscalls Blocked by the Default Profile
| Blocked Syscall | Why It's Dangerous |
|---|---|
kexec_load | Load a new kernel -- full system compromise |
mount | Mount filesystems -- escape container boundaries |
umount2 | Unmount filesystems |
ptrace | Trace/debug processes -- read memory, inject code |
reboot | Reboot the host system |
setns | Join another namespace -- container escape |
unshare | Create new namespaces |
clone (with CLONE_NEWUSER) | Create user namespaces |
keyctl | Kernel keyring manipulation |
bpf | Load BPF programs into kernel |
add_key | Add keys to kernel keyring |
request_key | Request keys from kernel keyring |
init_module | Load kernel modules |
delete_module | Unload kernel modules |
acct | Process accounting -- information disclosure |
swapon / swapoff | Modify swap -- DoS potential |
Syscalls Allowed by the Default Profile
Common syscalls that applications need and the default profile allows:
| Allowed Syscall | Purpose |
|---|---|
read / write | File and network I/O |
open / openat | Open files |
close | Close file descriptors |
stat / fstat | File metadata |
mmap / mprotect | Memory management |
brk | Heap management |
socket / connect / bind | Network operations |
fork / execve | Process creation |
getpid / getuid | Process/user info |
ioctl | Device control |
Applying Seccomp via SecurityContext
Using RuntimeDefault
The simplest and most common approach -- applies the container runtime's built-in seccomp profile:
apiVersion: v1
kind: Pod
metadata:
name: seccomp-runtime-default
spec:
containers:
- name: app
image: nginx:1.27
securityContext:
seccompProfile:
type: RuntimeDefaultPod-Level vs Container-Level
You can set seccomp at the pod level (applies to all containers) or at the container level (applies to specific containers). Container-level settings override pod-level.
spec:
securityContext:
seccompProfile:
type: RuntimeDefault # Pod level -- applies to all containers
containers:
- name: app
image: nginx:1.27
securityContext:
seccompProfile:
type: Localhost # Container level -- overrides pod level
localhostProfile: profiles/custom.jsonUsing a Custom Localhost Profile
apiVersion: v1
kind: Pod
metadata:
name: seccomp-custom
spec:
containers:
- name: app
image: nginx:1.27
securityContext:
seccompProfile:
type: Localhost
localhostProfile: profiles/nginx-strict.jsonLocalhost Profile Path
The localhostProfile path is relative to the kubelet's configured seccomp profile root directory, which defaults to:
/var/lib/kubelet/seccomp/So profiles/nginx-strict.json resolves to:
/var/lib/kubelet/seccomp/profiles/nginx-strict.jsonUsing Unconfined (Not Recommended)
apiVersion: v1
kind: Pod
metadata:
name: seccomp-unconfined
spec:
containers:
- name: app
image: nginx:1.27
securityContext:
seccompProfile:
type: UnconfinedCreating Custom Seccomp Profiles
Custom seccomp profiles are JSON files that define exactly which syscalls to allow or deny.
JSON Profile Structure
{
"defaultAction": "SCMP_ACT_ERRNO",
"architectures": [
"SCMP_ARCH_X86_64",
"SCMP_ARCH_X86",
"SCMP_ARCH_X32"
],
"syscalls": [
{
"names": [
"accept4",
"bind",
"close",
"connect",
"epoll_create1",
"epoll_ctl",
"epoll_wait",
"execve",
"exit",
"exit_group",
"fstat",
"futex",
"getpid",
"getuid",
"listen",
"mmap",
"mprotect",
"munmap",
"openat",
"read",
"recvfrom",
"rt_sigaction",
"sendto",
"socket",
"write"
],
"action": "SCMP_ACT_ALLOW"
}
]
}Profile Actions
| Action | Behavior |
|---|---|
SCMP_ACT_ALLOW | Allow the syscall |
SCMP_ACT_ERRNO | Block the syscall, return error to caller |
SCMP_ACT_KILL | Immediately kill the process |
SCMP_ACT_KILL_PROCESS | Kill the entire process (not just thread) |
SCMP_ACT_TRAP | Send SIGSYS signal to the process |
SCMP_ACT_LOG | Allow the syscall but log it (for auditing) |
Strategy: Allowlist vs Denylist
Allowlist (Recommended)
Set defaultAction to SCMP_ACT_ERRNO (block everything by default) and explicitly allow needed syscalls. This is the most secure approach.
{
"defaultAction": "SCMP_ACT_ERRNO",
"syscalls": [
{
"names": ["read", "write", "exit", "..."],
"action": "SCMP_ACT_ALLOW"
}
]
}Denylist (Less Secure)
Set defaultAction to SCMP_ACT_ALLOW (allow everything by default) and explicitly block dangerous syscalls. Easier to create but less secure.
{
"defaultAction": "SCMP_ACT_ALLOW",
"syscalls": [
{
"names": ["ptrace", "mount", "kexec_load"],
"action": "SCMP_ACT_ERRNO"
}
]
}Complete Profile Examples
Profile: Block All Network Syscalls
{
"defaultAction": "SCMP_ACT_ALLOW",
"syscalls": [
{
"names": [
"socket",
"connect",
"bind",
"listen",
"accept",
"accept4",
"sendto",
"recvfrom",
"sendmsg",
"recvmsg",
"shutdown",
"getsockname",
"getpeername",
"setsockopt",
"getsockopt"
],
"action": "SCMP_ACT_ERRNO"
}
]
}Profile: Audit Mode (Log Everything)
This is useful for discovering which syscalls your application needs:
{
"defaultAction": "SCMP_ACT_LOG",
"syscalls": [
{
"names": [
"read",
"write",
"exit",
"exit_group",
"rt_sigreturn"
],
"action": "SCMP_ACT_ALLOW"
}
]
}Profile: Strict Web Server
{
"defaultAction": "SCMP_ACT_ERRNO",
"architectures": [
"SCMP_ARCH_X86_64"
],
"syscalls": [
{
"names": [
"accept4",
"access",
"arch_prctl",
"bind",
"brk",
"clone",
"close",
"connect",
"dup2",
"epoll_create1",
"epoll_ctl",
"epoll_wait",
"eventfd2",
"execve",
"exit",
"exit_group",
"fchown",
"fcntl",
"fstat",
"futex",
"getdents64",
"getpid",
"getppid",
"getuid",
"ioctl",
"listen",
"lseek",
"madvise",
"mmap",
"mprotect",
"munmap",
"nanosleep",
"newfstatat",
"openat",
"pipe2",
"pread64",
"prlimit64",
"read",
"recvfrom",
"recvmsg",
"rt_sigaction",
"rt_sigprocmask",
"rt_sigreturn",
"sendmsg",
"sendto",
"set_robust_list",
"set_tid_address",
"setgid",
"setgroups",
"setuid",
"setsockopt",
"socket",
"stat",
"uname",
"write",
"writev"
],
"action": "SCMP_ACT_ALLOW"
}
]
}Deploying Custom Seccomp Profiles
Step 1: Place the Profile on the Node
# SSH to the node
ssh node01
# Create the seccomp profiles directory
sudo mkdir -p /var/lib/kubelet/seccomp/profiles
# Create the profile
sudo tee /var/lib/kubelet/seccomp/profiles/block-network.json << 'EOF'
{
"defaultAction": "SCMP_ACT_ALLOW",
"syscalls": [
{
"names": ["socket", "connect", "bind", "listen", "accept", "accept4", "sendto", "recvfrom"],
"action": "SCMP_ACT_ERRNO"
}
]
}
EOFStep 2: Create the Pod
apiVersion: v1
kind: Pod
metadata:
name: no-network-pod
spec:
nodeName: node01
containers:
- name: app
image: busybox:1.36
command: ["sh", "-c", "echo 'Network blocked' && sleep 3600"]
securityContext:
seccompProfile:
type: Localhost
localhostProfile: profiles/block-network.jsonStep 3: Verify
# Apply and check
kubectl apply -f no-network-pod.yaml
kubectl get pod no-network-pod
# Test network -- should fail
kubectl exec no-network-pod -- wget -qO- http://google.com
# Expected: wget: can't connect
# Test file access -- should work
kubectl exec no-network-pod -- ls /
# Expected: normal outputAuditing Syscalls a Container Makes
To create an effective seccomp profile, you need to know which syscalls your application actually uses.
Method 1: Using strace
# Run the container with strace
docker run --rm -it --security-opt seccomp=unconfined \
strace -cf -o /dev/stderr nginx -g 'daemon off;' 2>&1 | head -50
# Or trace a running container
# Get the PID of the container process
docker inspect --format '{{.State.Pid}}' <container-id>
strace -cf -p <pid>Method 2: Using the Audit/Log Profile
Create a seccomp profile that logs all syscalls:
{
"defaultAction": "SCMP_ACT_LOG"
}Then check the audit log:
# After running the container with the log profile
sudo grep SECCOMP /var/log/audit/audit.log
# or
sudo dmesg | grep SECCOMP
# Extract unique syscall numbers
sudo grep SECCOMP /var/log/audit/audit.log | \
grep -oP 'syscall=\K\d+' | sort -uMethod 3: Using Security Profiles Operator
The Security Profiles Operator (SPO) can automatically generate seccomp profiles by recording syscalls:
apiVersion: security-profiles-operator.x-k8s.io/v1alpha1
kind: SeccompProfile
metadata:
name: nginx-profile
namespace: default
spec:
defaultAction: SCMP_ACT_ERRNO
architectures:
- SCMP_ARCH_X86_64
syscalls:
- action: SCMP_ACT_ALLOW
names:
- accept4
- bind
- close
- epoll_ctl
- listen
- openat
- read
- socket
- writeExam Tip
The Security Profiles Operator is not typically available in the exam environment, but understanding it demonstrates depth of knowledge. For the exam, focus on manually placing JSON profiles on nodes and referencing them via securityContext.
Seccomp Syscall Filtering Architecture
Common Syscalls Reference
This table lists syscalls frequently referenced in CKS exam scenarios:
| Syscall | Category | Purpose | Risk if Unrestricted |
|---|---|---|---|
mount | Filesystem | Mount filesystems | Container escape via host mounts |
umount2 | Filesystem | Unmount filesystems | Disrupt host filesystem |
ptrace | Process | Trace/debug processes | Read process memory, inject code |
kexec_load | System | Load new kernel | Full system takeover |
reboot | System | Reboot host | Denial of service |
setns | Namespace | Join a namespace | Container escape |
unshare | Namespace | Create new namespace | Privilege escalation |
bpf | Kernel | Load BPF programs | Kernel-level code execution |
init_module | Kernel | Load kernel modules | Arbitrary kernel code |
clone | Process | Create child process | Namespace manipulation with flags |
socket | Network | Create network socket | Network access |
connect | Network | Connect to remote | Outbound network access |
bind | Network | Bind to port | Inbound network access |
execve | Process | Execute program | Running arbitrary binaries |
open / openat | Filesystem | Open files | File access |
Quick Reference
Exam Speed Reference
# Create seccomp profile directory on node
sudo mkdir -p /var/lib/kubelet/seccomp/profiles
# Place profile on node
sudo cp my-profile.json /var/lib/kubelet/seccomp/profiles/
# Verify profile is readable
sudo cat /var/lib/kubelet/seccomp/profiles/my-profile.jsonPod spec for RuntimeDefault:
securityContext:
seccompProfile:
type: RuntimeDefaultPod spec for custom profile:
securityContext:
seccompProfile:
type: Localhost
localhostProfile: profiles/my-profile.jsonMinimal denylist profile (block dangerous syscalls):
{
"defaultAction": "SCMP_ACT_ALLOW",
"syscalls": [
{
"names": ["ptrace","mount","kexec_load","reboot","setns","unshare","bpf","init_module"],
"action": "SCMP_ACT_ERRNO"
}
]
}Key Exam Takeaways
- Seccomp profiles must be placed on the node at
/var/lib/kubelet/seccomp/ localhostProfileis relative to that directoryRuntimeDefaultis the quick win -- always apply it when no custom profile is needed- Default Kubernetes behavior is
Unconfined-- you must explicitly opt in - Use
defaultAction: SCMP_ACT_ERRNOwith an allowlist for maximum security - Use
defaultAction: SCMP_ACT_LOGto audit which syscalls an app needs - Seccomp and AppArmor are complementary -- use both together