Skip to content

Vertical Pod Autoscaler (VPA)

🎯 What is VPA?

Vertical Pod Autoscaler (VPA) automatically adjusts the CPU and memory requests/limits of your pods based on actual usage patterns. Unlike HPA (which adds/removes pods), VPA right-sizes individual pods by modifying their resource specifications.

💡 Explanation: Think of your pods wearing clothes. HPA buys more shirts when you have more people (adds pods). VPA measures each person and tailors their shirt to fit perfectly (adjusts CPU/memory per pod). If someone grows or shrinks, VPA remeasures and makes them a new shirt.

Core Analogy: Tailoring Clothes

  • HPA: Buys more shirts when you have more people
  • VPA: Resizes each shirt to perfectly fit each person

📊 VPA vs HPA Comparison

AspectHorizontal Pod Autoscaler (HPA)Vertical Pod Autoscaler (VPA)
ScalesNumber of pod replicasCPU/Memory per pod
DirectionHorizontal (more/fewer pods)Vertical (more/fewer resources)
Best forVariable trafficPredictable resource patterns
DisruptionNone (adds/removes pods)Pod recreation required
Typical UseWeb apps, APIsDatabases, memory-heavy apps

💡 Key Difference: HPA changes quantity (pod count), VPA changes quality (resources per pod). They can work together: VPA makes each pod the right size, HPA decides how many of those right-sized pods you need.

🏗️ VPA Architecture & Components

VPA Component Diagram

┌─────────────────────────────────────────────────────────┐
│                  VPA Controller                          │
│                                                         │
│  ┌─────────────────────────────────────────────────┐   │
│  │              1. RECOMMENDER                     │   │
│  │  • Analyzes historical usage                    │   │
│  │  • Creates resource profiles                    │   │
│  │  • Suggests optimal requests/limits            │   │
│  └─────────────────────────────────────────────────┘   │
│                                                         │
│  ┌─────────────────────────────────────────────────┐   │
│  │              2. UPDATER                         │   │
│  │  • Evicts pods needing updates                 │   │
│  │  • Only in "Auto" mode                         │   │
│  │  • Gracefully terminates pods                  │   │
│  └─────────────────────────────────────────────────┘   │
│                                                         │
│  ┌─────────────────────────────────────────────────┐   │
│  │        3. ADMISSION CONTROLLER                  │   │
│  │  • Intercepts pod creation                      │   │
│  │  • Injects recommended resources               │   │
│  │  • Mutating webhook                            │   │
│  └─────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────┐
│               METRICS HISTORY                           │
│          (Metrics Server / Prometheus)                  │
└─────────────────────────────────────────────────────────┘

💡 How They Work Together:

  1. Recommender = The "brain" that learns what resources pods actually need
  2. Updater = The "action taker" that replaces pods when they need new sizes (only in Auto mode)
  3. Admission Controller = The "gatekeeper" that gives new pods the right resources from the start
  4. Metrics = The "memory" that stores what happened in the past

🚀 VPA Installation

bash
# Clone VPA repository
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler

# Install all components
./hack/vpa-up.sh

# Verify installation
kubectl get pods -n kube-system | grep vpa

💡 What This Installs:

  • vpa-recommender: Learns and suggests resource sizes
  • vpa-updater: Takes action to resize pods (in Auto mode)
  • vpa-admission-controller: Updates new pods as they're created
  • vpa-crd: Custom Resource Definition for VPA objects

Method 2: Helm Installation

bash
# Add Helm repository
helm repo add fairwinds-stable https://charts.fairwinds.com/stable
helm repo update

# Install VPA
helm install vpa fairwinds-stable/vpa \
  --namespace kube-system \
  --set recommender.enabled=true \
  --set updater.enabled=true \
  --set admissionController.enabled=true

💡 Helm Benefits: Easier upgrades, configuration management, and clean uninstalls.

Method 3: Manifest Files

bash
# Apply individual components
kubectl apply -f https://github.com/kubernetes/autoscaler/releases/latest/download/vertical-pod-autoscaler-recommender.yaml
kubectl apply -f https://github.com/kubernetes/autoscaler/releases/latest/download/vertical-pod-autoscaler-updater.yaml
kubectl apply -f https://github.com/kubernetes/autoscaler/releases/latest/download/vertical-pod-autoscaler-admission-controller.yaml

# Verify
kubectl get pods -n kube-system -l app=vpa

⚙️ VPA Components Explained

1. VPA Recommender

  • Purpose: Analyzes historical resource usage
  • Output: Resource recommendations stored in VPA object
  • Data Source: Metrics Server (last 8 days by default)
  • Algorithm: Uses histogram of usage patterns

💡 How It Learns: Imagine you have a pod that uses between 200-400m CPU over time. Recommender watches for 8 days, builds a histogram, and says: "This pod usually needs 300m CPU, but sometimes hits 400m. Let's recommend 350m to be safe."

2. VPA Updater

  • Purpose: Evicts pods that need resource adjustments
  • Action: Deletes pods with incorrect allocations
  • Trigger: When recommendations differ significantly from current
  • Mode: Only active in Auto mode

💡 The "Pod Killer": Updater is like a tailor who says "Your shirt doesn't fit anymore!" and makes you take it off so they can give you a new one. It doesn't resize the shirt while you're wearing it—it makes you change shirts entirely.

3. VPA Admission Controller

  • Purpose: Modifies pod specs during creation
  • How: Mutating admission webhook
  • When: Intercepts all pod creation requests
  • What: Replaces resource requests/limits with recommendations

💡 The "Birth Certificate": When a pod is born, Admission Controller checks the Recommender's notes and says: "This pod should have 350m CPU, not the 100m written in its DNA (YAML file)." It secretly changes the birth certificate before anyone notices.

📝 VPA Configuration Modes

Four Update Modes

1. Initial Mode (Most Common)

yaml
updatePolicy:
  updateMode: "Initial"
  • Behavior: Only sets resources on pod creation
  • Existing pods: Never updated
  • New pods: Get recommended resources
  • Best for: Safe production use, testing

💡 Analogy: Like a hospital that measures every newborn baby and gives them the right size diaper, but doesn't change diapers on babies who've already left the hospital.

2. Auto Mode (Aggressive)

yaml
updatePolicy:
  updateMode: "Auto"
  • Behavior: Updates resources and recreates pods
  • Existing pods: Evicted and recreated with new resources
  • Risk: Pod disruptions, potential downtime
  • Best for: Non-critical workloads, dev environments

💡 Analogy: Like a strict parent who makes you change clothes immediately if your shirt doesn't fit perfectly, even if you're in the middle of dinner.

3. Recreate Mode (Rarely Used)

yaml
updatePolicy:
  updateMode: "Recreate"
  • Behavior: Only recreates pods (no resource updates)
  • Use case: When pod recreation needed without resource changes

💡 When to Use: If you want VPA to restart pods periodically but not change their resources. Rare case.

4. Off Mode (Monitor Only)

yaml
updatePolicy:
  updateMode: "Off"
  • Behavior: Only provides recommendations
  • Action: No automatic changes
  • Best for: Learning patterns, manual optimization

💡 Analogy: Like a nutritionist who tells you "You should eat 2000 calories per day" but doesn't stop you from eating a whole pizza.

🔧 Basic VPA Configuration

Minimal VPA Example

yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: myapp-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: myapp-deployment
  
  updatePolicy:
    updateMode: "Initial"  # Safe mode
  
  resourcePolicy:
    containerPolicies:
    - containerName: "*"   # Apply to all containers
      minAllowed:
        cpu: "100m"
        memory: "128Mi"
      maxAllowed:
        cpu: "2"
        memory: "2Gi"
      controlledResources: ["cpu", "memory"]

💡 Breaking It Down:

  • targetRef: Which deployment/statefulset to watch (like tagging a person for measurement)
  • updateMode: "Initial": Safe mode—only fix new pods
  • containerName: "*": Apply to ALL containers in the pod (asterisk = wildcard)
  • minAllowed/maxAllowed: Safety rails so VPA doesn't recommend crazy sizes
  • controlledResources: Which resources to adjust (CPU, memory, or both)

Complete VPA with All Options

yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: complete-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment       # Can be: Deployment, StatefulSet, DaemonSet
    name: my-application
  
  updatePolicy:
    updateMode: "Auto"     # Initial, Auto, Recreate, or Off
    minReplicas: 2         # Optional: Minimum pods to consider
    
  resourcePolicy:
    containerPolicies:
    - containerName: "app"
      mode: "Auto"         # Auto, Off, or Initial
      minAllowed:
        cpu: "100m"
        memory: "128Mi"
      maxAllowed:
        cpu: "4"
        memory: "8Gi"
      controlledResources: ["cpu", "memory"]
      controlledValues: "RequestsAndLimits"  # or "RequestsOnly"
      
    - containerName: "sidecar"
      mode: "Off"          # Don't autoscale this container

💡 Advanced Options Explained:

  • minReplicas: Only start recommending if you have at least this many pods (more data = better recommendations)
  • container-specific mode: Different rules per container in same pod
  • controlledValues: "RequestsOnly" = only adjust requests, not limits (limits stay as you set them)

📊 Resource Policy Explained

Container Policies

yaml
resourcePolicy:
  containerPolicies:
  - containerName: "webapp"      # Specific container
    minAllowed:                  # Minimum VPA can recommend
      cpu: "100m"
      memory: "256Mi"
    maxAllowed:                  # Maximum VPA can recommend
      cpu: "2"
      memory: "4Gi"
    controlledResources:         # Which resources to adjust
    - "cpu"
    - "memory"
    controlledValues: "RequestsAndLimits"  # Options:
                                           # - RequestsAndLimits (default)
                                           # - RequestsOnly

💡 Why Min/Max Matters: Without these, VPA might recommend 0.001m CPU (too small to run) or 1000 CPU cores (breaks your cluster). These are guardrails.

Wildcard vs Specific Containers

yaml
# Option 1: All containers
- containerName: "*"  # Apply to EVERY container

# Option 2: Specific containers
- containerName: "app-server"   # Only this container
- containerName: "cache"        # Different policy for cache
- containerName: "logger"       # Don't set mode: "Off"

💡 Real Example: Your pod has 3 containers: app (needs lots of CPU), redis (needs lots of memory), logger (tiny, fixed needs). Use specific names to treat each differently.

🛠️ Practical VPA Examples

Example 1: Web Application

yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: webapp-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: webapp
  
  updatePolicy:
    updateMode: "Initial"  # Safe for production
  
  resourcePolicy:
    containerPolicies:
    - containerName: "nginx"
      minAllowed:
        cpu: "100m"
        memory: "128Mi"
      maxAllowed:
        cpu: "1"
        memory: "1Gi"
    - containerName: "app"
      minAllowed:
        cpu: "200m"
        memory: "256Mi"
      maxAllowed:
        cpu: "2"
        memory: "2Gi"

💡 Two Containers, Different Needs: Nginx (web server) vs App (application logic) have different resource patterns. VPA handles each separately.

Example 2: Database with StatefulSet

yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: postgres-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: StatefulSet       # Works with StatefulSets!
    name: postgres
  
  updatePolicy:
    updateMode: "Initial"   # CRITICAL: Databases hate pod restarts!
  
  resourcePolicy:
    containerPolicies:
    - containerName: "postgres"
      minAllowed:
        cpu: "500m"
        memory: "1Gi"
      maxAllowed:
        cpu: "4"
        memory: "16Gi"

💡 StatefulSet Warning: Databases store data on disk. If VPA kills the pod in Auto mode, database might get confused. Initial mode is safer—only fixes new pods when they're created during maintenance.

Example 3: Multi-Container Microservice

yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: microservice-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: order-service
  
  updatePolicy:
    updateMode: "Auto"      # Will recreate pods
    
  resourcePolicy:
    containerPolicies:
    - containerName: "order-processor"
      minAllowed:
        cpu: "200m"
        memory: "512Mi"
      maxAllowed:
        cpu: "2"
        memory: "4Gi"
      controlledValues: "RequestsAndLimits"
    
    - containerName: "redis-sidecar"
      mode: "Off"           # Don't autoscale sidecar
      # No min/max needed when mode is Off

💡 Sidecar Pattern: Redis runs alongside your app as a cache. You might not want VPA changing it because Redis has known memory patterns. mode: "Off" tells VPA to leave it alone.

🔍 How VPA Makes Recommendations

Recommendation Algorithm

VPA Recommendation Process:
1. Collect 8 days of usage data (default)
2. Build histogram of CPU/Memory usage
3. Calculate:
   - Target: 90th percentile of usage + safety margin
   - Lower Bound: 50th percentile
   - Upper Bound: 95th percentile + safety margin
4. Store in VPA object status

💡 What This Means: If your pod uses 100m, 200m, 300m, 400m CPU over time:

  • 50th percentile (Lower Bound): 250m (half the time it uses less than this)
  • 90th percentile (Target): 380m (90% of the time it uses less than this)
  • 95th percentile (Upper Bound): 390m (almost never uses more than this)

VPA recommends Target (380m) as the "right size" with safety margins.

Viewing Recommendations

bash
# Check VPA status
kubectl get vpa

# Detailed view with recommendations
kubectl describe vpa myapp-vpa

# Raw YAML with recommendations
kubectl get vpa myapp-vpa -o yaml

Sample VPA Output

bash
$ kubectl describe vpa/webapp-vpa

Status:
  Conditions:
    Status: True
    Type: RecommendationProvided
  Recommendation:
    Container Recommendations:
      Container Name:  nginx
      Lower Bound:     # Minimum safe
        Cpu:     100m
        Memory:  128Mi
      Target:          # ⭐ RECOMMENDED VALUE ⭐
        Cpu:     350m
        Memory:  512Mi
      Uncapped Target: # Without min/max constraints
        Cpu:     420m
        Memory:  600Mi
      Upper Bound:     # Maximum safe
        Cpu:     500m
        Memory:  1Gi

💡 Reading This Output:

  • Target (350m CPU): What VPA wants to set your pod to
  • Lower Bound (100m): Below this is dangerously small
  • Upper Bound (500m): Above this is wastefully large
  • Uncapped Target (420m): What VPA would recommend if you had no min/max limits

🧪 Testing VPA Step by Step

Step 1: Create Test Deployment

yaml
# test-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: vpa-test
spec:
  replicas: 2
  selector:
    matchLabels:
      app: vpa-test
  template:
    metadata:
      labels:
        app: vpa-test
    spec:
      containers:
      - name: test-app
        image: polinux/stress
        command: ["sleep", "3600"]
        resources:
          requests:
            cpu: "50m"     # Intentionally TOO LOW
            memory: "64Mi" # Intentionally TOO LOW

💡 Setting Up the Test: We create pods with obviously wrong resources (50m CPU) so VPA has something to fix.

Step 2: Create VPA

yaml
# test-vpa.yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: vpa-test-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: vpa-test
  updatePolicy:
    updateMode: "Auto"  # Will show immediate effect
  resourcePolicy:
    containerPolicies:
    - containerName: "*"
      minAllowed:
        cpu: "100m"
        memory: "128Mi"
      maxAllowed:
        cpu: "2"
        memory: "2Gi"

💡 Using Auto Mode for Testing: In real life, start with Initial. For testing, Auto shows VPA in action immediately.

Step 3: Apply and Generate Load

bash
# Apply configurations
kubectl apply -f test-deployment.yaml
kubectl apply -f test-vpa.yaml

# Generate load to help VPA learn
kubectl exec deployment/vpa-test -- stress --cpu 2 --vm 1 --vm-bytes 200M --timeout 300

# Wait for VPA to analyze (5-10 minutes)
sleep 300

# Check recommendations
kubectl describe vpa vpa-test-vpa

# Watch pods get recreated (Auto mode)
kubectl get pods -l app=vpa-test -w

💡 The Learning Process: VPA needs to see actual usage. stress command simulates load so VPA can say "Hey, this pod needs more than 50m CPU!"

🚨 VPA Troubleshooting

Common Issues & Solutions

Issue 1: VPA shows no recommendations

bash
# Check VPA components are running
kubectl get pods -n kube-system | grep vpa

# Check logs
kubectl logs -n kube-system deployment/vpa-recommender

# Verify metrics are available
kubectl top pods

# Check VPA object
kubectl describe vpa <name>

💡 Likely Causes: Metrics server not running, VPA pods crashed, or not enough time has passed (VPA needs hours of data).

Issue 2: Pods not being updated in Auto mode

bash
# Check update mode
kubectl get vpa <name> -o yaml | grep updateMode

# Check if recommendations exist
kubectl describe vpa <name> | grep -A5 "Recommendation"

# Check updater logs
kubectl logs -n kube-system deployment/vpa-updater

# Check events
kubectl get events | grep -i vpa

💡 Common Reason: Recommendations don't differ enough from current resources. VPA won't restart pods for tiny changes.

Issue 3: Admission controller not working

bash
# Check mutating webhook
kubectl get mutatingwebhookconfigurations

# Check webhook logs
kubectl logs -n kube-system deployment/vpa-admission-controller

# Test pod creation
kubectl run test --image=nginx --dry-run=client -o yaml | kubectl apply -f -

💡 Webhook Issues: Admission controller is a webhook that intercepts pod creation. If it's down, new pods won't get VPA recommendations.

Diagnostic Commands

bash
# Get all VPA resources
kubectl get vpa --all-namespaces

# Check VPA system status
kubectl get pods -n kube-system -l app=vpa
kubectl get deployments -n kube-system -l app=vpa
kubectl get services -n kube-system -l app=vpa

# Check events
kubectl get events --field-selector involvedObject.kind=VerticalPodAutoscaler
kubectl get events --sort-by=.metadata.creationTimestamp | tail -20

# Check resource usage history
kubectl get --raw "/apis/metrics.k8s.io/v1beta1/pods" | jq '.items[] | select(.metadata.name | contains("your-pod"))'

VPA Best Practices

1. Start with Initial Mode

yaml
updatePolicy:
  updateMode: "Initial"  # Always start here

💡 Why: Zero risk. Existing pods keep running, only new pods get changes. Like dipping your toe in water before jumping in.

2. Set Conservative Bounds

yaml
minAllowed:
  cpu: "100m"     # Prevent too-small pods
  memory: "128Mi"
maxAllowed:
  cpu: "4"        # Prevent runaway growth
  memory: "8Gi"

💡 Safety Rails: Without bounds, VPA might recommend 0.1m CPU (pod won't start) or 100 CPU cores (bankrupts your cloud bill).

3. Monitor Before Switching to Auto

bash
# Run in Initial/Off mode for 1-2 weeks
# Check recommendations are stable
kubectl describe vpa <name> | grep -A10 "Recommendation"

# Only switch to Auto when:
# 1. Recommendations are stable for 7+ days
# 2. You understand the impact of pod recreation
# 3. Your app can handle pod restarts

💡 The 7-Day Rule: VPA needs to see weekly patterns (weekday vs weekend, business hours vs night). Don't trust day 1 recommendations.

4. Combine with HPA

yaml
# VPA optimizes resource per pod
# HPA adjusts number of pods
# Perfect combination for variable workloads

💡 Dream Team: VPA makes each pod the perfect size. HPA decides how many perfect-sized pods you need based on traffic.

5. Regular Reviews

  • Weekly: Check VPA recommendations
  • Monthly: Adjust min/max bounds if needed
  • Quarterly: Review if VPA still needed

💡 VPA Isn't Fire-and-Forget: Like a garden, it needs occasional checking. Applications change, usage patterns shift.

📈 VPA with HPA: Combined Strategy

Why Combine Both?

  • VPA: Gets resource requests right for each pod
  • HPA: Scales pod count based on those correct resources
  • Result: Optimal scaling in both dimensions

Implementation Example

yaml
# 1. Deployment with placeholder resources
apiVersion: apps/v1
kind: Deployment
metadata:
  name: combined-app
spec:
  replicas: 2
  template:
    spec:
      containers:
      - name: app
        image: nginx
        resources:
          requests:
            cpu: "100m"    # Placeholder - VPA will fix
            memory: "256Mi" # Placeholder - VPA will fix

---
# 2. VPA for resource optimization
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: combined-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: combined-app
  updatePolicy:
    updateMode: "Initial"  # Safe mode

---
# 3. HPA for replica scaling
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: combined-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: combined-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70  # Scale based on VPA-optimized resources

Important Interaction

HPA uses: (actual usage / requested resources) × 100%

Since VPA optimizes requested resources, HPA makes better scaling decisions!

💡 Example: Your pod actually uses 300m CPU.

  • Without VPA: You guessed 500m request → HPA sees 60% usage (300/500) → No scaling
  • With VPA: VPA sets 350m request → HPA sees 85% usage (300/350) → Scales up!

VPA makes HPA's math more accurate.

⚠️ VPA Limitations & Gotchas

1. Pod Recreation Required

  • VPA cannot update running pods' resources
  • Pods must be recreated (causes brief downtime)
  • Not suitable for stateful applications that hate restarts

💡 The Shirt Problem: You can't resize a shirt while someone's wearing it. You must give them a new shirt (recreate pod).

2. Learning Period Required

  • Needs 8+ hours of metrics for good recommendations
  • Longer (24-48 hours) for stable patterns
  • Initial recommendations may be inaccurate

💡 Like a New Doctor: A doctor needs to examine you multiple times before understanding your health patterns. Day 1 diagnosis might be wrong.

3. Container Name Dependency

yaml
# VPA tracks by CONTAINER NAME
containers:
- name: "app"          # Profile tied to THIS name
- name: "app-v2"       # Different name = different profile!

💡 Name Change = New Person: If you rename container from "app" to "app-v2", VPA thinks it's a completely different container and starts learning from scratch.

4. Not for All Workloads

Avoid VPA for:

  • ❌ Short-lived Jobs (< 5 minutes)
  • ❌ StatefulSets with persistent data
  • ❌ Applications with bursty, unpredictable patterns
  • ❌ When pod recreation causes issues

💡 Wrong Tool for the Job: VPA is great for steady workloads. For spiky, unpredictable workloads, HPA is better.

5. Resource Quota Conflicts

  • VPA might recommend resources exceeding namespace quotas
  • Can cause pod creation failures
  • Monitor quotas when using VPA

💡 Quota Jail: If your namespace has 2 CPU quota and VPA wants to give one pod 3 CPU, that pod can't be created.

🔧 Advanced VPA Configuration

Custom Metrics Integration

yaml
# VPA can use Prometheus metrics
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: custom-vpa
  annotations:
    vpa.custom.metrics.prometheus.io/cpu: |
      rate(container_cpu_usage_seconds_total{container="myapp"}[5m])
    vpa.custom.metrics.prometheus.io/memory: |
      container_memory_working_set_bytes{container="myapp"}

💡 Beyond CPU/Memory: In theory, VPA could use any metric (requests per second, queue length), but CPU/memory are the standard ones.

Resource-Specific Controls

yaml
controlledValues: "RequestsOnly"  # Only adjust requests, not limits
# or
controlledValues: "RequestsAndLimits"  # Adjust both (default)

controlledResources: ["cpu"]  # Only adjust CPU, not memory
# or
controlledResources: ["cpu", "memory"]  # Adjust both (default)

💡 Requests vs Limits:

  • Requests: What Kubernetes guarantees you
  • Limits: Maximum you can use
  • Most people let VPA adjust both, but RequestsOnly is safer.

Target CPU/Memory Percentiles

yaml
# Adjust safety margins (advanced)
# These are VPA recommender flags, not in VPA spec
# Set as command-line args to recommender
spec:
  containers:
  - name: recommender
    args:
    - --cpu-histogram-decay-half-life=24h
    - --memory-histogram-decay-half-life=48h
    - --target-cpu-utilization=0.9  # 90th percentile
    - --target-memory-utilization=0.9

💡 Tuning Knobs:

  • decay-half-life: How quickly VPA "forgets" old data (24h = yesterday's data matters half as much as today's)
  • target-utilization: How aggressive to be (0.9 = 90th percentile = 10% safety margin)

📊 Monitoring VPA

Key Metrics to Track

bash
# 1. Resource optimization
kubectl describe vpa <name> | grep -A5 "Recommendation"

# 2. Pod evictions (Auto mode)
kubectl get events | grep "vpa.*evict"

# 3. Cost savings
# Compare before/after resource requests

# 4. Application performance
# Monitor app metrics after VPA changes

Prometheus Metrics (if exposed)

txt
# VPA recommender metrics
vpa_recommendation_cpu_cores
vpa_recommendation_memory_bytes
vpa_checkpoint_created_total

# VPA updater metrics
vpa_updater_evictions_total
vpa_updater_errors_total

🎯 When to Use VPA - Decision Guide

Use VPA When:

  • ✅ Memory-intensive applications (Java, .NET, Node.js)
  • ✅ Applications with growing resource needs
  • ✅ You're unsure of resource requirements
  • ✅ Cost optimization is important
  • ✅ Combined with HPA for complete autoscaling

💡 Perfect Example: A Java microservice that starts with 1GB heap but grows to need 2GB over months. VPA notices and adjusts automatically.

Don't Use VPA When:

  • ❌ Stateful applications (databases with persistent storage)
  • ❌ Short-lived batch jobs
  • ❌ When pod recreation causes business impact
  • ❌ You have precise, known resource requirements
  • ❌ Your cluster has strict resource quotas

💡 Bad Example: PostgreSQL database. If VPA kills the pod to resize it, database recovery might take minutes.

  1. Week 1: Deploy VPA in Off mode, monitor recommendations
  2. Week 2: Switch to Initial mode, verify new pods get correct resources
  3. Week 3+:: If stable, consider Auto mode for continuous optimization
  4. Ongoing: Combine with HPA, monitor monthly

💡 The VPA Journey: Off → Initial → (maybe) Auto. Like learning to drive: Parking lot (Off) → Quiet streets (Initial) → Highway (Auto, if you're brave).

📋 Quick Reference Commands

VPA Management

bash
# Basic commands
kubectl get vpa                          # List VPAs
kubectl describe vpa <name>              # Detailed view
kubectl edit vpa <name>                  # Edit VPA
kubectl delete vpa <name>                # Delete VPA

# Check components
kubectl get pods -n kube-system -l app=vpa
kubectl get deployments -n kube-system -l app=vpa
kubectl get services -n kube-system -l app=vpa

# Debugging
kubectl logs -n kube-system deployment/vpa-recommender
kubectl logs -n kube-system deployment/vpa-updater
kubectl logs -n kube-system deployment/vpa-admission-controller

Testing & Validation

bash
# Check current resource usage
kubectl top pods
kubectl describe pod <pod> | grep -A10 "Resources"

# Generate test load
kubectl exec <pod> -- stress --cpu 2 --vm 1 --vm-bytes 200M --timeout 300

# Monitor VPA actions
watch -n 5 'kubectl get vpa,pods && echo "---" && kubectl describe vpa <name> | grep -A5 "Recommendation"'

💡 Pro Tips

  1. Always set min/max bounds to prevent extreme recommendations
  2. Start with Initial mode in production
  3. Monitor for 1-2 weeks before switching to Auto
  4. Combine with HPA for complete autoscaling
  5. Regularly review recommendations - VPA isn't "set and forget"
  6. Test pod recreation impact before enabling Auto mode
  7. Use with resource quotas to prevent runaway growth

🎓 Summary: VPA in One Page

AspectRecommendationWhy
InstallationUse Helm or official manifestsClean management
Initial ModeupdateMode: "Initial"Safe start
Resource BoundsAlways set min/maxPrevent extremes
Monitoring Period1-2 weeks before trustingLearn patterns
Production UseStart with Initial, move to Auto cautiouslyAvoid surprises
Best ComboVPA + HPAComplete autoscaling
Avoid ForStateful apps, short jobsWrong tool

Remember: VPA is a powerful tool for resource optimization, but requires careful implementation and monitoring. Start small, learn patterns, and expand gradually!

Final Thought: VPA is like having a personal tailor for your pods. A good tailor measures carefully, makes small adjustments, and never ruins your favorite suit. A bad tailor cuts without measuring and leaves you with clothes that don't fit. Be a good VPA tailor.

Released under the MIT License.