Vertical Pod Autoscaler (VPA)

🎯 What is VPA?

Vertical Pod Autoscaler (VPA) automatically adjusts the CPU and memory requests/limits of your pods based on actual usage patterns. Unlike HPA (which adds/removes pods), VPA right-sizes individual pods by modifying their resource specifications.

💡 Explanation: Think of your pods wearing clothes. HPA buys more shirts when you have more people (adds pods). VPA measures each person and tailors their shirt to fit perfectly (adjusts CPU/memory per pod). If someone grows or shrinks, VPA remeasures and makes them a new shirt.

Core Analogy: Tailoring Clothes

HPA: Buys more shirts when you have more people
VPA: Resizes each shirt to perfectly fit each person

📊 VPA vs HPA Comparison

Aspect	Horizontal Pod Autoscaler (HPA)	Vertical Pod Autoscaler (VPA)
Scales	Number of pod replicas	CPU/Memory per pod
Direction	Horizontal (more/fewer pods)	Vertical (more/fewer resources)
Best for	Variable traffic	Predictable resource patterns
Disruption	None (adds/removes pods)	Pod recreation required
Typical Use	Web apps, APIs	Databases, memory-heavy apps

💡 Key Difference: HPA changes quantity (pod count), VPA changes quality (resources per pod). They can work together: VPA makes each pod the right size, HPA decides how many of those right-sized pods you need.

🏗️ VPA Architecture & Components

VPA Component Diagram

┌─────────────────────────────────────────────────────────┐
│                  VPA Controller                          │
│                                                         │
│  ┌─────────────────────────────────────────────────┐   │
│  │              1. RECOMMENDER                     │   │
│  │  • Analyzes historical usage                    │   │
│  │  • Creates resource profiles                    │   │
│  │  • Suggests optimal requests/limits            │   │
│  └─────────────────────────────────────────────────┘   │
│                                                         │
│  ┌─────────────────────────────────────────────────┐   │
│  │              2. UPDATER                         │   │
│  │  • Evicts pods needing updates                 │   │
│  │  • Only in "Auto" mode                         │   │
│  │  • Gracefully terminates pods                  │   │
│  └─────────────────────────────────────────────────┘   │
│                                                         │
│  ┌─────────────────────────────────────────────────┐   │
│  │        3. ADMISSION CONTROLLER                  │   │
│  │  • Intercepts pod creation                      │   │
│  │  • Injects recommended resources               │   │
│  │  • Mutating webhook                            │   │
│  └─────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────┐
│               METRICS HISTORY                           │
│          (Metrics Server / Prometheus)                  │
└─────────────────────────────────────────────────────────┘

💡 How They Work Together:

Recommender = The "brain" that learns what resources pods actually need
Updater = The "action taker" that replaces pods when they need new sizes (only in Auto mode)
Admission Controller = The "gatekeeper" that gives new pods the right resources from the start
Metrics = The "memory" that stores what happened in the past

🚀 VPA Installation

Method 1: Official Release (Recommended)

bash

# Clone VPA repository
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler

# Install all components
./hack/vpa-up.sh

# Verify installation
kubectl get pods -n kube-system | grep vpa

💡 What This Installs:

vpa-recommender: Learns and suggests resource sizes
vpa-updater: Takes action to resize pods (in Auto mode)
vpa-admission-controller: Updates new pods as they're created
vpa-crd: Custom Resource Definition for VPA objects

Method 2: Helm Installation

bash

# Add Helm repository
helm repo add fairwinds-stable https://charts.fairwinds.com/stable
helm repo update

# Install VPA
helm install vpa fairwinds-stable/vpa \
  --namespace kube-system \
  --set recommender.enabled=true \
  --set updater.enabled=true \
  --set admissionController.enabled=true

💡 Helm Benefits: Easier upgrades, configuration management, and clean uninstalls.

Method 3: Manifest Files

bash

# Apply individual components
kubectl apply -f https://github.com/kubernetes/autoscaler/releases/latest/download/vertical-pod-autoscaler-recommender.yaml
kubectl apply -f https://github.com/kubernetes/autoscaler/releases/latest/download/vertical-pod-autoscaler-updater.yaml
kubectl apply -f https://github.com/kubernetes/autoscaler/releases/latest/download/vertical-pod-autoscaler-admission-controller.yaml

# Verify
kubectl get pods -n kube-system -l app=vpa

⚙️ VPA Components Explained

1. VPA Recommender

Purpose: Analyzes historical resource usage
Output: Resource recommendations stored in VPA object
Data Source: Metrics Server (last 8 days by default)
Algorithm: Uses histogram of usage patterns

💡 How It Learns: Imagine you have a pod that uses between 200-400m CPU over time. Recommender watches for 8 days, builds a histogram, and says: "This pod usually needs 300m CPU, but sometimes hits 400m. Let's recommend 350m to be safe."

2. VPA Updater

Purpose: Evicts pods that need resource adjustments
Action: Deletes pods with incorrect allocations
Trigger: When recommendations differ significantly from current
Mode: Only active in Auto mode

💡 The "Pod Killer": Updater is like a tailor who says "Your shirt doesn't fit anymore!" and makes you take it off so they can give you a new one. It doesn't resize the shirt while you're wearing it—it makes you change shirts entirely.

3. VPA Admission Controller

Purpose: Modifies pod specs during creation
How: Mutating admission webhook
When: Intercepts all pod creation requests
What: Replaces resource requests/limits with recommendations

💡 The "Birth Certificate": When a pod is born, Admission Controller checks the Recommender's notes and says: "This pod should have 350m CPU, not the 100m written in its DNA (YAML file)." It secretly changes the birth certificate before anyone notices.

📝 VPA Configuration Modes

Four Update Modes

1. `Initial` Mode (Most Common)

yaml

updatePolicy:
  updateMode: "Initial"

Behavior: Only sets resources on pod creation
Existing pods: Never updated
New pods: Get recommended resources
Best for: Safe production use, testing

💡 Analogy: Like a hospital that measures every newborn baby and gives them the right size diaper, but doesn't change diapers on babies who've already left the hospital.

2. `Auto` Mode (Aggressive)

yaml

updatePolicy:
  updateMode: "Auto"

Behavior: Updates resources and recreates pods
Existing pods: Evicted and recreated with new resources
Risk: Pod disruptions, potential downtime
Best for: Non-critical workloads, dev environments

💡 Analogy: Like a strict parent who makes you change clothes immediately if your shirt doesn't fit perfectly, even if you're in the middle of dinner.

3. `Recreate` Mode (Rarely Used)

yaml

updatePolicy:
  updateMode: "Recreate"

Behavior: Only recreates pods (no resource updates)
Use case: When pod recreation needed without resource changes

💡 When to Use: If you want VPA to restart pods periodically but not change their resources. Rare case.

4. `Off` Mode (Monitor Only)

yaml

updatePolicy:
  updateMode: "Off"

Behavior: Only provides recommendations
Action: No automatic changes
Best for: Learning patterns, manual optimization

💡 Analogy: Like a nutritionist who tells you "You should eat 2000 calories per day" but doesn't stop you from eating a whole pizza.

🔧 Basic VPA Configuration

Minimal VPA Example

yaml

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: myapp-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: myapp-deployment
  
  updatePolicy:
    updateMode: "Initial"  # Safe mode
  
  resourcePolicy:
    containerPolicies:
    - containerName: "*"   # Apply to all containers
      minAllowed:
        cpu: "100m"
        memory: "128Mi"
      maxAllowed:
        cpu: "2"
        memory: "2Gi"
      controlledResources: ["cpu", "memory"]

💡 Breaking It Down:

targetRef: Which deployment/statefulset to watch (like tagging a person for measurement)
updateMode: "Initial": Safe mode—only fix new pods
containerName: "*": Apply to ALL containers in the pod (asterisk = wildcard)
minAllowed/maxAllowed: Safety rails so VPA doesn't recommend crazy sizes
controlledResources: Which resources to adjust (CPU, memory, or both)

Complete VPA with All Options

yaml

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: complete-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment       # Can be: Deployment, StatefulSet, DaemonSet
    name: my-application
  
  updatePolicy:
    updateMode: "Auto"     # Initial, Auto, Recreate, or Off
    minReplicas: 2         # Optional: Minimum pods to consider
    
  resourcePolicy:
    containerPolicies:
    - containerName: "app"
      mode: "Auto"         # Auto, Off, or Initial
      minAllowed:
        cpu: "100m"
        memory: "128Mi"
      maxAllowed:
        cpu: "4"
        memory: "8Gi"
      controlledResources: ["cpu", "memory"]
      controlledValues: "RequestsAndLimits"  # or "RequestsOnly"
      
    - containerName: "sidecar"
      mode: "Off"          # Don't autoscale this container

💡 Advanced Options Explained:

minReplicas: Only start recommending if you have at least this many pods (more data = better recommendations)
container-specific mode: Different rules per container in same pod
controlledValues: "RequestsOnly" = only adjust requests, not limits (limits stay as you set them)

📊 Resource Policy Explained

Container Policies

yaml

resourcePolicy:
  containerPolicies:
  - containerName: "webapp"      # Specific container
    minAllowed:                  # Minimum VPA can recommend
      cpu: "100m"
      memory: "256Mi"
    maxAllowed:                  # Maximum VPA can recommend
      cpu: "2"
      memory: "4Gi"
    controlledResources:         # Which resources to adjust
    - "cpu"
    - "memory"
    controlledValues: "RequestsAndLimits"  # Options:
                                           # - RequestsAndLimits (default)
                                           # - RequestsOnly

💡 Why Min/Max Matters: Without these, VPA might recommend 0.001m CPU (too small to run) or 1000 CPU cores (breaks your cluster). These are guardrails.

Wildcard vs Specific Containers

yaml

# Option 1: All containers
- containerName: "*"  # Apply to EVERY container

# Option 2: Specific containers
- containerName: "app-server"   # Only this container
- containerName: "cache"        # Different policy for cache
- containerName: "logger"       # Don't set mode: "Off"

💡 Real Example: Your pod has 3 containers: app (needs lots of CPU), redis (needs lots of memory), logger (tiny, fixed needs). Use specific names to treat each differently.

🛠️ Practical VPA Examples

Example 1: Web Application

yaml

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: webapp-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: webapp
  
  updatePolicy:
    updateMode: "Initial"  # Safe for production
  
  resourcePolicy:
    containerPolicies:
    - containerName: "nginx"
      minAllowed:
        cpu: "100m"
        memory: "128Mi"
      maxAllowed:
        cpu: "1"
        memory: "1Gi"
    - containerName: "app"
      minAllowed:
        cpu: "200m"
        memory: "256Mi"
      maxAllowed:
        cpu: "2"
        memory: "2Gi"

💡 Two Containers, Different Needs: Nginx (web server) vs App (application logic) have different resource patterns. VPA handles each separately.

Example 2: Database with StatefulSet

yaml

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: postgres-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: StatefulSet       # Works with StatefulSets!
    name: postgres
  
  updatePolicy:
    updateMode: "Initial"   # CRITICAL: Databases hate pod restarts!
  
  resourcePolicy:
    containerPolicies:
    - containerName: "postgres"
      minAllowed:
        cpu: "500m"
        memory: "1Gi"
      maxAllowed:
        cpu: "4"
        memory: "16Gi"

💡 StatefulSet Warning: Databases store data on disk. If VPA kills the pod in Auto mode, database might get confused. Initial mode is safer—only fixes new pods when they're created during maintenance.

Example 3: Multi-Container Microservice

yaml

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: microservice-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: order-service
  
  updatePolicy:
    updateMode: "Auto"      # Will recreate pods
    
  resourcePolicy:
    containerPolicies:
    - containerName: "order-processor"
      minAllowed:
        cpu: "200m"
        memory: "512Mi"
      maxAllowed:
        cpu: "2"
        memory: "4Gi"
      controlledValues: "RequestsAndLimits"
    
    - containerName: "redis-sidecar"
      mode: "Off"           # Don't autoscale sidecar
      # No min/max needed when mode is Off

💡 Sidecar Pattern: Redis runs alongside your app as a cache. You might not want VPA changing it because Redis has known memory patterns. mode: "Off" tells VPA to leave it alone.

🔍 How VPA Makes Recommendations

Recommendation Algorithm

VPA Recommendation Process:
1. Collect 8 days of usage data (default)
2. Build histogram of CPU/Memory usage
3. Calculate:
   - Target: 90th percentile of usage + safety margin
   - Lower Bound: 50th percentile
   - Upper Bound: 95th percentile + safety margin
4. Store in VPA object status

💡 What This Means: If your pod uses 100m, 200m, 300m, 400m CPU over time:

50th percentile (Lower Bound): 250m (half the time it uses less than this)
90th percentile (Target): 380m (90% of the time it uses less than this)
95th percentile (Upper Bound): 390m (almost never uses more than this)

VPA recommends Target (380m) as the "right size" with safety margins.

Viewing Recommendations

bash

# Check VPA status
kubectl get vpa

# Detailed view with recommendations
kubectl describe vpa myapp-vpa

# Raw YAML with recommendations
kubectl get vpa myapp-vpa -o yaml

Sample VPA Output

bash

$ kubectl describe vpa/webapp-vpa

Status:
  Conditions:
    Status: True
    Type: RecommendationProvided
  Recommendation:
    Container Recommendations:
      Container Name:  nginx
      Lower Bound:     # Minimum safe
        Cpu:     100m
        Memory:  128Mi
      Target:          # ⭐ RECOMMENDED VALUE ⭐
        Cpu:     350m
        Memory:  512Mi
      Uncapped Target: # Without min/max constraints
        Cpu:     420m
        Memory:  600Mi
      Upper Bound:     # Maximum safe
        Cpu:     500m
        Memory:  1Gi

💡 Reading This Output:

Target (350m CPU): What VPA wants to set your pod to
Lower Bound (100m): Below this is dangerously small
Upper Bound (500m): Above this is wastefully large
Uncapped Target (420m): What VPA would recommend if you had no min/max limits

🧪 Testing VPA Step by Step

Step 1: Create Test Deployment

yaml

# test-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: vpa-test
spec:
  replicas: 2
  selector:
    matchLabels:
      app: vpa-test
  template:
    metadata:
      labels:
        app: vpa-test
    spec:
      containers:
      - name: test-app
        image: polinux/stress
        command: ["sleep", "3600"]
        resources:
          requests:
            cpu: "50m"     # Intentionally TOO LOW
            memory: "64Mi" # Intentionally TOO LOW

💡 Setting Up the Test: We create pods with obviously wrong resources (50m CPU) so VPA has something to fix.

Step 2: Create VPA

yaml

# test-vpa.yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: vpa-test-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: vpa-test
  updatePolicy:
    updateMode: "Auto"  # Will show immediate effect
  resourcePolicy:
    containerPolicies:
    - containerName: "*"
      minAllowed:
        cpu: "100m"
        memory: "128Mi"
      maxAllowed:
        cpu: "2"
        memory: "2Gi"

💡 Using Auto Mode for Testing: In real life, start with Initial. For testing, Auto shows VPA in action immediately.

Step 3: Apply and Generate Load

bash

# Apply configurations
kubectl apply -f test-deployment.yaml
kubectl apply -f test-vpa.yaml

# Generate load to help VPA learn
kubectl exec deployment/vpa-test -- stress --cpu 2 --vm 1 --vm-bytes 200M --timeout 300

# Wait for VPA to analyze (5-10 minutes)
sleep 300

# Check recommendations
kubectl describe vpa vpa-test-vpa

# Watch pods get recreated (Auto mode)
kubectl get pods -l app=vpa-test -w

💡 The Learning Process: VPA needs to see actual usage. stress command simulates load so VPA can say "Hey, this pod needs more than 50m CPU!"

🚨 VPA Troubleshooting

Common Issues & Solutions

Issue 1: VPA shows no recommendations

bash

# Check VPA components are running
kubectl get pods -n kube-system | grep vpa

# Check logs
kubectl logs -n kube-system deployment/vpa-recommender

# Verify metrics are available
kubectl top pods

# Check VPA object
kubectl describe vpa <name>

💡 Likely Causes: Metrics server not running, VPA pods crashed, or not enough time has passed (VPA needs hours of data).

Issue 2: Pods not being updated in Auto mode

bash

# Check update mode
kubectl get vpa <name> -o yaml | grep updateMode

# Check if recommendations exist
kubectl describe vpa <name> | grep -A5 "Recommendation"

# Check updater logs
kubectl logs -n kube-system deployment/vpa-updater

# Check events
kubectl get events | grep -i vpa

💡 Common Reason: Recommendations don't differ enough from current resources. VPA won't restart pods for tiny changes.

Issue 3: Admission controller not working

bash

# Check mutating webhook
kubectl get mutatingwebhookconfigurations

# Check webhook logs
kubectl logs -n kube-system deployment/vpa-admission-controller

# Test pod creation
kubectl run test --image=nginx --dry-run=client -o yaml | kubectl apply -f -

💡 Webhook Issues: Admission controller is a webhook that intercepts pod creation. If it's down, new pods won't get VPA recommendations.

Diagnostic Commands

bash

# Get all VPA resources
kubectl get vpa --all-namespaces

# Check VPA system status
kubectl get pods -n kube-system -l app=vpa
kubectl get deployments -n kube-system -l app=vpa
kubectl get services -n kube-system -l app=vpa

# Check events
kubectl get events --field-selector involvedObject.kind=VerticalPodAutoscaler
kubectl get events --sort-by=.metadata.creationTimestamp | tail -20

# Check resource usage history
kubectl get --raw "/apis/metrics.k8s.io/v1beta1/pods" | jq '.items[] | select(.metadata.name | contains("your-pod"))'

⚡ VPA Best Practices

1. Start with `Initial` Mode

yaml

updatePolicy:
  updateMode: "Initial"  # Always start here

💡 Why: Zero risk. Existing pods keep running, only new pods get changes. Like dipping your toe in water before jumping in.

2. Set Conservative Bounds

yaml

minAllowed:
  cpu: "100m"     # Prevent too-small pods
  memory: "128Mi"
maxAllowed:
  cpu: "4"        # Prevent runaway growth
  memory: "8Gi"

💡 Safety Rails: Without bounds, VPA might recommend 0.1m CPU (pod won't start) or 100 CPU cores (bankrupts your cloud bill).

3. Monitor Before Switching to Auto

bash

# Run in Initial/Off mode for 1-2 weeks
# Check recommendations are stable
kubectl describe vpa <name> | grep -A10 "Recommendation"

# Only switch to Auto when:
# 1. Recommendations are stable for 7+ days
# 2. You understand the impact of pod recreation
# 3. Your app can handle pod restarts

💡 The 7-Day Rule: VPA needs to see weekly patterns (weekday vs weekend, business hours vs night). Don't trust day 1 recommendations.

4. Combine with HPA

yaml

# VPA optimizes resource per pod
# HPA adjusts number of pods
# Perfect combination for variable workloads

💡 Dream Team: VPA makes each pod the perfect size. HPA decides how many perfect-sized pods you need based on traffic.

5. Regular Reviews

Weekly: Check VPA recommendations
Monthly: Adjust min/max bounds if needed
Quarterly: Review if VPA still needed

💡 VPA Isn't Fire-and-Forget: Like a garden, it needs occasional checking. Applications change, usage patterns shift.

📈 VPA with HPA: Combined Strategy

Why Combine Both?

VPA: Gets resource requests right for each pod
HPA: Scales pod count based on those correct resources
Result: Optimal scaling in both dimensions

Implementation Example

yaml

# 1. Deployment with placeholder resources
apiVersion: apps/v1
kind: Deployment
metadata:
  name: combined-app
spec:
  replicas: 2
  template:
    spec:
      containers:
      - name: app
        image: nginx
        resources:
          requests:
            cpu: "100m"    # Placeholder - VPA will fix
            memory: "256Mi" # Placeholder - VPA will fix

---
# 2. VPA for resource optimization
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: combined-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: combined-app
  updatePolicy:
    updateMode: "Initial"  # Safe mode

---
# 3. HPA for replica scaling
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: combined-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: combined-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70  # Scale based on VPA-optimized resources

Important Interaction

HPA uses: (actual usage / requested resources) × 100%

Since VPA optimizes requested resources, HPA makes better scaling decisions!

💡 Example: Your pod actually uses 300m CPU.

Without VPA: You guessed 500m request → HPA sees 60% usage (300/500) → No scaling
With VPA: VPA sets 350m request → HPA sees 85% usage (300/350) → Scales up!

VPA makes HPA's math more accurate.

⚠️ VPA Limitations & Gotchas

1. Pod Recreation Required

VPA cannot update running pods' resources
Pods must be recreated (causes brief downtime)
Not suitable for stateful applications that hate restarts

💡 The Shirt Problem: You can't resize a shirt while someone's wearing it. You must give them a new shirt (recreate pod).

2. Learning Period Required

Needs 8+ hours of metrics for good recommendations
Longer (24-48 hours) for stable patterns
Initial recommendations may be inaccurate

💡 Like a New Doctor: A doctor needs to examine you multiple times before understanding your health patterns. Day 1 diagnosis might be wrong.

3. Container Name Dependency

yaml

# VPA tracks by CONTAINER NAME
containers:
- name: "app"          # Profile tied to THIS name
- name: "app-v2"       # Different name = different profile!

💡 Name Change = New Person: If you rename container from "app" to "app-v2", VPA thinks it's a completely different container and starts learning from scratch.

4. Not for All Workloads

Avoid VPA for:

❌ Short-lived Jobs (< 5 minutes)
❌ StatefulSets with persistent data
❌ Applications with bursty, unpredictable patterns
❌ When pod recreation causes issues

💡 Wrong Tool for the Job: VPA is great for steady workloads. For spiky, unpredictable workloads, HPA is better.

5. Resource Quota Conflicts

VPA might recommend resources exceeding namespace quotas
Can cause pod creation failures
Monitor quotas when using VPA

💡 Quota Jail: If your namespace has 2 CPU quota and VPA wants to give one pod 3 CPU, that pod can't be created.

🔧 Advanced VPA Configuration

Custom Metrics Integration

yaml

# VPA can use Prometheus metrics
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: custom-vpa
  annotations:
    vpa.custom.metrics.prometheus.io/cpu: |
      rate(container_cpu_usage_seconds_total{container="myapp"}[5m])
    vpa.custom.metrics.prometheus.io/memory: |
      container_memory_working_set_bytes{container="myapp"}

💡 Beyond CPU/Memory: In theory, VPA could use any metric (requests per second, queue length), but CPU/memory are the standard ones.

Resource-Specific Controls

yaml

controlledValues: "RequestsOnly"  # Only adjust requests, not limits
# or
controlledValues: "RequestsAndLimits"  # Adjust both (default)

controlledResources: ["cpu"]  # Only adjust CPU, not memory
# or
controlledResources: ["cpu", "memory"]  # Adjust both (default)

💡 Requests vs Limits:

Requests: What Kubernetes guarantees you
Limits: Maximum you can use
Most people let VPA adjust both, but RequestsOnly is safer.

Target CPU/Memory Percentiles

yaml

# Adjust safety margins (advanced)
# These are VPA recommender flags, not in VPA spec
# Set as command-line args to recommender
spec:
  containers:
  - name: recommender
    args:
    - --cpu-histogram-decay-half-life=24h
    - --memory-histogram-decay-half-life=48h
    - --target-cpu-utilization=0.9  # 90th percentile
    - --target-memory-utilization=0.9

💡 Tuning Knobs:

decay-half-life: How quickly VPA "forgets" old data (24h = yesterday's data matters half as much as today's)
target-utilization: How aggressive to be (0.9 = 90th percentile = 10% safety margin)

📊 Monitoring VPA

Key Metrics to Track

bash

# 1. Resource optimization
kubectl describe vpa <name> | grep -A5 "Recommendation"

# 2. Pod evictions (Auto mode)
kubectl get events | grep "vpa.*evict"

# 3. Cost savings
# Compare before/after resource requests

# 4. Application performance
# Monitor app metrics after VPA changes

Prometheus Metrics (if exposed)

txt

# VPA recommender metrics
vpa_recommendation_cpu_cores
vpa_recommendation_memory_bytes
vpa_checkpoint_created_total

# VPA updater metrics
vpa_updater_evictions_total
vpa_updater_errors_total

🎯 When to Use VPA - Decision Guide

Use VPA When:

✅ Memory-intensive applications (Java, .NET, Node.js)
✅ Applications with growing resource needs
✅ You're unsure of resource requirements
✅ Cost optimization is important
✅ Combined with HPA for complete autoscaling

💡 Perfect Example: A Java microservice that starts with 1GB heap but grows to need 2GB over months. VPA notices and adjusts automatically.

Don't Use VPA When:

❌ Stateful applications (databases with persistent storage)
❌ Short-lived batch jobs
❌ When pod recreation causes business impact
❌ You have precise, known resource requirements
❌ Your cluster has strict resource quotas

💡 Bad Example: PostgreSQL database. If VPA kills the pod to resize it, database recovery might take minutes.

Recommended VPA Strategy

Week 1: Deploy VPA in Off mode, monitor recommendations
Week 2: Switch to Initial mode, verify new pods get correct resources
Week 3+:: If stable, consider Auto mode for continuous optimization
Ongoing: Combine with HPA, monitor monthly

💡 The VPA Journey: Off → Initial → (maybe) Auto. Like learning to drive: Parking lot (Off) → Quiet streets (Initial) → Highway (Auto, if you're brave).

📋 Quick Reference Commands

VPA Management

bash

# Basic commands
kubectl get vpa                          # List VPAs
kubectl describe vpa <name>              # Detailed view
kubectl edit vpa <name>                  # Edit VPA
kubectl delete vpa <name>                # Delete VPA

# Check components
kubectl get pods -n kube-system -l app=vpa
kubectl get deployments -n kube-system -l app=vpa
kubectl get services -n kube-system -l app=vpa

# Debugging
kubectl logs -n kube-system deployment/vpa-recommender
kubectl logs -n kube-system deployment/vpa-updater
kubectl logs -n kube-system deployment/vpa-admission-controller

Testing & Validation

bash

# Check current resource usage
kubectl top pods
kubectl describe pod <pod> | grep -A10 "Resources"

# Generate test load
kubectl exec <pod> -- stress --cpu 2 --vm 1 --vm-bytes 200M --timeout 300

# Monitor VPA actions
watch -n 5 'kubectl get vpa,pods && echo "---" && kubectl describe vpa <name> | grep -A5 "Recommendation"'

💡 Pro Tips

Always set min/max bounds to prevent extreme recommendations
Start with Initial mode in production
Monitor for 1-2 weeks before switching to Auto
Combine with HPA for complete autoscaling
Regularly review recommendations - VPA isn't "set and forget"
Test pod recreation impact before enabling Auto mode
Use with resource quotas to prevent runaway growth

🎓 Summary: VPA in One Page

Aspect	Recommendation	Why
Installation	Use Helm or official manifests	Clean management
Initial Mode	`updateMode: "Initial"`	Safe start
Resource Bounds	Always set min/max	Prevent extremes
Monitoring Period	1-2 weeks before trusting	Learn patterns
Production Use	Start with `Initial`, move to `Auto` cautiously	Avoid surprises
Best Combo	VPA + HPA	Complete autoscaling
Avoid For	Stateful apps, short jobs	Wrong tool

Remember: VPA is a powerful tool for resource optimization, but requires careful implementation and monitoring. Start small, learn patterns, and expand gradually!

Final Thought: VPA is like having a personal tailor for your pods. A good tailor measures carefully, makes small adjustments, and never ruins your favorite suit. A bad tailor cuts without measuring and leaves you with clothes that don't fit. Be a good VPA tailor.

Vertical Pod Autoscaler (VPA) ​

🎯 What is VPA? ​

Core Analogy: Tailoring Clothes ​

📊 VPA vs HPA Comparison ​

🏗️ VPA Architecture & Components ​

VPA Component Diagram ​

🚀 VPA Installation ​

Method 1: Official Release (Recommended) ​

Method 2: Helm Installation ​

Method 3: Manifest Files ​

⚙️ VPA Components Explained ​

1. VPA Recommender ​

2. VPA Updater ​

3. VPA Admission Controller ​

📝 VPA Configuration Modes ​

Four Update Modes ​

1. Initial Mode (Most Common) ​

2. Auto Mode (Aggressive) ​

3. Recreate Mode (Rarely Used) ​

4. Off Mode (Monitor Only) ​

🔧 Basic VPA Configuration ​

Minimal VPA Example ​

Complete VPA with All Options ​

📊 Resource Policy Explained ​

Container Policies ​

Wildcard vs Specific Containers ​

🛠️ Practical VPA Examples ​

Example 1: Web Application ​

Example 2: Database with StatefulSet ​

Example 3: Multi-Container Microservice ​

🔍 How VPA Makes Recommendations ​

Recommendation Algorithm ​

Viewing Recommendations ​

Sample VPA Output ​

🧪 Testing VPA Step by Step ​

Step 1: Create Test Deployment ​

Step 2: Create VPA ​

Step 3: Apply and Generate Load ​

🚨 VPA Troubleshooting ​

Common Issues & Solutions ​

Issue 1: VPA shows no recommendations ​

Issue 2: Pods not being updated in Auto mode ​

Issue 3: Admission controller not working ​

Diagnostic Commands ​

⚡ VPA Best Practices ​

1. Start with Initial Mode ​

2. Set Conservative Bounds ​

3. Monitor Before Switching to Auto ​

4. Combine with HPA ​

5. Regular Reviews ​

📈 VPA with HPA: Combined Strategy ​

Why Combine Both? ​

Implementation Example ​

Important Interaction ​

⚠️ VPA Limitations & Gotchas ​

1. Pod Recreation Required ​

2. Learning Period Required ​

3. Container Name Dependency ​

4. Not for All Workloads ​

5. Resource Quota Conflicts ​

🔧 Advanced VPA Configuration ​

Custom Metrics Integration ​

Resource-Specific Controls ​

Target CPU/Memory Percentiles ​

📊 Monitoring VPA ​

Key Metrics to Track ​

Prometheus Metrics (if exposed) ​

🎯 When to Use VPA - Decision Guide ​

Use VPA When: ​

Don't Use VPA When: ​

Recommended VPA Strategy ​

📋 Quick Reference Commands ​

VPA Management ​

Testing & Validation ​

💡 Pro Tips ​

🎓 Summary: VPA in One Page ​

Vertical Pod Autoscaler (VPA)

🎯 What is VPA?

Core Analogy: Tailoring Clothes

📊 VPA vs HPA Comparison

🏗️ VPA Architecture & Components

VPA Component Diagram

🚀 VPA Installation

Method 1: Official Release (Recommended)

Method 2: Helm Installation

Method 3: Manifest Files

⚙️ VPA Components Explained

1. VPA Recommender

2. VPA Updater

3. VPA Admission Controller

📝 VPA Configuration Modes

Four Update Modes

1. `Initial` Mode (Most Common)

2. `Auto` Mode (Aggressive)

3. `Recreate` Mode (Rarely Used)

4. `Off` Mode (Monitor Only)

🔧 Basic VPA Configuration

Minimal VPA Example

Complete VPA with All Options

📊 Resource Policy Explained

Container Policies

Wildcard vs Specific Containers

🛠️ Practical VPA Examples

Example 1: Web Application

Example 2: Database with StatefulSet

Example 3: Multi-Container Microservice

🔍 How VPA Makes Recommendations

Recommendation Algorithm

Viewing Recommendations

Sample VPA Output

🧪 Testing VPA Step by Step

Step 1: Create Test Deployment

Step 2: Create VPA

Step 3: Apply and Generate Load

🚨 VPA Troubleshooting

Common Issues & Solutions

Issue 1: VPA shows no recommendations

Issue 2: Pods not being updated in Auto mode

Issue 3: Admission controller not working

Diagnostic Commands

⚡ VPA Best Practices

1. Start with `Initial` Mode

2. Set Conservative Bounds

3. Monitor Before Switching to Auto

4. Combine with HPA

5. Regular Reviews

📈 VPA with HPA: Combined Strategy

Why Combine Both?

Implementation Example

Important Interaction

⚠️ VPA Limitations & Gotchas

1. Pod Recreation Required

2. Learning Period Required

3. Container Name Dependency

4. Not for All Workloads

5. Resource Quota Conflicts

🔧 Advanced VPA Configuration

Custom Metrics Integration

Resource-Specific Controls

Target CPU/Memory Percentiles

📊 Monitoring VPA

Key Metrics to Track

Prometheus Metrics (if exposed)

🎯 When to Use VPA - Decision Guide

Use VPA When:

Don't Use VPA When:

Recommended VPA Strategy

📋 Quick Reference Commands

VPA Management

Testing & Validation

💡 Pro Tips

🎓 Summary: VPA in One Page