Vertical Pod Autoscaler (VPA)
🎯 What is VPA?
Vertical Pod Autoscaler (VPA) automatically adjusts the CPU and memory requests/limits of your pods based on actual usage patterns. Unlike HPA (which adds/removes pods), VPA right-sizes individual pods by modifying their resource specifications.
💡 Explanation: Think of your pods wearing clothes. HPA buys more shirts when you have more people (adds pods). VPA measures each person and tailors their shirt to fit perfectly (adjusts CPU/memory per pod). If someone grows or shrinks, VPA remeasures and makes them a new shirt.
Core Analogy: Tailoring Clothes
- HPA: Buys more shirts when you have more people
- VPA: Resizes each shirt to perfectly fit each person
📊 VPA vs HPA Comparison
| Aspect | Horizontal Pod Autoscaler (HPA) | Vertical Pod Autoscaler (VPA) |
|---|---|---|
| Scales | Number of pod replicas | CPU/Memory per pod |
| Direction | Horizontal (more/fewer pods) | Vertical (more/fewer resources) |
| Best for | Variable traffic | Predictable resource patterns |
| Disruption | None (adds/removes pods) | Pod recreation required |
| Typical Use | Web apps, APIs | Databases, memory-heavy apps |
💡 Key Difference: HPA changes quantity (pod count), VPA changes quality (resources per pod). They can work together: VPA makes each pod the right size, HPA decides how many of those right-sized pods you need.
🏗️ VPA Architecture & Components
VPA Component Diagram
┌─────────────────────────────────────────────────────────┐
│ VPA Controller │
│ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ 1. RECOMMENDER │ │
│ │ • Analyzes historical usage │ │
│ │ • Creates resource profiles │ │
│ │ • Suggests optimal requests/limits │ │
│ └─────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ 2. UPDATER │ │
│ │ • Evicts pods needing updates │ │
│ │ • Only in "Auto" mode │ │
│ │ • Gracefully terminates pods │ │
│ └─────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ 3. ADMISSION CONTROLLER │ │
│ │ • Intercepts pod creation │ │
│ │ • Injects recommended resources │ │
│ │ • Mutating webhook │ │
│ └─────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ METRICS HISTORY │
│ (Metrics Server / Prometheus) │
└─────────────────────────────────────────────────────────┘💡 How They Work Together:
- Recommender = The "brain" that learns what resources pods actually need
- Updater = The "action taker" that replaces pods when they need new sizes (only in Auto mode)
- Admission Controller = The "gatekeeper" that gives new pods the right resources from the start
- Metrics = The "memory" that stores what happened in the past
🚀 VPA Installation
Method 1: Official Release (Recommended)
# Clone VPA repository
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler
# Install all components
./hack/vpa-up.sh
# Verify installation
kubectl get pods -n kube-system | grep vpa💡 What This Installs:
vpa-recommender: Learns and suggests resource sizesvpa-updater: Takes action to resize pods (in Auto mode)vpa-admission-controller: Updates new pods as they're createdvpa-crd: Custom Resource Definition for VPA objects
Method 2: Helm Installation
# Add Helm repository
helm repo add fairwinds-stable https://charts.fairwinds.com/stable
helm repo update
# Install VPA
helm install vpa fairwinds-stable/vpa \
--namespace kube-system \
--set recommender.enabled=true \
--set updater.enabled=true \
--set admissionController.enabled=true💡 Helm Benefits: Easier upgrades, configuration management, and clean uninstalls.
Method 3: Manifest Files
# Apply individual components
kubectl apply -f https://github.com/kubernetes/autoscaler/releases/latest/download/vertical-pod-autoscaler-recommender.yaml
kubectl apply -f https://github.com/kubernetes/autoscaler/releases/latest/download/vertical-pod-autoscaler-updater.yaml
kubectl apply -f https://github.com/kubernetes/autoscaler/releases/latest/download/vertical-pod-autoscaler-admission-controller.yaml
# Verify
kubectl get pods -n kube-system -l app=vpa⚙️ VPA Components Explained
1. VPA Recommender
- Purpose: Analyzes historical resource usage
- Output: Resource recommendations stored in VPA object
- Data Source: Metrics Server (last 8 days by default)
- Algorithm: Uses histogram of usage patterns
💡 How It Learns: Imagine you have a pod that uses between 200-400m CPU over time. Recommender watches for 8 days, builds a histogram, and says: "This pod usually needs 300m CPU, but sometimes hits 400m. Let's recommend 350m to be safe."
2. VPA Updater
- Purpose: Evicts pods that need resource adjustments
- Action: Deletes pods with incorrect allocations
- Trigger: When recommendations differ significantly from current
- Mode: Only active in
Automode
💡 The "Pod Killer": Updater is like a tailor who says "Your shirt doesn't fit anymore!" and makes you take it off so they can give you a new one. It doesn't resize the shirt while you're wearing it—it makes you change shirts entirely.
3. VPA Admission Controller
- Purpose: Modifies pod specs during creation
- How: Mutating admission webhook
- When: Intercepts all pod creation requests
- What: Replaces resource requests/limits with recommendations
💡 The "Birth Certificate": When a pod is born, Admission Controller checks the Recommender's notes and says: "This pod should have 350m CPU, not the 100m written in its DNA (YAML file)." It secretly changes the birth certificate before anyone notices.
📝 VPA Configuration Modes
Four Update Modes
1. Initial Mode (Most Common)
updatePolicy:
updateMode: "Initial"- Behavior: Only sets resources on pod creation
- Existing pods: Never updated
- New pods: Get recommended resources
- Best for: Safe production use, testing
💡 Analogy: Like a hospital that measures every newborn baby and gives them the right size diaper, but doesn't change diapers on babies who've already left the hospital.
2. Auto Mode (Aggressive)
updatePolicy:
updateMode: "Auto"- Behavior: Updates resources and recreates pods
- Existing pods: Evicted and recreated with new resources
- Risk: Pod disruptions, potential downtime
- Best for: Non-critical workloads, dev environments
💡 Analogy: Like a strict parent who makes you change clothes immediately if your shirt doesn't fit perfectly, even if you're in the middle of dinner.
3. Recreate Mode (Rarely Used)
updatePolicy:
updateMode: "Recreate"- Behavior: Only recreates pods (no resource updates)
- Use case: When pod recreation needed without resource changes
💡 When to Use: If you want VPA to restart pods periodically but not change their resources. Rare case.
4. Off Mode (Monitor Only)
updatePolicy:
updateMode: "Off"- Behavior: Only provides recommendations
- Action: No automatic changes
- Best for: Learning patterns, manual optimization
💡 Analogy: Like a nutritionist who tells you "You should eat 2000 calories per day" but doesn't stop you from eating a whole pizza.
🔧 Basic VPA Configuration
Minimal VPA Example
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: myapp-vpa
spec:
targetRef:
apiVersion: "apps/v1"
kind: Deployment
name: myapp-deployment
updatePolicy:
updateMode: "Initial" # Safe mode
resourcePolicy:
containerPolicies:
- containerName: "*" # Apply to all containers
minAllowed:
cpu: "100m"
memory: "128Mi"
maxAllowed:
cpu: "2"
memory: "2Gi"
controlledResources: ["cpu", "memory"]💡 Breaking It Down:
targetRef: Which deployment/statefulset to watch (like tagging a person for measurement)updateMode: "Initial": Safe mode—only fix new podscontainerName: "*": Apply to ALL containers in the pod (asterisk = wildcard)minAllowed/maxAllowed: Safety rails so VPA doesn't recommend crazy sizescontrolledResources: Which resources to adjust (CPU, memory, or both)
Complete VPA with All Options
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: complete-vpa
spec:
targetRef:
apiVersion: "apps/v1"
kind: Deployment # Can be: Deployment, StatefulSet, DaemonSet
name: my-application
updatePolicy:
updateMode: "Auto" # Initial, Auto, Recreate, or Off
minReplicas: 2 # Optional: Minimum pods to consider
resourcePolicy:
containerPolicies:
- containerName: "app"
mode: "Auto" # Auto, Off, or Initial
minAllowed:
cpu: "100m"
memory: "128Mi"
maxAllowed:
cpu: "4"
memory: "8Gi"
controlledResources: ["cpu", "memory"]
controlledValues: "RequestsAndLimits" # or "RequestsOnly"
- containerName: "sidecar"
mode: "Off" # Don't autoscale this container💡 Advanced Options Explained:
minReplicas: Only start recommending if you have at least this many pods (more data = better recommendations)container-specific mode: Different rules per container in same podcontrolledValues: "RequestsOnly" = only adjust requests, not limits (limits stay as you set them)
📊 Resource Policy Explained
Container Policies
resourcePolicy:
containerPolicies:
- containerName: "webapp" # Specific container
minAllowed: # Minimum VPA can recommend
cpu: "100m"
memory: "256Mi"
maxAllowed: # Maximum VPA can recommend
cpu: "2"
memory: "4Gi"
controlledResources: # Which resources to adjust
- "cpu"
- "memory"
controlledValues: "RequestsAndLimits" # Options:
# - RequestsAndLimits (default)
# - RequestsOnly💡 Why Min/Max Matters: Without these, VPA might recommend 0.001m CPU (too small to run) or 1000 CPU cores (breaks your cluster). These are guardrails.
Wildcard vs Specific Containers
# Option 1: All containers
- containerName: "*" # Apply to EVERY container
# Option 2: Specific containers
- containerName: "app-server" # Only this container
- containerName: "cache" # Different policy for cache
- containerName: "logger" # Don't set mode: "Off"💡 Real Example: Your pod has 3 containers: app (needs lots of CPU), redis (needs lots of memory), logger (tiny, fixed needs). Use specific names to treat each differently.
🛠️ Practical VPA Examples
Example 1: Web Application
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: webapp-vpa
spec:
targetRef:
apiVersion: "apps/v1"
kind: Deployment
name: webapp
updatePolicy:
updateMode: "Initial" # Safe for production
resourcePolicy:
containerPolicies:
- containerName: "nginx"
minAllowed:
cpu: "100m"
memory: "128Mi"
maxAllowed:
cpu: "1"
memory: "1Gi"
- containerName: "app"
minAllowed:
cpu: "200m"
memory: "256Mi"
maxAllowed:
cpu: "2"
memory: "2Gi"💡 Two Containers, Different Needs: Nginx (web server) vs App (application logic) have different resource patterns. VPA handles each separately.
Example 2: Database with StatefulSet
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: postgres-vpa
spec:
targetRef:
apiVersion: "apps/v1"
kind: StatefulSet # Works with StatefulSets!
name: postgres
updatePolicy:
updateMode: "Initial" # CRITICAL: Databases hate pod restarts!
resourcePolicy:
containerPolicies:
- containerName: "postgres"
minAllowed:
cpu: "500m"
memory: "1Gi"
maxAllowed:
cpu: "4"
memory: "16Gi"💡 StatefulSet Warning: Databases store data on disk. If VPA kills the pod in Auto mode, database might get confused. Initial mode is safer—only fixes new pods when they're created during maintenance.
Example 3: Multi-Container Microservice
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: microservice-vpa
spec:
targetRef:
apiVersion: "apps/v1"
kind: Deployment
name: order-service
updatePolicy:
updateMode: "Auto" # Will recreate pods
resourcePolicy:
containerPolicies:
- containerName: "order-processor"
minAllowed:
cpu: "200m"
memory: "512Mi"
maxAllowed:
cpu: "2"
memory: "4Gi"
controlledValues: "RequestsAndLimits"
- containerName: "redis-sidecar"
mode: "Off" # Don't autoscale sidecar
# No min/max needed when mode is Off💡 Sidecar Pattern: Redis runs alongside your app as a cache. You might not want VPA changing it because Redis has known memory patterns. mode: "Off" tells VPA to leave it alone.
🔍 How VPA Makes Recommendations
Recommendation Algorithm
VPA Recommendation Process:
1. Collect 8 days of usage data (default)
2. Build histogram of CPU/Memory usage
3. Calculate:
- Target: 90th percentile of usage + safety margin
- Lower Bound: 50th percentile
- Upper Bound: 95th percentile + safety margin
4. Store in VPA object status💡 What This Means: If your pod uses 100m, 200m, 300m, 400m CPU over time:
- 50th percentile (Lower Bound): 250m (half the time it uses less than this)
- 90th percentile (Target): 380m (90% of the time it uses less than this)
- 95th percentile (Upper Bound): 390m (almost never uses more than this)
VPA recommends Target (380m) as the "right size" with safety margins.
Viewing Recommendations
# Check VPA status
kubectl get vpa
# Detailed view with recommendations
kubectl describe vpa myapp-vpa
# Raw YAML with recommendations
kubectl get vpa myapp-vpa -o yamlSample VPA Output
$ kubectl describe vpa/webapp-vpa
Status:
Conditions:
Status: True
Type: RecommendationProvided
Recommendation:
Container Recommendations:
Container Name: nginx
Lower Bound: # Minimum safe
Cpu: 100m
Memory: 128Mi
Target: # ⭐ RECOMMENDED VALUE ⭐
Cpu: 350m
Memory: 512Mi
Uncapped Target: # Without min/max constraints
Cpu: 420m
Memory: 600Mi
Upper Bound: # Maximum safe
Cpu: 500m
Memory: 1Gi💡 Reading This Output:
- Target (350m CPU): What VPA wants to set your pod to
- Lower Bound (100m): Below this is dangerously small
- Upper Bound (500m): Above this is wastefully large
- Uncapped Target (420m): What VPA would recommend if you had no min/max limits
🧪 Testing VPA Step by Step
Step 1: Create Test Deployment
# test-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: vpa-test
spec:
replicas: 2
selector:
matchLabels:
app: vpa-test
template:
metadata:
labels:
app: vpa-test
spec:
containers:
- name: test-app
image: polinux/stress
command: ["sleep", "3600"]
resources:
requests:
cpu: "50m" # Intentionally TOO LOW
memory: "64Mi" # Intentionally TOO LOW💡 Setting Up the Test: We create pods with obviously wrong resources (50m CPU) so VPA has something to fix.
Step 2: Create VPA
# test-vpa.yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: vpa-test-vpa
spec:
targetRef:
apiVersion: "apps/v1"
kind: Deployment
name: vpa-test
updatePolicy:
updateMode: "Auto" # Will show immediate effect
resourcePolicy:
containerPolicies:
- containerName: "*"
minAllowed:
cpu: "100m"
memory: "128Mi"
maxAllowed:
cpu: "2"
memory: "2Gi"💡 Using Auto Mode for Testing: In real life, start with Initial. For testing, Auto shows VPA in action immediately.
Step 3: Apply and Generate Load
# Apply configurations
kubectl apply -f test-deployment.yaml
kubectl apply -f test-vpa.yaml
# Generate load to help VPA learn
kubectl exec deployment/vpa-test -- stress --cpu 2 --vm 1 --vm-bytes 200M --timeout 300
# Wait for VPA to analyze (5-10 minutes)
sleep 300
# Check recommendations
kubectl describe vpa vpa-test-vpa
# Watch pods get recreated (Auto mode)
kubectl get pods -l app=vpa-test -w💡 The Learning Process: VPA needs to see actual usage. stress command simulates load so VPA can say "Hey, this pod needs more than 50m CPU!"
🚨 VPA Troubleshooting
Common Issues & Solutions
Issue 1: VPA shows no recommendations
# Check VPA components are running
kubectl get pods -n kube-system | grep vpa
# Check logs
kubectl logs -n kube-system deployment/vpa-recommender
# Verify metrics are available
kubectl top pods
# Check VPA object
kubectl describe vpa <name>💡 Likely Causes: Metrics server not running, VPA pods crashed, or not enough time has passed (VPA needs hours of data).
Issue 2: Pods not being updated in Auto mode
# Check update mode
kubectl get vpa <name> -o yaml | grep updateMode
# Check if recommendations exist
kubectl describe vpa <name> | grep -A5 "Recommendation"
# Check updater logs
kubectl logs -n kube-system deployment/vpa-updater
# Check events
kubectl get events | grep -i vpa💡 Common Reason: Recommendations don't differ enough from current resources. VPA won't restart pods for tiny changes.
Issue 3: Admission controller not working
# Check mutating webhook
kubectl get mutatingwebhookconfigurations
# Check webhook logs
kubectl logs -n kube-system deployment/vpa-admission-controller
# Test pod creation
kubectl run test --image=nginx --dry-run=client -o yaml | kubectl apply -f -💡 Webhook Issues: Admission controller is a webhook that intercepts pod creation. If it's down, new pods won't get VPA recommendations.
Diagnostic Commands
# Get all VPA resources
kubectl get vpa --all-namespaces
# Check VPA system status
kubectl get pods -n kube-system -l app=vpa
kubectl get deployments -n kube-system -l app=vpa
kubectl get services -n kube-system -l app=vpa
# Check events
kubectl get events --field-selector involvedObject.kind=VerticalPodAutoscaler
kubectl get events --sort-by=.metadata.creationTimestamp | tail -20
# Check resource usage history
kubectl get --raw "/apis/metrics.k8s.io/v1beta1/pods" | jq '.items[] | select(.metadata.name | contains("your-pod"))'⚡ VPA Best Practices
1. Start with Initial Mode
updatePolicy:
updateMode: "Initial" # Always start here💡 Why: Zero risk. Existing pods keep running, only new pods get changes. Like dipping your toe in water before jumping in.
2. Set Conservative Bounds
minAllowed:
cpu: "100m" # Prevent too-small pods
memory: "128Mi"
maxAllowed:
cpu: "4" # Prevent runaway growth
memory: "8Gi"💡 Safety Rails: Without bounds, VPA might recommend 0.1m CPU (pod won't start) or 100 CPU cores (bankrupts your cloud bill).
3. Monitor Before Switching to Auto
# Run in Initial/Off mode for 1-2 weeks
# Check recommendations are stable
kubectl describe vpa <name> | grep -A10 "Recommendation"
# Only switch to Auto when:
# 1. Recommendations are stable for 7+ days
# 2. You understand the impact of pod recreation
# 3. Your app can handle pod restarts💡 The 7-Day Rule: VPA needs to see weekly patterns (weekday vs weekend, business hours vs night). Don't trust day 1 recommendations.
4. Combine with HPA
# VPA optimizes resource per pod
# HPA adjusts number of pods
# Perfect combination for variable workloads💡 Dream Team: VPA makes each pod the perfect size. HPA decides how many perfect-sized pods you need based on traffic.
5. Regular Reviews
- Weekly: Check VPA recommendations
- Monthly: Adjust min/max bounds if needed
- Quarterly: Review if VPA still needed
💡 VPA Isn't Fire-and-Forget: Like a garden, it needs occasional checking. Applications change, usage patterns shift.
📈 VPA with HPA: Combined Strategy
Why Combine Both?
- VPA: Gets resource requests right for each pod
- HPA: Scales pod count based on those correct resources
- Result: Optimal scaling in both dimensions
Implementation Example
# 1. Deployment with placeholder resources
apiVersion: apps/v1
kind: Deployment
metadata:
name: combined-app
spec:
replicas: 2
template:
spec:
containers:
- name: app
image: nginx
resources:
requests:
cpu: "100m" # Placeholder - VPA will fix
memory: "256Mi" # Placeholder - VPA will fix
---
# 2. VPA for resource optimization
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: combined-vpa
spec:
targetRef:
apiVersion: "apps/v1"
kind: Deployment
name: combined-app
updatePolicy:
updateMode: "Initial" # Safe mode
---
# 3. HPA for replica scaling
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: combined-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: combined-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70 # Scale based on VPA-optimized resourcesImportant Interaction
HPA uses: (actual usage / requested resources) × 100%
Since VPA optimizes requested resources, HPA makes better scaling decisions!
💡 Example: Your pod actually uses 300m CPU.
- Without VPA: You guessed 500m request → HPA sees 60% usage (300/500) → No scaling
- With VPA: VPA sets 350m request → HPA sees 85% usage (300/350) → Scales up!
VPA makes HPA's math more accurate.
⚠️ VPA Limitations & Gotchas
1. Pod Recreation Required
- VPA cannot update running pods' resources
- Pods must be recreated (causes brief downtime)
- Not suitable for stateful applications that hate restarts
💡 The Shirt Problem: You can't resize a shirt while someone's wearing it. You must give them a new shirt (recreate pod).
2. Learning Period Required
- Needs 8+ hours of metrics for good recommendations
- Longer (24-48 hours) for stable patterns
- Initial recommendations may be inaccurate
💡 Like a New Doctor: A doctor needs to examine you multiple times before understanding your health patterns. Day 1 diagnosis might be wrong.
3. Container Name Dependency
# VPA tracks by CONTAINER NAME
containers:
- name: "app" # Profile tied to THIS name
- name: "app-v2" # Different name = different profile!💡 Name Change = New Person: If you rename container from "app" to "app-v2", VPA thinks it's a completely different container and starts learning from scratch.
4. Not for All Workloads
Avoid VPA for:
- ❌ Short-lived Jobs (< 5 minutes)
- ❌ StatefulSets with persistent data
- ❌ Applications with bursty, unpredictable patterns
- ❌ When pod recreation causes issues
💡 Wrong Tool for the Job: VPA is great for steady workloads. For spiky, unpredictable workloads, HPA is better.
5. Resource Quota Conflicts
- VPA might recommend resources exceeding namespace quotas
- Can cause pod creation failures
- Monitor quotas when using VPA
💡 Quota Jail: If your namespace has 2 CPU quota and VPA wants to give one pod 3 CPU, that pod can't be created.
🔧 Advanced VPA Configuration
Custom Metrics Integration
# VPA can use Prometheus metrics
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: custom-vpa
annotations:
vpa.custom.metrics.prometheus.io/cpu: |
rate(container_cpu_usage_seconds_total{container="myapp"}[5m])
vpa.custom.metrics.prometheus.io/memory: |
container_memory_working_set_bytes{container="myapp"}💡 Beyond CPU/Memory: In theory, VPA could use any metric (requests per second, queue length), but CPU/memory are the standard ones.
Resource-Specific Controls
controlledValues: "RequestsOnly" # Only adjust requests, not limits
# or
controlledValues: "RequestsAndLimits" # Adjust both (default)
controlledResources: ["cpu"] # Only adjust CPU, not memory
# or
controlledResources: ["cpu", "memory"] # Adjust both (default)💡 Requests vs Limits:
- Requests: What Kubernetes guarantees you
- Limits: Maximum you can use
- Most people let VPA adjust both, but
RequestsOnlyis safer.
Target CPU/Memory Percentiles
# Adjust safety margins (advanced)
# These are VPA recommender flags, not in VPA spec
# Set as command-line args to recommender
spec:
containers:
- name: recommender
args:
- --cpu-histogram-decay-half-life=24h
- --memory-histogram-decay-half-life=48h
- --target-cpu-utilization=0.9 # 90th percentile
- --target-memory-utilization=0.9💡 Tuning Knobs:
decay-half-life: How quickly VPA "forgets" old data (24h = yesterday's data matters half as much as today's)target-utilization: How aggressive to be (0.9 = 90th percentile = 10% safety margin)
📊 Monitoring VPA
Key Metrics to Track
# 1. Resource optimization
kubectl describe vpa <name> | grep -A5 "Recommendation"
# 2. Pod evictions (Auto mode)
kubectl get events | grep "vpa.*evict"
# 3. Cost savings
# Compare before/after resource requests
# 4. Application performance
# Monitor app metrics after VPA changesPrometheus Metrics (if exposed)
# VPA recommender metrics
vpa_recommendation_cpu_cores
vpa_recommendation_memory_bytes
vpa_checkpoint_created_total
# VPA updater metrics
vpa_updater_evictions_total
vpa_updater_errors_total🎯 When to Use VPA - Decision Guide
Use VPA When:
- ✅ Memory-intensive applications (Java, .NET, Node.js)
- ✅ Applications with growing resource needs
- ✅ You're unsure of resource requirements
- ✅ Cost optimization is important
- ✅ Combined with HPA for complete autoscaling
💡 Perfect Example: A Java microservice that starts with 1GB heap but grows to need 2GB over months. VPA notices and adjusts automatically.
Don't Use VPA When:
- ❌ Stateful applications (databases with persistent storage)
- ❌ Short-lived batch jobs
- ❌ When pod recreation causes business impact
- ❌ You have precise, known resource requirements
- ❌ Your cluster has strict resource quotas
💡 Bad Example: PostgreSQL database. If VPA kills the pod to resize it, database recovery might take minutes.
Recommended VPA Strategy
- Week 1: Deploy VPA in
Offmode, monitor recommendations - Week 2: Switch to
Initialmode, verify new pods get correct resources - Week 3+:: If stable, consider
Automode for continuous optimization - Ongoing: Combine with HPA, monitor monthly
💡 The VPA Journey: Off → Initial → (maybe) Auto. Like learning to drive: Parking lot (Off) → Quiet streets (Initial) → Highway (Auto, if you're brave).
📋 Quick Reference Commands
VPA Management
# Basic commands
kubectl get vpa # List VPAs
kubectl describe vpa <name> # Detailed view
kubectl edit vpa <name> # Edit VPA
kubectl delete vpa <name> # Delete VPA
# Check components
kubectl get pods -n kube-system -l app=vpa
kubectl get deployments -n kube-system -l app=vpa
kubectl get services -n kube-system -l app=vpa
# Debugging
kubectl logs -n kube-system deployment/vpa-recommender
kubectl logs -n kube-system deployment/vpa-updater
kubectl logs -n kube-system deployment/vpa-admission-controllerTesting & Validation
# Check current resource usage
kubectl top pods
kubectl describe pod <pod> | grep -A10 "Resources"
# Generate test load
kubectl exec <pod> -- stress --cpu 2 --vm 1 --vm-bytes 200M --timeout 300
# Monitor VPA actions
watch -n 5 'kubectl get vpa,pods && echo "---" && kubectl describe vpa <name> | grep -A5 "Recommendation"'💡 Pro Tips
- Always set min/max bounds to prevent extreme recommendations
- Start with
Initialmode in production - Monitor for 1-2 weeks before switching to
Auto - Combine with HPA for complete autoscaling
- Regularly review recommendations - VPA isn't "set and forget"
- Test pod recreation impact before enabling
Automode - Use with resource quotas to prevent runaway growth
🎓 Summary: VPA in One Page
| Aspect | Recommendation | Why |
|---|---|---|
| Installation | Use Helm or official manifests | Clean management |
| Initial Mode | updateMode: "Initial" | Safe start |
| Resource Bounds | Always set min/max | Prevent extremes |
| Monitoring Period | 1-2 weeks before trusting | Learn patterns |
| Production Use | Start with Initial, move to Auto cautiously | Avoid surprises |
| Best Combo | VPA + HPA | Complete autoscaling |
| Avoid For | Stateful apps, short jobs | Wrong tool |
Remember: VPA is a powerful tool for resource optimization, but requires careful implementation and monitoring. Start small, learn patterns, and expand gradually!
Final Thought: VPA is like having a personal tailor for your pods. A good tailor measures carefully, makes small adjustments, and never ruins your favorite suit. A bad tailor cuts without measuring and leaves you with clothes that don't fit. Be a good VPA tailor.