How to roll back Kubernetes deployments in Linux
How to Roll Back Kubernetes Deployments in Linux
Rolling back Kubernetes deployments is a critical skill for maintaining application stability and recovering from failed updates. When a deployment introduces bugs, performance issues, or breaks functionality, the ability to quickly revert to a previous working version can save your application from extended downtime. This comprehensive guide will walk you through everything you need to know about rolling back Kubernetes deployments in Linux environments, from basic rollback commands to advanced strategies and best practices.
Table of Contents
1. [Prerequisites and Requirements](#prerequisites-and-requirements)
2. [Understanding Kubernetes Deployment Rollbacks](#understanding-kubernetes-deployment-rollbacks)
3. [Basic Rollback Commands](#basic-rollback-commands)
4. [Step-by-Step Rollback Process](#step-by-step-rollback-process)
5. [Advanced Rollback Scenarios](#advanced-rollback-scenarios)
6. [Monitoring and Validation](#monitoring-and-validation)
7. [Troubleshooting Common Issues](#troubleshooting-common-issues)
8. [Best Practices](#best-practices)
9. [Automation and Scripting](#automation-and-scripting)
10. [Conclusion](#conclusion)
Prerequisites and Requirements
Before diving into Kubernetes deployment rollbacks, ensure you have the following prerequisites in place:
System Requirements
- A Linux system (Ubuntu 18.04+, CentOS 7+, or similar distribution)
- Kubernetes cluster (version 1.16 or later recommended)
- `kubectl` command-line tool installed and configured
- Sufficient permissions to manage deployments in your target namespace
Knowledge Prerequisites
- Basic understanding of Kubernetes concepts (pods, deployments, services)
- Familiarity with Linux command line operations
- Understanding of YAML configuration files
- Basic knowledge of container orchestration principles
Verification Steps
Before proceeding, verify your environment setup:
```bash
Check kubectl version and cluster connection
kubectl version --short
Verify cluster access
kubectl cluster-info
Check available namespaces
kubectl get namespaces
Verify your current context
kubectl config current-context
```
Understanding Kubernetes Deployment Rollbacks
Kubernetes deployments maintain a revision history that enables rollback functionality. Each time you update a deployment, Kubernetes creates a new ReplicaSet while keeping previous versions for potential rollbacks.
How Rollback History Works
When you create or update a deployment, Kubernetes:
1. Creates a new ReplicaSet with the updated configuration
2. Gradually scales up the new ReplicaSet while scaling down the old one
3. Maintains previous ReplicaSets (by default, the last 10 revisions)
4. Stores revision history in the deployment's annotations
Revision History Limits
By default, Kubernetes keeps the last 10 deployment revisions. You can modify this setting using the `revisionHistoryLimit` field in your deployment specification:
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-application
spec:
revisionHistoryLimit: 5 # Keep only 5 previous revisions
replicas: 3
selector:
matchLabels:
app: my-application
template:
metadata:
labels:
app: my-application
spec:
containers:
- name: app-container
image: nginx:1.20
```
Basic Rollback Commands
Viewing Deployment History
Before performing a rollback, examine the deployment's revision history:
```bash
View rollout history for a specific deployment
kubectl rollout history deployment/my-application
View detailed information about a specific revision
kubectl rollout history deployment/my-application --revision=2
Check current deployment status
kubectl rollout status deployment/my-application
```
Example output:
```
deployment.apps/my-application
REVISION CHANGE-CAUSE
1 kubectl apply --filename=deployment.yaml --record=true
2 kubectl set image deployment/my-application app-container=nginx:1.21 --record=true
3 kubectl set image deployment/my-application app-container=nginx:1.22 --record=true
```
Performing Basic Rollbacks
Rolling Back to Previous Revision
The simplest rollback command reverts to the immediately previous revision:
```bash
Roll back to the previous revision
kubectl rollout undo deployment/my-application
Verify the rollback
kubectl rollout status deployment/my-application
```
Rolling Back to Specific Revision
To roll back to a specific revision number:
```bash
Roll back to revision 2
kubectl rollout undo deployment/my-application --to-revision=2
Check the rollback progress
kubectl get pods -l app=my-application
```
Step-by-Step Rollback Process
Step 1: Identify the Problem
Before initiating a rollback, confirm that a rollback is necessary:
```bash
Check pod status
kubectl get pods -l app=my-application
Examine pod logs for errors
kubectl logs -l app=my-application --tail=50
Check deployment events
kubectl describe deployment my-application
```
Step 2: Analyze Deployment History
Examine the deployment history to identify the target revision:
```bash
List all revisions with change causes
kubectl rollout history deployment/my-application
Get detailed information about the current revision
kubectl describe deployment my-application | grep -A 10 "Pod Template"
Compare with a previous working revision
kubectl rollout history deployment/my-application --revision=1
```
Step 3: Prepare for Rollback
Before executing the rollback:
```bash
Create a backup of current deployment configuration
kubectl get deployment my-application -o yaml > deployment-backup-$(date +%Y%m%d-%H%M%S).yaml
Note current image versions
kubectl get deployment my-application -o jsonpath='{.spec.template.spec.containers[*].image}'
```
Step 4: Execute the Rollback
Perform the actual rollback operation:
```bash
Roll back to previous revision
kubectl rollout undo deployment/my-application
Or roll back to specific revision
kubectl rollout undo deployment/my-application --to-revision=2
```
Step 5: Monitor the Rollback Process
Watch the rollback progress:
```bash
Monitor rollout status
kubectl rollout status deployment/my-application --watch
Watch pod changes in real-time
kubectl get pods -l app=my-application --watch
Check ReplicaSet status
kubectl get replicasets -l app=my-application
```
Step 6: Validate the Rollback
Verify that the rollback was successful:
```bash
Check deployment status
kubectl get deployment my-application
Verify pod health
kubectl get pods -l app=my-application
Test application functionality
kubectl port-forward deployment/my-application 8080:80
Test via curl or browser at localhost:8080
```
Advanced Rollback Scenarios
Rolling Back Multiple Deployments
When dealing with microservices, you might need to roll back multiple related deployments:
```bash
Create a script for multiple rollbacks
#!/bin/bash
DEPLOYMENTS=("frontend" "backend" "api-gateway")
for deployment in "${DEPLOYMENTS[@]}"; do
echo "Rolling back $deployment..."
kubectl rollout undo deployment/$deployment
kubectl rollout status deployment/$deployment
done
```
Conditional Rollbacks Based on Health Checks
Implement automated rollbacks based on application health:
```bash
#!/bin/bash
DEPLOYMENT_NAME="my-application"
HEALTH_ENDPOINT="http://localhost:8080/health"
Wait for deployment to complete
kubectl rollout status deployment/$DEPLOYMENT_NAME
Port forward for health check
kubectl port-forward deployment/$DEPLOYMENT_NAME 8080:80 &
PF_PID=$!
sleep 10
Check application health
if ! curl -f $HEALTH_ENDPOINT > /dev/null 2>&1; then
echo "Health check failed, rolling back..."
kill $PF_PID
kubectl rollout undo deployment/$DEPLOYMENT_NAME
else
echo "Health check passed, deployment successful"
kill $PF_PID
fi
```
Cross-Namespace Rollbacks
Managing rollbacks across multiple namespaces:
```bash
List deployments across all namespaces
kubectl get deployments --all-namespaces
Roll back deployment in specific namespace
kubectl rollout undo deployment/my-application -n production
Batch rollback across namespaces
NAMESPACES=("staging" "production")
for ns in "${NAMESPACES[@]}"; do
kubectl rollout undo deployment/my-application -n $ns
done
```
Monitoring and Validation
Real-time Monitoring During Rollbacks
Set up comprehensive monitoring during rollback operations:
```bash
Monitor deployment, pods, and events simultaneously
kubectl get deployment,pods,events -l app=my-application --watch
Use kubectl top to monitor resource usage
kubectl top pods -l app=my-application
Monitor service endpoints
kubectl get endpoints my-application-service --watch
```
Validation Scripts
Create validation scripts to ensure rollback success:
```bash
#!/bin/bash
DEPLOYMENT_NAME="my-application"
EXPECTED_IMAGE="nginx:1.20"
Function to validate deployment
validate_deployment() {
local current_image=$(kubectl get deployment $DEPLOYMENT_NAME -o jsonpath='{.spec.template.spec.containers[0].image}')
local ready_replicas=$(kubectl get deployment $DEPLOYMENT_NAME -o jsonpath='{.status.readyReplicas}')
local desired_replicas=$(kubectl get deployment $DEPLOYMENT_NAME -o jsonpath='{.spec.replicas}')
if [[ "$current_image" == "$EXPECTED_IMAGE" ]] && [[ "$ready_replicas" == "$desired_replicas" ]]; then
echo "✅ Rollback validation successful"
return 0
else
echo "❌ Rollback validation failed"
echo "Current image: $current_image (expected: $EXPECTED_IMAGE)"
echo "Ready replicas: $ready_replicas (expected: $desired_replicas)"
return 1
fi
}
Wait for rollback to complete
kubectl rollout status deployment/$DEPLOYMENT_NAME
Validate the rollback
validate_deployment
```
Integration with Monitoring Systems
Integrate rollback operations with monitoring systems:
```bash
Send metrics to monitoring system (example with curl)
send_metric() {
local metric_name=$1
local value=$2
curl -X POST "http://monitoring-system/api/metrics" \
-H "Content-Type: application/json" \
-d "{\"metric\": \"$metric_name\", \"value\": $value, \"timestamp\": $(date +%s)}"
}
Record rollback event
send_metric "kubernetes.deployment.rollback" 1
```
Troubleshooting Common Issues
Issue 1: Rollback Hangs or Fails to Complete
Symptoms:
- Rollback command executes but pods don't update
- Some pods remain in old version while others update
Diagnosis:
```bash
Check rollout status
kubectl rollout status deployment/my-application
Examine deployment events
kubectl describe deployment my-application
Check pod status and events
kubectl describe pods -l app=my-application
```
Solutions:
```bash
Force delete stuck pods
kubectl delete pods -l app=my-application --grace-period=0 --force
Scale deployment to zero and back up
kubectl scale deployment my-application --replicas=0
kubectl scale deployment my-application --replicas=3
Check resource constraints
kubectl describe nodes
kubectl top nodes
```
Issue 2: No Revision History Available
Symptoms:
- `kubectl rollout history` shows no revisions
- Error: "no rollout history found"
Diagnosis:
```bash
Check if deployment has revision history limit set to 0
kubectl get deployment my-application -o yaml | grep revisionHistoryLimit
Check ReplicaSets
kubectl get replicasets -l app=my-application
```
Solutions:
```bash
Update deployment to include revision history
kubectl patch deployment my-application -p '{"spec":{"revisionHistoryLimit":10}}'
Use --record flag for future updates
kubectl set image deployment/my-application app-container=nginx:1.21 --record
```
Issue 3: Rollback to Wrong Version
Symptoms:
- Rollback completes but application still has issues
- Wrong image version after rollback
Diagnosis:
```bash
Verify current image version
kubectl get deployment my-application -o jsonpath='{.spec.template.spec.containers[*].image}'
Check revision history details
kubectl rollout history deployment/my-application --revision=2
```
Solutions:
```bash
Roll back to correct revision
kubectl rollout undo deployment/my-application --to-revision=1
Manually update to known good version
kubectl set image deployment/my-application app-container=nginx:1.20
```
Issue 4: Service Disruption During Rollback
Symptoms:
- Application becomes unavailable during rollback
- Connection errors from clients
Diagnosis:
```bash
Check service endpoints
kubectl get endpoints my-application-service
Monitor pod readiness
kubectl get pods -l app=my-application -o wide
```
Solutions:
```bash
Ensure proper readiness probes are configured
kubectl patch deployment my-application -p '{
"spec": {
"template": {
"spec": {
"containers": [{
"name": "app-container",
"readinessProbe": {
"httpGet": {
"path": "/health",
"port": 80
},
"initialDelaySeconds": 5,
"periodSeconds": 10
}
}]
}
}
}
}'
Adjust rolling update strategy
kubectl patch deployment my-application -p '{
"spec": {
"strategy": {
"rollingUpdate": {
"maxUnavailable": "25%",
"maxSurge": "25%"
}
}
}
}'
```
Best Practices
1. Always Use the --record Flag
Record changes to maintain a clear revision history:
```bash
Good practice: use --record for tracking changes
kubectl set image deployment/my-application app-container=nginx:1.21 --record
Add meaningful annotations
kubectl annotate deployment/my-application deployment.kubernetes.io/change-cause="Update to nginx 1.21 for security patches"
```
2. Implement Proper Health Checks
Configure comprehensive health checks to ensure rollback safety:
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-application
spec:
template:
spec:
containers:
- name: app-container
image: nginx:1.20
livenessProbe:
httpGet:
path: /health
port: 80
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 80
initialDelaySeconds: 5
periodSeconds: 5
```
3. Use Staging Environments
Always test rollbacks in staging before production:
```bash
Test rollback in staging
kubectl rollout undo deployment/my-application -n staging
kubectl rollout status deployment/my-application -n staging
Validate staging rollback before production
./validate-deployment.sh staging
Apply to production only after staging validation
kubectl rollout undo deployment/my-application -n production
```
4. Implement Automated Rollback Triggers
Create automated rollback mechanisms based on metrics:
```bash
#!/bin/bash
Automated rollback based on error rate
DEPLOYMENT_NAME="my-application"
ERROR_THRESHOLD=5
Get current error rate (example using monitoring API)
ERROR_RATE=$(curl -s "http://monitoring/api/error-rate/$DEPLOYMENT_NAME" | jq '.rate')
if (( $(echo "$ERROR_RATE > $ERROR_THRESHOLD" | bc -l) )); then
echo "Error rate $ERROR_RATE exceeds threshold $ERROR_THRESHOLD, initiating rollback"
kubectl rollout undo deployment/$DEPLOYMENT_NAME
# Send alert
curl -X POST "http://alerting/api/alert" \
-d "Automated rollback triggered for $DEPLOYMENT_NAME due to high error rate: $ERROR_RATE%"
fi
```
5. Maintain Rollback Documentation
Document your rollback procedures and maintain runbooks:
```markdown
Rollback Runbook for my-application
Pre-rollback Checklist
- [ ] Identify root cause of issue
- [ ] Check revision history: `kubectl rollout history deployment/my-application`
- [ ] Notify stakeholders of impending rollback
- [ ] Backup current configuration
Rollback Steps
1. Execute rollback: `kubectl rollout undo deployment/my-application --to-revision=X`
2. Monitor progress: `kubectl rollout status deployment/my-application`
3. Validate functionality: Run health checks
4. Update monitoring dashboards
5. Document incident and lessons learned
Post-rollback Actions
- [ ] Investigate root cause
- [ ] Plan fix for next deployment
- [ ] Update tests to prevent regression
```
6. Use Blue-Green Deployments for Zero-Downtime Rollbacks
Implement blue-green deployment strategies for critical applications:
```bash
Blue-green rollback script
#!/bin/bash
BLUE_DEPLOYMENT="my-application-blue"
GREEN_DEPLOYMENT="my-application-green"
SERVICE_NAME="my-application-service"
Check current active deployment
CURRENT_SELECTOR=$(kubectl get service $SERVICE_NAME -o jsonpath='{.spec.selector.version}')
if [[ "$CURRENT_SELECTOR" == "blue" ]]; then
INACTIVE_DEPLOYMENT=$GREEN_DEPLOYMENT
NEW_SELECTOR="green"
else
INACTIVE_DEPLOYMENT=$BLUE_DEPLOYMENT
NEW_SELECTOR="blue"
fi
Switch service to inactive deployment (rollback)
kubectl patch service $SERVICE_NAME -p "{\"spec\":{\"selector\":{\"version\":\"$NEW_SELECTOR\"}}}"
echo "Rolled back to $INACTIVE_DEPLOYMENT"
```
Automation and Scripting
Comprehensive Rollback Script
Create a comprehensive rollback script for production use:
```bash
#!/bin/bash
comprehensive-rollback.sh
set -euo pipefail
Configuration
DEPLOYMENT_NAME=""
NAMESPACE="default"
TO_REVISION=""
DRY_RUN=false
SKIP_VALIDATION=false
TIMEOUT=300
Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m' # No Color
Logging function
log() {
echo -e "${GREEN}[$(date +'%Y-%m-%d %H:%M:%S')] $1${NC}"
}
warn() {
echo -e "${YELLOW}[$(date +'%Y-%m-%d %H:%M:%S')] WARNING: $1${NC}"
}
error() {
echo -e "${RED}[$(date +'%Y-%m-%d %H:%M:%S')] ERROR: $1${NC}"
exit 1
}
Help function
show_help() {
cat << EOF
Usage: $0 -d DEPLOYMENT_NAME [OPTIONS]
Options:
-d, --deployment Deployment name (required)
-n, --namespace Kubernetes namespace (default: default)
-r, --revision Target revision number (default: previous revision)
--dry-run Show what would be done without executing
--skip-validation Skip post-rollback validation
--timeout Timeout in seconds (default: 300)
-h, --help Show this help message
Examples:
$0 -d my-app # Rollback to previous revision
$0 -d my-app -r 2 # Rollback to revision 2
$0 -d my-app -n production --dry-run # Dry run in production namespace
EOF
}
Parse command line arguments
while [[ $# -gt 0 ]]; do
case $1 in
-d|--deployment)
DEPLOYMENT_NAME="$2"
shift 2
;;
-n|--namespace)
NAMESPACE="$2"
shift 2
;;
-r|--revision)
TO_REVISION="$2"
shift 2
;;
--dry-run)
DRY_RUN=true
shift
;;
--skip-validation)
SKIP_VALIDATION=true
shift
;;
--timeout)
TIMEOUT="$2"
shift 2
;;
-h|--help)
show_help
exit 0
;;
*)
error "Unknown option: $1"
;;
esac
done
Validate required parameters
[[ -z "$DEPLOYMENT_NAME" ]] && error "Deployment name is required. Use -d or --deployment."
Check kubectl connectivity
if ! kubectl cluster-info &>/dev/null; then
error "Cannot connect to Kubernetes cluster"
fi
Check if deployment exists
if ! kubectl get deployment "$DEPLOYMENT_NAME" -n "$NAMESPACE" &>/dev/null; then
error "Deployment '$DEPLOYMENT_NAME' not found in namespace '$NAMESPACE'"
fi
Pre-rollback validation
log "Starting rollback process for deployment '$DEPLOYMENT_NAME' in namespace '$NAMESPACE'"
Show current status
log "Current deployment status:"
kubectl get deployment "$DEPLOYMENT_NAME" -n "$NAMESPACE"
Show revision history
log "Revision history:"
kubectl rollout history deployment/"$DEPLOYMENT_NAME" -n "$NAMESPACE"
Backup current configuration
BACKUP_FILE="deployment-${DEPLOYMENT_NAME}-backup-$(date +%Y%m%d-%H%M%S).yaml"
log "Creating backup: $BACKUP_FILE"
kubectl get deployment "$DEPLOYMENT_NAME" -n "$NAMESPACE" -o yaml > "$BACKUP_FILE"
Prepare rollback command
ROLLBACK_CMD="kubectl rollout undo deployment/$DEPLOYMENT_NAME -n $NAMESPACE"
if [[ -n "$TO_REVISION" ]]; then
ROLLBACK_CMD="$ROLLBACK_CMD --to-revision=$TO_REVISION"
fi
if [[ "$DRY_RUN" == "true" ]]; then
log "DRY RUN: Would execute: $ROLLBACK_CMD"
exit 0
fi
Execute rollback
log "Executing rollback..."
eval "$ROLLBACK_CMD"
Monitor rollback progress
log "Monitoring rollback progress (timeout: ${TIMEOUT}s)..."
if ! kubectl rollout status deployment/"$DEPLOYMENT_NAME" -n "$NAMESPACE" --timeout="${TIMEOUT}s"; then
error "Rollback timed out or failed"
fi
Post-rollback validation
if [[ "$SKIP_VALIDATION" == "false" ]]; then
log "Validating rollback..."
# Check pod status
READY_REPLICAS=$(kubectl get deployment "$DEPLOYMENT_NAME" -n "$NAMESPACE" -o jsonpath='{.status.readyReplicas}')
DESIRED_REPLICAS=$(kubectl get deployment "$DEPLOYMENT_NAME" -n "$NAMESPACE" -o jsonpath='{.spec.replicas}')
if [[ "$READY_REPLICAS" != "$DESIRED_REPLICAS" ]]; then
warn "Not all replicas are ready: $READY_REPLICAS/$DESIRED_REPLICAS"
fi
# Check for failed pods
FAILED_PODS=$(kubectl get pods -l app="$DEPLOYMENT_NAME" -n "$NAMESPACE" --field-selector=status.phase=Failed --no-headers | wc -l)
if [[ "$FAILED_PODS" -gt 0 ]]; then
warn "Found $FAILED_PODS failed pods"
fi
fi
log "Rollback completed successfully!"
log "Current deployment status:"
kubectl get deployment "$DEPLOYMENT_NAME" -n "$NAMESPACE"
```
Integration with CI/CD Pipelines
Integrate rollback capabilities into your CI/CD pipeline:
```yaml
GitLab CI example
rollback_production:
stage: rollback
script:
- ./comprehensive-rollback.sh -d $APPLICATION_NAME -n production -r $TARGET_REVISION
when: manual
only:
- master
environment:
name: production
```
Conclusion
Rolling back Kubernetes deployments is an essential skill for maintaining application reliability and minimizing downtime during incidents. This comprehensive guide has covered everything from basic rollback commands to advanced automation strategies and troubleshooting techniques.
Key Takeaways
1. Always maintain revision history by using the `--record` flag and setting appropriate `revisionHistoryLimit` values
2. Implement comprehensive monitoring and validation to quickly identify when rollbacks are necessary
3. Test rollback procedures regularly in staging environments to ensure they work when needed
4. Automate rollback processes where possible to reduce human error and response time
5. Document your procedures and maintain runbooks for consistent execution across teams
Next Steps
To further enhance your Kubernetes rollback capabilities:
1. Implement advanced deployment strategies like blue-green or canary deployments for zero-downtime rollbacks
2. Integrate with monitoring and alerting systems to enable automated rollback triggers
3. Develop comprehensive testing suites that validate application functionality after rollbacks
4. Create disaster recovery procedures that include rollback scenarios
5. Train your team on rollback procedures and conduct regular drills
Additional Resources
- [Kubernetes Official Documentation on Deployments](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/)
- [kubectl Cheat Sheet](https://kubernetes.io/docs/reference/kubectl/cheatsheet/)
- [Kubernetes Best Practices](https://kubernetes.io/docs/concepts/configuration/overview/)
By following the practices and procedures outlined in this guide, you'll be well-equipped to handle deployment rollbacks confidently and efficiently, ensuring your applications remain stable and available even when issues arise.
Remember that rollbacks are just one part of a comprehensive deployment strategy. Combine these techniques with proper testing, monitoring, and gradual rollout strategies to minimize the need for rollbacks while being prepared to execute them quickly and safely when necessary.