How to upgrade Kubernetes cluster in Linux
How to Upgrade Kubernetes Cluster in Linux
Kubernetes cluster upgrades are essential for maintaining security, accessing new features, and ensuring optimal performance. This comprehensive guide will walk you through the entire process of upgrading your Kubernetes cluster on Linux systems, covering everything from preparation to post-upgrade validation.
Table of Contents
1. [Introduction](#introduction)
2. [Prerequisites](#prerequisites)
3. [Understanding Kubernetes Upgrade Strategy](#understanding-kubernetes-upgrade-strategy)
4. [Pre-Upgrade Preparation](#pre-upgrade-preparation)
5. [Upgrading Control Plane Nodes](#upgrading-control-plane-nodes)
6. [Upgrading Worker Nodes](#upgrading-worker-nodes)
7. [Post-Upgrade Validation](#post-upgrade-validation)
8. [Troubleshooting Common Issues](#troubleshooting-common-issues)
9. [Best Practices and Tips](#best-practices-and-tips)
10. [Conclusion](#conclusion)
Introduction
Kubernetes cluster upgrades involve updating the control plane components, worker nodes, and associated tools to newer versions. This process ensures your cluster benefits from the latest security patches, bug fixes, and feature enhancements. However, upgrades must be performed carefully to avoid service disruptions and maintain cluster stability.
This guide focuses on upgrading clusters deployed using kubeadm, which is one of the most common deployment methods for self-managed Kubernetes clusters on Linux systems. The upgrade process involves several critical steps that must be executed in the correct order to ensure a successful transition.
Prerequisites
Before beginning the upgrade process, ensure you meet the following requirements:
System Requirements
- Root or sudo access to all cluster nodes
- Active SSH access to all nodes
- Sufficient disk space (at least 2GB free on each node)
- Network connectivity between all nodes
- Backup of etcd data and cluster configuration
Software Requirements
- Current Kubernetes cluster running version 1.24 or higher
- kubectl client tool installed and configured
- kubeadm tool available on all nodes
- Container runtime (Docker, containerd, or CRI-O) properly configured
Knowledge Requirements
- Basic understanding of Kubernetes architecture
- Familiarity with Linux command line operations
- Understanding of your cluster's current configuration
- Knowledge of your applications' deployment patterns
Pre-Upgrade Checklist
```bash
Check current cluster version
kubectl version --short
Verify cluster health
kubectl get nodes
kubectl get pods --all-namespaces
Check cluster component status
kubectl get componentstatuses
Verify etcd health
kubectl get pods -n kube-system | grep etcd
```
Understanding Kubernetes Upgrade Strategy
Kubernetes follows a structured upgrade approach that ensures compatibility and stability:
Version Skew Policy
- Control plane components can be upgraded one minor version at a time
- Worker nodes can run up to two minor versions behind the control plane
- kubectl can be one minor version ahead or behind the control plane
Upgrade Order
1. Control plane nodes (starting with the primary master)
2. Additional control plane nodes (if running in HA mode)
3. Worker nodes (can be done in rolling fashion)
Supported Upgrade Paths
```bash
Example supported upgrade paths
1.25.x → 1.26.x → 1.27.x → 1.28.x
NOT supported: 1.25.x → 1.27.x (skipping versions)
```
Pre-Upgrade Preparation
1. Create Cluster Backup
Before starting any upgrade, create comprehensive backups:
```bash
Backup etcd data
sudo ETCDCTL_API=3 etcdctl snapshot save /backup/etcd-snapshot.db \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key
Backup Kubernetes configuration
sudo cp -r /etc/kubernetes /backup/kubernetes-config
Export current cluster resources
kubectl get all --all-namespaces -o yaml > /backup/cluster-resources.yaml
```
2. Document Current State
```bash
Record current versions
kubectl version --output=yaml > /backup/versions-before-upgrade.yaml
List all nodes and their status
kubectl get nodes -o wide > /backup/nodes-before-upgrade.txt
Document running pods
kubectl get pods --all-namespaces -o wide > /backup/pods-before-upgrade.txt
```
3. Check Deprecated APIs
```bash
Use kubectl-convert plugin to check for deprecated APIs
kubectl-convert --help
Or use online tools like Pluto to scan for deprecated APIs
pluto detect-files -d /path/to/your/manifests
```
4. Update Package Repositories
```bash
For Ubuntu/Debian systems
sudo apt update
For RHEL/CentOS systems
sudo yum update -y
or for newer versions
sudo dnf update -y
```
Upgrading Control Plane Nodes
Step 1: Determine Target Version
```bash
Check available kubeadm versions
apt list -a kubeadm | head -20
or for RHEL/CentOS
yum list --showduplicates kubeadm --disableexcludes=kubernetes
```
Step 2: Upgrade kubeadm on First Control Plane Node
```bash
Drain the node (replace node-name with actual name)
kubectl drain --ignore-daemonsets --delete-emptydir-data
Upgrade kubeadm package
For Ubuntu/Debian
sudo apt-mark unhold kubeadm
sudo apt update
sudo apt install -y kubeadm=1.28.x-00
sudo apt-mark hold kubeadm
For RHEL/CentOS
sudo yum install -y kubeadm-1.28.x-0 --disableexcludes=kubernetes
```
Step 3: Verify Upgrade Plan
```bash
Check the upgrade plan
sudo kubeadm upgrade plan
This command will show:
- Current cluster version
- Available upgrade versions
- Component versions after upgrade
- Important notes and warnings
```
Step 4: Apply the Upgrade
```bash
Apply the upgrade to the first control plane node
sudo kubeadm upgrade apply v1.28.x
Follow the prompts and confirm the upgrade
This process typically takes 5-10 minutes
```
Step 5: Upgrade kubelet and kubectl
```bash
Upgrade kubelet and kubectl
For Ubuntu/Debian
sudo apt-mark unhold kubelet kubectl
sudo apt update
sudo apt install -y kubelet=1.28.x-00 kubectl=1.28.x-00
sudo apt-mark hold kubelet kubectl
Restart kubelet
sudo systemctl daemon-reload
sudo systemctl restart kubelet
Verify kubelet is running
sudo systemctl status kubelet
```
Step 6: Uncordon the Node
```bash
Make the node schedulable again
kubectl uncordon
Verify node status
kubectl get nodes
```
Step 7: Upgrade Additional Control Plane Nodes
For each additional control plane node, repeat the process with a slight modification:
```bash
On additional control plane nodes, use:
sudo kubeadm upgrade node
Instead of 'kubeadm upgrade apply'
Then proceed with kubelet and kubectl upgrade as above
```
Upgrading Worker Nodes
Worker nodes can be upgraded using a rolling update approach to minimize service disruption:
Step 1: Upgrade kubeadm on Worker Node
```bash
SSH to the worker node
ssh user@worker-node
Upgrade kubeadm package (same commands as control plane)
sudo apt-mark unhold kubeadm
sudo apt update
sudo apt install -y kubeadm=1.28.x-00
sudo apt-mark hold kubeadm
```
Step 2: Drain the Worker Node
```bash
From a machine with kubectl access to the cluster
kubectl drain --ignore-daemonsets --delete-emptydir-data
This command will:
- Mark the node as unschedulable
- Evict all pods (except DaemonSets)
- Wait for pods to terminate gracefully
```
Step 3: Upgrade the Node
```bash
On the worker node, run:
sudo kubeadm upgrade node
This updates the local kubelet configuration
```
Step 4: Upgrade kubelet and kubectl
```bash
Upgrade kubelet and kubectl packages
sudo apt-mark unhold kubelet kubectl
sudo apt update
sudo apt install -y kubelet=1.28.x-00 kubectl=1.28.x-00
sudo apt-mark hold kubelet kubectl
Restart kubelet
sudo systemctl daemon-reload
sudo systemctl restart kubelet
```
Step 5: Uncordon the Node
```bash
Make the node schedulable again
kubectl uncordon
Verify the node is ready
kubectl get nodes
```
Step 6: Verify Pod Rescheduling
```bash
Check that pods are properly distributed
kubectl get pods -o wide --all-namespaces
Verify critical applications are running
kubectl get pods -n kube-system
```
Post-Upgrade Validation
1. Verify Cluster Status
```bash
Check all nodes are ready
kubectl get nodes
Verify system pods are running
kubectl get pods -n kube-system
Check cluster version
kubectl version --short
Verify component status
kubectl get componentstatuses
```
2. Test Cluster Functionality
```bash
Create a test deployment
kubectl create deployment test-upgrade --image=nginx:latest
Scale the deployment
kubectl scale deployment test-upgrade --replicas=3
Verify pods are running
kubectl get pods -l app=test-upgrade
Test service creation
kubectl expose deployment test-upgrade --port=80 --type=ClusterIP
Clean up test resources
kubectl delete deployment test-upgrade
kubectl delete service test-upgrade
```
3. Validate Application Workloads
```bash
Check all application namespaces
kubectl get pods --all-namespaces | grep -v kube-system
Verify persistent volumes
kubectl get pv,pvc --all-namespaces
Check ingress resources
kubectl get ingress --all-namespaces
Validate services and endpoints
kubectl get svc,endpoints --all-namespaces
```
4. Performance Verification
```bash
Check resource usage
kubectl top nodes
kubectl top pods --all-namespaces
Verify DNS resolution
kubectl run test-dns --image=busybox:1.28 --rm -it --restart=Never -- nslookup kubernetes.default
Test network connectivity
kubectl run test-connectivity --image=busybox:1.28 --rm -it --restart=Never -- wget -qO- http://kubernetes.default.svc.cluster.local
```
Troubleshooting Common Issues
Issue 1: Node Fails to Join After Upgrade
Symptoms:
- Node shows "NotReady" status
- kubelet logs show certificate errors
Solution:
```bash
Check kubelet logs
sudo journalctl -xeu kubelet
Reset and rejoin the node
sudo kubeadm reset
sudo kubeadm join --token --discovery-token-ca-cert-hash
Generate new token if needed
kubeadm token create --print-join-command
```
Issue 2: Pods Stuck in Pending State
Symptoms:
- Pods remain in "Pending" status after upgrade
- Scheduler errors in logs
Solution:
```bash
Check node resources
kubectl describe nodes
Verify pod requirements
kubectl describe pod
Check for taints
kubectl get nodes -o json | jq '.items[].spec.taints'
Remove unwanted taints
kubectl taint nodes -
```
Issue 3: API Server Connectivity Issues
Symptoms:
- kubectl commands timeout
- API server logs show errors
Solution:
```bash
Check API server status
sudo systemctl status kubelet
sudo docker ps | grep kube-apiserver
Verify certificates
sudo kubeadm certs check-expiration
Restart kubelet if needed
sudo systemctl restart kubelet
```
Issue 4: etcd Cluster Issues
Symptoms:
- Control plane pods failing
- etcd connectivity errors
Solution:
```bash
Check etcd health
sudo ETCDCTL_API=3 etcdctl endpoint health \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key
Restore from backup if necessary
sudo ETCDCTL_API=3 etcdctl snapshot restore /backup/etcd-snapshot.db
```
Issue 5: CNI Plugin Compatibility
Symptoms:
- Pod networking issues
- DNS resolution failures
Solution:
```bash
Check CNI plugin status
kubectl get pods -n kube-system | grep -E "(calico|flannel|weave|cilium)"
Update CNI plugin if needed
kubectl apply -f
Verify network policies
kubectl get networkpolicies --all-namespaces
```
Best Practices and Tips
1. Upgrade Planning
- Schedule During Maintenance Windows: Plan upgrades during low-traffic periods
- Test in Staging: Always test the upgrade process in a staging environment first
- Read Release Notes: Review Kubernetes release notes for breaking changes
- Version Strategy: Maintain consistent versions across all cluster components
2. Backup Strategy
```bash
Automated backup script example
#!/bin/bash
BACKUP_DIR="/backup/$(date +%Y%m%d-%H%M%S)"
mkdir -p $BACKUP_DIR
Backup etcd
ETCDCTL_API=3 etcdctl snapshot save $BACKUP_DIR/etcd-snapshot.db \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key
Backup configurations
cp -r /etc/kubernetes $BACKUP_DIR/
kubectl get all --all-namespaces -o yaml > $BACKUP_DIR/cluster-state.yaml
```
3. Monitoring During Upgrade
- Watch System Metrics: Monitor CPU, memory, and disk usage
- Application Health Checks: Verify application endpoints remain accessible
- Log Monitoring: Keep an eye on system and application logs
- Rollback Plan: Have a tested rollback procedure ready
4. Communication Strategy
- Stakeholder Notification: Inform teams about planned maintenance
- Status Updates: Provide regular updates during the upgrade process
- Documentation: Document any issues and resolutions for future reference
5. Automation Considerations
```bash
Example upgrade automation script structure
#!/bin/bash
set -e
Pre-upgrade checks
check_cluster_health() {
kubectl get nodes | grep -q "Ready" || exit 1
}
Backup function
create_backup() {
echo "Creating backup..."
# Backup commands here
}
Upgrade function
upgrade_cluster() {
echo "Starting cluster upgrade..."
# Upgrade commands here
}
Validation function
validate_upgrade() {
echo "Validating upgrade..."
# Validation commands here
}
Main execution
main() {
check_cluster_health
create_backup
upgrade_cluster
validate_upgrade
echo "Upgrade completed successfully!"
}
main "$@"
```
6. Security Considerations
- Certificate Management: Ensure certificates are valid and up-to-date
- RBAC Updates: Review role-based access control settings after upgrade
- Network Policies: Validate network security policies remain effective
- Secrets Management: Verify secret rotation and encryption settings
7. Performance Optimization
- Resource Limits: Review and adjust resource requests and limits
- Node Affinity: Optimize pod placement with node affinity rules
- Storage Performance: Monitor persistent volume performance
- Network Optimization: Verify CNI plugin performance settings
Conclusion
Successfully upgrading a Kubernetes cluster requires careful planning, systematic execution, and thorough validation. This comprehensive guide has covered the essential steps from pre-upgrade preparation through post-upgrade validation, including troubleshooting common issues and implementing best practices.
Key Takeaways
1. Always backup your cluster before beginning any upgrade process
2. Follow the proper upgrade order: control plane first, then worker nodes
3. Test thoroughly in a staging environment before production upgrades
4. Monitor closely during and after the upgrade process
5. Document everything for future reference and team knowledge sharing
Next Steps
After completing your cluster upgrade, consider these follow-up actions:
- Update monitoring and alerting configurations for the new version
- Review and update application deployment manifests for new features
- Plan the next upgrade cycle based on the Kubernetes release schedule
- Share lessons learned with your team and update runbooks
- Evaluate new features introduced in the upgraded version
Additional Resources
- Official Kubernetes Documentation: https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/
- Kubernetes Release Notes: https://kubernetes.io/releases/
- Community Forums: Kubernetes Slack channels and Stack Overflow
- Professional Support: Consider commercial Kubernetes distributions for enterprise environments
Remember that Kubernetes upgrades are not just technical processes but also organizational ones. Ensure your team is well-trained, your procedures are well-documented, and your rollback plans are tested and ready. With proper preparation and execution, cluster upgrades can be performed safely and efficiently, keeping your Kubernetes infrastructure secure, stable, and feature-rich.
The investment in mastering the upgrade process pays dividends in maintaining a healthy, secure, and up-to-date Kubernetes environment that can support your organization's evolving containerized workloads effectively.