How to upgrade Kubernetes cluster in Linux

How to Upgrade Kubernetes Cluster in Linux Kubernetes cluster upgrades are essential for maintaining security, accessing new features, and ensuring optimal performance. This comprehensive guide will walk you through the entire process of upgrading your Kubernetes cluster on Linux systems, covering everything from preparation to post-upgrade validation. Table of Contents 1. [Introduction](#introduction) 2. [Prerequisites](#prerequisites) 3. [Understanding Kubernetes Upgrade Strategy](#understanding-kubernetes-upgrade-strategy) 4. [Pre-Upgrade Preparation](#pre-upgrade-preparation) 5. [Upgrading Control Plane Nodes](#upgrading-control-plane-nodes) 6. [Upgrading Worker Nodes](#upgrading-worker-nodes) 7. [Post-Upgrade Validation](#post-upgrade-validation) 8. [Troubleshooting Common Issues](#troubleshooting-common-issues) 9. [Best Practices and Tips](#best-practices-and-tips) 10. [Conclusion](#conclusion) Introduction Kubernetes cluster upgrades involve updating the control plane components, worker nodes, and associated tools to newer versions. This process ensures your cluster benefits from the latest security patches, bug fixes, and feature enhancements. However, upgrades must be performed carefully to avoid service disruptions and maintain cluster stability. This guide focuses on upgrading clusters deployed using kubeadm, which is one of the most common deployment methods for self-managed Kubernetes clusters on Linux systems. The upgrade process involves several critical steps that must be executed in the correct order to ensure a successful transition. Prerequisites Before beginning the upgrade process, ensure you meet the following requirements: System Requirements - Root or sudo access to all cluster nodes - Active SSH access to all nodes - Sufficient disk space (at least 2GB free on each node) - Network connectivity between all nodes - Backup of etcd data and cluster configuration Software Requirements - Current Kubernetes cluster running version 1.24 or higher - kubectl client tool installed and configured - kubeadm tool available on all nodes - Container runtime (Docker, containerd, or CRI-O) properly configured Knowledge Requirements - Basic understanding of Kubernetes architecture - Familiarity with Linux command line operations - Understanding of your cluster's current configuration - Knowledge of your applications' deployment patterns Pre-Upgrade Checklist ```bash Check current cluster version kubectl version --short Verify cluster health kubectl get nodes kubectl get pods --all-namespaces Check cluster component status kubectl get componentstatuses Verify etcd health kubectl get pods -n kube-system | grep etcd ``` Understanding Kubernetes Upgrade Strategy Kubernetes follows a structured upgrade approach that ensures compatibility and stability: Version Skew Policy - Control plane components can be upgraded one minor version at a time - Worker nodes can run up to two minor versions behind the control plane - kubectl can be one minor version ahead or behind the control plane Upgrade Order 1. Control plane nodes (starting with the primary master) 2. Additional control plane nodes (if running in HA mode) 3. Worker nodes (can be done in rolling fashion) Supported Upgrade Paths ```bash Example supported upgrade paths 1.25.x → 1.26.x → 1.27.x → 1.28.x NOT supported: 1.25.x → 1.27.x (skipping versions) ``` Pre-Upgrade Preparation 1. Create Cluster Backup Before starting any upgrade, create comprehensive backups: ```bash Backup etcd data sudo ETCDCTL_API=3 etcdctl snapshot save /backup/etcd-snapshot.db \ --endpoints=https://127.0.0.1:2379 \ --cacert=/etc/kubernetes/pki/etcd/ca.crt \ --cert=/etc/kubernetes/pki/etcd/server.crt \ --key=/etc/kubernetes/pki/etcd/server.key Backup Kubernetes configuration sudo cp -r /etc/kubernetes /backup/kubernetes-config Export current cluster resources kubectl get all --all-namespaces -o yaml > /backup/cluster-resources.yaml ``` 2. Document Current State ```bash Record current versions kubectl version --output=yaml > /backup/versions-before-upgrade.yaml List all nodes and their status kubectl get nodes -o wide > /backup/nodes-before-upgrade.txt Document running pods kubectl get pods --all-namespaces -o wide > /backup/pods-before-upgrade.txt ``` 3. Check Deprecated APIs ```bash Use kubectl-convert plugin to check for deprecated APIs kubectl-convert --help Or use online tools like Pluto to scan for deprecated APIs pluto detect-files -d /path/to/your/manifests ``` 4. Update Package Repositories ```bash For Ubuntu/Debian systems sudo apt update For RHEL/CentOS systems sudo yum update -y or for newer versions sudo dnf update -y ``` Upgrading Control Plane Nodes Step 1: Determine Target Version ```bash Check available kubeadm versions apt list -a kubeadm | head -20 or for RHEL/CentOS yum list --showduplicates kubeadm --disableexcludes=kubernetes ``` Step 2: Upgrade kubeadm on First Control Plane Node ```bash Drain the node (replace node-name with actual name) kubectl drain --ignore-daemonsets --delete-emptydir-data Upgrade kubeadm package For Ubuntu/Debian sudo apt-mark unhold kubeadm sudo apt update sudo apt install -y kubeadm=1.28.x-00 sudo apt-mark hold kubeadm For RHEL/CentOS sudo yum install -y kubeadm-1.28.x-0 --disableexcludes=kubernetes ``` Step 3: Verify Upgrade Plan ```bash Check the upgrade plan sudo kubeadm upgrade plan This command will show: - Current cluster version - Available upgrade versions - Component versions after upgrade - Important notes and warnings ``` Step 4: Apply the Upgrade ```bash Apply the upgrade to the first control plane node sudo kubeadm upgrade apply v1.28.x Follow the prompts and confirm the upgrade This process typically takes 5-10 minutes ``` Step 5: Upgrade kubelet and kubectl ```bash Upgrade kubelet and kubectl For Ubuntu/Debian sudo apt-mark unhold kubelet kubectl sudo apt update sudo apt install -y kubelet=1.28.x-00 kubectl=1.28.x-00 sudo apt-mark hold kubelet kubectl Restart kubelet sudo systemctl daemon-reload sudo systemctl restart kubelet Verify kubelet is running sudo systemctl status kubelet ``` Step 6: Uncordon the Node ```bash Make the node schedulable again kubectl uncordon Verify node status kubectl get nodes ``` Step 7: Upgrade Additional Control Plane Nodes For each additional control plane node, repeat the process with a slight modification: ```bash On additional control plane nodes, use: sudo kubeadm upgrade node Instead of 'kubeadm upgrade apply' Then proceed with kubelet and kubectl upgrade as above ``` Upgrading Worker Nodes Worker nodes can be upgraded using a rolling update approach to minimize service disruption: Step 1: Upgrade kubeadm on Worker Node ```bash SSH to the worker node ssh user@worker-node Upgrade kubeadm package (same commands as control plane) sudo apt-mark unhold kubeadm sudo apt update sudo apt install -y kubeadm=1.28.x-00 sudo apt-mark hold kubeadm ``` Step 2: Drain the Worker Node ```bash From a machine with kubectl access to the cluster kubectl drain --ignore-daemonsets --delete-emptydir-data This command will: - Mark the node as unschedulable - Evict all pods (except DaemonSets) - Wait for pods to terminate gracefully ``` Step 3: Upgrade the Node ```bash On the worker node, run: sudo kubeadm upgrade node This updates the local kubelet configuration ``` Step 4: Upgrade kubelet and kubectl ```bash Upgrade kubelet and kubectl packages sudo apt-mark unhold kubelet kubectl sudo apt update sudo apt install -y kubelet=1.28.x-00 kubectl=1.28.x-00 sudo apt-mark hold kubelet kubectl Restart kubelet sudo systemctl daemon-reload sudo systemctl restart kubelet ``` Step 5: Uncordon the Node ```bash Make the node schedulable again kubectl uncordon Verify the node is ready kubectl get nodes ``` Step 6: Verify Pod Rescheduling ```bash Check that pods are properly distributed kubectl get pods -o wide --all-namespaces Verify critical applications are running kubectl get pods -n kube-system ``` Post-Upgrade Validation 1. Verify Cluster Status ```bash Check all nodes are ready kubectl get nodes Verify system pods are running kubectl get pods -n kube-system Check cluster version kubectl version --short Verify component status kubectl get componentstatuses ``` 2. Test Cluster Functionality ```bash Create a test deployment kubectl create deployment test-upgrade --image=nginx:latest Scale the deployment kubectl scale deployment test-upgrade --replicas=3 Verify pods are running kubectl get pods -l app=test-upgrade Test service creation kubectl expose deployment test-upgrade --port=80 --type=ClusterIP Clean up test resources kubectl delete deployment test-upgrade kubectl delete service test-upgrade ``` 3. Validate Application Workloads ```bash Check all application namespaces kubectl get pods --all-namespaces | grep -v kube-system Verify persistent volumes kubectl get pv,pvc --all-namespaces Check ingress resources kubectl get ingress --all-namespaces Validate services and endpoints kubectl get svc,endpoints --all-namespaces ``` 4. Performance Verification ```bash Check resource usage kubectl top nodes kubectl top pods --all-namespaces Verify DNS resolution kubectl run test-dns --image=busybox:1.28 --rm -it --restart=Never -- nslookup kubernetes.default Test network connectivity kubectl run test-connectivity --image=busybox:1.28 --rm -it --restart=Never -- wget -qO- http://kubernetes.default.svc.cluster.local ``` Troubleshooting Common Issues Issue 1: Node Fails to Join After Upgrade Symptoms: - Node shows "NotReady" status - kubelet logs show certificate errors Solution: ```bash Check kubelet logs sudo journalctl -xeu kubelet Reset and rejoin the node sudo kubeadm reset sudo kubeadm join --token --discovery-token-ca-cert-hash Generate new token if needed kubeadm token create --print-join-command ``` Issue 2: Pods Stuck in Pending State Symptoms: - Pods remain in "Pending" status after upgrade - Scheduler errors in logs Solution: ```bash Check node resources kubectl describe nodes Verify pod requirements kubectl describe pod Check for taints kubectl get nodes -o json | jq '.items[].spec.taints' Remove unwanted taints kubectl taint nodes - ``` Issue 3: API Server Connectivity Issues Symptoms: - kubectl commands timeout - API server logs show errors Solution: ```bash Check API server status sudo systemctl status kubelet sudo docker ps | grep kube-apiserver Verify certificates sudo kubeadm certs check-expiration Restart kubelet if needed sudo systemctl restart kubelet ``` Issue 4: etcd Cluster Issues Symptoms: - Control plane pods failing - etcd connectivity errors Solution: ```bash Check etcd health sudo ETCDCTL_API=3 etcdctl endpoint health \ --endpoints=https://127.0.0.1:2379 \ --cacert=/etc/kubernetes/pki/etcd/ca.crt \ --cert=/etc/kubernetes/pki/etcd/server.crt \ --key=/etc/kubernetes/pki/etcd/server.key Restore from backup if necessary sudo ETCDCTL_API=3 etcdctl snapshot restore /backup/etcd-snapshot.db ``` Issue 5: CNI Plugin Compatibility Symptoms: - Pod networking issues - DNS resolution failures Solution: ```bash Check CNI plugin status kubectl get pods -n kube-system | grep -E "(calico|flannel|weave|cilium)" Update CNI plugin if needed kubectl apply -f Verify network policies kubectl get networkpolicies --all-namespaces ``` Best Practices and Tips 1. Upgrade Planning - Schedule During Maintenance Windows: Plan upgrades during low-traffic periods - Test in Staging: Always test the upgrade process in a staging environment first - Read Release Notes: Review Kubernetes release notes for breaking changes - Version Strategy: Maintain consistent versions across all cluster components 2. Backup Strategy ```bash Automated backup script example #!/bin/bash BACKUP_DIR="/backup/$(date +%Y%m%d-%H%M%S)" mkdir -p $BACKUP_DIR Backup etcd ETCDCTL_API=3 etcdctl snapshot save $BACKUP_DIR/etcd-snapshot.db \ --endpoints=https://127.0.0.1:2379 \ --cacert=/etc/kubernetes/pki/etcd/ca.crt \ --cert=/etc/kubernetes/pki/etcd/server.crt \ --key=/etc/kubernetes/pki/etcd/server.key Backup configurations cp -r /etc/kubernetes $BACKUP_DIR/ kubectl get all --all-namespaces -o yaml > $BACKUP_DIR/cluster-state.yaml ``` 3. Monitoring During Upgrade - Watch System Metrics: Monitor CPU, memory, and disk usage - Application Health Checks: Verify application endpoints remain accessible - Log Monitoring: Keep an eye on system and application logs - Rollback Plan: Have a tested rollback procedure ready 4. Communication Strategy - Stakeholder Notification: Inform teams about planned maintenance - Status Updates: Provide regular updates during the upgrade process - Documentation: Document any issues and resolutions for future reference 5. Automation Considerations ```bash Example upgrade automation script structure #!/bin/bash set -e Pre-upgrade checks check_cluster_health() { kubectl get nodes | grep -q "Ready" || exit 1 } Backup function create_backup() { echo "Creating backup..." # Backup commands here } Upgrade function upgrade_cluster() { echo "Starting cluster upgrade..." # Upgrade commands here } Validation function validate_upgrade() { echo "Validating upgrade..." # Validation commands here } Main execution main() { check_cluster_health create_backup upgrade_cluster validate_upgrade echo "Upgrade completed successfully!" } main "$@" ``` 6. Security Considerations - Certificate Management: Ensure certificates are valid and up-to-date - RBAC Updates: Review role-based access control settings after upgrade - Network Policies: Validate network security policies remain effective - Secrets Management: Verify secret rotation and encryption settings 7. Performance Optimization - Resource Limits: Review and adjust resource requests and limits - Node Affinity: Optimize pod placement with node affinity rules - Storage Performance: Monitor persistent volume performance - Network Optimization: Verify CNI plugin performance settings Conclusion Successfully upgrading a Kubernetes cluster requires careful planning, systematic execution, and thorough validation. This comprehensive guide has covered the essential steps from pre-upgrade preparation through post-upgrade validation, including troubleshooting common issues and implementing best practices. Key Takeaways 1. Always backup your cluster before beginning any upgrade process 2. Follow the proper upgrade order: control plane first, then worker nodes 3. Test thoroughly in a staging environment before production upgrades 4. Monitor closely during and after the upgrade process 5. Document everything for future reference and team knowledge sharing Next Steps After completing your cluster upgrade, consider these follow-up actions: - Update monitoring and alerting configurations for the new version - Review and update application deployment manifests for new features - Plan the next upgrade cycle based on the Kubernetes release schedule - Share lessons learned with your team and update runbooks - Evaluate new features introduced in the upgraded version Additional Resources - Official Kubernetes Documentation: https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/ - Kubernetes Release Notes: https://kubernetes.io/releases/ - Community Forums: Kubernetes Slack channels and Stack Overflow - Professional Support: Consider commercial Kubernetes distributions for enterprise environments Remember that Kubernetes upgrades are not just technical processes but also organizational ones. Ensure your team is well-trained, your procedures are well-documented, and your rollback plans are tested and ready. With proper preparation and execution, cluster upgrades can be performed safely and efficiently, keeping your Kubernetes infrastructure secure, stable, and feature-rich. The investment in mastering the upgrade process pays dividends in maintaining a healthy, secure, and up-to-date Kubernetes environment that can support your organization's evolving containerized workloads effectively.