How to configure Keepalived in Linux - High Availability & Clustering Guide

How to Configure Keepalived in Linux Keepalived is a powerful routing software designed to provide high availability and load balancing for Linux systems. Built around the Virtual Router Redundancy Protocol (VRRP), Keepalived enables automatic failover between multiple servers, ensuring continuous service availability even when individual components fail. This comprehensive guide will walk you through the complete process of installing, configuring, and managing Keepalived in Linux environments. Table of Contents - [Introduction to Keepalived](#introduction-to-keepalived) - [Prerequisites and Requirements](#prerequisites-and-requirements) - [Installation Process](#installation-process) - [Understanding Keepalived Configuration](#understanding-keepalived-configuration) - [Step-by-Step Configuration](#step-by-step-configuration) - [Practical Examples and Use Cases](#practical-examples-and-use-cases) - [Testing and Validation](#testing-and-validation) - [Troubleshooting Common Issues](#troubleshooting-common-issues) - [Best Practices and Security](#best-practices-and-security) - [Advanced Configuration Options](#advanced-configuration-options) - [Monitoring and Maintenance](#monitoring-and-maintenance) - [Conclusion](#conclusion) Introduction to Keepalived Keepalived operates as a framework that provides both high availability and load balancing functionality for Linux-based infrastructures. The software implements the VRRP protocol, which allows multiple routers or servers to work together in a coordinated fashion, with one acting as the master and others as backups. Key Features and Benefits High Availability: Keepalived ensures service continuity by automatically promoting backup servers to master status when failures occur. This seamless failover process minimizes downtime and maintains service accessibility. Load Balancing: The software includes a complete implementation of the Linux Virtual Server (LVS) framework, enabling sophisticated load balancing across multiple backend servers. Health Checking: Keepalived continuously monitors the health of services and servers, automatically removing failed components from the active pool and restoring them when they recover. VRRP Implementation: The robust VRRP implementation ensures that virtual IP addresses are properly managed and that only one master exists at any given time. Prerequisites and Requirements Before beginning the Keepalived configuration process, ensure your environment meets the following requirements: System Requirements - Operating System: Linux distribution (Ubuntu 18.04+, CentOS 7+, RHEL 7+, Debian 9+) - Kernel Version: Linux kernel 2.6 or higher - Memory: Minimum 512MB RAM (1GB+ recommended for production) - Network: Multiple network interfaces or VLAN support - Root Access: Administrative privileges for installation and configuration Network Prerequisites - IP Address Planning: Dedicated virtual IP addresses for each service - Network Segmentation: Proper VLAN or subnet configuration - Firewall Rules: Configured to allow VRRP traffic (protocol 112) - Multicast Support: Network infrastructure supporting multicast communication Software Dependencies ```bash Essential packages for compilation (if building from source) gcc make autoconf libnl3-dev libssl-dev libpopt-dev kernel-headers ``` Installation Process Installing from Package Repositories Most modern Linux distributions include Keepalived in their official repositories, making installation straightforward. Ubuntu/Debian Installation ```bash Update package repositories sudo apt update Install Keepalived sudo apt install keepalived Verify installation keepalived --version ``` CentOS/RHEL Installation ```bash Install EPEL repository (if not already available) sudo yum install epel-release Install Keepalived sudo yum install keepalived For CentOS 8/RHEL 8 sudo dnf install keepalived Verify installation keepalived --version ``` Compiling from Source For the latest features or custom configurations, you may choose to compile Keepalived from source: ```bash Download source code wget https://www.keepalived.org/software/keepalived-2.2.8.tar.gz tar -xzf keepalived-2.2.8.tar.gz cd keepalived-2.2.8 Configure compilation ./configure --prefix=/usr/local/keepalived Compile and install make sudo make install Create systemd service file sudo cp /usr/local/keepalived/etc/systemd/keepalived.service /etc/systemd/system/ sudo systemctl daemon-reload ``` Understanding Keepalived Configuration The Keepalived configuration file (`/etc/keepalived/keepalived.conf`) uses a structured format with three main sections: Global Definitions This section contains global parameters affecting the entire Keepalived instance: ```bash global_defs { # Unique identifier for this Keepalived instance router_id LVS_DEVEL # Email notifications notification_email { admin@example.com support@example.com } notification_email_from keepalived@example.com smtp_server 192.168.1.1 smtp_connect_timeout 30 # Script execution user script_user root enable_script_security } ``` VRRP Instance Configuration VRRP instances define the high availability behavior: ```bash vrrp_instance VI_1 { state MASTER # Initial state (MASTER or BACKUP) interface eth0 # Network interface virtual_router_id 51 # Unique ID (1-255) priority 100 # Priority (higher = preferred master) advert_int 1 # Advertisement interval authentication { auth_type PASS auth_pass mypassword } virtual_ipaddress { 192.168.1.100/24 } } ``` Virtual Server Configuration Virtual servers define load balancing behavior: ```bash virtual_server 192.168.1.100 80 { delay_loop 6 # Health check interval lb_algo rr # Load balancing algorithm lb_kind NAT # Load balancing method persistence_timeout 50 # Session persistence protocol TCP # Protocol type real_server 192.168.1.10 80 { weight 1 TCP_CHECK { connect_timeout 3 nb_get_retry 3 delay_before_retry 3 } } } ``` Step-by-Step Configuration Step 1: Basic Network Setup Before configuring Keepalived, ensure your network interfaces are properly configured: ```bash Check current network configuration ip addr show Configure network interfaces (example for Ubuntu) sudo nano /etc/netplan/01-network-manager-all.yaml Example netplan configuration network: version: 2 renderer: networkd ethernets: eth0: dhcp4: no addresses: - 192.168.1.10/24 gateway4: 192.168.1.1 nameservers: addresses: [8.8.8.8, 8.8.4.4] Apply network configuration sudo netplan apply ``` Step 2: Create Master Server Configuration Create the main configuration file for the master server: ```bash sudo nano /etc/keepalived/keepalived.conf ``` ```bash Master server configuration global_defs { router_id MASTER_SERVER notification_email { admin@company.com } notification_email_from keepalived@company.com smtp_server localhost smtp_connect_timeout 30 script_user root enable_script_security } Health check script vrrp_script chk_nginx { script "/usr/local/bin/check_nginx.sh" interval 2 weight -2 fall 3 rise 2 } VRRP instance for web service vrrp_instance VI_1 { state MASTER interface eth0 virtual_router_id 51 priority 110 advert_int 1 authentication { auth_type PASS auth_pass SecurePassword123 } virtual_ipaddress { 192.168.1.100/24 dev eth0 } track_script { chk_nginx } notify_master "/usr/local/bin/master.sh" notify_backup "/usr/local/bin/backup.sh" notify_fault "/usr/local/bin/fault.sh" } ``` Step 3: Create Backup Server Configuration Configure the backup server with lower priority: ```bash Backup server configuration global_defs { router_id BACKUP_SERVER notification_email { admin@company.com } notification_email_from keepalived@company.com smtp_server localhost smtp_connect_timeout 30 script_user root enable_script_security } vrrp_script chk_nginx { script "/usr/local/bin/check_nginx.sh" interval 2 weight -2 fall 3 rise 2 } vrrp_instance VI_1 { state BACKUP interface eth0 virtual_router_id 51 priority 100 # Lower priority than master advert_int 1 authentication { auth_type PASS auth_pass SecurePassword123 # Must match master } virtual_ipaddress { 192.168.1.100/24 dev eth0 } track_script { chk_nginx } notify_master "/usr/local/bin/master.sh" notify_backup "/usr/local/bin/backup.sh" notify_fault "/usr/local/bin/fault.sh" } ``` Step 4: Create Health Check Scripts Develop custom health check scripts to monitor service availability: ```bash Create health check script sudo nano /usr/local/bin/check_nginx.sh ``` ```bash #!/bin/bash Nginx health check script Check if nginx is running if pgrep nginx > /dev/null; then # Check if nginx responds to HTTP requests if curl -f http://localhost > /dev/null 2>&1; then exit 0 # Service is healthy else exit 1 # Service is not responding fi else exit 1 # Service is not running fi ``` ```bash Make script executable sudo chmod +x /usr/local/bin/check_nginx.sh ``` Step 5: Create Notification Scripts Implement notification scripts for state changes: ```bash Master notification script sudo nano /usr/local/bin/master.sh ``` ```bash #!/bin/bash Actions to perform when becoming master echo "$(date): Becoming MASTER" >> /var/log/keepalived-state.log Start services that should only run on master systemctl start nginx systemctl start mysql Update DNS or load balancer configuration /usr/local/bin/update_dns.sh master Send notification echo "Server $(hostname) is now MASTER" | mail -s "Keepalived State Change" admin@company.com ``` ```bash Backup notification script sudo nano /usr/local/bin/backup.sh ``` ```bash #!/bin/bash Actions to perform when becoming backup echo "$(date): Becoming BACKUP" >> /var/log/keepalived-state.log Stop services that should only run on master systemctl stop nginx systemctl stop mysql Send notification echo "Server $(hostname) is now BACKUP" | mail -s "Keepalived State Change" admin@company.com ``` ```bash Make scripts executable sudo chmod +x /usr/local/bin/master.sh sudo chmod +x /usr/local/bin/backup.sh sudo chmod +x /usr/local/bin/fault.sh ``` Practical Examples and Use Cases Example 1: Web Server High Availability This example demonstrates setting up high availability for web servers using Nginx: ```bash Complete configuration for web server HA global_defs { router_id WEB_HA_CLUSTER notification_email { webadmin@company.com } notification_email_from keepalived@web-cluster.company.com smtp_server mail.company.com smtp_connect_timeout 30 } vrrp_script chk_nginx { script "/bin/bash -c 'curl -f http://localhost:80/ || exit 1'" interval 3 timeout 3 weight -2 fall 2 rise 1 } vrrp_instance WEB_SERVERS { state MASTER interface eth0 virtual_router_id 100 priority 110 advert_int 1 authentication { auth_type PASS auth_pass WebCluster2023! } virtual_ipaddress { 10.0.1.100/24 dev eth0 10.0.1.101/24 dev eth0 } track_script { chk_nginx } notify "/usr/local/bin/notify_state_change.sh" } Load balancing configuration virtual_server 10.0.1.100 80 { delay_loop 10 lb_algo wrr lb_kind DR persistence_timeout 300 protocol TCP real_server 10.0.1.10 80 { weight 3 HTTP_GET { url { path /health status_code 200 } connect_timeout 3 nb_get_retry 3 delay_before_retry 2 } } real_server 10.0.1.11 80 { weight 3 HTTP_GET { url { path /health status_code 200 } connect_timeout 3 nb_get_retry 3 delay_before_retry 2 } } } ``` Example 2: Database High Availability Configuration for database server failover: ```bash Database HA configuration global_defs { router_id DB_HA_CLUSTER script_user root enable_script_security } vrrp_script chk_mysql { script "/usr/local/bin/check_mysql.sh" interval 5 timeout 3 weight -10 fall 2 rise 1 } vrrp_instance DATABASE { state MASTER interface eth1 virtual_router_id 200 priority 120 advert_int 1 preempt_delay 300 authentication { auth_type PASS auth_pass DBCluster2023! } virtual_ipaddress { 192.168.10.100/24 dev eth1 } track_script { chk_mysql } notify_master "/usr/local/bin/mysql_master.sh" notify_backup "/usr/local/bin/mysql_backup.sh" notify_stop "/usr/local/bin/mysql_stop.sh" } ``` MySQL health check script: ```bash #!/bin/bash MySQL health check script MYSQL_USER="healthcheck" MYSQL_PASS="password" MYSQL_HOST="localhost" MYSQL_PORT="3306" Test MySQL connectivity and basic functionality mysql -u${MYSQL_USER} -p${MYSQL_PASS} -h${MYSQL_HOST} -P${MYSQL_PORT} \ -e "SELECT 1" > /dev/null 2>&1 if [ $? -eq 0 ]; then # Additional checks can be added here # Check replication status, disk space, etc. exit 0 else exit 1 fi ``` Testing and Validation Initial Configuration Testing Before deploying to production, thoroughly test your Keepalived configuration: ```bash Test configuration syntax sudo keepalived -t -f /etc/keepalived/keepalived.conf Start Keepalived in debug mode sudo keepalived -D -f /etc/keepalived/keepalived.conf Check VRRP advertisements sudo tcpdump -i eth0 vrrp Monitor system logs sudo tail -f /var/log/syslog | grep -i keepalived ``` Failover Testing Systematically test failover scenarios: ```bash Test 1: Stop Keepalived service on master sudo systemctl stop keepalived Test 2: Simulate network failure sudo iptables -A INPUT -p vrrp -j DROP sudo iptables -A OUTPUT -p vrrp -j DROP Test 3: Simulate service failure sudo systemctl stop nginx Test 4: Server reboot simulation sudo reboot Restore network rules after testing sudo iptables -D INPUT -p vrrp -j DROP sudo iptables -D OUTPUT -p vrrp -j DROP ``` Monitoring Commands Essential commands for monitoring Keepalived status: ```bash Check virtual IP assignment ip addr show | grep -A 2 -B 2 "192.168.1.100" Monitor VRRP state sudo journalctl -u keepalived -f Check process status ps aux | grep keepalived Network connectivity testing ping -c 3 192.168.1.100 telnet 192.168.1.100 80 ``` Troubleshooting Common Issues Issue 1: Split-Brain Scenarios Symptoms: Multiple masters exist simultaneously, causing IP conflicts. Diagnosis: ```bash Check for duplicate virtual IPs ip addr show | grep "192.168.1.100" Monitor VRRP traffic on both servers sudo tcpdump -i eth0 -n vrrp ``` Solutions: - Verify network connectivity between VRRP peers - Check firewall rules allowing VRRP traffic (protocol 112) - Ensure consistent authentication passwords - Review network switch configuration for multicast support Issue 2: Service Not Starting Symptoms: Keepalived fails to start or immediately stops. Diagnosis: ```bash Check configuration syntax sudo keepalived -t Review system logs sudo journalctl -u keepalived -n 50 Check file permissions ls -la /etc/keepalived/keepalived.conf ``` Solutions: - Fix configuration syntax errors - Verify script permissions and paths - Check SELinux/AppArmor policies - Ensure required kernel modules are loaded Issue 3: Health Check Failures Symptoms: Frequent failovers or services marked as down incorrectly. Diagnosis: ```bash Test health check script manually sudo /usr/local/bin/check_nginx.sh echo $? Review script execution logs sudo tail -f /var/log/syslog | grep "check_nginx" ``` Solutions: - Adjust health check intervals and thresholds - Improve script error handling and logging - Consider network latency in timeout values - Implement more sophisticated health checks Issue 4: Virtual IP Not Accessible Symptoms: Virtual IP assigned but not reachable from network. Diagnosis: ```bash Verify IP assignment ip addr show eth0 Check routing table ip route show Test local connectivity ping -c 1 -I eth0 192.168.1.100 Check ARP table on other hosts arp -a | grep 192.168.1.100 ``` Solutions: - Verify network interface configuration - Check VLAN and subnet settings - Review firewall rules on both servers and network - Ensure proper gratuitous ARP configuration Best Practices and Security Security Considerations Authentication: Always use strong passwords for VRRP authentication: ```bash authentication { auth_type PASS auth_pass $(openssl rand -base64 12) } ``` Script Security: Implement proper script validation and permissions: ```bash Use dedicated user for scripts script_user keepalived_script enable_script_security Set restrictive permissions chmod 750 /usr/local/bin/check_*.sh chown root:keepalived /usr/local/bin/check_*.sh ``` Network Security: Configure firewall rules appropriately: ```bash Allow VRRP traffic between cluster members sudo iptables -A INPUT -s 192.168.1.0/24 -p vrrp -j ACCEPT sudo iptables -A OUTPUT -d 192.168.1.0/24 -p vrrp -j ACCEPT Allow health check traffic sudo iptables -A INPUT -s 192.168.1.10,192.168.1.11 -p tcp --dport 80 -j ACCEPT ``` Performance Optimization Resource Management: Configure appropriate resource limits: ```bash Systemd service limits [Service] LimitNOFILE=65536 LimitNPROC=32768 MemoryLimit=512M ``` Network Tuning: Optimize network parameters: ```bash Increase network buffers echo 'net.core.rmem_max = 134217728' >> /etc/sysctl.conf echo 'net.core.wmem_max = 134217728' >> /etc/sysctl.conf Apply changes sudo sysctl -p ``` Configuration Management Version Control: Maintain configuration files in version control: ```bash Initialize git repository for configurations cd /etc/keepalived git init git add keepalived.conf git commit -m "Initial Keepalived configuration" ``` Backup Strategy: Implement regular configuration backups: ```bash #!/bin/bash Backup script DATE=$(date +%Y%m%d_%H%M%S) cp /etc/keepalived/keepalived.conf /backup/keepalived_${DATE}.conf find /backup -name "keepalived_*.conf" -mtime +30 -delete ``` Advanced Configuration Options Multi-Instance Setup Configure multiple VRRP instances for different services: ```bash Web service instance vrrp_instance WEB_SERVICE { state MASTER interface eth0 virtual_router_id 10 priority 110 virtual_ipaddress { 10.0.1.100/24 } } Database service instance vrrp_instance DB_SERVICE { state BACKUP interface eth1 virtual_router_id 20 priority 100 virtual_ipaddress { 10.0.2.100/24 } } ``` Advanced Load Balancing Implement sophisticated load balancing with persistence: ```bash virtual_server 10.0.1.100 443 { delay_loop 15 lb_algo sh # Source hash for persistence lb_kind TUN # IP tunneling persistence_timeout 3600 # 1-hour session persistence persistence_granularity 255.255.255.0 protocol TCP sorry_server 10.0.1.200 443 # Sorry server for maintenance real_server 10.0.1.10 443 { weight 100 inhibit_on_failure SSL_GET { url { path /api/health status_code 200 } connect_timeout 5 connect_port 443 } } } ``` Integration with Monitoring Systems Configure integration with external monitoring: ```bash Nagios/Icinga integration vrrp_instance VI_1 { # ... standard configuration ... notify "/usr/local/bin/notify_monitoring.sh" } ``` Notification script for monitoring integration: ```bash #!/bin/bash Monitoring integration script STATE=$1 INSTANCE=$2 PRIORITY=$3 case $STATE in "MASTER") # Update monitoring system curl -X POST "http://monitoring.company.com/api/update" \ -d "host=$(hostname)&state=master&service=keepalived" ;; "BACKUP") curl -X POST "http://monitoring.company.com/api/update" \ -d "host=$(hostname)&state=backup&service=keepalived" ;; "FAULT") curl -X POST "http://monitoring.company.com/api/alert" \ -d "host=$(hostname)&state=fault&service=keepalived&priority=high" ;; esac ``` Monitoring and Maintenance Log Management Configure comprehensive logging for troubleshooting: ```bash Rsyslog configuration for Keepalived echo "local0.* /var/log/keepalived.log" >> /etc/rsyslog.d/49-keepalived.conf systemctl restart rsyslog Logrotate configuration cat > /etc/logrotate.d/keepalived << EOF /var/log/keepalived.log { daily missingok rotate 52 compress delaycompress notifempty postrotate /bin/kill -HUP \`cat /var/run/rsyslogd.pid 2> /dev/null\` 2> /dev/null || true endscript } EOF ``` Performance Monitoring Implement monitoring for Keepalived performance: ```bash #!/bin/bash Performance monitoring script Check memory usage MEMORY_USAGE=$(ps -o pid,vsz,rss,comm -C keepalived --no-headers) echo "Keepalived Memory Usage: $MEMORY_USAGE" Check file descriptors PID=$(pgrep keepalived) FD_COUNT=$(ls -1 /proc/$PID/fd | wc -l) echo "File Descriptors: $FD_COUNT" Check network statistics VRRP_PACKETS=$(netstat -s | grep -i vrrp) echo "VRRP Statistics: $VRRP_PACKETS" ``` Automated Health Monitoring Create comprehensive health monitoring: ```bash #!/bin/bash Comprehensive health check script KEEPALIVED_CONFIG="/etc/keepalived/keepalived.conf" VIRTUAL_IPS=($(grep -oP 'virtual_ipaddress.?{.?\K[0-9.]+' $KEEPALIVED_CONFIG)) Check Keepalived process if ! pgrep keepalived > /dev/null; then echo "CRITICAL: Keepalived process not running" exit 2 fi Check virtual IP assignment for VIP in "${VIRTUAL_IPS[@]}"; do if ! ip addr show | grep -q "$VIP"; then echo "WARNING: Virtual IP $VIP not assigned" else echo "OK: Virtual IP $VIP assigned" fi done Check VRRP advertisements VRRP_COUNT=$(timeout 10 tcpdump -i any -c 5 vrrp 2>/dev/null | wc -l) if [ $VRRP_COUNT -lt 3 ]; then echo "WARNING: Low VRRP advertisement count: $VRRP_COUNT" else echo "OK: VRRP advertisements detected: $VRRP_COUNT" fi echo "Health check completed" ``` Conclusion Keepalived provides a robust solution for implementing high availability and load balancing in Linux environments. Through proper configuration, testing, and monitoring, you can achieve reliable service continuity and automatic failover capabilities. Key Takeaways 1. Proper Planning: Success with Keepalived begins with careful network planning and IP address allocation 2. Security First: Always implement strong authentication and secure script execution practices 3. Thorough Testing: Comprehensive testing of failover scenarios prevents production issues 4. Continuous Monitoring: Regular monitoring and maintenance ensure optimal performance 5. Documentation: Maintain detailed documentation of configurations and procedures Next Steps After implementing Keepalived, consider these additional enhancements: - Integration with Configuration Management: Use tools like Ansible, Puppet, or Chef for configuration management - Advanced Monitoring: Implement comprehensive monitoring with tools like Prometheus, Nagios, or Zabbix - Disaster Recovery: Develop and test disaster recovery procedures - Performance Tuning: Continuously optimize performance based on monitoring data - Security Hardening: Regular security audits and updates Additional Resources - Official Documentation: [keepalived.org](https://keepalived.org) - VRRP RFC: RFC 3768 for detailed protocol specifications - Linux Virtual Server: [linuxvirtualserver.org](http://www.linuxvirtualserver.org) - Community Support: Keepalived mailing lists and forums By following this comprehensive guide, you should now have a solid foundation for implementing and managing Keepalived in your Linux infrastructure. Remember that high availability is not just about technology—it requires ongoing attention, monitoring, and maintenance to ensure continued reliability and performance.