How to configure Corosync cluster in Linux

How to Configure Corosync Cluster in Linux Introduction Corosync is a powerful cluster communication system that provides reliable messaging and membership services for high availability clusters in Linux environments. As the foundation for many clustering solutions, including Pacemaker, Corosync ensures that cluster nodes can communicate effectively and maintain consistent cluster membership information. This comprehensive guide will walk you through the complete process of configuring a Corosync cluster from scratch. You'll learn how to install, configure, and manage Corosync clusters, understand the key concepts behind cluster communication, and implement best practices for production environments. Whether you're building your first high availability cluster or optimizing an existing setup, this article provides the detailed knowledge you need. By the end of this guide, you'll have a fully functional Corosync cluster running on multiple Linux nodes, complete with proper authentication, network configuration, and monitoring capabilities. Prerequisites and Requirements Before beginning the Corosync cluster configuration, ensure you meet the following requirements: System Requirements - Operating System: CentOS/RHEL 7+, Ubuntu 18.04+, or SUSE Linux Enterprise Server 12+ - Memory: Minimum 2GB RAM per node (4GB+ recommended for production) - CPU: Dual-core processor minimum (quad-core recommended) - Storage: At least 20GB available disk space - Network: Dedicated network interfaces for cluster communication (recommended) Network Prerequisites - Multiple Network Paths: At least two network interfaces per node for redundancy - Low Latency: Network latency should be less than 2ms between nodes - Bandwidth: Minimum 100Mbps network connection - Firewall Configuration: Proper firewall rules for Corosync communication - Time Synchronization: NTP configured and synchronized across all nodes Software Dependencies - Root Access: Administrative privileges on all cluster nodes - Package Manager: yum, apt, or zypper depending on your distribution - Text Editor: vi, nano, or your preferred editor for configuration files Planning Considerations - Node Count: Odd number of nodes (3, 5, 7) for proper quorum calculation - IP Addressing: Dedicated IP ranges for cluster communication - Naming Convention: Consistent hostname and FQDN configuration - Security: Authentication keys and encryption requirements Step-by-Step Corosync Installation Step 1: Prepare the Environment First, update your system packages and configure the basic environment on all cluster nodes: ```bash For CentOS/RHEL systems sudo yum update -y sudo yum install -y epel-release For Ubuntu systems sudo apt update && sudo apt upgrade -y For SUSE systems sudo zypper update -y ``` Configure hostnames and ensure proper DNS resolution: ```bash Set hostname on each node (replace node1 with appropriate names) sudo hostnamectl set-hostname node1.cluster.local Update /etc/hosts file on all nodes sudo cat >> /etc/hosts << EOF 192.168.100.10 node1.cluster.local node1 192.168.100.11 node2.cluster.local node2 192.168.100.12 node3.cluster.local node3 EOF ``` Step 2: Install Corosync Packages Install the necessary Corosync packages on all cluster nodes: ```bash For CentOS/RHEL 7/8 sudo yum install -y corosync pacemaker pcs fence-agents-all For Ubuntu sudo apt install -y corosync pacemaker crmsh fence-agents For SUSE sudo zypper install -y corosync pacemaker crmsh fence-agents ``` Verify the installation: ```bash corosync -v pacemakerd --version ``` Step 3: Configure Firewall Rules Configure firewall rules to allow Corosync communication: ```bash For firewalld (CentOS/RHEL) sudo firewall-cmd --permanent --add-service=high-availability sudo firewall-cmd --permanent --add-port=5404-5405/udp sudo firewall-cmd --permanent --add-port=2224/tcp sudo firewall-cmd --reload For UFW (Ubuntu) sudo ufw allow 5404:5405/udp sudo ufw allow 2224/tcp sudo ufw allow from 192.168.100.0/24 For SuSEfirewall2 (SUSE) sudo SuSEfirewall2 open EXT TCP 2224 sudo SuSEfirewall2 open EXT UDP 5404:5405 ``` Step 4: Generate Authentication Key Create a shared authentication key for secure cluster communication. Run this command on one node only: ```bash sudo corosync-keygen ``` This process may take several minutes as it generates entropy. You can speed it up by generating system activity: ```bash In another terminal, generate entropy find /usr -type f -exec md5sum {} \; > /dev/null 2>&1 & ``` Copy the generated key to all other nodes: ```bash sudo scp /etc/corosync/authkey root@node2:/etc/corosync/ sudo scp /etc/corosync/authkey root@node3:/etc/corosync/ ``` Set proper permissions on all nodes: ```bash sudo chown root:root /etc/corosync/authkey sudo chmod 400 /etc/corosync/authkey ``` Detailed Corosync Configuration Step 5: Create Main Configuration File Create the primary Corosync configuration file `/etc/corosync/corosync.conf` on all nodes: ```bash sudo cat > /etc/corosync/corosync.conf << 'EOF' totem { version: 2 cluster_name: production-cluster transport: knet # Crypto configuration crypto_cipher: aes256 crypto_hash: sha256 # Interface configuration interface { ringnumber: 0 bindnetaddr: 192.168.100.0 mcastaddr: 239.255.100.1 mcastport: 5405 ttl: 1 } interface { ringnumber: 1 bindnetaddr: 192.168.101.0 mcastaddr: 239.255.101.1 mcastport: 5407 ttl: 1 } } logging { fileline: off to_stderr: no to_logfile: yes logfile: /var/log/corosync/corosync.log to_syslog: yes debug: off timestamp: on logger_subsys { subsys: QUORUM debug: off } } quorum { provider: corosync_votequorum expected_votes: 3 two_node: 0 } nodelist { node { ring0_addr: 192.168.100.10 ring1_addr: 192.168.101.10 name: node1 nodeid: 1 } node { ring0_addr: 192.168.100.11 ring1_addr: 192.168.101.11 name: node2 nodeid: 2 } node { ring0_addr: 192.168.100.12 ring1_addr: 192.168.101.12 name: node3 nodeid: 3 } } EOF ``` Step 6: Configure Advanced Options Create additional configuration files for enhanced functionality: Service Configuration (`/etc/corosync/service.d/pcmk`): ```bash sudo mkdir -p /etc/corosync/service.d sudo cat > /etc/corosync/service.d/pcmk << 'EOF' service { name: pacemaker ver: 1 } EOF ``` Corosync Service Configuration (`/etc/sysconfig/corosync` for RHEL/CentOS): ```bash sudo cat > /etc/sysconfig/corosync << 'EOF' COROSYNC_INIT_TIMEOUT=60 COROSYNC_OPTIONS="" EOF ``` Step 7: Configure Log Rotation Set up proper log rotation to prevent disk space issues: ```bash sudo cat > /etc/logrotate.d/corosync << 'EOF' /var/log/corosync/corosync.log { daily rotate 7 missingok compress notifempty create 0600 root root postrotate /bin/kill -HUP `pidof corosync` 2>/dev/null || true endscript } EOF ``` Starting and Testing the Cluster Step 8: Enable and Start Services Enable and start Corosync services on all nodes: ```bash Enable services to start at boot sudo systemctl enable corosync sudo systemctl enable pacemaker Start Corosync first sudo systemctl start corosync Verify Corosync is running sudo systemctl status corosync ``` Wait a few seconds, then start Pacemaker: ```bash sudo systemctl start pacemaker sudo systemctl status pacemaker ``` Step 9: Verify Cluster Status Check cluster membership and communication: ```bash Check cluster membership sudo corosync-cmapctl | grep members Check cluster communication sudo corosync-cfgtool -s Check quorum status sudo corosync-quorumtool -s Check Pacemaker cluster status sudo pcs status ``` Expected output for a healthy 3-node cluster: ``` Printing ring status. Local node ID 1 RING ID 0 id = 192.168.100.10 status = ring 0 active with no faults RING ID 1 id = 192.168.101.10 status = ring 1 active with no faults ``` Practical Configuration Examples Example 1: Two-Node Cluster Configuration For environments requiring only two nodes, modify the configuration: ```bash In corosync.conf, change quorum section: quorum { provider: corosync_votequorum expected_votes: 2 two_node: 1 } Remove the third node from nodelist nodelist { node { ring0_addr: 192.168.100.10 ring1_addr: 192.168.101.10 name: node1 nodeid: 1 } node { ring0_addr: 192.168.100.11 ring1_addr: 192.168.101.11 name: node2 nodeid: 2 } } ``` Example 2: Unicast Configuration For environments where multicast is not available: ```bash Modify totem section in corosync.conf: totem { version: 2 cluster_name: unicast-cluster transport: knet # Use unicast instead of multicast interface { ringnumber: 0 bindnetaddr: 192.168.100.0 broadcast: yes } } Nodelist becomes mandatory for unicast nodelist { node { ring0_addr: 192.168.100.10 name: node1 nodeid: 1 } node { ring0_addr: 192.168.100.11 name: node2 nodeid: 2 } node { ring0_addr: 192.168.100.12 name: node3 nodeid: 3 } } ``` Example 3: Cloud Environment Configuration For cloud deployments (AWS, Azure, GCP): ```bash Cloud-optimized configuration totem { version: 2 cluster_name: cloud-cluster transport: knet # Adjust timeouts for cloud latency token: 10000 token_retransmits_before_loss_const: 6 join: 1000 consensus: 12000 max_messages: 20 interface { ringnumber: 0 bindnetaddr: 10.0.1.0 broadcast: yes mcastport: 5405 } } Disable split-brain protection for cloud quorum { provider: corosync_votequorum expected_votes: 3 wait_for_all: 0 last_man_standing: 1 last_man_standing_window: 10000 } ``` Advanced Configuration Options Performance Tuning Optimize Corosync for high-performance environments: ```bash Add performance tuning to totem section totem { # ... existing configuration ... # Performance optimizations token: 3000 token_retransmits_before_loss_const: 10 join: 100 consensus: 3600 max_messages: 20 # Network performance netmtu: 1500 threads: 4 send_join: 0 # Reduce CPU usage rrp_mode: passive } ``` Security Enhancements Implement additional security measures: ```bash Enhanced crypto configuration totem { # ... existing configuration ... # Strong encryption crypto_cipher: aes256 crypto_hash: sha512 # Key rotation keyfile: /etc/corosync/authkey key_reload_interval: 3600 } Add IP-based access control totem { # ... existing configuration ... # Restrict cluster communication clear_node_high_bit: yes interface { ringnumber: 0 bindnetaddr: 192.168.100.0 # Add allowed networks member { memberaddr: 192.168.100.10 } member { memberaddr: 192.168.100.11 } member { memberaddr: 192.168.100.12 } } } ``` Common Issues and Troubleshooting Issue 1: Cluster Nodes Not Joining Symptoms: Nodes appear offline or don't join the cluster Diagnosis: ```bash Check network connectivity sudo corosync-cfgtool -s Verify multicast connectivity sudo omping -c 10 -i 0.1 -m 239.255.100.1 192.168.100.10 192.168.100.11 Check firewall rules sudo iptables -L | grep 5405 ``` Solutions: 1. Verify network configuration and connectivity 2. Check firewall rules on all nodes 3. Ensure authentication keys match across nodes 4. Verify time synchronization with NTP Issue 2: Split-Brain Scenarios Symptoms: Multiple cluster instances running simultaneously Diagnosis: ```bash Check quorum status sudo corosync-quorumtool -s Check cluster partition information sudo crm_mon -1 ``` Solutions: 1. Implement proper quorum configuration 2. Use fencing/STONITH devices 3. Configure proper network redundancy 4. Implement quorum devices for even-node clusters Issue 3: High CPU Usage Symptoms: Corosync consuming excessive CPU resources Diagnosis: ```bash Monitor Corosync processes top -p `pidof corosync` Check message rates sudo corosync-cfgtool -s ``` Solutions: 1. Adjust token timeout values 2. Reduce message frequency 3. Optimize network configuration 4. Enable threading in configuration Issue 4: Log File Issues Symptoms: Missing logs or excessive log growth Diagnosis: ```bash Check log configuration sudo grep -A5 logging /etc/corosync/corosync.conf Verify log file permissions ls -la /var/log/corosync/ ``` Solutions: 1. Configure proper log rotation 2. Adjust logging levels 3. Verify directory permissions 4. Monitor disk space usage Monitoring and Maintenance Cluster Health Monitoring Implement comprehensive monitoring: ```bash #!/bin/bash Cluster health check script echo "=== Cluster Status Check ===" echo "Date: $(date)" echo echo "--- Corosync Status ---" systemctl status corosync --no-pager echo "--- Cluster Membership ---" corosync-cmapctl | grep members echo "--- Ring Status ---" corosync-cfgtool -s echo "--- Quorum Status ---" corosync-quorumtool -s echo "--- Pacemaker Status ---" pcs status ``` Log Analysis Monitor cluster communication: ```bash Real-time log monitoring sudo tail -f /var/log/corosync/corosync.log Search for specific issues sudo grep -i "error\|warning\|failed" /var/log/corosync/corosync.log Analyze cluster transitions sudo grep "membership" /var/log/corosync/corosync.log ``` Performance Monitoring Track cluster performance metrics: ```bash Monitor network statistics sudo netstat -i | grep eth Check memory usage ps aux | grep corosync Monitor cluster message statistics sudo corosync-cfgtool -s | grep messages ``` Best Practices and Tips Network Configuration Best Practices 1. Use Dedicated Networks: Implement separate networks for cluster communication 2. Multiple Paths: Configure redundant network paths using ring1 and ring2 3. Low Latency: Ensure network latency remains below 2ms 4. Bandwidth Planning: Allocate sufficient bandwidth for cluster traffic Security Best Practices 1. Regular Key Rotation: Update authentication keys periodically 2. Network Segmentation: Isolate cluster traffic from other network traffic 3. Firewall Configuration: Implement restrictive firewall rules 4. Access Control: Limit administrative access to cluster nodes Operational Best Practices 1. Documentation: Maintain detailed configuration documentation 2. Change Management: Implement controlled change processes 3. Backup Strategy: Regular backup of cluster configurations 4. Testing: Regular disaster recovery testing Performance Optimization 1. Timeout Tuning: Adjust timeouts based on network characteristics 2. Message Optimization: Configure appropriate message limits 3. Threading: Enable threading for high-load environments 4. Hardware Selection: Use appropriate hardware for cluster requirements Advanced Topics Integration with Storage Clusters Configure Corosync for storage clustering: ```bash Storage-optimized configuration totem { # ... base configuration ... # Storage cluster optimizations token: 5000 max_messages: 50 # Fast failure detection fail_recv_const: 2500 seqno_unchanged_const: 30 } ``` Container Environment Setup Deploy Corosync in containerized environments: ```bash Docker container considerations Mount required directories -v /etc/corosync:/etc/corosync:ro -v /var/log/corosync:/var/log/corosync -v /dev/shm:/dev/shm Required capabilities --cap-add=NET_ADMIN --cap-add=SYS_ADMIN Network configuration --network=host ``` Automation and Orchestration Automate cluster deployment: ```bash #!/bin/bash Cluster deployment automation script NODES=("node1" "node2" "node3") CLUSTER_NAME="production-cluster" for node in "${NODES[@]}"; do echo "Configuring $node..." ssh $node "systemctl enable corosync pacemaker" scp /etc/corosync/corosync.conf $node:/etc/corosync/ scp /etc/corosync/authkey $node:/etc/corosync/ done echo "Starting cluster services..." for node in "${NODES[@]}"; do ssh $node "systemctl start corosync && sleep 5 && systemctl start pacemaker" done ``` Conclusion Configuring a Corosync cluster requires careful planning, attention to detail, and understanding of clustering concepts. This comprehensive guide has covered the essential aspects of Corosync cluster configuration, from basic installation to advanced optimization techniques. Key takeaways from this guide include: - Proper Planning: Successful cluster deployment starts with thorough planning of network architecture, node configuration, and security requirements - Network Redundancy: Implementing multiple network paths ensures cluster resilience and prevents split-brain scenarios - Security Configuration: Proper authentication, encryption, and access control are critical for production deployments - Monitoring and Maintenance: Regular monitoring and proactive maintenance ensure optimal cluster performance - Documentation: Maintaining detailed documentation facilitates troubleshooting and future modifications Next Steps After successfully configuring your Corosync cluster, consider these next steps: 1. Resource Configuration: Configure cluster resources using Pacemaker 2. Fencing Setup: Implement STONITH devices for split-brain protection 3. Application Integration: Integrate your applications with the cluster 4. Monitoring Setup: Implement comprehensive monitoring solutions 5. Disaster Recovery: Develop and test disaster recovery procedures Additional Resources - Official Corosync documentation and user guides - Pacemaker configuration and resource management - High availability clustering best practices - Community forums and support channels By following this guide and implementing the recommended best practices, you'll have a robust, secure, and well-configured Corosync cluster ready for production use. Remember to regularly review and update your configuration as your environment evolves and new features become available.