How to configure Corosync cluster in Linux
How to Configure Corosync Cluster in Linux
Introduction
Corosync is a powerful cluster communication system that provides reliable messaging and membership services for high availability clusters in Linux environments. As the foundation for many clustering solutions, including Pacemaker, Corosync ensures that cluster nodes can communicate effectively and maintain consistent cluster membership information.
This comprehensive guide will walk you through the complete process of configuring a Corosync cluster from scratch. You'll learn how to install, configure, and manage Corosync clusters, understand the key concepts behind cluster communication, and implement best practices for production environments. Whether you're building your first high availability cluster or optimizing an existing setup, this article provides the detailed knowledge you need.
By the end of this guide, you'll have a fully functional Corosync cluster running on multiple Linux nodes, complete with proper authentication, network configuration, and monitoring capabilities.
Prerequisites and Requirements
Before beginning the Corosync cluster configuration, ensure you meet the following requirements:
System Requirements
- Operating System: CentOS/RHEL 7+, Ubuntu 18.04+, or SUSE Linux Enterprise Server 12+
- Memory: Minimum 2GB RAM per node (4GB+ recommended for production)
- CPU: Dual-core processor minimum (quad-core recommended)
- Storage: At least 20GB available disk space
- Network: Dedicated network interfaces for cluster communication (recommended)
Network Prerequisites
- Multiple Network Paths: At least two network interfaces per node for redundancy
- Low Latency: Network latency should be less than 2ms between nodes
- Bandwidth: Minimum 100Mbps network connection
- Firewall Configuration: Proper firewall rules for Corosync communication
- Time Synchronization: NTP configured and synchronized across all nodes
Software Dependencies
- Root Access: Administrative privileges on all cluster nodes
- Package Manager: yum, apt, or zypper depending on your distribution
- Text Editor: vi, nano, or your preferred editor for configuration files
Planning Considerations
- Node Count: Odd number of nodes (3, 5, 7) for proper quorum calculation
- IP Addressing: Dedicated IP ranges for cluster communication
- Naming Convention: Consistent hostname and FQDN configuration
- Security: Authentication keys and encryption requirements
Step-by-Step Corosync Installation
Step 1: Prepare the Environment
First, update your system packages and configure the basic environment on all cluster nodes:
```bash
For CentOS/RHEL systems
sudo yum update -y
sudo yum install -y epel-release
For Ubuntu systems
sudo apt update && sudo apt upgrade -y
For SUSE systems
sudo zypper update -y
```
Configure hostnames and ensure proper DNS resolution:
```bash
Set hostname on each node (replace node1 with appropriate names)
sudo hostnamectl set-hostname node1.cluster.local
Update /etc/hosts file on all nodes
sudo cat >> /etc/hosts << EOF
192.168.100.10 node1.cluster.local node1
192.168.100.11 node2.cluster.local node2
192.168.100.12 node3.cluster.local node3
EOF
```
Step 2: Install Corosync Packages
Install the necessary Corosync packages on all cluster nodes:
```bash
For CentOS/RHEL 7/8
sudo yum install -y corosync pacemaker pcs fence-agents-all
For Ubuntu
sudo apt install -y corosync pacemaker crmsh fence-agents
For SUSE
sudo zypper install -y corosync pacemaker crmsh fence-agents
```
Verify the installation:
```bash
corosync -v
pacemakerd --version
```
Step 3: Configure Firewall Rules
Configure firewall rules to allow Corosync communication:
```bash
For firewalld (CentOS/RHEL)
sudo firewall-cmd --permanent --add-service=high-availability
sudo firewall-cmd --permanent --add-port=5404-5405/udp
sudo firewall-cmd --permanent --add-port=2224/tcp
sudo firewall-cmd --reload
For UFW (Ubuntu)
sudo ufw allow 5404:5405/udp
sudo ufw allow 2224/tcp
sudo ufw allow from 192.168.100.0/24
For SuSEfirewall2 (SUSE)
sudo SuSEfirewall2 open EXT TCP 2224
sudo SuSEfirewall2 open EXT UDP 5404:5405
```
Step 4: Generate Authentication Key
Create a shared authentication key for secure cluster communication. Run this command on one node only:
```bash
sudo corosync-keygen
```
This process may take several minutes as it generates entropy. You can speed it up by generating system activity:
```bash
In another terminal, generate entropy
find /usr -type f -exec md5sum {} \; > /dev/null 2>&1 &
```
Copy the generated key to all other nodes:
```bash
sudo scp /etc/corosync/authkey root@node2:/etc/corosync/
sudo scp /etc/corosync/authkey root@node3:/etc/corosync/
```
Set proper permissions on all nodes:
```bash
sudo chown root:root /etc/corosync/authkey
sudo chmod 400 /etc/corosync/authkey
```
Detailed Corosync Configuration
Step 5: Create Main Configuration File
Create the primary Corosync configuration file `/etc/corosync/corosync.conf` on all nodes:
```bash
sudo cat > /etc/corosync/corosync.conf << 'EOF'
totem {
version: 2
cluster_name: production-cluster
transport: knet
# Crypto configuration
crypto_cipher: aes256
crypto_hash: sha256
# Interface configuration
interface {
ringnumber: 0
bindnetaddr: 192.168.100.0
mcastaddr: 239.255.100.1
mcastport: 5405
ttl: 1
}
interface {
ringnumber: 1
bindnetaddr: 192.168.101.0
mcastaddr: 239.255.101.1
mcastport: 5407
ttl: 1
}
}
logging {
fileline: off
to_stderr: no
to_logfile: yes
logfile: /var/log/corosync/corosync.log
to_syslog: yes
debug: off
timestamp: on
logger_subsys {
subsys: QUORUM
debug: off
}
}
quorum {
provider: corosync_votequorum
expected_votes: 3
two_node: 0
}
nodelist {
node {
ring0_addr: 192.168.100.10
ring1_addr: 192.168.101.10
name: node1
nodeid: 1
}
node {
ring0_addr: 192.168.100.11
ring1_addr: 192.168.101.11
name: node2
nodeid: 2
}
node {
ring0_addr: 192.168.100.12
ring1_addr: 192.168.101.12
name: node3
nodeid: 3
}
}
EOF
```
Step 6: Configure Advanced Options
Create additional configuration files for enhanced functionality:
Service Configuration (`/etc/corosync/service.d/pcmk`):
```bash
sudo mkdir -p /etc/corosync/service.d
sudo cat > /etc/corosync/service.d/pcmk << 'EOF'
service {
name: pacemaker
ver: 1
}
EOF
```
Corosync Service Configuration (`/etc/sysconfig/corosync` for RHEL/CentOS):
```bash
sudo cat > /etc/sysconfig/corosync << 'EOF'
COROSYNC_INIT_TIMEOUT=60
COROSYNC_OPTIONS=""
EOF
```
Step 7: Configure Log Rotation
Set up proper log rotation to prevent disk space issues:
```bash
sudo cat > /etc/logrotate.d/corosync << 'EOF'
/var/log/corosync/corosync.log {
daily
rotate 7
missingok
compress
notifempty
create 0600 root root
postrotate
/bin/kill -HUP `pidof corosync` 2>/dev/null || true
endscript
}
EOF
```
Starting and Testing the Cluster
Step 8: Enable and Start Services
Enable and start Corosync services on all nodes:
```bash
Enable services to start at boot
sudo systemctl enable corosync
sudo systemctl enable pacemaker
Start Corosync first
sudo systemctl start corosync
Verify Corosync is running
sudo systemctl status corosync
```
Wait a few seconds, then start Pacemaker:
```bash
sudo systemctl start pacemaker
sudo systemctl status pacemaker
```
Step 9: Verify Cluster Status
Check cluster membership and communication:
```bash
Check cluster membership
sudo corosync-cmapctl | grep members
Check cluster communication
sudo corosync-cfgtool -s
Check quorum status
sudo corosync-quorumtool -s
Check Pacemaker cluster status
sudo pcs status
```
Expected output for a healthy 3-node cluster:
```
Printing ring status.
Local node ID 1
RING ID 0
id = 192.168.100.10
status = ring 0 active with no faults
RING ID 1
id = 192.168.101.10
status = ring 1 active with no faults
```
Practical Configuration Examples
Example 1: Two-Node Cluster Configuration
For environments requiring only two nodes, modify the configuration:
```bash
In corosync.conf, change quorum section:
quorum {
provider: corosync_votequorum
expected_votes: 2
two_node: 1
}
Remove the third node from nodelist
nodelist {
node {
ring0_addr: 192.168.100.10
ring1_addr: 192.168.101.10
name: node1
nodeid: 1
}
node {
ring0_addr: 192.168.100.11
ring1_addr: 192.168.101.11
name: node2
nodeid: 2
}
}
```
Example 2: Unicast Configuration
For environments where multicast is not available:
```bash
Modify totem section in corosync.conf:
totem {
version: 2
cluster_name: unicast-cluster
transport: knet
# Use unicast instead of multicast
interface {
ringnumber: 0
bindnetaddr: 192.168.100.0
broadcast: yes
}
}
Nodelist becomes mandatory for unicast
nodelist {
node {
ring0_addr: 192.168.100.10
name: node1
nodeid: 1
}
node {
ring0_addr: 192.168.100.11
name: node2
nodeid: 2
}
node {
ring0_addr: 192.168.100.12
name: node3
nodeid: 3
}
}
```
Example 3: Cloud Environment Configuration
For cloud deployments (AWS, Azure, GCP):
```bash
Cloud-optimized configuration
totem {
version: 2
cluster_name: cloud-cluster
transport: knet
# Adjust timeouts for cloud latency
token: 10000
token_retransmits_before_loss_const: 6
join: 1000
consensus: 12000
max_messages: 20
interface {
ringnumber: 0
bindnetaddr: 10.0.1.0
broadcast: yes
mcastport: 5405
}
}
Disable split-brain protection for cloud
quorum {
provider: corosync_votequorum
expected_votes: 3
wait_for_all: 0
last_man_standing: 1
last_man_standing_window: 10000
}
```
Advanced Configuration Options
Performance Tuning
Optimize Corosync for high-performance environments:
```bash
Add performance tuning to totem section
totem {
# ... existing configuration ...
# Performance optimizations
token: 3000
token_retransmits_before_loss_const: 10
join: 100
consensus: 3600
max_messages: 20
# Network performance
netmtu: 1500
threads: 4
send_join: 0
# Reduce CPU usage
rrp_mode: passive
}
```
Security Enhancements
Implement additional security measures:
```bash
Enhanced crypto configuration
totem {
# ... existing configuration ...
# Strong encryption
crypto_cipher: aes256
crypto_hash: sha512
# Key rotation
keyfile: /etc/corosync/authkey
key_reload_interval: 3600
}
Add IP-based access control
totem {
# ... existing configuration ...
# Restrict cluster communication
clear_node_high_bit: yes
interface {
ringnumber: 0
bindnetaddr: 192.168.100.0
# Add allowed networks
member {
memberaddr: 192.168.100.10
}
member {
memberaddr: 192.168.100.11
}
member {
memberaddr: 192.168.100.12
}
}
}
```
Common Issues and Troubleshooting
Issue 1: Cluster Nodes Not Joining
Symptoms: Nodes appear offline or don't join the cluster
Diagnosis:
```bash
Check network connectivity
sudo corosync-cfgtool -s
Verify multicast connectivity
sudo omping -c 10 -i 0.1 -m 239.255.100.1 192.168.100.10 192.168.100.11
Check firewall rules
sudo iptables -L | grep 5405
```
Solutions:
1. Verify network configuration and connectivity
2. Check firewall rules on all nodes
3. Ensure authentication keys match across nodes
4. Verify time synchronization with NTP
Issue 2: Split-Brain Scenarios
Symptoms: Multiple cluster instances running simultaneously
Diagnosis:
```bash
Check quorum status
sudo corosync-quorumtool -s
Check cluster partition information
sudo crm_mon -1
```
Solutions:
1. Implement proper quorum configuration
2. Use fencing/STONITH devices
3. Configure proper network redundancy
4. Implement quorum devices for even-node clusters
Issue 3: High CPU Usage
Symptoms: Corosync consuming excessive CPU resources
Diagnosis:
```bash
Monitor Corosync processes
top -p `pidof corosync`
Check message rates
sudo corosync-cfgtool -s
```
Solutions:
1. Adjust token timeout values
2. Reduce message frequency
3. Optimize network configuration
4. Enable threading in configuration
Issue 4: Log File Issues
Symptoms: Missing logs or excessive log growth
Diagnosis:
```bash
Check log configuration
sudo grep -A5 logging /etc/corosync/corosync.conf
Verify log file permissions
ls -la /var/log/corosync/
```
Solutions:
1. Configure proper log rotation
2. Adjust logging levels
3. Verify directory permissions
4. Monitor disk space usage
Monitoring and Maintenance
Cluster Health Monitoring
Implement comprehensive monitoring:
```bash
#!/bin/bash
Cluster health check script
echo "=== Cluster Status Check ==="
echo "Date: $(date)"
echo
echo "--- Corosync Status ---"
systemctl status corosync --no-pager
echo "--- Cluster Membership ---"
corosync-cmapctl | grep members
echo "--- Ring Status ---"
corosync-cfgtool -s
echo "--- Quorum Status ---"
corosync-quorumtool -s
echo "--- Pacemaker Status ---"
pcs status
```
Log Analysis
Monitor cluster communication:
```bash
Real-time log monitoring
sudo tail -f /var/log/corosync/corosync.log
Search for specific issues
sudo grep -i "error\|warning\|failed" /var/log/corosync/corosync.log
Analyze cluster transitions
sudo grep "membership" /var/log/corosync/corosync.log
```
Performance Monitoring
Track cluster performance metrics:
```bash
Monitor network statistics
sudo netstat -i | grep eth
Check memory usage
ps aux | grep corosync
Monitor cluster message statistics
sudo corosync-cfgtool -s | grep messages
```
Best Practices and Tips
Network Configuration Best Practices
1. Use Dedicated Networks: Implement separate networks for cluster communication
2. Multiple Paths: Configure redundant network paths using ring1 and ring2
3. Low Latency: Ensure network latency remains below 2ms
4. Bandwidth Planning: Allocate sufficient bandwidth for cluster traffic
Security Best Practices
1. Regular Key Rotation: Update authentication keys periodically
2. Network Segmentation: Isolate cluster traffic from other network traffic
3. Firewall Configuration: Implement restrictive firewall rules
4. Access Control: Limit administrative access to cluster nodes
Operational Best Practices
1. Documentation: Maintain detailed configuration documentation
2. Change Management: Implement controlled change processes
3. Backup Strategy: Regular backup of cluster configurations
4. Testing: Regular disaster recovery testing
Performance Optimization
1. Timeout Tuning: Adjust timeouts based on network characteristics
2. Message Optimization: Configure appropriate message limits
3. Threading: Enable threading for high-load environments
4. Hardware Selection: Use appropriate hardware for cluster requirements
Advanced Topics
Integration with Storage Clusters
Configure Corosync for storage clustering:
```bash
Storage-optimized configuration
totem {
# ... base configuration ...
# Storage cluster optimizations
token: 5000
max_messages: 50
# Fast failure detection
fail_recv_const: 2500
seqno_unchanged_const: 30
}
```
Container Environment Setup
Deploy Corosync in containerized environments:
```bash
Docker container considerations
Mount required directories
-v /etc/corosync:/etc/corosync:ro
-v /var/log/corosync:/var/log/corosync
-v /dev/shm:/dev/shm
Required capabilities
--cap-add=NET_ADMIN
--cap-add=SYS_ADMIN
Network configuration
--network=host
```
Automation and Orchestration
Automate cluster deployment:
```bash
#!/bin/bash
Cluster deployment automation script
NODES=("node1" "node2" "node3")
CLUSTER_NAME="production-cluster"
for node in "${NODES[@]}"; do
echo "Configuring $node..."
ssh $node "systemctl enable corosync pacemaker"
scp /etc/corosync/corosync.conf $node:/etc/corosync/
scp /etc/corosync/authkey $node:/etc/corosync/
done
echo "Starting cluster services..."
for node in "${NODES[@]}"; do
ssh $node "systemctl start corosync && sleep 5 && systemctl start pacemaker"
done
```
Conclusion
Configuring a Corosync cluster requires careful planning, attention to detail, and understanding of clustering concepts. This comprehensive guide has covered the essential aspects of Corosync cluster configuration, from basic installation to advanced optimization techniques.
Key takeaways from this guide include:
- Proper Planning: Successful cluster deployment starts with thorough planning of network architecture, node configuration, and security requirements
- Network Redundancy: Implementing multiple network paths ensures cluster resilience and prevents split-brain scenarios
- Security Configuration: Proper authentication, encryption, and access control are critical for production deployments
- Monitoring and Maintenance: Regular monitoring and proactive maintenance ensure optimal cluster performance
- Documentation: Maintaining detailed documentation facilitates troubleshooting and future modifications
Next Steps
After successfully configuring your Corosync cluster, consider these next steps:
1. Resource Configuration: Configure cluster resources using Pacemaker
2. Fencing Setup: Implement STONITH devices for split-brain protection
3. Application Integration: Integrate your applications with the cluster
4. Monitoring Setup: Implement comprehensive monitoring solutions
5. Disaster Recovery: Develop and test disaster recovery procedures
Additional Resources
- Official Corosync documentation and user guides
- Pacemaker configuration and resource management
- High availability clustering best practices
- Community forums and support channels
By following this guide and implementing the recommended best practices, you'll have a robust, secure, and well-configured Corosync cluster ready for production use. Remember to regularly review and update your configuration as your environment evolves and new features become available.