How to configure Pacemaker on Linux
How to Configure Pacemaker on Linux
Pacemaker is a powerful open-source cluster resource manager that provides high availability (HA) for Linux systems. It ensures that critical services remain operational even when individual nodes fail, making it an essential component for mission-critical environments. This comprehensive guide will walk you through the complete process of configuring Pacemaker on Linux, from initial setup to advanced resource management.
Table of Contents
1. [Introduction to Pacemaker](#introduction-to-pacemaker)
2. [Prerequisites and Requirements](#prerequisites-and-requirements)
3. [Installation Process](#installation-process)
4. [Initial Cluster Configuration](#initial-cluster-configuration)
5. [Resource Configuration](#resource-configuration)
6. [Advanced Configuration Options](#advanced-configuration-options)
7. [Monitoring and Management](#monitoring-and-management)
8. [Troubleshooting Common Issues](#troubleshooting-common-issues)
9. [Best Practices and Tips](#best-practices-and-tips)
10. [Conclusion and Next Steps](#conclusion-and-next-steps)
Introduction to Pacemaker
Pacemaker is the brain of a Linux high availability cluster, working in conjunction with Corosync (the messaging layer) to provide automatic failover capabilities. It manages cluster resources such as IP addresses, file systems, databases, and applications, ensuring they remain available even when hardware or software failures occur.
Key Components
- Pacemaker: The cluster resource manager
- Corosync: The cluster communication layer
- Cluster Resource Agents: Scripts that manage specific services
- Fencing Agents: Tools for isolating failed nodes
Benefits of Using Pacemaker
- High Availability: Automatic failover reduces downtime
- Scalability: Support for multiple nodes and complex configurations
- Flexibility: Extensive resource agent library
- Monitoring: Built-in health checking and alerting
- Standards Compliance: Follows industry best practices
Prerequisites and Requirements
Before configuring Pacemaker, ensure your environment meets the following requirements:
System Requirements
- Operating System: RHEL/CentOS 7+, Ubuntu 18.04+, SLES 12+, or Debian 9+
- Memory: Minimum 2GB RAM per node (4GB+ recommended)
- Storage: At least 20GB available disk space
- Network: Dedicated network interfaces for cluster communication
- Time Synchronization: NTP or Chrony configured on all nodes
Network Configuration
```bash
Example network configuration for two-node cluster
Node 1: cluster-node1 (192.168.1.10)
Node 2: cluster-node2 (192.168.1.11)
Virtual IP: 192.168.1.100
```
User Permissions
- Root access or sudo privileges on all cluster nodes
- SSH key-based authentication between nodes (recommended)
Firewall Configuration
Ensure the following ports are open between cluster nodes:
```bash
Corosync communication
sudo firewall-cmd --permanent --add-port=5404-5406/udp
sudo firewall-cmd --permanent --add-service=high-availability
sudo firewall-cmd --reload
```
Installation Process
Installing Pacemaker on RHEL/CentOS
```bash
Enable High Availability repository
sudo yum install -y centos-release-ha
Install Pacemaker and related packages
sudo yum install -y pacemaker corosync pcs fence-agents-all
Start and enable pcsd service
sudo systemctl start pcsd
sudo systemctl enable pcsd
Set password for hacluster user
sudo passwd hacluster
```
Installing Pacemaker on Ubuntu/Debian
```bash
Update package repository
sudo apt update
Install Pacemaker and Corosync
sudo apt install -y pacemaker corosync crmsh fence-agents
Start and enable services
sudo systemctl start corosync
sudo systemctl start pacemaker
sudo systemctl enable corosync
sudo systemctl enable pacemaker
```
Post-Installation Verification
```bash
Check service status
sudo systemctl status pacemaker
sudo systemctl status corosync
Verify installation
sudo crm status
```
Initial Cluster Configuration
Setting Up Authentication
First, configure authentication between cluster nodes:
```bash
On all nodes, authenticate with pcs
sudo pcs host auth cluster-node1 cluster-node2 -u hacluster -p your_password
Verify authentication
sudo pcs host auth cluster-node1 cluster-node2
```
Creating the Cluster
```bash
Create cluster (run on one node only)
sudo pcs cluster setup mycluster cluster-node1 cluster-node2
Start cluster services on all nodes
sudo pcs cluster start --all
Enable cluster services to start on boot
sudo pcs cluster enable --all
Check cluster status
sudo pcs status
```
Basic Cluster Properties
Configure essential cluster properties:
```bash
Disable STONITH initially (enable in production)
sudo pcs property set stonith-enabled=false
Set cluster name
sudo pcs property set cluster-name=mycluster
Configure no-quorum policy for 2-node cluster
sudo pcs property set no-quorum-policy=ignore
Set default resource stickiness
sudo pcs property set default-resource-stickiness=100
```
Resource Configuration
Creating a Virtual IP Resource
A virtual IP address is one of the most common cluster resources:
```bash
Create virtual IP resource
sudo pcs resource create VirtualIP IPaddr2 ip=192.168.1.100 cidr_netmask=24 op monitor interval=30s
Check resource status
sudo pcs status resources
```
Configuring Web Server Resource
Example of creating an Apache web server resource:
```bash
Install Apache on all nodes
sudo yum install -y httpd # RHEL/CentOS
sudo apt install -y apache2 # Ubuntu/Debian
Create Apache resource
sudo pcs resource create WebServer apache configfile=/etc/httpd/conf/httpd.conf op monitor interval=1min
Create resource group
sudo pcs resource group add WebGroup VirtualIP WebServer
Verify configuration
sudo pcs resource show
```
Database Resource Configuration
Setting up a MySQL/MariaDB cluster resource:
```bash
Create MySQL resource
sudo pcs resource create MySQL mysql binary="/usr/bin/mysqld_safe" config="/etc/my.cnf" datadir="/var/lib/mysql" pid="/var/lib/mysql/mysql.pid" socket="/var/lib/mysql/mysql.sock" op start timeout=60s op stop timeout=60s op monitor interval=20s timeout=30s
Add to resource group
sudo pcs resource group add DBGroup VirtualIP MySQL
```
File System Resources
Configuring shared file system resources:
```bash
Create file system resource
sudo pcs resource create SharedFS Filesystem device="/dev/sdb1" directory="/shared" fstype="ext4" op monitor interval=20s
Set resource constraints
sudo pcs constraint colocation add SharedFS with VirtualIP INFINITY
sudo pcs constraint order VirtualIP then SharedFS
```
Advanced Configuration Options
Resource Constraints
Location Constraints
Control where resources can run:
```bash
Prefer node1 for WebServer
sudo pcs constraint location WebServer prefers cluster-node1=50
Prevent resource from running on specific node
sudo pcs constraint location MySQL avoids cluster-node2
```
Colocation Constraints
Ensure resources run together:
```bash
Keep VirtualIP and WebServer on same node
sudo pcs constraint colocation add WebServer with VirtualIP INFINITY
```
Order Constraints
Define startup/shutdown order:
```bash
Start VirtualIP before WebServer
sudo pcs constraint order VirtualIP then WebServer
```
Resource Groups vs. Clones
Resource Groups
```bash
Create resource group
sudo pcs resource group add WebCluster VirtualIP WebServer SharedFS
View group configuration
sudo pcs resource show WebCluster
```
Clone Resources
For services that can run on multiple nodes:
```bash
Create clone resource
sudo pcs resource create DLM dlm op monitor interval=30s on-fail=ignore clone interleave=true ordered=true
Create master/slave resource
sudo pcs resource create DRBD drbd drbd_resource=r0 op monitor interval=60s role=Master op monitor interval=59s role=Slave master master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true
```
STONITH Configuration
STONITH (Shoot The Other Node In The Head) is crucial for production clusters:
```bash
List available fence agents
sudo pcs stonith list
Configure IPMI fencing
sudo pcs stonith create fence-node1 fence_ipmilan pcmk_host_list="cluster-node1" ipaddr="192.168.1.101" login="admin" passwd="password" lanplus=true
sudo pcs stonith create fence-node2 fence_ipmilan pcmk_host_list="cluster-node2" ipaddr="192.168.1.102" login="admin" passwd="password" lanplus=true
Enable STONITH
sudo pcs property set stonith-enabled=true
Test fencing
sudo pcs stonith fence cluster-node2
```
Monitoring and Management
Cluster Status Commands
```bash
Overall cluster status
sudo pcs status
Detailed status
sudo pcs status --full
Resource-specific status
sudo pcs resource show WebServer
Node status
sudo pcs node status
Constraint information
sudo pcs constraint show
```
Log Management
```bash
View cluster logs
sudo journalctl -u pacemaker
sudo journalctl -u corosync
Real-time log monitoring
sudo tail -f /var/log/cluster/corosync.log
sudo tail -f /var/log/pacemaker/pacemaker.log
```
Performance Monitoring
```bash
Check cluster performance
sudo crm_mon -1
Resource utilization
sudo pcs resource utilization
Cluster statistics
sudo corosync-quorumtool -s
```
Troubleshooting Common Issues
Split-Brain Prevention
Split-brain occurs when cluster nodes can't communicate but continue operating:
```bash
Configure quorum
sudo pcs quorum config
For 2-node clusters
sudo pcs property set no-quorum-policy=ignore
Add quorum device for better split-brain protection
sudo pcs quorum device add model net host=qnetd-server algorithm=ffsplit
```
Resource Failures
When resources fail to start or stop:
```bash
Check resource failures
sudo pcs status
Clear resource failures
sudo pcs resource cleanup WebServer
Force resource to specific node
sudo pcs resource move WebServer cluster-node1
Remove move constraint
sudo pcs resource clear WebServer
```
Network Issues
Diagnosing cluster communication problems:
```bash
Check Corosync status
sudo corosync-quorumtool
Test multicast connectivity
sudo corosync-ping -c 5
Verify cluster membership
sudo pcs status corosync
```
Node Issues
Handling problematic cluster nodes:
```bash
Put node in standby mode
sudo pcs node standby cluster-node1
Remove node from standby
sudo pcs node unstandby cluster-node1
Remove failed node from cluster
sudo pcs cluster node remove cluster-node1
```
Configuration Errors
```bash
Validate cluster configuration
sudo crm_verify -L -V
Check configuration syntax
sudo pcs config show
Backup and restore configuration
sudo pcs config backup mycluster-backup
sudo pcs config restore mycluster-backup
```
Best Practices and Tips
Security Best Practices
1. Enable STONITH: Always configure fencing in production environments
2. Network Isolation: Use dedicated network interfaces for cluster traffic
3. Authentication: Implement strong authentication between nodes
4. Firewall Configuration: Properly configure firewalls to allow cluster traffic
```bash
Example secure cluster configuration
sudo pcs property set stonith-enabled=true
sudo pcs property set stonith-action=poweroff
sudo pcs property set stonith-timeout=60s
```
Performance Optimization
1. Resource Stickiness: Configure appropriate stickiness values
2. Monitoring Intervals: Balance between responsiveness and system load
3. Timeout Values: Set realistic timeout values for resources
```bash
Optimize resource monitoring
sudo pcs resource op defaults timeout=60s interval=10s
sudo pcs property set default-resource-stickiness=1000
```
Maintenance Procedures
```bash
Put cluster in maintenance mode
sudo pcs property set maintenance-mode=true
Perform maintenance tasks
...
Exit maintenance mode
sudo pcs property set maintenance-mode=false
```
Backup and Recovery
```bash
Regular configuration backup
sudo pcs config backup /backup/cluster-config-$(date +%Y%m%d)
Export resource configuration
sudo pcs resource config > /backup/resources.cfg
Document cluster topology
sudo pcs status > /backup/cluster-status-$(date +%Y%m%d).txt
```
Testing Procedures
1. Planned Failover Testing: Regularly test resource migration
2. Unplanned Failure Simulation: Test node failures and recovery
3. STONITH Testing: Verify fencing mechanisms work correctly
```bash
Test resource migration
sudo pcs resource move WebServer cluster-node2
sudo pcs resource clear WebServer
Test node failure
sudo pcs node standby cluster-node1
Verify resources migrate
sudo pcs node unstandby cluster-node1
```
Conclusion and Next Steps
Configuring Pacemaker on Linux provides a robust foundation for high availability clustering. This guide has covered the essential aspects of Pacemaker configuration, from basic setup to advanced resource management and troubleshooting.
Key Takeaways
- Proper Planning: Successful cluster deployment requires careful planning of network, storage, and application architecture
- Incremental Implementation: Start with basic configurations and gradually add complexity
- Testing is Critical: Regular testing ensures cluster reliability when failures occur
- Documentation: Maintain detailed documentation of cluster configuration and procedures
Next Steps
1. Advanced Features: Explore multi-site clustering and disaster recovery configurations
2. Integration: Integrate with monitoring systems like Nagios or Zabbix
3. Automation: Consider using configuration management tools like Ansible for cluster deployment
4. Training: Invest in team training for ongoing cluster management
Additional Resources
- Official Documentation: Refer to the Pacemaker project documentation for detailed technical information
- Community Support: Engage with the Pacemaker community through mailing lists and forums
- Professional Services: Consider professional support for mission-critical deployments
By following this comprehensive guide, you now have the knowledge and tools necessary to successfully configure and manage Pacemaker clusters on Linux. Remember that high availability is not just about technology—it requires ongoing attention to monitoring, maintenance, and testing to ensure your critical services remain available when your organization needs them most.
The investment in properly configured Pacemaker clustering will pay dividends in reduced downtime, improved service reliability, and enhanced business continuity. As you gain experience with Pacemaker, you'll discover additional features and optimizations that can further improve your cluster's performance and reliability.