How to configure Galera Cluster on Linux

How to Configure Galera Cluster on Linux Introduction Galera Cluster is a powerful synchronous multi-master database clustering solution that provides high availability, data consistency, and automatic failover for MySQL and MariaDB databases. Unlike traditional master-slave replication, Galera enables true multi-master replication where all nodes can accept both read and write operations simultaneously, ensuring data remains consistent across all cluster members. In this comprehensive guide, you will learn how to configure a robust Galera Cluster on Linux systems, understand the underlying concepts, implement best practices, and troubleshoot common issues. Whether you're a database administrator seeking high availability solutions or a developer working with distributed applications, this tutorial will provide you with the knowledge needed to successfully deploy and manage Galera Cluster in production environments. What You'll Learn By the end of this article, you will understand: - Core concepts and architecture of Galera Cluster - Prerequisites and system requirements - Step-by-step installation and configuration process - Network and firewall configuration - Cluster initialization and node joining procedures - Monitoring and maintenance techniques - Troubleshooting common issues and performance optimization Prerequisites and Requirements System Requirements Before configuring Galera Cluster, ensure your environment meets the following requirements: Hardware Requirements: - Minimum 3 nodes (odd number recommended for split-brain prevention) - At least 2GB RAM per node (4GB+ recommended for production) - Sufficient disk space with good I/O performance (SSD recommended) - Reliable network connectivity between nodes with low latency Software Requirements: - Linux distribution (Ubuntu 18.04+, CentOS 7+, RHEL 7+, or Debian 9+) - MySQL 5.7+ or MariaDB 10.1+ with Galera support - Root or sudo access on all nodes - Network Time Protocol (NTP) for time synchronization Network Requirements: - Static IP addresses for all cluster nodes - Open ports: 3306 (MySQL), 4567 (Galera replication), 4568 (IST), 4444 (SST) - Low network latency between nodes (preferably < 5ms) - Sufficient bandwidth for data synchronization Planning Your Cluster Before implementation, plan your cluster architecture: ``` Node 1 (Primary): 192.168.1.10 Node 2 (Secondary): 192.168.1.11 Node 3 (Secondary): 192.168.1.12 ``` Choose descriptive hostnames and ensure DNS resolution or proper `/etc/hosts` entries exist. Step-by-Step Installation and Configuration Step 1: Prepare the Environment Configure Hostnames and DNS Resolution On each node, set up proper hostname resolution: ```bash Edit /etc/hosts on all nodes sudo nano /etc/hosts Add entries for all cluster nodes 192.168.1.10 galera-node1 192.168.1.11 galera-node2 192.168.1.12 galera-node3 ``` Synchronize System Time Install and configure NTP to ensure time synchronization: ```bash Ubuntu/Debian sudo apt update sudo apt install ntp CentOS/RHEL sudo yum install ntp Start and enable NTP service sudo systemctl start ntp sudo systemctl enable ntp Verify time synchronization ntpq -p ``` Configure Firewall Rules Open necessary ports for Galera communication: ```bash Ubuntu/Debian (UFW) sudo ufw allow 3306 sudo ufw allow 4567 sudo ufw allow 4568 sudo ufw allow 4444 CentOS/RHEL (firewalld) sudo firewall-cmd --permanent --add-port=3306/tcp sudo firewall-cmd --permanent --add-port=4567/tcp sudo firewall-cmd --permanent --add-port=4568/tcp sudo firewall-cmd --permanent --add-port=4444/tcp sudo firewall-cmd --reload ``` Step 2: Install MariaDB with Galera Support Ubuntu/Debian Installation ```bash Update package repository sudo apt update Install MariaDB server and Galera packages sudo apt install mariadb-server mariadb-client galera-4 Stop MariaDB service (we'll configure before starting) sudo systemctl stop mariadb ``` CentOS/RHEL Installation ```bash Install MariaDB repository sudo yum install mariadb-server mariadb galera Alternative: Add MariaDB official repository curl -sS https://downloads.mariadb.com/MariaDB/mariadb_repo_setup | sudo bash sudo yum install MariaDB-server MariaDB-client galera-4 Stop MariaDB service sudo systemctl stop mariadb ``` Step 3: Configure Galera Cluster Settings Create Galera Configuration File Create or modify the Galera configuration file on each node: ```bash sudo nano /etc/mysql/mariadb.conf.d/60-galera.cnf ``` Configuration for Node 1 (Primary): ```ini [galera] Mandatory settings wsrep_on=ON wsrep_cluster_name="production_cluster" wsrep_cluster_address="gcomm://192.168.1.10,192.168.1.11,192.168.1.12" wsrep_node_address="192.168.1.10" wsrep_node_name="galera-node1" wsrep_provider="/usr/lib/galera/libgalera_smm.so" Replication settings wsrep_sst_method=rsync binlog_format=row default_storage_engine=InnoDB innodb_autoinc_lock_mode=2 Performance optimizations wsrep_slave_threads=4 wsrep_certify_nonPK=1 wsrep_max_ws_rows=0 wsrep_max_ws_size=2147483647 wsrep_debug=0 wsrep_convert_LOCK_to_trx=0 wsrep_retry_autocommit=1 wsrep_auto_increment_control=1 Network settings wsrep_provider_options="socket.ssl_key=/etc/mysql/server-key.pem;socket.ssl_cert=/etc/mysql/server-cert.pem;socket.ssl_ca=/etc/mysql/ca-cert.pem" [mysql_safe] log-error=/var/log/mariadb/mariadb.log pid-file=/var/run/mariadb/mariadb.pid [mysqld] bind-address=0.0.0.0 ``` Configuration for Node 2: ```ini [galera] wsrep_on=ON wsrep_cluster_name="production_cluster" wsrep_cluster_address="gcomm://192.168.1.10,192.168.1.11,192.168.1.12" wsrep_node_address="192.168.1.11" wsrep_node_name="galera-node2" wsrep_provider="/usr/lib/galera/libgalera_smm.so" Copy remaining settings from Node 1 configuration ... (same as Node 1, only change node_address and node_name) ``` Configuration for Node 3: ```ini [galera] wsrep_on=ON wsrep_cluster_name="production_cluster" wsrep_cluster_address="gcomm://192.168.1.10,192.168.1.11,192.168.1.12" wsrep_node_address="192.168.1.12" wsrep_node_name="galera-node3" wsrep_provider="/usr/lib/galera/libgalera_smm.so" Copy remaining settings from Node 1 configuration ... (same as Node 1, only change node_address and node_name) ``` Step 4: Initialize the Cluster Bootstrap the First Node Initialize the cluster by starting the first node in bootstrap mode: ```bash On Node 1 only sudo galera_new_cluster Verify the service is running sudo systemctl status mariadb Check cluster status mysql -u root -p -e "SHOW STATUS LIKE 'wsrep%';" ``` Secure the Installation Run the security script on the first node: ```bash sudo mysql_secure_installation ``` Follow the prompts to: - Set root password - Remove anonymous users - Disable remote root login - Remove test database - Reload privilege tables Create Cluster User Create a dedicated user for cluster operations: ```sql mysql -u root -p CREATE USER 'cluster_user'@'%' IDENTIFIED BY 'secure_password'; GRANT ALL PRIVILEGES ON . TO 'cluster_user'@'%'; FLUSH PRIVILEGES; EXIT; ``` Step 5: Join Additional Nodes Start Node 2 ```bash On Node 2 sudo systemctl start mariadb sudo systemctl enable mariadb Verify cluster membership mysql -u root -p -e "SHOW STATUS LIKE 'wsrep_cluster_size';" ``` Start Node 3 ```bash On Node 3 sudo systemctl start mariadb sudo systemctl enable mariadb Verify all nodes are connected mysql -u root -p -e "SHOW STATUS LIKE 'wsrep%';" | grep -E "(cluster_size|local_state_comment|ready)" ``` Step 6: Verify Cluster Operation Test Data Synchronization Create a test database on Node 1: ```sql -- On Node 1 mysql -u root -p CREATE DATABASE test_replication; USE test_replication; CREATE TABLE test_table ( id INT AUTO_INCREMENT PRIMARY KEY, data VARCHAR(255), created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ); INSERT INTO test_table (data) VALUES ('Test from Node 1'); ``` Verify replication on other nodes: ```sql -- On Node 2 and Node 3 mysql -u root -p USE test_replication; SELECT * FROM test_table; ``` Monitor Cluster Status Check important cluster metrics: ```sql -- Cluster size (should be 3) SHOW STATUS LIKE 'wsrep_cluster_size'; -- Local state (should be 'Synced') SHOW STATUS LIKE 'wsrep_local_state_comment'; -- Cluster status (should be 'Primary') SHOW STATUS LIKE 'wsrep_cluster_status'; -- Ready status (should be 'ON') SHOW STATUS LIKE 'wsrep_ready'; ``` Advanced Configuration Options SSL/TLS Encryption Enable encryption for inter-node communication: ```ini Add to galera configuration wsrep_provider_options="socket.ssl=yes;socket.ssl_cert=/etc/mysql/server-cert.pem;socket.ssl_key=/etc/mysql/server-key.pem;socket.ssl_ca=/etc/mysql/ca-cert.pem" ``` State Snapshot Transfer (SST) Methods Configure different SST methods based on your requirements: ```ini For faster transfers with large datasets wsrep_sst_method=mariabackup wsrep_sst_auth="backup_user:backup_password" For encrypted transfers wsrep_sst_method=rsync wsrep_sst_encrypt=1 ``` Performance Tuning Optimize cluster performance: ```ini Increase parallel applying threads wsrep_slave_threads=8 Adjust flow control wsrep_provider_options="gcs.fc_limit=128;gcs.fc_factor=0.8" Configure write-set caching wsrep_provider_options="gcache.size=1GB" ``` Monitoring and Maintenance Essential Monitoring Queries Create monitoring scripts to track cluster health: ```sql -- Comprehensive cluster status SELECT VARIABLE_NAME, VARIABLE_VALUE FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME IN ( 'wsrep_cluster_size', 'wsrep_local_state_comment', 'wsrep_cluster_status', 'wsrep_ready', 'wsrep_connected', 'wsrep_local_recv_queue', 'wsrep_local_send_queue' ); ``` Automated Health Checks Create a monitoring script: ```bash #!/bin/bash galera_health_check.sh MYSQL_USER="monitor_user" MYSQL_PASS="monitor_password" Check if MySQL is running if ! systemctl is-active --quiet mariadb; then echo "ERROR: MariaDB service is not running" exit 1 fi Check cluster size CLUSTER_SIZE=$(mysql -u$MYSQL_USER -p$MYSQL_PASS -e "SHOW STATUS LIKE 'wsrep_cluster_size';" -s -N | awk '{print $2}') if [ "$CLUSTER_SIZE" -lt 3 ]; then echo "WARNING: Cluster size is $CLUSTER_SIZE (expected 3)" fi Check node state NODE_STATE=$(mysql -u$MYSQL_USER -p$MYSQL_PASS -e "SHOW STATUS LIKE 'wsrep_local_state_comment';" -s -N | awk '{print $2}') if [ "$NODE_STATE" != "Synced" ]; then echo "ERROR: Node state is $NODE_STATE (expected Synced)" exit 1 fi echo "Cluster health check passed" ``` Troubleshooting Common Issues Split-Brain Scenarios Symptoms: Cluster splits into multiple components, nodes can't communicate. Solution: ```bash Identify the primary component mysql -u root -p -e "SHOW STATUS LIKE 'wsrep_cluster_status';" If no primary component exists, bootstrap from the most up-to-date node sudo systemctl stop mariadb sudo galera_new_cluster Restart other nodes normally sudo systemctl start mariadb ``` Node Synchronization Issues Symptoms: Node stuck in "Joining" or "Donor/Desynced" state. Diagnosis: ```sql -- Check local state SHOW STATUS LIKE 'wsrep_local_state_comment'; -- Check receive queue SHOW STATUS LIKE 'wsrep_local_recv_queue'; -- Check for conflicts SHOW STATUS LIKE 'wsrep_local_cert_failures'; ``` Solutions: ```bash For persistent synchronization issues sudo systemctl stop mariadb Remove grastate.dat to force SST sudo rm /var/lib/mysql/grastate.dat Restart the node sudo systemctl start mariadb ``` Performance Issues Common causes and solutions: 1. High queue sizes: ```sql -- Monitor queue sizes SHOW STATUS LIKE 'wsrep_local_recv_queue'; SHOW STATUS LIKE 'wsrep_local_send_queue'; ``` 2. Flow control activation: ```sql -- Check flow control events SHOW STATUS LIKE 'wsrep_flow_control_paused'; ``` Optimization strategies: ```ini Increase slave threads wsrep_slave_threads=16 Adjust flow control parameters wsrep_provider_options="gcs.fc_limit=256;gcs.fc_factor=0.99" Increase gcache size wsrep_provider_options="gcache.size=2GB" ``` Network Connectivity Issues Diagnosis commands: ```bash Test network connectivity telnet 192.168.1.11 4567 Check Galera communication ss -tlnp | grep :4567 Monitor network traffic tcpdump -i eth0 port 4567 ``` Data Inconsistency Detection: ```sql -- Check for certification failures SHOW STATUS LIKE 'wsrep_local_cert_failures'; -- Monitor conflicts SHOW STATUS LIKE 'wsrep_local_bf_aborts'; ``` Prevention strategies: 1. Use proper transaction isolation levels 2. Implement application-level conflict resolution 3. Design schema to minimize conflicts 4. Use optimistic locking patterns Best Practices and Tips Security Best Practices 1. Use SSL/TLS encryption: ```ini wsrep_provider_options="socket.ssl=yes" ``` 2. Create dedicated cluster users: ```sql CREATE USER 'galera_sst'@'localhost' IDENTIFIED BY 'secure_password'; GRANT RELOAD, LOCK TABLES, PROCESS, REPLICATION CLIENT ON . TO 'galera_sst'@'localhost'; ``` 3. Restrict network access: ```bash Use iptables to limit access to cluster ports iptables -A INPUT -p tcp --dport 4567 -s 192.168.1.0/24 -j ACCEPT iptables -A INPUT -p tcp --dport 4567 -j DROP ``` Performance Optimization 1. Configure appropriate gcache size: ```ini wsrep_provider_options="gcache.size=2GB" ``` 2. Optimize InnoDB settings: ```ini innodb_buffer_pool_size=4G innodb_log_file_size=256M innodb_flush_log_at_trx_commit=2 ``` 3. Use SSD storage for better I/O performance 4. Monitor and tune slave threads: ```ini wsrep_slave_threads=8 # Adjust based on CPU cores ``` Operational Best Practices 1. Always maintain an odd number of nodes 2. Implement proper backup strategies 3. Use consistent hardware configurations 4. Monitor cluster health continuously 5. Plan maintenance windows for updates 6. Test failover scenarios regularly Application Design Considerations 1. Handle deadlock exceptions gracefully 2. Use smaller transactions when possible 3. Implement retry logic for failed transactions 4. Avoid long-running transactions 5. Use appropriate isolation levels Backup and Recovery Strategies Consistent Backup Methods Using MariaBackup: ```bash Create consistent cluster backup mariabackup --backup --target-dir=/backup/full_backup \ --user=backup_user --password=backup_password \ --galera-info Prepare backup for restoration mariabackup --prepare --target-dir=/backup/full_backup ``` Using mysqldump: ```bash Dump with consistent snapshot mysqldump --single-transaction --routines --triggers \ --all-databases > cluster_backup.sql ``` Disaster Recovery Procedures Complete cluster failure recovery: 1. Identify the most recent node: ```bash Check grastate.dat on all nodes cat /var/lib/mysql/grastate.dat ``` 2. Bootstrap from the most recent node: ```bash sudo galera_new_cluster ``` 3. Restore other nodes from backup if necessary Conclusion Configuring Galera Cluster on Linux provides a robust foundation for high-availability database deployments. This comprehensive guide has covered the essential aspects of Galera Cluster setup, from initial installation through advanced configuration, monitoring, and troubleshooting. Key takeaways from this implementation: - Proper planning is crucial for successful Galera Cluster deployment - Network configuration and connectivity are fundamental requirements - Monitoring and maintenance ensure long-term cluster stability - Understanding common issues enables quick problem resolution - Following best practices prevents many operational challenges Next Steps After successfully implementing your Galera Cluster: 1. Set up comprehensive monitoring using tools like Prometheus, Grafana, or Nagios 2. Implement automated backup solutions for data protection 3. Create runbooks for common operational procedures 4. Test disaster recovery scenarios in a controlled environment 5. Consider load balancing solutions like HAProxy or ProxySQL 6. Plan for capacity scaling as your application grows Additional Resources - Official Galera Documentation: Comprehensive technical reference - MariaDB Knowledge Base: Detailed configuration examples - Community Forums: Peer support and troubleshooting assistance - Professional Support: Consider commercial support for production environments By following this guide and implementing the recommended practices, you'll have a reliable, high-performance Galera Cluster that can handle your application's database requirements while providing the availability and consistency your users expect. Remember that database clustering is an ongoing process that requires continuous monitoring, maintenance, and optimization. Stay updated with the latest Galera releases and security patches to ensure your cluster remains secure and performs optimally.