How to replicate storage with DRBD in Linux

How to Replicate Storage with DRBD in Linux Introduction Distributed Replicated Block Device (DRBD) is a powerful Linux kernel module that creates a distributed storage system by replicating block devices between multiple servers. Often referred to as "RAID 1 over the network," DRBD provides real-time data synchronization across geographically distributed locations, making it an essential component for high-availability (HA) clusters and disaster recovery solutions. In this comprehensive guide, you'll learn how to implement DRBD storage replication in Linux environments. We'll cover everything from basic installation and configuration to advanced troubleshooting techniques, ensuring you have the knowledge to deploy robust, fault-tolerant storage solutions. DRBD operates at the block device level, intercepting write operations and transmitting them to remote nodes before confirming completion. This approach ensures data consistency and provides seamless failover capabilities, making it ideal for critical applications that require zero data loss and minimal downtime. Prerequisites and Requirements Before implementing DRBD storage replication, ensure your environment meets the following requirements: Hardware Requirements - Two or more Linux servers with identical or similar hardware specifications - Dedicated storage devices (physical disks, LVM volumes, or partitions) of equal or larger size on each node - Reliable network connectivity between nodes with sufficient bandwidth for data replication - Minimum 1GB RAM per node (2GB or more recommended for production environments) Software Requirements - Linux distribution supporting DRBD (RHEL/CentOS 7+, Ubuntu 18.04+, SUSE, Debian) - Kernel version 2.6.33 or later (most modern distributions include DRBD support) - Root access or sudo privileges on all participating nodes - Network Time Protocol (NTP) configured for time synchronization Network Configuration - Dedicated network interface or VLAN for DRBD replication (recommended) - Static IP addresses configured on all nodes - Firewall rules allowing DRBD traffic (default port 7788) - Low latency connection between nodes (< 100ms for optimal performance) Storage Considerations - Identical block device sizes across all nodes - Unused block devices (no existing filesystems or data) - SSD storage recommended for metadata and high-performance requirements - Backup strategy in place before beginning configuration Step-by-Step DRBD Installation and Configuration Step 1: Install DRBD Packages Begin by installing DRBD on all participating nodes. The installation method varies depending on your Linux distribution. For RHEL/CentOS/Fedora: ```bash Enable EPEL repository (if not already enabled) sudo yum install epel-release -y Install DRBD kernel module and utilities sudo yum install drbd90-utils kmod-drbd90 -y For CentOS 8/RHEL 8 sudo dnf install drbd90-utils kmod-drbd90 -y ``` For Ubuntu/Debian: ```bash Update package repository sudo apt update Install DRBD utilities and kernel module sudo apt install drbd-utils drbd-dkms -y Load the DRBD kernel module sudo modprobe drbd ``` For SUSE/openSUSE: ```bash Install DRBD packages sudo zypper install drbd drbd-utils drbd-kmp-default -y ``` Step 2: Load and Verify DRBD Kernel Module After installation, ensure the DRBD kernel module loads correctly: ```bash Load DRBD module sudo modprobe drbd Verify module is loaded lsmod | grep drbd Check DRBD version sudo drbdadm --version ``` Expected output should show DRBD version information and confirm the module is active. Step 3: Configure Network and Hostnames Ensure proper hostname resolution between nodes by editing `/etc/hosts`: ```bash Edit hosts file on both nodes sudo nano /etc/hosts Add entries for both nodes 192.168.1.10 drbd-node1 192.168.1.11 drbd-node2 ``` Test connectivity between nodes: ```bash From node1 to node2 ping drbd-node2 From node2 to node1 ping drbd-node1 ``` Step 4: Prepare Storage Devices Identify and prepare the block devices for DRBD replication. Ensure devices are unmounted and contain no valuable data: ```bash List available block devices lsblk Check if device is mounted (should return nothing) mount | grep /dev/sdb1 Verify device is not part of any RAID or LVM cat /proc/mdstat pvdisplay ``` Warning: The following steps will destroy any existing data on the specified devices. Step 5: Create DRBD Configuration Create the main DRBD configuration file. DRBD uses a hierarchical configuration structure with global settings and resource-specific configurations. Create the global configuration file: ```bash sudo nano /etc/drbd.d/global_common.conf ``` Add the following global configuration: ```bash global { usage-count no; } common { handlers { pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f"; pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f"; local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f"; } startup { degr-wfc-timeout 60; } options { auto-promote yes; } disk { on-io-error detach; } net { cram-hmac-alg sha1; shared-secret "your-shared-secret-key"; after-sb-0pri discard-zero-changes; after-sb-1pri discard-secondary; after-sb-2pri call-pri-lost-after-sb; protocol C; } } ``` Step 6: Create Resource Configuration Create a resource-specific configuration file: ```bash sudo nano /etc/drbd.d/data.res ``` Add the resource configuration: ```bash resource data { on drbd-node1 { device /dev/drbd0; disk /dev/sdb1; address 192.168.1.10:7788; meta-disk internal; } on drbd-node2 { device /dev/drbd0; disk /dev/sdb1; address 192.168.1.11:7788; meta-disk internal; } } ``` Configuration Parameters Explained: - device: Virtual DRBD block device path - disk: Physical block device to be replicated - address: IP address and port for DRBD communication - meta-disk: Location for DRBD metadata (internal uses same device) Step 7: Copy Configuration to All Nodes Ensure identical configuration across all nodes: ```bash Copy configuration files to second node scp /etc/drbd.d/*.conf root@drbd-node2:/etc/drbd.d/ scp /etc/drbd.d/*.res root@drbd-node2:/etc/drbd.d/ ``` Step 8: Initialize DRBD Metadata Create DRBD metadata on all nodes: ```bash On both nodes, run: sudo drbdadm create-md data Expected output should show metadata creation success ``` If you encounter issues, verify the backing device is not mounted and has no existing filesystem. Step 9: Start DRBD Service Enable and start DRBD on both nodes: ```bash Enable DRBD service sudo systemctl enable drbd Start DRBD service sudo systemctl start drbd Check service status sudo systemctl status drbd ``` Bring up the DRBD resource: ```bash On both nodes sudo drbdadm up data Check DRBD status sudo drbdadm status data ``` Step 10: Establish Initial Synchronization Choose one node as the primary and initiate the first synchronization: ```bash On the chosen primary node (drbd-node1) sudo drbdadm primary --force data Monitor synchronization progress watch -n 1 'cat /proc/drbd' ``` The synchronization process may take considerable time depending on device size and network speed. Practical Examples and Use Cases Example 1: MySQL Database Replication Configure DRBD for MySQL high availability: ```bash After DRBD synchronization completes, create filesystem on primary node sudo mkfs.ext4 /dev/drbd0 Create mount point sudo mkdir /var/lib/mysql-drbd Mount DRBD device sudo mount /dev/drbd0 /var/lib/mysql-drbd Configure MySQL to use DRBD storage sudo nano /etc/mysql/mysql.conf.d/mysqld.cnf ``` Add to MySQL configuration: ```ini [mysqld] datadir = /var/lib/mysql-drbd ``` Example 2: Web Server Document Root Set up DRBD for web server content synchronization: ```bash Create filesystem (on primary node only) sudo mkfs.xfs /dev/drbd0 Create web root directory sudo mkdir /var/www/html-drbd Mount DRBD device sudo mount /dev/drbd0 /var/www/html-drbd Configure Apache virtual host sudo nano /etc/apache2/sites-available/drbd-site.conf ``` Apache configuration example: ```apache DocumentRoot /var/www/html-drbd ServerName example.com AllowOverride All Require all granted ``` Example 3: Multi-Resource Configuration Configure multiple DRBD resources for different services: ```bash Create additional resource configuration sudo nano /etc/drbd.d/web.res ``` ```bash resource web { on drbd-node1 { device /dev/drbd1; disk /dev/sdc1; address 192.168.1.10:7789; meta-disk internal; } on drbd-node2 { device /dev/drbd1; disk /dev/sdc1; address 192.168.1.11:7789; meta-disk internal; } } ``` Initialize and manage multiple resources: ```bash Create metadata for new resource sudo drbdadm create-md web Bring up all resources sudo drbdadm up all Check status of all resources sudo drbdadm status ``` Common Issues and Troubleshooting Issue 1: Split-Brain Scenarios Split-brain occurs when both nodes become primary simultaneously, leading to data divergence. Symptoms: - DRBD status shows "StandAlone" state - Log entries indicating split-brain detection - Unable to establish connection between nodes Resolution: ```bash On the secondary node (data will be discarded) sudo drbdadm secondary data sudo drbdadm connect --discard-my-data data On the primary node sudo drbdadm connect data ``` Prevention: - Use proper fencing mechanisms - Implement STONITH (Shoot The Other Node In The Head) - Configure appropriate handlers in global configuration Issue 2: Slow Synchronization Performance Symptoms: - Extremely slow initial sync or resync operations - High network latency during synchronization Solutions: ```bash Increase synchronization rate (adjust based on network capacity) sudo drbdadm disk-options --resync-rate=100M data Optimize network buffer sizes echo 'net.core.rmem_max = 134217728' >> /etc/sysctl.conf echo 'net.core.wmem_max = 134217728' >> /etc/sysctl.conf sudo sysctl -p ``` Add performance tuning to resource configuration: ```bash resource data { disk { resync-rate 100M; c-plan-ahead 20; c-fill-target 10M; } net { sndbuf-size 1M; rcvbuf-size 1M; } } ``` Issue 3: Connection Problems Symptoms: - Nodes cannot establish connection - "Connection refused" errors in logs - Resources stuck in "WFConnection" state Troubleshooting Steps: ```bash Check firewall rules sudo iptables -L | grep 7788 sudo firewall-cmd --list-ports Open DRBD port if needed sudo firewall-cmd --permanent --add-port=7788/tcp sudo firewall-cmd --reload Test network connectivity telnet drbd-node2 7788 Check DRBD service status sudo systemctl status drbd sudo journalctl -u drbd -f ``` Issue 4: Metadata Corruption Symptoms: - DRBD fails to start - Metadata inconsistency errors - Unable to create or read metadata Resolution: ```bash Backup existing metadata (if possible) sudo drbdadm dump-md data > /tmp/drbd-metadata-backup Recreate metadata sudo drbdadm create-md data If data exists on one node, force primary and resync sudo drbdadm primary --force data ``` Issue 5: Kernel Module Loading Issues Symptoms: - "modprobe: FATAL: Module drbd not found" errors - DRBD utilities cannot communicate with kernel Solutions: ```bash Check if module exists find /lib/modules/$(uname -r) -name "drbd*" Install appropriate kernel module package sudo apt install drbd-dkms # Ubuntu/Debian sudo yum install kmod-drbd90 # RHEL/CentOS Rebuild DKMS modules if necessary sudo dkms autoinstall ``` Best Practices and Professional Tips Security Considerations 1. Use Encrypted Connections: ```bash net { cram-hmac-alg sha256; shared-secret "strong-random-secret-key"; data-integrity-alg crc32c; } ``` 2. Implement Network Isolation: - Use dedicated VLANs for DRBD traffic - Configure firewall rules to restrict access - Use VPN tunnels for geographically distributed nodes 3. Regular Security Audits: - Monitor DRBD logs for suspicious activity - Rotate shared secrets periodically - Keep DRBD software updated Performance Optimization 1. Storage Configuration: - Use SSDs for DRBD metadata - Align partition boundaries properly - Configure appropriate I/O schedulers ```bash Set I/O scheduler for DRBD devices echo mq-deadline > /sys/block/sdb/queue/scheduler ``` 2. Network Tuning: - Use dedicated gigabit or 10GbE connections - Optimize TCP buffer sizes - Consider SR-IOV for virtualized environments 3. Resource Allocation: ```bash resource data { disk { al-extents 6433; c-plan-ahead 20; c-fill-target 100M; c-max-rate 4G; } } ``` Monitoring and Maintenance 1. Implement Comprehensive Monitoring: ```bash #!/bin/bash DRBD monitoring script RESOURCE="data" STATUS=$(drbdadm cstate $RESOURCE) if [ "$STATUS" != "Connected" ]; then echo "CRITICAL: DRBD resource $RESOURCE is $STATUS" exit 2 fi echo "OK: DRBD resource $RESOURCE is Connected" exit 0 ``` 2. Regular Backup Procedures: - Test failover scenarios regularly - Maintain configuration backups - Document recovery procedures 3. Log Management: ```bash Configure rsyslog for DRBD echo 'kern.* /var/log/drbd.log' >> /etc/rsyslog.conf sudo systemctl restart rsyslog ``` High Availability Integration 1. Pacemaker Integration: ```bash Install Pacemaker cluster stack sudo apt install pacemaker corosync crmsh Configure DRBD as cluster resource sudo crm configure primitive drbd_data ocf:linbit:drbd \ params drbd_resource=data \ op start interval=0 timeout=240 \ op stop interval=0 timeout=100 ``` 2. Automatic Failover Configuration: - Configure proper resource constraints - Implement health checks and monitoring - Test failover scenarios thoroughly Capacity Planning 1. Network Bandwidth Requirements: - Calculate peak write rates - Account for resynchronization traffic - Plan for network redundancy 2. Storage Sizing: - Account for DRBD metadata overhead (approximately 32MB + 18MB per TB) - Plan for activity log and bitmap storage - Consider snapshot and backup space requirements Advanced Configuration Options Protocol Selection DRBD supports three replication protocols: - Protocol A (Asynchronous): Fastest but least safe - Protocol B (Semi-synchronous): Balanced performance and safety - Protocol C (Synchronous): Safest but slowest ```bash net { protocol C; # Recommended for critical data } ``` Quorum Configuration For multi-node setups, configure quorum to prevent split-brain: ```bash resource data { options { quorum majority; on-no-quorum suspend-io; } } ``` Compression and Deduplication Enable compression for WAN replication: ```bash net { compress { alg lz4; level 1; } } ``` Conclusion DRBD provides a robust, enterprise-grade solution for storage replication in Linux environments. By following the comprehensive steps outlined in this guide, you can implement reliable data synchronization that forms the foundation of high-availability systems. Key takeaways from this implementation guide: - Proper planning is essential: Ensure adequate network bandwidth, storage capacity, and hardware resources before deployment - Configuration consistency: Maintain identical configurations across all nodes to prevent synchronization issues - Monitor continuously: Implement comprehensive monitoring to detect and resolve issues quickly - Test regularly: Perform routine failover testing to validate system reliability - Stay updated: Keep DRBD software and configurations current with security patches and performance improvements Next Steps After successfully implementing DRBD storage replication, consider these advanced topics: 1. Cluster Integration: Integrate DRBD with Pacemaker or other cluster management solutions 2. Backup Strategies: Implement comprehensive backup solutions that work with DRBD 3. Performance Tuning: Optimize configurations based on your specific workload requirements 4. Disaster Recovery: Develop and test disaster recovery procedures 5. Scaling: Plan for adding additional nodes or resources as your infrastructure grows DRBD's flexibility and reliability make it an excellent choice for organizations requiring zero-downtime storage solutions. With proper implementation and maintenance, DRBD can provide years of reliable service while protecting your critical data assets. Remember to always test configurations in non-production environments first, maintain current backups, and document all procedures for your team. The investment in proper DRBD implementation will pay dividends in system reliability and data protection for years to come.