How to replicate storage with DRBD in Linux
How to Replicate Storage with DRBD in Linux
Introduction
Distributed Replicated Block Device (DRBD) is a powerful Linux kernel module that creates a distributed storage system by replicating block devices between multiple servers. Often referred to as "RAID 1 over the network," DRBD provides real-time data synchronization across geographically distributed locations, making it an essential component for high-availability (HA) clusters and disaster recovery solutions.
In this comprehensive guide, you'll learn how to implement DRBD storage replication in Linux environments. We'll cover everything from basic installation and configuration to advanced troubleshooting techniques, ensuring you have the knowledge to deploy robust, fault-tolerant storage solutions.
DRBD operates at the block device level, intercepting write operations and transmitting them to remote nodes before confirming completion. This approach ensures data consistency and provides seamless failover capabilities, making it ideal for critical applications that require zero data loss and minimal downtime.
Prerequisites and Requirements
Before implementing DRBD storage replication, ensure your environment meets the following requirements:
Hardware Requirements
- Two or more Linux servers with identical or similar hardware specifications
- Dedicated storage devices (physical disks, LVM volumes, or partitions) of equal or larger size on each node
- Reliable network connectivity between nodes with sufficient bandwidth for data replication
- Minimum 1GB RAM per node (2GB or more recommended for production environments)
Software Requirements
- Linux distribution supporting DRBD (RHEL/CentOS 7+, Ubuntu 18.04+, SUSE, Debian)
- Kernel version 2.6.33 or later (most modern distributions include DRBD support)
- Root access or sudo privileges on all participating nodes
- Network Time Protocol (NTP) configured for time synchronization
Network Configuration
- Dedicated network interface or VLAN for DRBD replication (recommended)
- Static IP addresses configured on all nodes
- Firewall rules allowing DRBD traffic (default port 7788)
- Low latency connection between nodes (< 100ms for optimal performance)
Storage Considerations
- Identical block device sizes across all nodes
- Unused block devices (no existing filesystems or data)
- SSD storage recommended for metadata and high-performance requirements
- Backup strategy in place before beginning configuration
Step-by-Step DRBD Installation and Configuration
Step 1: Install DRBD Packages
Begin by installing DRBD on all participating nodes. The installation method varies depending on your Linux distribution.
For RHEL/CentOS/Fedora:
```bash
Enable EPEL repository (if not already enabled)
sudo yum install epel-release -y
Install DRBD kernel module and utilities
sudo yum install drbd90-utils kmod-drbd90 -y
For CentOS 8/RHEL 8
sudo dnf install drbd90-utils kmod-drbd90 -y
```
For Ubuntu/Debian:
```bash
Update package repository
sudo apt update
Install DRBD utilities and kernel module
sudo apt install drbd-utils drbd-dkms -y
Load the DRBD kernel module
sudo modprobe drbd
```
For SUSE/openSUSE:
```bash
Install DRBD packages
sudo zypper install drbd drbd-utils drbd-kmp-default -y
```
Step 2: Load and Verify DRBD Kernel Module
After installation, ensure the DRBD kernel module loads correctly:
```bash
Load DRBD module
sudo modprobe drbd
Verify module is loaded
lsmod | grep drbd
Check DRBD version
sudo drbdadm --version
```
Expected output should show DRBD version information and confirm the module is active.
Step 3: Configure Network and Hostnames
Ensure proper hostname resolution between nodes by editing `/etc/hosts`:
```bash
Edit hosts file on both nodes
sudo nano /etc/hosts
Add entries for both nodes
192.168.1.10 drbd-node1
192.168.1.11 drbd-node2
```
Test connectivity between nodes:
```bash
From node1 to node2
ping drbd-node2
From node2 to node1
ping drbd-node1
```
Step 4: Prepare Storage Devices
Identify and prepare the block devices for DRBD replication. Ensure devices are unmounted and contain no valuable data:
```bash
List available block devices
lsblk
Check if device is mounted (should return nothing)
mount | grep /dev/sdb1
Verify device is not part of any RAID or LVM
cat /proc/mdstat
pvdisplay
```
Warning: The following steps will destroy any existing data on the specified devices.
Step 5: Create DRBD Configuration
Create the main DRBD configuration file. DRBD uses a hierarchical configuration structure with global settings and resource-specific configurations.
Create the global configuration file:
```bash
sudo nano /etc/drbd.d/global_common.conf
```
Add the following global configuration:
```bash
global {
usage-count no;
}
common {
handlers {
pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f";
}
startup {
degr-wfc-timeout 60;
}
options {
auto-promote yes;
}
disk {
on-io-error detach;
}
net {
cram-hmac-alg sha1;
shared-secret "your-shared-secret-key";
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
after-sb-2pri call-pri-lost-after-sb;
protocol C;
}
}
```
Step 6: Create Resource Configuration
Create a resource-specific configuration file:
```bash
sudo nano /etc/drbd.d/data.res
```
Add the resource configuration:
```bash
resource data {
on drbd-node1 {
device /dev/drbd0;
disk /dev/sdb1;
address 192.168.1.10:7788;
meta-disk internal;
}
on drbd-node2 {
device /dev/drbd0;
disk /dev/sdb1;
address 192.168.1.11:7788;
meta-disk internal;
}
}
```
Configuration Parameters Explained:
- device: Virtual DRBD block device path
- disk: Physical block device to be replicated
- address: IP address and port for DRBD communication
- meta-disk: Location for DRBD metadata (internal uses same device)
Step 7: Copy Configuration to All Nodes
Ensure identical configuration across all nodes:
```bash
Copy configuration files to second node
scp /etc/drbd.d/*.conf root@drbd-node2:/etc/drbd.d/
scp /etc/drbd.d/*.res root@drbd-node2:/etc/drbd.d/
```
Step 8: Initialize DRBD Metadata
Create DRBD metadata on all nodes:
```bash
On both nodes, run:
sudo drbdadm create-md data
Expected output should show metadata creation success
```
If you encounter issues, verify the backing device is not mounted and has no existing filesystem.
Step 9: Start DRBD Service
Enable and start DRBD on both nodes:
```bash
Enable DRBD service
sudo systemctl enable drbd
Start DRBD service
sudo systemctl start drbd
Check service status
sudo systemctl status drbd
```
Bring up the DRBD resource:
```bash
On both nodes
sudo drbdadm up data
Check DRBD status
sudo drbdadm status data
```
Step 10: Establish Initial Synchronization
Choose one node as the primary and initiate the first synchronization:
```bash
On the chosen primary node (drbd-node1)
sudo drbdadm primary --force data
Monitor synchronization progress
watch -n 1 'cat /proc/drbd'
```
The synchronization process may take considerable time depending on device size and network speed.
Practical Examples and Use Cases
Example 1: MySQL Database Replication
Configure DRBD for MySQL high availability:
```bash
After DRBD synchronization completes, create filesystem on primary node
sudo mkfs.ext4 /dev/drbd0
Create mount point
sudo mkdir /var/lib/mysql-drbd
Mount DRBD device
sudo mount /dev/drbd0 /var/lib/mysql-drbd
Configure MySQL to use DRBD storage
sudo nano /etc/mysql/mysql.conf.d/mysqld.cnf
```
Add to MySQL configuration:
```ini
[mysqld]
datadir = /var/lib/mysql-drbd
```
Example 2: Web Server Document Root
Set up DRBD for web server content synchronization:
```bash
Create filesystem (on primary node only)
sudo mkfs.xfs /dev/drbd0
Create web root directory
sudo mkdir /var/www/html-drbd
Mount DRBD device
sudo mount /dev/drbd0 /var/www/html-drbd
Configure Apache virtual host
sudo nano /etc/apache2/sites-available/drbd-site.conf
```
Apache configuration example:
```apache
DocumentRoot /var/www/html-drbd
ServerName example.com
AllowOverride All
Require all granted
```
Example 3: Multi-Resource Configuration
Configure multiple DRBD resources for different services:
```bash
Create additional resource configuration
sudo nano /etc/drbd.d/web.res
```
```bash
resource web {
on drbd-node1 {
device /dev/drbd1;
disk /dev/sdc1;
address 192.168.1.10:7789;
meta-disk internal;
}
on drbd-node2 {
device /dev/drbd1;
disk /dev/sdc1;
address 192.168.1.11:7789;
meta-disk internal;
}
}
```
Initialize and manage multiple resources:
```bash
Create metadata for new resource
sudo drbdadm create-md web
Bring up all resources
sudo drbdadm up all
Check status of all resources
sudo drbdadm status
```
Common Issues and Troubleshooting
Issue 1: Split-Brain Scenarios
Split-brain occurs when both nodes become primary simultaneously, leading to data divergence.
Symptoms:
- DRBD status shows "StandAlone" state
- Log entries indicating split-brain detection
- Unable to establish connection between nodes
Resolution:
```bash
On the secondary node (data will be discarded)
sudo drbdadm secondary data
sudo drbdadm connect --discard-my-data data
On the primary node
sudo drbdadm connect data
```
Prevention:
- Use proper fencing mechanisms
- Implement STONITH (Shoot The Other Node In The Head)
- Configure appropriate handlers in global configuration
Issue 2: Slow Synchronization Performance
Symptoms:
- Extremely slow initial sync or resync operations
- High network latency during synchronization
Solutions:
```bash
Increase synchronization rate (adjust based on network capacity)
sudo drbdadm disk-options --resync-rate=100M data
Optimize network buffer sizes
echo 'net.core.rmem_max = 134217728' >> /etc/sysctl.conf
echo 'net.core.wmem_max = 134217728' >> /etc/sysctl.conf
sudo sysctl -p
```
Add performance tuning to resource configuration:
```bash
resource data {
disk {
resync-rate 100M;
c-plan-ahead 20;
c-fill-target 10M;
}
net {
sndbuf-size 1M;
rcvbuf-size 1M;
}
}
```
Issue 3: Connection Problems
Symptoms:
- Nodes cannot establish connection
- "Connection refused" errors in logs
- Resources stuck in "WFConnection" state
Troubleshooting Steps:
```bash
Check firewall rules
sudo iptables -L | grep 7788
sudo firewall-cmd --list-ports
Open DRBD port if needed
sudo firewall-cmd --permanent --add-port=7788/tcp
sudo firewall-cmd --reload
Test network connectivity
telnet drbd-node2 7788
Check DRBD service status
sudo systemctl status drbd
sudo journalctl -u drbd -f
```
Issue 4: Metadata Corruption
Symptoms:
- DRBD fails to start
- Metadata inconsistency errors
- Unable to create or read metadata
Resolution:
```bash
Backup existing metadata (if possible)
sudo drbdadm dump-md data > /tmp/drbd-metadata-backup
Recreate metadata
sudo drbdadm create-md data
If data exists on one node, force primary and resync
sudo drbdadm primary --force data
```
Issue 5: Kernel Module Loading Issues
Symptoms:
- "modprobe: FATAL: Module drbd not found" errors
- DRBD utilities cannot communicate with kernel
Solutions:
```bash
Check if module exists
find /lib/modules/$(uname -r) -name "drbd*"
Install appropriate kernel module package
sudo apt install drbd-dkms # Ubuntu/Debian
sudo yum install kmod-drbd90 # RHEL/CentOS
Rebuild DKMS modules if necessary
sudo dkms autoinstall
```
Best Practices and Professional Tips
Security Considerations
1. Use Encrypted Connections:
```bash
net {
cram-hmac-alg sha256;
shared-secret "strong-random-secret-key";
data-integrity-alg crc32c;
}
```
2. Implement Network Isolation:
- Use dedicated VLANs for DRBD traffic
- Configure firewall rules to restrict access
- Use VPN tunnels for geographically distributed nodes
3. Regular Security Audits:
- Monitor DRBD logs for suspicious activity
- Rotate shared secrets periodically
- Keep DRBD software updated
Performance Optimization
1. Storage Configuration:
- Use SSDs for DRBD metadata
- Align partition boundaries properly
- Configure appropriate I/O schedulers
```bash
Set I/O scheduler for DRBD devices
echo mq-deadline > /sys/block/sdb/queue/scheduler
```
2. Network Tuning:
- Use dedicated gigabit or 10GbE connections
- Optimize TCP buffer sizes
- Consider SR-IOV for virtualized environments
3. Resource Allocation:
```bash
resource data {
disk {
al-extents 6433;
c-plan-ahead 20;
c-fill-target 100M;
c-max-rate 4G;
}
}
```
Monitoring and Maintenance
1. Implement Comprehensive Monitoring:
```bash
#!/bin/bash
DRBD monitoring script
RESOURCE="data"
STATUS=$(drbdadm cstate $RESOURCE)
if [ "$STATUS" != "Connected" ]; then
echo "CRITICAL: DRBD resource $RESOURCE is $STATUS"
exit 2
fi
echo "OK: DRBD resource $RESOURCE is Connected"
exit 0
```
2. Regular Backup Procedures:
- Test failover scenarios regularly
- Maintain configuration backups
- Document recovery procedures
3. Log Management:
```bash
Configure rsyslog for DRBD
echo 'kern.* /var/log/drbd.log' >> /etc/rsyslog.conf
sudo systemctl restart rsyslog
```
High Availability Integration
1. Pacemaker Integration:
```bash
Install Pacemaker cluster stack
sudo apt install pacemaker corosync crmsh
Configure DRBD as cluster resource
sudo crm configure primitive drbd_data ocf:linbit:drbd \
params drbd_resource=data \
op start interval=0 timeout=240 \
op stop interval=0 timeout=100
```
2. Automatic Failover Configuration:
- Configure proper resource constraints
- Implement health checks and monitoring
- Test failover scenarios thoroughly
Capacity Planning
1. Network Bandwidth Requirements:
- Calculate peak write rates
- Account for resynchronization traffic
- Plan for network redundancy
2. Storage Sizing:
- Account for DRBD metadata overhead (approximately 32MB + 18MB per TB)
- Plan for activity log and bitmap storage
- Consider snapshot and backup space requirements
Advanced Configuration Options
Protocol Selection
DRBD supports three replication protocols:
- Protocol A (Asynchronous): Fastest but least safe
- Protocol B (Semi-synchronous): Balanced performance and safety
- Protocol C (Synchronous): Safest but slowest
```bash
net {
protocol C; # Recommended for critical data
}
```
Quorum Configuration
For multi-node setups, configure quorum to prevent split-brain:
```bash
resource data {
options {
quorum majority;
on-no-quorum suspend-io;
}
}
```
Compression and Deduplication
Enable compression for WAN replication:
```bash
net {
compress {
alg lz4;
level 1;
}
}
```
Conclusion
DRBD provides a robust, enterprise-grade solution for storage replication in Linux environments. By following the comprehensive steps outlined in this guide, you can implement reliable data synchronization that forms the foundation of high-availability systems.
Key takeaways from this implementation guide:
- Proper planning is essential: Ensure adequate network bandwidth, storage capacity, and hardware resources before deployment
- Configuration consistency: Maintain identical configurations across all nodes to prevent synchronization issues
- Monitor continuously: Implement comprehensive monitoring to detect and resolve issues quickly
- Test regularly: Perform routine failover testing to validate system reliability
- Stay updated: Keep DRBD software and configurations current with security patches and performance improvements
Next Steps
After successfully implementing DRBD storage replication, consider these advanced topics:
1. Cluster Integration: Integrate DRBD with Pacemaker or other cluster management solutions
2. Backup Strategies: Implement comprehensive backup solutions that work with DRBD
3. Performance Tuning: Optimize configurations based on your specific workload requirements
4. Disaster Recovery: Develop and test disaster recovery procedures
5. Scaling: Plan for adding additional nodes or resources as your infrastructure grows
DRBD's flexibility and reliability make it an excellent choice for organizations requiring zero-downtime storage solutions. With proper implementation and maintenance, DRBD can provide years of reliable service while protecting your critical data assets.
Remember to always test configurations in non-production environments first, maintain current backups, and document all procedures for your team. The investment in proper DRBD implementation will pay dividends in system reliability and data protection for years to come.