How to monitor → mdadm --monitor --scan --daemonise - RAID Management with mdadm Guide

How to Monitor RAID Arrays with mdadm --monitor --scan --daemonise Table of Contents 1. [Introduction](#introduction) 2. [Prerequisites](#prerequisites) 3. [Understanding mdadm Monitor Options](#understanding-mdadm-monitor-options) 4. [Basic Monitoring Setup](#basic-monitoring-setup) 5. [Advanced Configuration Options](#advanced-configuration-options) 6. [Practical Examples and Use Cases](#practical-examples-and-use-cases) 7. [Troubleshooting Common Issues](#troubleshooting-common-issues) 8. [Best Practices and Professional Tips](#best-practices-and-professional-tips) 9. [Integration with System Services](#integration-with-system-services) 10. [Conclusion](#conclusion) Introduction RAID (Redundant Array of Independent Disks) monitoring is a critical aspect of system administration that ensures data integrity and availability. The `mdadm` utility in Linux provides powerful monitoring capabilities through its `--monitor` functionality, which can continuously watch RAID arrays for failures, degradation, and other important events. The command `mdadm --monitor --scan --daemonise` represents one of the most essential tools for automated RAID monitoring in Linux environments. This comprehensive guide will walk you through setting up, configuring, and maintaining a robust RAID monitoring system that can alert you to potential issues before they become critical failures. By the end of this article, you'll understand how to implement automated RAID monitoring, configure appropriate alerting mechanisms, troubleshoot common issues, and follow industry best practices for maintaining healthy RAID arrays in production environments. Prerequisites Before implementing mdadm monitoring, ensure you have the following requirements met: System Requirements - Linux system with mdadm installed (version 3.0 or higher recommended) - Root or sudo privileges for system configuration - Active RAID arrays configured with mdadm - Basic understanding of Linux command line operations Software Dependencies ```bash Verify mdadm installation mdadm --version Install mdadm if not present (Ubuntu/Debian) sudo apt-get update && sudo apt-get install mdadm Install mdadm (CentOS/RHEL/Fedora) sudo yum install mdadm or for newer versions sudo dnf install mdadm ``` Configuration Files Ensure the following configuration files are properly set up: - `/etc/mdadm/mdadm.conf` (Debian/Ubuntu) or `/etc/mdadm.conf` (CentOS/RHEL) - `/proc/mdstat` accessible for reading array status - Mail system configured for notifications (optional but recommended) Understanding mdadm Monitor Options Core Monitoring Parameters The `mdadm --monitor` command accepts several key parameters that control its behavior: --monitor The `--monitor` flag puts mdadm into monitoring mode, where it continuously watches specified RAID arrays for changes in their status. This mode is designed to run as a long-running process that can detect and respond to various RAID events. --scan The `--scan` option instructs mdadm to automatically discover and monitor all RAID arrays listed in the configuration file (`/etc/mdadm/mdadm.conf` or `/etc/mdadm.conf`). This eliminates the need to manually specify each array device. --daemonise The `--daemonise` (or `--daemonize` in American spelling) option causes mdadm to run as a background daemon process, detaching from the terminal and continuing to run even after the user logs out. Additional Important Options ```bash Common monitoring options --delay=seconds # Time between checks (default: 60 seconds) --mail=email@domain.com # Email address for notifications --program=script # Custom script to run on events --syslog # Send notifications to syslog --test # Test mode - don't actually send notifications ``` Basic Monitoring Setup Step 1: Configure mdadm.conf First, ensure your `/etc/mdadm/mdadm.conf` file is properly configured: ```bash Generate or update mdadm configuration sudo mdadm --detail --scan >> /etc/mdadm/mdadm.conf Example mdadm.conf content ARRAY /dev/md0 metadata=1.2 name=server:0 UUID=12345678:90abcdef:12345678:90abcdef ARRAY /dev/md1 metadata=1.2 name=server:1 UUID=87654321:fedcba09:87654321:fedcba09 Monitoring configuration MAILADDR admin@example.com PROGRAM /usr/local/bin/raid-alert.sh ``` Step 2: Basic Monitor Command Start monitoring with the basic command: ```bash Basic monitoring command sudo mdadm --monitor --scan --daemonise Verify the daemon is running ps aux | grep mdadm pgrep -f "mdadm.*monitor" ``` Step 3: Verify Monitoring Status Check that monitoring is active and working: ```bash Check RAID array status cat /proc/mdstat View mdadm process details sudo systemctl status mdmonitor # On systemd systems sudo service mdmonitor status # On SysV init systems Check system logs for mdadm messages sudo journalctl -u mdmonitor -f # systemd sudo tail -f /var/log/syslog | grep mdadm ``` Advanced Configuration Options Custom Monitoring Scripts Create custom notification scripts for specific events: ```bash #!/bin/bash /usr/local/bin/raid-alert.sh Custom RAID alert script EVENT="$1" DEVICE="$2" COMPONENT="$3" LOGFILE="/var/log/raid-alerts.log" TIMESTAMP=$(date '+%Y-%m-%d %H:%M:%S') Log the event echo "$TIMESTAMP - Event: $EVENT, Device: $DEVICE, Component: $COMPONENT" >> $LOGFILE case "$EVENT" in "Fail") # Critical failure - immediate notification echo "CRITICAL: RAID device $DEVICE has failed component $COMPONENT" | \ mail -s "URGENT: RAID Failure on $(hostname)" admin@example.com # Send to monitoring system (e.g., Nagios, Zabbix) /usr/local/bin/send_alert.sh "CRITICAL" "RAID failure: $DEVICE/$COMPONENT" ;; "DegradedArray") # Array is degraded but functional echo "WARNING: RAID array $DEVICE is running in degraded mode" | \ mail -s "RAID Degraded: $(hostname)" admin@example.com ;; "SpareActive") # Spare drive activated echo "INFO: Spare drive activated for $DEVICE" | \ mail -s "RAID Spare Activated: $(hostname)" admin@example.com ;; esac Make script executable chmod +x /usr/local/bin/raid-alert.sh ``` Fine-tuning Monitor Intervals Adjust monitoring frequency based on your requirements: ```bash High-frequency monitoring (every 30 seconds) sudo mdadm --monitor --scan --daemonise --delay=30 Low-frequency monitoring (every 5 minutes) sudo mdadm --monitor --scan --daemonise --delay=300 Monitor specific arrays with different intervals sudo mdadm --monitor /dev/md0 /dev/md1 --delay=60 --daemonise ``` Email Configuration Set up comprehensive email notifications: ```bash Configure in mdadm.conf echo "MAILADDR root@localhost admin@example.com backup-admin@example.com" >> /etc/mdadm/mdadm.conf Test email functionality echo "Test RAID monitoring email" | mail -s "RAID Monitor Test" admin@example.com Configure mail relay if needed Edit /etc/postfix/main.cf or equivalent MTA configuration ``` Practical Examples and Use Cases Example 1: Production Server Setup For a production server with multiple RAID arrays: ```bash #!/bin/bash Production RAID monitoring setup script Update mdadm configuration sudo mdadm --detail --scan | sudo tee /etc/mdadm/mdadm.conf Add monitoring configuration cat << EOF | sudo tee -a /etc/mdadm/mdadm.conf MAILADDR sysadmin@company.com monitoring@company.com PROGRAM /opt/monitoring/raid-alert.sh EOF Start monitoring with appropriate settings sudo mdadm --monitor --scan --daemonise --delay=60 --syslog Enable automatic startup sudo systemctl enable mdmonitor sudo systemctl start mdmonitor echo "RAID monitoring configured for production environment" ``` Example 2: Development Environment For development systems with less critical monitoring needs: ```bash Development environment monitoring sudo mdadm --monitor --scan --daemonise --delay=300 --test Log to file instead of email sudo mdadm --monitor --scan --daemonise --delay=300 --program=/usr/local/bin/log-only.sh ``` Example 3: Integration with Monitoring Systems Integrate with external monitoring systems: ```bash #!/bin/bash /usr/local/bin/monitoring-integration.sh Integration script for external monitoring EVENT="$1" DEVICE="$2" COMPONENT="$3" Send to Nagios/Icinga echo "RAID_$EVENT:$DEVICE:$COMPONENT" | /usr/sbin/send_nsca -H monitoring.example.com Send to Zabbix zabbix_sender -z zabbix.example.com -s "$(hostname)" -k "raid.status" -o "$EVENT:$DEVICE" Send to Prometheus Pushgateway curl -X POST http://pushgateway.example.com:9091/metrics/job/raid_monitor/instance/$(hostname) \ --data-binary "raid_event{device=\"$DEVICE\",component=\"$COMPONENT\"} 1" Log locally logger -p local0.warning "RAID Event: $EVENT on $DEVICE ($COMPONENT)" ``` Troubleshooting Common Issues Issue 1: Monitoring Daemon Not Starting Symptoms: - mdadm monitor process doesn't appear in process list - No monitoring alerts received - Service fails to start Solutions: ```bash Check configuration file syntax sudo mdadm --config-file=/etc/mdadm/mdadm.conf --test --scan Verify RAID arrays are properly configured sudo mdadm --detail --scan Check for conflicting processes sudo pkill -f "mdadm.*monitor" sudo systemctl stop mdmonitor Start in foreground for debugging sudo mdadm --monitor --scan --verbose --oneshot Check system logs sudo journalctl -u mdmonitor -n 50 sudo tail -f /var/log/messages | grep mdadm ``` Issue 2: Missing or Delayed Notifications Symptoms: - RAID events occur but no notifications received - Delayed notification delivery - Notifications sent to wrong addresses Solutions: ```bash Test email configuration echo "Test message" | mail -s "Test Subject" your-email@example.com Check mail queue mailq sudo postqueue -f # Flush mail queue Verify mdadm configuration grep -E "MAILADDR|PROGRAM" /etc/mdadm/mdadm.conf Test custom notification scripts sudo /usr/local/bin/raid-alert.sh "Test" "/dev/md0" "test-component" Check script permissions and execution ls -la /usr/local/bin/raid-alert.sh sudo -u mdadm /usr/local/bin/raid-alert.sh "Test" "/dev/md0" "test" ``` Issue 3: High CPU Usage from Monitor Process Symptoms: - mdadm monitor process consuming high CPU - System performance degradation - Frequent disk I/O from monitoring Solutions: ```bash Increase monitoring delay sudo pkill -f "mdadm.*monitor" sudo mdadm --monitor --scan --daemonise --delay=300 # 5 minutes Monitor system resources top -p $(pgrep -f "mdadm.*monitor") iostat -x 1 10 Check for underlying RAID issues cat /proc/mdstat sudo mdadm --detail /dev/md0 Review system logs for errors sudo dmesg | grep -E "(md|raid)" sudo journalctl -f | grep mdadm ``` Issue 4: False Positive Alerts Symptoms: - Frequent unnecessary alerts - Alerts for normal operations - Monitoring script triggering incorrectly Solutions: ```bash Add filtering to notification scripts #!/bin/bash Enhanced raid-alert.sh with filtering EVENT="$1" DEVICE="$2" COMPONENT="$3" Filter out routine events case "$EVENT" in "NewArray"|"ArrayDisappeared") # Log but don't alert for routine array changes during maintenance logger "RAID Info: $EVENT on $DEVICE" exit 0 ;; "TestMessage") # Skip test messages exit 0 ;; esac Continue with normal alert processing... ``` Best Practices and Professional Tips Security Considerations ```bash Run monitoring with appropriate permissions Create dedicated user for mdadm monitoring sudo useradd -r -s /bin/false -d /var/lib/mdadm mdadm-monitor Set proper file permissions sudo chown root:root /etc/mdadm/mdadm.conf sudo chmod 644 /etc/mdadm/mdadm.conf Secure notification scripts sudo chown root:root /usr/local/bin/raid-alert.sh sudo chmod 755 /usr/local/bin/raid-alert.sh ``` Performance Optimization ```bash Optimize monitoring intervals based on environment Production: 60-120 seconds Development: 300-600 seconds Critical systems: 30-60 seconds Use appropriate logging levels sudo mdadm --monitor --scan --daemonise --delay=60 --syslog --verbose=1 Monitor monitoring performance #!/bin/bash Monitor the monitor script while true; do MONITOR_PID=$(pgrep -f "mdadm.*monitor") if [ -n "$MONITOR_PID" ]; then ps -o pid,pcpu,pmem,time,cmd -p $MONITOR_PID else echo "$(date): mdadm monitor not running!" >> /var/log/monitor-check.log fi sleep 300 done ``` Backup and Recovery ```bash Backup mdadm configuration sudo cp /etc/mdadm/mdadm.conf /etc/mdadm/mdadm.conf.backup.$(date +%Y%m%d) Create configuration recovery script #!/bin/bash /usr/local/bin/recover-mdadm-config.sh echo "Recovering mdadm configuration..." sudo mdadm --detail --scan > /tmp/mdadm.conf.recovered sudo cp /tmp/mdadm.conf.recovered /etc/mdadm/mdadm.conf sudo systemctl restart mdmonitor echo "Configuration recovered and monitoring restarted" ``` Monitoring the Monitor Implement meta-monitoring to ensure your RAID monitoring is working: ```bash #!/bin/bash /usr/local/bin/monitor-health-check.sh Cron job to verify RAID monitoring is functioning MONITOR_PID=$(pgrep -f "mdadm.*monitor") LOGFILE="/var/log/monitor-health.log" TIMESTAMP=$(date '+%Y-%m-%d %H:%M:%S') if [ -z "$MONITOR_PID" ]; then echo "$TIMESTAMP: ERROR - mdadm monitor not running" >> $LOGFILE # Restart monitoring sudo systemctl restart mdmonitor echo "$TIMESTAMP: Attempted to restart mdmonitor" >> $LOGFILE # Send alert echo "RAID monitoring daemon was down and has been restarted on $(hostname)" | \ mail -s "RAID Monitor Service Alert" admin@example.com else echo "$TIMESTAMP: OK - mdadm monitor running (PID: $MONITOR_PID)" >> $LOGFILE fi Add to crontab /5 * /usr/local/bin/monitor-health-check.sh ``` Integration with System Services Systemd Integration Create a custom systemd service for enhanced control: ```bash /etc/systemd/system/mdadm-monitor.service [Unit] Description=MD Array Monitor After=multi-user.target [Service] Type=forking ExecStart=/sbin/mdadm --monitor --scan --daemonise --delay=60 --syslog ExecReload=/bin/kill -HUP $MAINPID KillMode=process Restart=on-failure RestartSec=30 [Install] WantedBy=multi-user.target Enable and start the service sudo systemctl daemon-reload sudo systemctl enable mdadm-monitor.service sudo systemctl start mdadm-monitor.service ``` Log Rotation Configure log rotation for monitoring logs: ```bash /etc/logrotate.d/mdadm-monitor /var/log/raid-alerts.log { daily missingok rotate 30 compress delaycompress notifempty create 644 root root postrotate /usr/bin/systemctl reload mdadm-monitor > /dev/null 2>&1 || true endscript } ``` Startup Scripts For systems without systemd: ```bash #!/bin/bash /etc/init.d/mdadm-monitor SysV init script for mdadm monitoring case "$1" in start) echo "Starting mdadm monitor..." /sbin/mdadm --monitor --scan --daemonise --delay=60 --syslog ;; stop) echo "Stopping mdadm monitor..." pkill -f "mdadm.*monitor" ;; restart) $0 stop sleep 2 $0 start ;; status) if pgrep -f "mdadm.*monitor" > /dev/null; then echo "mdadm monitor is running" else echo "mdadm monitor is not running" fi ;; *) echo "Usage: $0 {start|stop|restart|status}" exit 1 ;; esac Make executable and add to system startup chmod +x /etc/init.d/mdadm-monitor update-rc.d mdadm-monitor defaults ``` Conclusion Implementing robust RAID monitoring with `mdadm --monitor --scan --daemonise` is essential for maintaining data integrity and system reliability in Linux environments. This comprehensive guide has covered everything from basic setup to advanced configuration, troubleshooting, and best practices. Key Takeaways 1. Automated Monitoring: The `--scan --daemonise` combination provides hands-off monitoring of all configured RAID arrays 2. Flexible Alerting: Custom scripts and multiple notification methods ensure you're informed of issues promptly 3. Proactive Maintenance: Regular monitoring helps identify potential problems before they cause data loss 4. Integration Capabilities: mdadm monitoring integrates well with existing system monitoring infrastructure Next Steps After implementing mdadm monitoring: 1. Test Your Setup: Regularly test notification systems and recovery procedures 2. Document Procedures: Maintain clear documentation of your monitoring configuration 3. Review and Optimize: Periodically review monitoring logs and adjust settings as needed 4. Plan for Scaling: Consider how monitoring will scale as you add more RAID arrays Final Recommendations - Always test monitoring configurations in non-production environments first - Implement redundant notification methods (email, SMS, monitoring systems) - Regularly review and update monitoring scripts and configurations - Keep mdadm and related tools updated to the latest stable versions - Maintain comprehensive backup strategies alongside RAID monitoring By following the practices outlined in this guide, you'll have a robust, reliable RAID monitoring system that helps protect your data and ensures system availability. Remember that monitoring is just one part of a comprehensive data protection strategy – regular backups, proper hardware maintenance, and documented procedures are equally important for maintaining a healthy storage infrastructure.