How to repair XFS → xfs_repair /dev/..

How to Repair XFS Filesystems Using xfs_repair Command Table of Contents 1. [Introduction](#introduction) 2. [Prerequisites](#prerequisites) 3. [Understanding XFS Filesystem](#understanding-xfs-filesystem) 4. [XFS Repair Command Overview](#xfs-repair-command-overview) 5. [Pre-Repair Assessment](#pre-repair-assessment) 6. [Step-by-Step XFS Repair Process](#step-by-step-xfs-repair-process) 7. [Advanced Repair Options](#advanced-repair-options) 8. [Practical Examples](#practical-examples) 9. [Troubleshooting Common Issues](#troubleshooting-common-issues) 10. [Best Practices](#best-practices) 11. [Prevention Strategies](#prevention-strategies) 12. [Conclusion](#conclusion) Introduction XFS (eXtended File System) is a high-performance journaling filesystem originally developed by Silicon Graphics Inc. (SGI) and widely used in enterprise Linux environments. Despite its robustness and advanced features, XFS filesystems can occasionally become corrupted due to hardware failures, power outages, or system crashes. When corruption occurs, the `xfs_repair` utility becomes an essential tool for recovering and restoring filesystem integrity. This comprehensive guide will walk you through the complete process of diagnosing and repairing XFS filesystems using the `xfs_repair` command. You'll learn how to identify corruption, perform safe repairs, handle complex scenarios, and implement preventive measures to minimize future filesystem issues. Prerequisites Before proceeding with XFS filesystem repairs, ensure you have: System Requirements - Linux system with XFS utilities installed - Root or sudo privileges - Sufficient free disk space for backup operations - Access to the affected storage device Required Packages Install the necessary XFS utilities on your system: For Ubuntu/Debian systems: ```bash sudo apt update sudo apt install xfsprogs ``` For Red Hat/CentOS/Fedora systems: ```bash sudo yum install xfsprogs or for newer versions sudo dnf install xfsprogs ``` Essential Knowledge - Basic Linux command-line operations - Understanding of filesystem concepts - Familiarity with device naming conventions (/dev/sdX, /dev/nvmeX) - Knowledge of backup and recovery procedures Safety Preparations - Critical: Always create a complete backup before attempting repairs - Ensure the filesystem is unmounted - Have a rescue system or live USB available - Document current system configuration Understanding XFS Filesystem XFS Architecture XFS employs several key components that make it robust but also require specific repair approaches: - Superblocks: Contain filesystem metadata and configuration - Allocation Groups (AGs): Divide the filesystem into manageable sections - Journals: Record filesystem changes for crash recovery - B+ Trees: Organize directory and extent information Common Corruption Causes Understanding why XFS corruption occurs helps in both repair and prevention: 1. Hardware Failures: Bad sectors, failing drives, memory errors 2. Power Issues: Sudden shutdowns, power fluctuations 3. System Crashes: Kernel panics, forced reboots 4. Storage Controller Problems: RAID controller failures, cable issues 5. Software Bugs: Kernel bugs, driver issues XFS Repair Command Overview Basic Syntax ```bash xfs_repair [options] device ``` Essential Options | Option | Description | |--------|-------------| | `-n` | No-modify mode (read-only check) | | `-v` | Verbose output | | `-f` | Force repair on dirty filesystem | | `-L` | Force log zeroing | | `-d` | Dangerous mode (use with extreme caution) | | `-m` | Specify maximum memory usage | | `-t` | Set timeout values | Safety Features The `xfs_repair` utility includes several safety mechanisms: - Refuses to run on mounted filesystems - Creates detailed logs of all operations - Provides dry-run capabilities with `-n` option - Validates repairs before committing changes Pre-Repair Assessment Step 1: Identify the Problem Before running repairs, determine the extent of corruption: ```bash Check system logs for filesystem errors sudo dmesg | grep -i xfs sudo journalctl | grep -i xfs Look for specific error patterns sudo grep -i "xfs\|corruption\|metadata" /var/log/messages ``` Step 2: Unmount the Filesystem Ensure the filesystem is completely unmounted: ```bash Check if filesystem is mounted mount | grep /dev/sdX1 Unmount if necessary sudo umount /dev/sdX1 Force unmount if regular unmount fails sudo umount -f /dev/sdX1 Kill processes using the filesystem if needed sudo fuser -km /mount/point sudo umount /dev/sdX1 ``` Step 3: Perform Read-Only Check Always start with a non-destructive assessment: ```bash Perform read-only check sudo xfs_repair -n /dev/sdX1 Save output for analysis sudo xfs_repair -n -v /dev/sdX1 > xfs_check_output.txt 2>&1 ``` Step 4: Analyze Check Results Review the output to understand corruption severity: ```bash Look for key indicators grep -i "error\|corrupt\|bad\|inconsistent" xfs_check_output.txt Count total errors found grep -c "ERROR" xfs_check_output.txt ``` Step-by-Step XFS Repair Process Phase 1: Initial Safety Checks Step 1: Verify Device Accessibility ```bash Confirm device exists and is accessible ls -la /dev/sdX1 Check device information sudo fdisk -l /dev/sdX Verify XFS filesystem sudo file -s /dev/sdX1 ``` Step 2: Create Emergency Backup If possible, create a low-level backup: ```bash Create device image (if space permits) sudo dd if=/dev/sdX1 of=/backup/location/filesystem_backup.img bs=64K Or create compressed backup sudo dd if=/dev/sdX1 bs=64K | gzip > /backup/location/filesystem_backup.img.gz ``` Phase 2: Execute Repair Process Step 1: Standard Repair Attempt Begin with the most conservative repair approach: ```bash Standard repair with verbose output sudo xfs_repair -v /dev/sdX1 ``` Step 2: Monitor Repair Progress The repair process provides detailed progress information: ```bash Typical output phases: Phase 1 - find and verify superblock Phase 2 - using internal log Phase 3 - for each AG Phase 4 - check for duplicate blocks Phase 5 - rebuild AG headers and trees Phase 6 - check inode connectivity Phase 7 - verify and correct link counts ``` Step 3: Handle Log Issues If log corruption is detected: ```bash Zero the log (destroys recent changes) sudo xfs_repair -L /dev/sdX1 This should be followed by standard repair sudo xfs_repair -v /dev/sdX1 ``` Phase 3: Post-Repair Verification Step 1: Verify Repair Success ```bash Perform final read-only check sudo xfs_repair -n /dev/sdX1 Mount filesystem in read-only mode first sudo mount -o ro /dev/sdX1 /mnt/test Check filesystem statistics sudo xfs_info /mnt/test ``` Step 2: Test Filesystem Functionality ```bash Mount read-write sudo mount -o rw /dev/sdX1 /mnt/test Test basic operations ls -la /mnt/test touch /mnt/test/test_file rm /mnt/test/test_file Check filesystem health sudo xfs_db -r -c "check" /dev/sdX1 ``` Advanced Repair Options Handling Severe Corruption Using Dangerous Mode Warning: Only use when standard repair fails and you accept data loss risk: ```bash Enable dangerous mode repairs sudo xfs_repair -d /dev/sdX1 ``` Memory Management For large filesystems or systems with limited RAM: ```bash Limit memory usage (in megabytes) sudo xfs_repair -m 512 /dev/sdX1 For very large filesystems sudo xfs_repair -m 2048 -v /dev/sdX1 ``` Superblock Recovery When primary superblock is corrupted: ```bash Find backup superblocks sudo xfs_db -r -c "sb 1" -c "print" /dev/sdX1 Use specific superblock for repair sudo xfs_repair -s 1 /dev/sdX1 ``` Dealing with Specific Error Types Inode Corruption ```bash Check inode-specific issues sudo xfs_db -r -c "inode " -c "print" /dev/sdX1 Repair with inode rebuilding sudo xfs_repair -i /dev/sdX1 ``` Directory Corruption ```bash Rebuild directory structures sudo xfs_repair -d -v /dev/sdX1 ``` Practical Examples Example 1: Standard Filesystem Corruption Scenario: System crashed during heavy I/O operations, XFS filesystem shows corruption errors. ```bash Step 1: Check system logs sudo dmesg | tail -50 | grep -i xfs Output shows: XFS (sdb1): metadata I/O error Step 2: Unmount filesystem sudo umount /dev/sdb1 Step 3: Assess damage sudo xfs_repair -n /dev/sdb1 Output shows: Phase 1 - find and verify superblock... superblock has bad magic number 0x0 at sector 64 Step 4: Attempt repair sudo xfs_repair -v /dev/sdb1 Output shows: Phase 1 - find and verify superblock... attempting to find secondary superblock... found candidate secondary superblock... Step 5: Verify repair sudo xfs_repair -n /dev/sdb1 Output shows: no inconsistencies found Step 6: Test mount sudo mount /dev/sdb1 /mnt/recovered ls -la /mnt/recovered ``` Example 2: Log Corruption Recovery Scenario: Power failure during filesystem operations resulted in log corruption. ```bash Step 1: Initial check reveals log issues sudo xfs_repair -n /dev/sdc1 Output shows: log head/tail verification failed Step 2: Zero the log sudo xfs_repair -L /dev/sdc1 Output shows: Phase 1 - find and verify superblock... zeroing log... Step 3: Complete repair sudo xfs_repair -v /dev/sdc1 Output shows all phases completing successfully Step 4: Verify integrity sudo mount /dev/sdc1 /mnt/test sudo xfs_info /mnt/test ``` Example 3: Large Filesystem with Memory Constraints Scenario: Repairing a 10TB XFS filesystem on a system with limited RAM. ```bash Step 1: Check available memory free -h Step 2: Repair with memory limits sudo xfs_repair -m 1024 -v /dev/sdd1 Limits repair process to 1GB RAM usage Step 3: Monitor progress Repair will take longer but won't exhaust system memory Step 4: Verify completion sudo xfs_repair -n /dev/sdd1 ``` Troubleshooting Common Issues Issue 1: "Device is Busy" Error Problem: Cannot unmount filesystem for repair. Solution: ```bash Find processes using the filesystem sudo lsof +f -- /mount/point sudo fuser -v /mount/point Kill processes if safe to do so sudo fuser -km /mount/point Force unmount sudo umount -f /dev/sdX1 If still failing, use lazy unmount sudo umount -l /dev/sdX1 ``` Issue 2: "Bad Magic Number" in Superblock Problem: Primary superblock is severely corrupted. Solution: ```bash Find backup superblocks sudo xfs_db -r -c "sb" -c "print" /dev/sdX1 List all superblocks sudo xfs_db -r -c "sb 0" -c "sb 1" -c "sb 2" -c "print" /dev/sdX1 Use backup superblock for repair sudo xfs_repair -s 1 /dev/sdX1 ``` Issue 3: Repair Process Runs Out of Memory Problem: System becomes unresponsive during repair of large filesystem. Solution: ```bash Restart repair with memory limits sudo xfs_repair -m 512 /dev/sdX1 For extremely large filesystems sudo xfs_repair -m 256 /dev/sdX1 Monitor memory usage during repair watch -n 5 'free -h && ps aux | grep xfs_repair' ``` Issue 4: "Filesystem has Valuable Metadata Changes" Problem: Repair refuses to run due to recent changes in log. Solution: ```bash First, try to mount and unmount cleanly sudo mount /dev/sdX1 /mnt/temp sudo umount /mnt/temp If that fails, force log zeroing (data loss possible) sudo xfs_repair -L /dev/sdX1 Then run standard repair sudo xfs_repair -v /dev/sdX1 ``` Issue 5: Repair Completes but Filesystem Still Shows Errors Problem: After repair, mounting still fails or errors persist. Solution: ```bash Run multiple repair passes sudo xfs_repair -v /dev/sdX1 sudo xfs_repair -v /dev/sdX1 # Second pass Check for hardware issues sudo badblocks -v /dev/sdX1 Verify filesystem structure sudo xfs_db -r -c "check" /dev/sdX1 If errors persist, consider data recovery tools sudo xfs_metadump /dev/sdX1 metadata_dump.img ``` Best Practices Pre-Repair Best Practices 1. Always Backup First ```bash # Create complete device backup if possible sudo dd if=/dev/sdX1 of=/backup/device_backup.img bs=64K status=progress ``` 2. Document Everything ```bash # Save all command outputs sudo xfs_repair -n /dev/sdX1 > repair_assessment.log 2>&1 sudo xfs_repair -v /dev/sdX1 > repair_process.log 2>&1 ``` 3. Check Hardware Health ```bash # Test for bad blocks sudo badblocks -v /dev/sdX1 # Check SMART status sudo smartctl -a /dev/sdX ``` During Repair Best Practices 1. Use Verbose Output ```bash # Always use -v flag for detailed information sudo xfs_repair -v /dev/sdX1 ``` 2. Monitor System Resources ```bash # Watch memory and CPU usage watch -n 5 'free -h && iostat 1 1' ``` 3. Be Patient - Large filesystems can take hours to repair - Don't interrupt the process unless absolutely necessary - Monitor progress through verbose output Post-Repair Best Practices 1. Thorough Verification ```bash # Multiple verification steps sudo xfs_repair -n /dev/sdX1 sudo mount -o ro /dev/sdX1 /mnt/test sudo xfs_info /mnt/test sudo umount /mnt/test ``` 2. Gradual Return to Service ```bash # Mount read-only first sudo mount -o ro /dev/sdX1 /mount/point # Test extensively before read-write sudo mount -o remount,rw /mount/point ``` 3. Monitor After Repair ```bash # Watch for recurring issues sudo dmesg -w | grep -i xfs # Regular filesystem checks sudo xfs_repair -n /dev/sdX1 # Weekly or monthly ``` Prevention Strategies Regular Maintenance 1. Scheduled Filesystem Checks ```bash # Add to crontab for monthly checks 0 2 1 /usr/sbin/xfs_repair -n /dev/sdX1 > /var/log/xfs_check.log 2>&1 ``` 2. Monitor Filesystem Health ```bash # Check filesystem statistics sudo xfs_info /mount/point # Monitor space usage df -h /mount/point # Check for fragmentation sudo xfs_db -r -c "frag" /dev/sdX1 ``` Hardware Considerations 1. UPS Systems: Protect against power failures 2. RAID Configuration: Implement appropriate redundancy 3. Regular Hardware Testing: Monitor SMART data, run memory tests 4. Quality Components: Use enterprise-grade storage devices System Configuration 1. Proper Mount Options ```bash # Example /etc/fstab entry with safety options /dev/sdX1 /mount/point xfs defaults,noatime,largeio,inode64 0 2 ``` 2. Kernel Parameters ```bash # Optimize for XFS echo 'vm.dirty_ratio = 5' >> /etc/sysctl.conf echo 'vm.dirty_background_ratio = 2' >> /etc/sysctl.conf ``` 3. Regular Backups ```bash # Automated backup script example #!/bin/bash rsync -avH /mount/point/ /backup/location/ xfsdump -f /backup/xfs_dump.img /mount/point ``` Conclusion Repairing XFS filesystems using the `xfs_repair` command is a critical skill for Linux system administrators and users working with XFS storage systems. This comprehensive guide has covered the complete process from initial assessment through successful recovery, including advanced techniques for handling complex corruption scenarios. Key Takeaways 1. Safety First: Always create backups and perform read-only checks before attempting repairs 2. Systematic Approach: Follow the structured repair process to maximize success chances 3. Understanding is Crucial: Knowing XFS architecture helps in making informed repair decisions 4. Prevention is Better: Implement proper monitoring and maintenance to prevent corruption 5. Documentation Matters: Keep detailed logs of all repair operations for future reference Next Steps After successfully repairing an XFS filesystem: 1. Implement Monitoring: Set up automated checks to detect future issues early 2. Review Backup Strategy: Ensure your backup procedures are adequate and tested 3. Hardware Assessment: Investigate and address any underlying hardware issues 4. Knowledge Building: Continue learning about XFS administration and advanced recovery techniques 5. Documentation Update: Update your disaster recovery procedures based on lessons learned When to Seek Professional Help Consider professional data recovery services when: - Multiple repair attempts fail - Critical data cannot be replaced - Hardware failure is suspected - Corruption affects multiple filesystems - Legal or compliance requirements mandate professional recovery Remember that `xfs_repair` is a powerful tool that can recover most XFS filesystem corruption scenarios when used correctly. However, it's not a substitute for proper backup strategies and preventive maintenance. Regular monitoring, quality hardware, and good system administration practices remain the best defense against filesystem corruption. By following the procedures and best practices outlined in this guide, you'll be well-equipped to handle XFS filesystem repairs confidently and effectively, minimizing downtime and data loss in critical situations.