How to repair XFS → xfs_repair /dev/..
How to Repair XFS Filesystems Using xfs_repair Command
Table of Contents
1. [Introduction](#introduction)
2. [Prerequisites](#prerequisites)
3. [Understanding XFS Filesystem](#understanding-xfs-filesystem)
4. [XFS Repair Command Overview](#xfs-repair-command-overview)
5. [Pre-Repair Assessment](#pre-repair-assessment)
6. [Step-by-Step XFS Repair Process](#step-by-step-xfs-repair-process)
7. [Advanced Repair Options](#advanced-repair-options)
8. [Practical Examples](#practical-examples)
9. [Troubleshooting Common Issues](#troubleshooting-common-issues)
10. [Best Practices](#best-practices)
11. [Prevention Strategies](#prevention-strategies)
12. [Conclusion](#conclusion)
Introduction
XFS (eXtended File System) is a high-performance journaling filesystem originally developed by Silicon Graphics Inc. (SGI) and widely used in enterprise Linux environments. Despite its robustness and advanced features, XFS filesystems can occasionally become corrupted due to hardware failures, power outages, or system crashes. When corruption occurs, the `xfs_repair` utility becomes an essential tool for recovering and restoring filesystem integrity.
This comprehensive guide will walk you through the complete process of diagnosing and repairing XFS filesystems using the `xfs_repair` command. You'll learn how to identify corruption, perform safe repairs, handle complex scenarios, and implement preventive measures to minimize future filesystem issues.
Prerequisites
Before proceeding with XFS filesystem repairs, ensure you have:
System Requirements
- Linux system with XFS utilities installed
- Root or sudo privileges
- Sufficient free disk space for backup operations
- Access to the affected storage device
Required Packages
Install the necessary XFS utilities on your system:
For Ubuntu/Debian systems:
```bash
sudo apt update
sudo apt install xfsprogs
```
For Red Hat/CentOS/Fedora systems:
```bash
sudo yum install xfsprogs
or for newer versions
sudo dnf install xfsprogs
```
Essential Knowledge
- Basic Linux command-line operations
- Understanding of filesystem concepts
- Familiarity with device naming conventions (/dev/sdX, /dev/nvmeX)
- Knowledge of backup and recovery procedures
Safety Preparations
- Critical: Always create a complete backup before attempting repairs
- Ensure the filesystem is unmounted
- Have a rescue system or live USB available
- Document current system configuration
Understanding XFS Filesystem
XFS Architecture
XFS employs several key components that make it robust but also require specific repair approaches:
- Superblocks: Contain filesystem metadata and configuration
- Allocation Groups (AGs): Divide the filesystem into manageable sections
- Journals: Record filesystem changes for crash recovery
- B+ Trees: Organize directory and extent information
Common Corruption Causes
Understanding why XFS corruption occurs helps in both repair and prevention:
1. Hardware Failures: Bad sectors, failing drives, memory errors
2. Power Issues: Sudden shutdowns, power fluctuations
3. System Crashes: Kernel panics, forced reboots
4. Storage Controller Problems: RAID controller failures, cable issues
5. Software Bugs: Kernel bugs, driver issues
XFS Repair Command Overview
Basic Syntax
```bash
xfs_repair [options] device
```
Essential Options
| Option | Description |
|--------|-------------|
| `-n` | No-modify mode (read-only check) |
| `-v` | Verbose output |
| `-f` | Force repair on dirty filesystem |
| `-L` | Force log zeroing |
| `-d` | Dangerous mode (use with extreme caution) |
| `-m` | Specify maximum memory usage |
| `-t` | Set timeout values |
Safety Features
The `xfs_repair` utility includes several safety mechanisms:
- Refuses to run on mounted filesystems
- Creates detailed logs of all operations
- Provides dry-run capabilities with `-n` option
- Validates repairs before committing changes
Pre-Repair Assessment
Step 1: Identify the Problem
Before running repairs, determine the extent of corruption:
```bash
Check system logs for filesystem errors
sudo dmesg | grep -i xfs
sudo journalctl | grep -i xfs
Look for specific error patterns
sudo grep -i "xfs\|corruption\|metadata" /var/log/messages
```
Step 2: Unmount the Filesystem
Ensure the filesystem is completely unmounted:
```bash
Check if filesystem is mounted
mount | grep /dev/sdX1
Unmount if necessary
sudo umount /dev/sdX1
Force unmount if regular unmount fails
sudo umount -f /dev/sdX1
Kill processes using the filesystem if needed
sudo fuser -km /mount/point
sudo umount /dev/sdX1
```
Step 3: Perform Read-Only Check
Always start with a non-destructive assessment:
```bash
Perform read-only check
sudo xfs_repair -n /dev/sdX1
Save output for analysis
sudo xfs_repair -n -v /dev/sdX1 > xfs_check_output.txt 2>&1
```
Step 4: Analyze Check Results
Review the output to understand corruption severity:
```bash
Look for key indicators
grep -i "error\|corrupt\|bad\|inconsistent" xfs_check_output.txt
Count total errors found
grep -c "ERROR" xfs_check_output.txt
```
Step-by-Step XFS Repair Process
Phase 1: Initial Safety Checks
Step 1: Verify Device Accessibility
```bash
Confirm device exists and is accessible
ls -la /dev/sdX1
Check device information
sudo fdisk -l /dev/sdX
Verify XFS filesystem
sudo file -s /dev/sdX1
```
Step 2: Create Emergency Backup
If possible, create a low-level backup:
```bash
Create device image (if space permits)
sudo dd if=/dev/sdX1 of=/backup/location/filesystem_backup.img bs=64K
Or create compressed backup
sudo dd if=/dev/sdX1 bs=64K | gzip > /backup/location/filesystem_backup.img.gz
```
Phase 2: Execute Repair Process
Step 1: Standard Repair Attempt
Begin with the most conservative repair approach:
```bash
Standard repair with verbose output
sudo xfs_repair -v /dev/sdX1
```
Step 2: Monitor Repair Progress
The repair process provides detailed progress information:
```bash
Typical output phases:
Phase 1 - find and verify superblock
Phase 2 - using internal log
Phase 3 - for each AG
Phase 4 - check for duplicate blocks
Phase 5 - rebuild AG headers and trees
Phase 6 - check inode connectivity
Phase 7 - verify and correct link counts
```
Step 3: Handle Log Issues
If log corruption is detected:
```bash
Zero the log (destroys recent changes)
sudo xfs_repair -L /dev/sdX1
This should be followed by standard repair
sudo xfs_repair -v /dev/sdX1
```
Phase 3: Post-Repair Verification
Step 1: Verify Repair Success
```bash
Perform final read-only check
sudo xfs_repair -n /dev/sdX1
Mount filesystem in read-only mode first
sudo mount -o ro /dev/sdX1 /mnt/test
Check filesystem statistics
sudo xfs_info /mnt/test
```
Step 2: Test Filesystem Functionality
```bash
Mount read-write
sudo mount -o rw /dev/sdX1 /mnt/test
Test basic operations
ls -la /mnt/test
touch /mnt/test/test_file
rm /mnt/test/test_file
Check filesystem health
sudo xfs_db -r -c "check" /dev/sdX1
```
Advanced Repair Options
Handling Severe Corruption
Using Dangerous Mode
Warning: Only use when standard repair fails and you accept data loss risk:
```bash
Enable dangerous mode repairs
sudo xfs_repair -d /dev/sdX1
```
Memory Management
For large filesystems or systems with limited RAM:
```bash
Limit memory usage (in megabytes)
sudo xfs_repair -m 512 /dev/sdX1
For very large filesystems
sudo xfs_repair -m 2048 -v /dev/sdX1
```
Superblock Recovery
When primary superblock is corrupted:
```bash
Find backup superblocks
sudo xfs_db -r -c "sb 1" -c "print" /dev/sdX1
Use specific superblock for repair
sudo xfs_repair -s 1 /dev/sdX1
```
Dealing with Specific Error Types
Inode Corruption
```bash
Check inode-specific issues
sudo xfs_db -r -c "inode " -c "print" /dev/sdX1
Repair with inode rebuilding
sudo xfs_repair -i /dev/sdX1
```
Directory Corruption
```bash
Rebuild directory structures
sudo xfs_repair -d -v /dev/sdX1
```
Practical Examples
Example 1: Standard Filesystem Corruption
Scenario: System crashed during heavy I/O operations, XFS filesystem shows corruption errors.
```bash
Step 1: Check system logs
sudo dmesg | tail -50 | grep -i xfs
Output shows: XFS (sdb1): metadata I/O error
Step 2: Unmount filesystem
sudo umount /dev/sdb1
Step 3: Assess damage
sudo xfs_repair -n /dev/sdb1
Output shows: Phase 1 - find and verify superblock...
superblock has bad magic number 0x0 at sector 64
Step 4: Attempt repair
sudo xfs_repair -v /dev/sdb1
Output shows: Phase 1 - find and verify superblock...
attempting to find secondary superblock...
found candidate secondary superblock...
Step 5: Verify repair
sudo xfs_repair -n /dev/sdb1
Output shows: no inconsistencies found
Step 6: Test mount
sudo mount /dev/sdb1 /mnt/recovered
ls -la /mnt/recovered
```
Example 2: Log Corruption Recovery
Scenario: Power failure during filesystem operations resulted in log corruption.
```bash
Step 1: Initial check reveals log issues
sudo xfs_repair -n /dev/sdc1
Output shows: log head/tail verification failed
Step 2: Zero the log
sudo xfs_repair -L /dev/sdc1
Output shows: Phase 1 - find and verify superblock...
zeroing log...
Step 3: Complete repair
sudo xfs_repair -v /dev/sdc1
Output shows all phases completing successfully
Step 4: Verify integrity
sudo mount /dev/sdc1 /mnt/test
sudo xfs_info /mnt/test
```
Example 3: Large Filesystem with Memory Constraints
Scenario: Repairing a 10TB XFS filesystem on a system with limited RAM.
```bash
Step 1: Check available memory
free -h
Step 2: Repair with memory limits
sudo xfs_repair -m 1024 -v /dev/sdd1
Limits repair process to 1GB RAM usage
Step 3: Monitor progress
Repair will take longer but won't exhaust system memory
Step 4: Verify completion
sudo xfs_repair -n /dev/sdd1
```
Troubleshooting Common Issues
Issue 1: "Device is Busy" Error
Problem: Cannot unmount filesystem for repair.
Solution:
```bash
Find processes using the filesystem
sudo lsof +f -- /mount/point
sudo fuser -v /mount/point
Kill processes if safe to do so
sudo fuser -km /mount/point
Force unmount
sudo umount -f /dev/sdX1
If still failing, use lazy unmount
sudo umount -l /dev/sdX1
```
Issue 2: "Bad Magic Number" in Superblock
Problem: Primary superblock is severely corrupted.
Solution:
```bash
Find backup superblocks
sudo xfs_db -r -c "sb" -c "print" /dev/sdX1
List all superblocks
sudo xfs_db -r -c "sb 0" -c "sb 1" -c "sb 2" -c "print" /dev/sdX1
Use backup superblock for repair
sudo xfs_repair -s 1 /dev/sdX1
```
Issue 3: Repair Process Runs Out of Memory
Problem: System becomes unresponsive during repair of large filesystem.
Solution:
```bash
Restart repair with memory limits
sudo xfs_repair -m 512 /dev/sdX1
For extremely large filesystems
sudo xfs_repair -m 256 /dev/sdX1
Monitor memory usage during repair
watch -n 5 'free -h && ps aux | grep xfs_repair'
```
Issue 4: "Filesystem has Valuable Metadata Changes"
Problem: Repair refuses to run due to recent changes in log.
Solution:
```bash
First, try to mount and unmount cleanly
sudo mount /dev/sdX1 /mnt/temp
sudo umount /mnt/temp
If that fails, force log zeroing (data loss possible)
sudo xfs_repair -L /dev/sdX1
Then run standard repair
sudo xfs_repair -v /dev/sdX1
```
Issue 5: Repair Completes but Filesystem Still Shows Errors
Problem: After repair, mounting still fails or errors persist.
Solution:
```bash
Run multiple repair passes
sudo xfs_repair -v /dev/sdX1
sudo xfs_repair -v /dev/sdX1 # Second pass
Check for hardware issues
sudo badblocks -v /dev/sdX1
Verify filesystem structure
sudo xfs_db -r -c "check" /dev/sdX1
If errors persist, consider data recovery tools
sudo xfs_metadump /dev/sdX1 metadata_dump.img
```
Best Practices
Pre-Repair Best Practices
1. Always Backup First
```bash
# Create complete device backup if possible
sudo dd if=/dev/sdX1 of=/backup/device_backup.img bs=64K status=progress
```
2. Document Everything
```bash
# Save all command outputs
sudo xfs_repair -n /dev/sdX1 > repair_assessment.log 2>&1
sudo xfs_repair -v /dev/sdX1 > repair_process.log 2>&1
```
3. Check Hardware Health
```bash
# Test for bad blocks
sudo badblocks -v /dev/sdX1
# Check SMART status
sudo smartctl -a /dev/sdX
```
During Repair Best Practices
1. Use Verbose Output
```bash
# Always use -v flag for detailed information
sudo xfs_repair -v /dev/sdX1
```
2. Monitor System Resources
```bash
# Watch memory and CPU usage
watch -n 5 'free -h && iostat 1 1'
```
3. Be Patient
- Large filesystems can take hours to repair
- Don't interrupt the process unless absolutely necessary
- Monitor progress through verbose output
Post-Repair Best Practices
1. Thorough Verification
```bash
# Multiple verification steps
sudo xfs_repair -n /dev/sdX1
sudo mount -o ro /dev/sdX1 /mnt/test
sudo xfs_info /mnt/test
sudo umount /mnt/test
```
2. Gradual Return to Service
```bash
# Mount read-only first
sudo mount -o ro /dev/sdX1 /mount/point
# Test extensively before read-write
sudo mount -o remount,rw /mount/point
```
3. Monitor After Repair
```bash
# Watch for recurring issues
sudo dmesg -w | grep -i xfs
# Regular filesystem checks
sudo xfs_repair -n /dev/sdX1 # Weekly or monthly
```
Prevention Strategies
Regular Maintenance
1. Scheduled Filesystem Checks
```bash
# Add to crontab for monthly checks
0 2 1 /usr/sbin/xfs_repair -n /dev/sdX1 > /var/log/xfs_check.log 2>&1
```
2. Monitor Filesystem Health
```bash
# Check filesystem statistics
sudo xfs_info /mount/point
# Monitor space usage
df -h /mount/point
# Check for fragmentation
sudo xfs_db -r -c "frag" /dev/sdX1
```
Hardware Considerations
1. UPS Systems: Protect against power failures
2. RAID Configuration: Implement appropriate redundancy
3. Regular Hardware Testing: Monitor SMART data, run memory tests
4. Quality Components: Use enterprise-grade storage devices
System Configuration
1. Proper Mount Options
```bash
# Example /etc/fstab entry with safety options
/dev/sdX1 /mount/point xfs defaults,noatime,largeio,inode64 0 2
```
2. Kernel Parameters
```bash
# Optimize for XFS
echo 'vm.dirty_ratio = 5' >> /etc/sysctl.conf
echo 'vm.dirty_background_ratio = 2' >> /etc/sysctl.conf
```
3. Regular Backups
```bash
# Automated backup script example
#!/bin/bash
rsync -avH /mount/point/ /backup/location/
xfsdump -f /backup/xfs_dump.img /mount/point
```
Conclusion
Repairing XFS filesystems using the `xfs_repair` command is a critical skill for Linux system administrators and users working with XFS storage systems. This comprehensive guide has covered the complete process from initial assessment through successful recovery, including advanced techniques for handling complex corruption scenarios.
Key Takeaways
1. Safety First: Always create backups and perform read-only checks before attempting repairs
2. Systematic Approach: Follow the structured repair process to maximize success chances
3. Understanding is Crucial: Knowing XFS architecture helps in making informed repair decisions
4. Prevention is Better: Implement proper monitoring and maintenance to prevent corruption
5. Documentation Matters: Keep detailed logs of all repair operations for future reference
Next Steps
After successfully repairing an XFS filesystem:
1. Implement Monitoring: Set up automated checks to detect future issues early
2. Review Backup Strategy: Ensure your backup procedures are adequate and tested
3. Hardware Assessment: Investigate and address any underlying hardware issues
4. Knowledge Building: Continue learning about XFS administration and advanced recovery techniques
5. Documentation Update: Update your disaster recovery procedures based on lessons learned
When to Seek Professional Help
Consider professional data recovery services when:
- Multiple repair attempts fail
- Critical data cannot be replaced
- Hardware failure is suspected
- Corruption affects multiple filesystems
- Legal or compliance requirements mandate professional recovery
Remember that `xfs_repair` is a powerful tool that can recover most XFS filesystem corruption scenarios when used correctly. However, it's not a substitute for proper backup strategies and preventive maintenance. Regular monitoring, quality hardware, and good system administration practices remain the best defense against filesystem corruption.
By following the procedures and best practices outlined in this guide, you'll be well-equipped to handle XFS filesystem repairs confidently and effectively, minimizing downtime and data loss in critical situations.