How to recover Linux from disaster
How to Recover Linux from Disaster
Linux systems, while robust and reliable, are not immune to disasters. Whether facing hardware failures, corrupted filesystems, accidental deletions, or system crashes, knowing how to effectively recover your Linux environment is crucial for system administrators and users alike. This comprehensive guide will walk you through various disaster recovery scenarios, providing step-by-step instructions, practical examples, and professional insights to help you restore your Linux system to full functionality.
Table of Contents
1. [Understanding Linux Disaster Scenarios](#understanding-linux-disaster-scenarios)
2. [Prerequisites and Preparation](#prerequisites-and-preparation)
3. [Boot Recovery Methods](#boot-recovery-methods)
4. [Filesystem Recovery Techniques](#filesystem-recovery-techniques)
5. [Data Recovery Strategies](#data-recovery-strategies)
6. [System Configuration Recovery](#system-configuration-recovery)
7. [Network and Service Recovery](#network-and-service-recovery)
8. [Advanced Recovery Techniques](#advanced-recovery-techniques)
9. [Troubleshooting Common Issues](#troubleshooting-common-issues)
10. [Best Practices and Prevention](#best-practices-and-prevention)
Understanding Linux Disaster Scenarios
Before diving into recovery procedures, it's essential to understand the types of disasters that can affect Linux systems:
Hardware-Related Disasters
- Hard drive failures: Complete disk failure or bad sectors
- Memory corruption: RAM issues causing system instability
- Power supply problems: Sudden shutdowns leading to filesystem corruption
- Motherboard failures: Complete system hardware breakdown
Software-Related Disasters
- Bootloader corruption: GRUB or other bootloader issues preventing system startup
- Kernel panics: System crashes due to kernel-level problems
- Filesystem corruption: Damaged file structures preventing normal operation
- Configuration file corruption: Critical system files becoming unreadable
Human Error Disasters
- Accidental deletions: Removing critical system files or user data
- Incorrect permissions: Changing file permissions that break system functionality
- Misconfigured services: Breaking essential system services through poor configuration
Security-Related Disasters
- Malware infections: Viruses or rootkits compromising system integrity
- Unauthorized access: System compromise through security breaches
- Data breaches: Sensitive information exposure requiring system recovery
Prerequisites and Preparation
Essential Tools and Resources
Before attempting any recovery procedure, ensure you have access to:
Recovery Media
```bash
Create a bootable USB drive with a Linux distribution
sudo dd if=ubuntu-20.04.3-desktop-amd64.iso of=/dev/sdX bs=4M status=progress
sync
```
Backup Verification
```bash
Verify backup integrity before disaster strikes
tar -tzf backup.tar.gz > /dev/null
echo $? # Should return 0 for successful verification
```
Network Access
Ensure you have alternative internet connectivity for downloading tools and accessing documentation during recovery.
Documentation
Maintain offline copies of:
- System configuration details
- Network settings
- User account information
- Installed package lists
- Custom application configurations
Pre-Disaster Preparation
System Information Gathering
```bash
Document system information
uname -a > system_info.txt
lscpu >> system_info.txt
lsblk >> system_info.txt
df -h >> system_info.txt
mount >> system_info.txt
```
Package List Backup
```bash
Debian/Ubuntu systems
dpkg --get-selections > package_list.txt
Red Hat/CentOS/Fedora systems
rpm -qa > package_list.txt
Arch Linux systems
pacman -Qqe > package_list.txt
```
Boot Recovery Methods
GRUB Recovery
When your system won't boot due to GRUB issues, follow these steps:
Method 1: GRUB Rescue Mode
```bash
If you see grub rescue> prompt
grub rescue> ls
grub rescue> set root=(hd0,1)
grub rescue> set prefix=(hd0,1)/boot/grub
grub rescue> insmod normal
grub rescue> normal
```
Method 2: Live CD GRUB Reinstallation
```bash
Boot from live CD/USB
sudo mount /dev/sda1 /mnt
sudo mount --bind /dev /mnt/dev
sudo mount --bind /proc /mnt/proc
sudo mount --bind /sys /mnt/sys
sudo chroot /mnt
Reinstall GRUB
grub-install /dev/sda
update-grub
exit
Unmount and reboot
sudo umount /mnt/sys
sudo umount /mnt/proc
sudo umount /mnt/dev
sudo umount /mnt
```
Single User Mode Recovery
Access single-user mode for system repairs:
```bash
At GRUB menu, edit kernel line and add:
single
or
init=/bin/bash
Once in single-user mode:
mount -o remount,rw /
Perform necessary repairs
Reboot when complete
```
SystemD Emergency Mode
For systemd-based systems:
```bash
Add to kernel parameters:
systemd.unit=emergency.target
Or use rescue mode:
systemd.unit=rescue.target
```
Filesystem Recovery Techniques
Filesystem Check and Repair
ext2/ext3/ext4 Filesystems
```bash
Unmount the filesystem first
sudo umount /dev/sda1
Check and repair automatically
sudo fsck -y /dev/sda1
For more serious corruption:
sudo e2fsck -f -y /dev/sda1
Force check even if filesystem appears clean:
sudo e2fsck -f /dev/sda1
```
XFS Filesystem Recovery
```bash
XFS repair (filesystem must be unmounted)
sudo umount /dev/sda1
sudo xfs_repair /dev/sda1
For more aggressive repair:
sudo xfs_repair -L /dev/sda1 # Zeroes log, use as last resort
```
Btrfs Filesystem Recovery
```bash
Check Btrfs filesystem
sudo btrfs check /dev/sda1
Repair Btrfs (dangerous, backup first)
sudo btrfs check --repair /dev/sda1
Scrub for error detection and repair
sudo btrfs scrub start /mount/point
sudo btrfs scrub status /mount/point
```
Advanced Filesystem Recovery
Using TestDisk for Partition Recovery
```bash
Install TestDisk
sudo apt-get install testdisk
Run TestDisk
sudo testdisk
Follow interactive prompts to:
1. Select disk
2. Choose partition table type
3. Analyze partition structure
4. Search for lost partitions
5. Write partition table
```
Recovering Deleted Files with PhotoRec
```bash
PhotoRec comes with TestDisk
sudo photorec
Select:
1. Physical disk
2. Partition type
3. File types to recover
4. Destination directory
5. Start recovery process
```
Data Recovery Strategies
Using ddrescue for Disk Imaging
Create a bit-by-bit copy of a failing drive:
```bash
Install ddrescue
sudo apt-get install gddrescue
Create disk image with error handling
sudo ddrescue -d -r3 /dev/sda /path/to/recovery/disk_image.img /path/to/recovery/logfile.log
Mount the recovered image
sudo losetup /dev/loop0 /path/to/recovery/disk_image.img
sudo mount /dev/loop0 /mnt/recovered
```
File Recovery with Extundelete
Recover deleted files from ext3/ext4 filesystems:
```bash
Install extundelete
sudo apt-get install extundelete
Unmount the filesystem
sudo umount /dev/sda1
Recover all deleted files
sudo extundelete /dev/sda1 --restore-all
Recover specific file
sudo extundelete /dev/sda1 --restore-file /path/to/deleted/file.txt
Recover files deleted after specific date
sudo extundelete /dev/sda1 --restore-files --after $(date -d "2023-01-01" +%s)
```
Database Recovery
MySQL/MariaDB Recovery
```bash
Start MySQL in recovery mode
sudo mysqld_safe --skip-grant-tables --skip-networking &
Connect and reset if needed
mysql -u root
USE mysql;
UPDATE user SET authentication_string = PASSWORD('newpassword') WHERE User = 'root';
FLUSH PRIVILEGES;
Repair tables
mysqlcheck --repair --all-databases
InnoDB recovery
sudo systemctl stop mysql
Edit /etc/mysql/my.cnf and add:
innodb_force_recovery = 1
sudo systemctl start mysql
Export data and recreate database
```
PostgreSQL Recovery
```bash
Single-user mode recovery
sudo -u postgres postgres --single -D /var/lib/postgresql/data
VACUUM and REINDEX in single-user mode
VACUUM FULL;
REINDEX DATABASE your_database;
WAL recovery
sudo -u postgres pg_resetwal /var/lib/postgresql/data
```
System Configuration Recovery
Restoring from Configuration Backups
Using etckeeper for /etc Recovery
```bash
If etckeeper was previously set up
cd /etc
sudo git log --oneline # View configuration history
sudo git checkout HEAD~5 # Restore to 5 commits ago
sudo git checkout master # Return to current state
```
Manual Configuration Restoration
```bash
Restore network configuration
sudo cp /backup/etc/network/interfaces /etc/network/interfaces
sudo systemctl restart networking
Restore user accounts
sudo cp /backup/etc/passwd /etc/passwd
sudo cp /backup/etc/shadow /etc/shadow
sudo cp /backup/etc/group /etc/group
Restore SSH configuration
sudo cp /backup/etc/ssh/sshd_config /etc/ssh/sshd_config
sudo systemctl restart sshd
```
Service Configuration Recovery
SystemD Service Recovery
```bash
Restore service files
sudo cp /backup/etc/systemd/system/* /etc/systemd/system/
sudo systemctl daemon-reload
Re-enable services
sudo systemctl enable apache2
sudo systemctl enable mysql
sudo systemctl enable ssh
Check service status
sudo systemctl status --all
```
Crontab Recovery
```bash
Restore system crontab
sudo cp /backup/etc/crontab /etc/crontab
Restore user crontabs
sudo cp /backup/var/spool/cron/crontabs/* /var/spool/cron/crontabs/
sudo systemctl restart cron
```
Network and Service Recovery
Network Configuration Recovery
Static IP Configuration
```bash
Ubuntu/Debian - /etc/network/interfaces
auto eth0
iface eth0 inet static
address 192.168.1.100
netmask 255.255.255.0
gateway 192.168.1.1
dns-nameservers 8.8.8.8 8.8.4.4
Restart networking
sudo systemctl restart networking
```
NetworkManager Recovery
```bash
Restart NetworkManager
sudo systemctl restart NetworkManager
Reset network configuration
sudo rm /etc/NetworkManager/system-connections/*
sudo systemctl restart NetworkManager
Reconfigure network through nmtui
sudo nmtui
```
DNS Resolution Recovery
```bash
Restore /etc/resolv.conf
echo "nameserver 8.8.8.8" | sudo tee /etc/resolv.conf
echo "nameserver 8.8.4.4" | sudo tee -a /etc/resolv.conf
Test DNS resolution
nslookup google.com
dig google.com
```
Advanced Recovery Techniques
LVM Recovery
Recovering LVM Metadata
```bash
Scan for LVM volumes
sudo pvscan
sudo vgscan
sudo lvscan
Activate volume groups
sudo vgchange -ay
Restore LVM metadata from backup
sudo vgcfgrestore volume_group_name
Mount recovered logical volumes
sudo mount /dev/volume_group/logical_volume /mnt/recovery
```
RAID Recovery
Software RAID Recovery
```bash
Check RAID status
cat /proc/mdstat
Stop failed RAID
sudo mdadm --stop /dev/md0
Reassemble RAID
sudo mdadm --assemble --scan
Force assembly with missing drives
sudo mdadm --assemble --force /dev/md0 /dev/sda1 /dev/sdb1
Add replacement drive
sudo mdadm --add /dev/md0 /dev/sdc1
```
Encrypted Filesystem Recovery
LUKS Recovery
```bash
Check LUKS header
sudo cryptsetup luksDump /dev/sda1
Repair LUKS header
sudo cryptsetup repair /dev/sda1
Open encrypted volume
sudo cryptsetup luksOpen /dev/sda1 encrypted_volume
Mount decrypted filesystem
sudo mount /dev/mapper/encrypted_volume /mnt/encrypted
```
Troubleshooting Common Issues
Boot Issues
Kernel Panic Resolution
```bash
Boot with previous kernel version from GRUB menu
Or add kernel parameters:
nomodeset acpi=off
Check system logs after boot
journalctl -b -1 # Previous boot
dmesg | less # Current boot messages
```
Initramfs Issues
```bash
Regenerate initramfs
sudo update-initramfs -u -k all
For specific kernel version
sudo update-initramfs -u -k 5.4.0-74-generic
```
Filesystem Issues
Read-Only Filesystem
```bash
Remount as read-write
sudo mount -o remount,rw /
Check for filesystem errors
sudo fsck -f /dev/sda1
```
Inode Exhaustion
```bash
Check inode usage
df -i
Find directories with many files
find /path -type d -exec bash -c 'echo -n "{}: "; ls -1 "{}" | wc -l' \;
Clean up unnecessary files
sudo find /tmp -type f -atime +7 -delete
sudo find /var/log -name "*.log" -type f -size +100M -delete
```
Permission Issues
Fixing Broken Permissions
```bash
Reset /etc permissions
sudo chmod 755 /etc
sudo chmod 644 /etc/passwd
sudo chmod 600 /etc/shadow
Reset home directory permissions
sudo chmod 755 /home/username
sudo chmod 700 /home/username/.ssh
sudo chmod 600 /home/username/.ssh/authorized_keys
```
Best Practices and Prevention
Backup Strategies
Automated System Backups
```bash
#!/bin/bash
Comprehensive backup script
BACKUP_DIR="/backup/$(date +%Y%m%d)"
mkdir -p "$BACKUP_DIR"
System configuration
tar -czf "$BACKUP_DIR/etc_backup.tar.gz" /etc
User data
tar -czf "$BACKUP_DIR/home_backup.tar.gz" /home
Database backup
mysqldump --all-databases > "$BACKUP_DIR/mysql_backup.sql"
Package list
dpkg --get-selections > "$BACKUP_DIR/package_list.txt"
System information
uname -a > "$BACKUP_DIR/system_info.txt"
lsblk >> "$BACKUP_DIR/system_info.txt"
```
Using rsync for Incremental Backups
```bash
Daily incremental backup
rsync -avz --delete /home/ backup_server:/backups/home/
rsync -avz --delete /etc/ backup_server:/backups/etc/
rsync -avz --delete /var/www/ backup_server:/backups/www/
```
Monitoring and Early Detection
System Health Monitoring
```bash
Install monitoring tools
sudo apt-get install smartmontools lm-sensors
Check disk health
sudo smartctl -a /dev/sda
Monitor system temperatures
sensors
Check system logs regularly
journalctl -p err -b
```
Automated Health Checks
```bash
#!/bin/bash
Daily health check script
LOG_FILE="/var/log/health_check.log"
echo "$(date): Starting health check" >> "$LOG_FILE"
Check disk space
df -h | awk '$5 > 90 {print "WARNING: " $0}' >> "$LOG_FILE"
Check memory usage
free -m | awk 'NR==2{printf "Memory Usage: %s/%sMB (%.2f%%)\n", $3,$2,$3*100/$2 }' >> "$LOG_FILE"
Check load average
uptime >> "$LOG_FILE"
Check for failed services
systemctl --failed >> "$LOG_FILE"
echo "$(date): Health check completed" >> "$LOG_FILE"
```
Documentation and Change Management
Maintain Recovery Documentation
- Keep updated network diagrams
- Document all system changes
- Maintain contact information for vendors
- Create step-by-step recovery procedures
- Test recovery procedures regularly
Version Control for Configurations
```bash
Initialize git repository for /etc
cd /etc
sudo git init
sudo git add .
sudo git commit -m "Initial configuration snapshot"
Create hooks for automatic commits
echo '#!/bin/bash
cd /etc && git add -A && git commit -m "Auto-commit $(date)"' | sudo tee /etc/cron.daily/etc-backup
sudo chmod +x /etc/cron.daily/etc-backup
```
Testing Recovery Procedures
Regular Recovery Drills
- Schedule monthly recovery tests
- Document test results and improvements
- Update procedures based on lessons learned
- Train team members on recovery procedures
Virtualized Testing Environment
```bash
Create VM snapshots before major changes
virsh snapshot-create-as domain_name snapshot_name "Pre-update snapshot"
Test recovery procedures in isolated environment
virsh restore domain_name snapshot_name
```
Conclusion
Linux disaster recovery requires preparation, knowledge, and the right tools. By understanding common disaster scenarios, maintaining proper backups, and following systematic recovery procedures, you can minimize downtime and data loss when disasters strike.
Key takeaways for effective Linux disaster recovery:
1. Prevention is Better Than Cure: Implement robust backup strategies and monitoring systems
2. Document Everything: Maintain detailed system documentation and recovery procedures
3. Test Regularly: Regularly test backup integrity and recovery procedures
4. Stay Calm and Systematic: Follow established procedures methodically during actual disasters
5. Learn and Improve: Document lessons learned from each incident to improve future recovery efforts
Remember that disaster recovery is an ongoing process, not a one-time setup. Regular maintenance of backup systems, testing of recovery procedures, and staying updated with the latest recovery tools and techniques are essential for maintaining a robust disaster recovery capability.
By implementing the strategies and techniques outlined in this guide, you'll be well-prepared to handle various Linux disaster scenarios and restore your systems to full functionality with minimal disruption to your operations.