How to check directory/file size → du

How to Check Directory/File Size → du The `du` (disk usage) command is one of the most essential tools for system administrators, developers, and Linux users who need to monitor and analyze disk space usage. Whether you're troubleshooting storage issues, optimizing system performance, or simply trying to understand where your disk space is being consumed, mastering the `du` command is crucial for effective system management. This comprehensive guide will walk you through everything you need to know about using the `du` command, from basic syntax to advanced usage scenarios, troubleshooting common issues, and implementing best practices for disk space monitoring. Table of Contents 1. [Introduction to the du Command](#introduction-to-the-du-command) 2. [Prerequisites and Requirements](#prerequisites-and-requirements) 3. [Basic du Command Syntax](#basic-du-command-syntax) 4. [Essential du Command Options](#essential-du-command-options) 5. [Practical Examples and Use Cases](#practical-examples-and-use-cases) 6. [Advanced du Usage Scenarios](#advanced-du-usage-scenarios) 7. [Combining du with Other Commands](#combining-du-with-other-commands) 8. [Common Issues and Troubleshooting](#common-issues-and-troubleshooting) 9. [Best Practices and Professional Tips](#best-practices-and-professional-tips) 10. [Performance Considerations](#performance-considerations) 11. [Alternative Tools and Comparisons](#alternative-tools-and-comparisons) 12. [Conclusion](#conclusion) Introduction to the du Command The `du` command, short for "disk usage," is a standard Unix and Linux utility that displays the amount of disk space used by files and directories. Unlike the `df` command which shows filesystem-level disk usage, `du` provides detailed information about individual files and directory structures, making it invaluable for identifying space-consuming files and directories. The command works by recursively traversing directory structures and calculating the total disk space occupied by each file and subdirectory. This makes it particularly useful for: - Identifying large files and directories consuming excessive disk space - Monitoring disk usage trends over time - Cleaning up storage by locating unnecessary files - Analyzing directory structures for optimization - Troubleshooting disk space issues - Generating reports for system auditing Prerequisites and Requirements Before diving into the `du` command usage, ensure you have the following: System Requirements - Unix-like operating system (Linux, macOS, BSD, etc.) - Terminal or command-line access - Basic understanding of file system navigation - Appropriate permissions to access target directories Knowledge Prerequisites - Familiarity with command-line interface - Understanding of file system hierarchy - Basic knowledge of file permissions - Comfort with terminal navigation commands (`cd`, `ls`, `pwd`) Tools and Access - Terminal emulator or SSH access - Text editor (for creating scripts) - Administrative privileges (for system directories) Basic du Command Syntax The fundamental syntax of the `du` command is straightforward: ```bash du [OPTIONS] [FILE/DIRECTORY] ``` Simple Usage Examples ```bash Check current directory size du Check specific directory size du /home/user/documents Check multiple directories du /var/log /tmp /home ``` Default Behavior When executed without options, `du` displays: - Disk usage for each subdirectory - Values in 1024-byte blocks (kilobytes) - Recursive traversal of all subdirectories - Total usage at the end ```bash $ du /home/user/projects 4 /home/user/projects/project1/src 8 /home/user/projects/project1 12 /home/user/projects/project2 20 /home/user/projects ``` Essential du Command Options Understanding the key options available with `du` is crucial for effective usage. Here are the most important flags and their applications: Human-Readable Format (-h) The `-h` option displays sizes in human-readable format using appropriate units (K, M, G, T): ```bash Human-readable output du -h /var/log 156K /var/log/apache2 2.3M /var/log/mysql 45M /var/log/system 47M /var/log ``` Summary Mode (-s) Use `-s` to display only the total size without showing subdirectory details: ```bash Show only total size du -sh /home/user 2.5G /home/user Multiple directories summary du -sh /var/log /tmp /home 47M /var/log 156K /tmp 15G /home ``` Maximum Depth (-d) Control how deep the directory traversal goes with the `-d` option: ```bash Show only immediate subdirectories du -h -d 1 /home/user 500M /home/user/documents 1.2G /home/user/downloads 800M /home/user/projects 2.5G /home/user Limit to 2 levels deep du -h -d 2 /var ``` Show All Files (-a) Include individual files in the output, not just directories: ```bash Show all files and directories du -ah /home/user/documents 4.0K /home/user/documents/readme.txt 156K /home/user/documents/report.pdf 2.3M /home/user/documents/presentation.pptx 2.5M /home/user/documents ``` Exclude Patterns (--exclude) Exclude specific files or directories from the calculation: ```bash Exclude specific patterns du -h --exclude="*.log" /var/log du -h --exclude="node_modules" /home/user/projects du -h --exclude="*.tmp" --exclude="cache" /home/user ``` Practical Examples and Use Cases Finding the Largest Directories Identify the top space-consuming directories in your system: ```bash Find top 10 largest directories in /home du -h /home | sort -hr | head -10 Find largest directories in current location du -h -d 1 . | sort -hr More comprehensive analysis du -h /var | sort -hr | head -20 ``` Monitoring Specific File Types Analyze disk usage for specific file types: ```bash Find all large image files find /home -name ".jpg" -o -name ".png" -o -name "*.gif" | xargs du -ch Check video file sizes find /home -name ".mp4" -o -name ".avi" -o -name "*.mkv" | xargs du -ch | tail -1 ``` System Cleanup Analysis Identify potential cleanup targets: ```bash Check temporary directories du -sh /tmp /var/tmp ~/.cache Analyze log file usage du -sh /var/log/* Check package cache (Ubuntu/Debian) du -sh /var/cache/apt/archives ``` Project Directory Analysis For developers managing multiple projects: ```bash Analyze project directories du -h -d 2 ~/projects | sort -hr Exclude common build artifacts du -h --exclude="node_modules" --exclude="target" --exclude=".git" ~/projects Compare project sizes for dir in ~/projects/*/; do echo "$(du -sh --exclude=node_modules "$dir" | cut -f1) - $(basename "$dir")" done ``` Database and Application Monitoring Monitor application-specific disk usage: ```bash Check database sizes du -sh /var/lib/mysql/* Web server content analysis du -h -d 2 /var/www Application log monitoring du -sh /var/log/apache2/* | sort -hr ``` Advanced du Usage Scenarios Time-Based Analysis Combine `du` with time-based filtering for temporal analysis: ```bash Files modified in last 7 days find /home/user -mtime -7 -type f | xargs du -ch | tail -1 Large files older than 30 days find /var/log -mtime +30 -type f | xargs du -h | sort -hr ``` Cross-Filesystem Handling Control how `du` handles different filesystems: ```bash Stay within single filesystem du -x -h /home Include all mounted filesystems du -h / ``` Scripting and Automation Create automated disk monitoring scripts: ```bash #!/bin/bash disk_monitor.sh - Monitor directory sizes THRESHOLD=1000000 # 1GB in KB DIRECTORIES=("/home" "/var/log" "/tmp") for dir in "${DIRECTORIES[@]}"; do size=$(du -s "$dir" | cut -f1) if [ "$size" -gt "$THRESHOLD" ]; then echo "WARNING: $dir is using $(du -sh "$dir" | cut -f1)" du -h -d 1 "$dir" | sort -hr | head -5 fi done ``` Network and Remote Usage Use `du` with remote systems: ```bash SSH remote disk usage check ssh user@remote-server "du -sh /var/log" Remote directory analysis ssh user@server "du -h /home | sort -hr | head -10" ``` Combining du with Other Commands Integration with find Powerful combinations for targeted analysis: ```bash Find and size large files find /home -size +100M -type f -exec du -h {} \; | sort -hr Files larger than 1GB modified recently find / -size +1G -mtime -30 -type f -exec du -h {} \; ``` Piping to sort and head/tail Organize output for better analysis: ```bash Top 20 largest directories du -h /var | sort -hr | head -20 Smallest directories (excluding zero-size) du -h /etc | sort -h | grep -v "^0" | head -10 Middle-range sizes du -h /usr/share | sort -hr | tail -n +11 | head -10 ``` Using with grep for Filtering Filter results based on patterns: ```bash Only show directories with specific patterns du -h /var/log | grep -E "(error|access|debug)" Exclude certain patterns from output du -h /home | grep -v -E "(cache|tmp|\.git)" ``` Integration with awk for Processing Process and format output: ```bash Show only sizes above 100MB du -h /home | awk '$1 ~ /[0-9]+[GM]/ && $1+0 > 100' Format output with custom messages du -sh /var/log/* | awk '{print "Directory " $2 " uses " $1 " of space"}' ``` Common Issues and Troubleshooting Permission Denied Errors When encountering permission issues: ```bash Problem: Permission denied errors du: cannot read directory '/root': Permission denied Solution: Use sudo for system directories sudo du -sh /root Alternative: Redirect errors to suppress them du -sh /home 2>/dev/null Better: Show errors but continue processing du -sh /var 2>&1 | grep -v "Permission denied" ``` Symbolic Link Handling Understanding how `du` handles symbolic links: ```bash By default, du doesn't follow symbolic links du -sh /usr/bin # Won't follow symlinks Follow symbolic links with -L du -shL /usr/bin # Follows symlinks Count each symlink as its own size with -P (default) du -shP /usr/bin ``` Large Directory Performance Optimizing performance for large directories: ```bash Problem: du takes too long on large directories Solution: Limit depth and use parallel processing Limit traversal depth du -h -d 2 /usr | sort -hr Use parallel processing for multiple directories echo -e "/var\n/usr\n/home" | xargs -I {} -P 3 du -sh {} Background processing for large scans nohup du -h /large-directory > du-results.txt 2>&1 & ``` Disk Space Discrepancies Resolving differences between `du` and `df`: ```bash Check both du and df results df -h /home du -sh /home Reasons for discrepancies: 1. Open deleted files (use lsof to check) lsof | grep deleted 2. Hard links (du counts each link) find /home -links +1 -type f 3. Sparse files (du shows actual usage) du --apparent-size -h file.sparse du -h file.sparse ``` Memory Usage Issues Managing memory consumption during large scans: ```bash Problem: du uses too much memory Solution: Process directories individually Instead of: du -h / Use: for dir in /*; do [ -d "$dir" ] && du -sh "$dir" done ``` Best Practices and Professional Tips Regular Monitoring Strategies Implement systematic disk monitoring: ```bash Create daily disk usage reports #!/bin/bash daily_disk_report.sh DATE=$(date +%Y-%m-%d) REPORT_DIR="/var/log/disk-reports" mkdir -p "$REPORT_DIR" { echo "Disk Usage Report - $DATE" echo "==========================" echo echo "Top 20 Largest Directories:" du -h /home | sort -hr | head -20 echo echo "System Directory Usage:" du -sh /var/log /tmp /var/cache /usr/share } > "$REPORT_DIR/disk-usage-$DATE.txt" ``` Efficient Directory Scanning Optimize scanning strategies: ```bash Use appropriate depth limits du -h -d 3 /usr # Usually sufficient for analysis Exclude unnecessary directories EXCLUDE_OPTS="--exclude=.git --exclude=node_modules --exclude=.cache" du -h $EXCLUDE_OPTS /home/user/projects Batch processing for multiple targets DIRS=("/var/log" "/tmp" "/var/cache") printf "%s\n" "${DIRS[@]}" | xargs -I {} du -sh {} ``` Creating Useful Aliases Set up convenient aliases: ```bash Add to ~/.bashrc or ~/.zshrc alias duh='du -h -d 1 | sort -hr' alias dus='du -sh' alias dutop='du -h | sort -hr | head -20' alias dudir='du -h -d 1' Function for interactive directory analysis analyze_dir() { local dir=${1:-.} echo "Analyzing directory: $dir" echo "Total size: $(du -sh "$dir" | cut -f1)" echo "Largest subdirectories:" du -h -d 1 "$dir" | sort -hr | head -10 } ``` Documentation and Reporting Maintain proper documentation: ```bash Generate comprehensive reports generate_disk_report() { local output_file="disk-report-$(date +%Y%m%d).txt" { echo "=== DISK USAGE ANALYSIS REPORT ===" echo "Generated: $(date)" echo "Hostname: $(hostname)" echo echo "=== FILESYSTEM OVERVIEW ===" df -h echo echo "=== TOP 20 LARGEST DIRECTORIES ===" du -h / 2>/dev/null | sort -hr | head -20 echo echo "=== SYSTEM DIRECTORIES ===" for dir in /var/log /tmp /var/cache /usr/share; do [ -d "$dir" ] && echo "$dir: $(du -sh "$dir" 2>/dev/null | cut -f1)" done } > "$output_file" echo "Report generated: $output_file" } ``` Performance Considerations Optimizing Large Scans Strategies for handling large directory structures: ```bash Use ionice to reduce I/O impact ionice -c 3 du -sh /large-directory Combine with nice for CPU priority nice -n 19 ionice -c 3 du -sh /massive-dataset Parallel processing for independent directories parallel "du -sh {} 2>/dev/null" ::: /var/* | sort -hr ``` Memory-Efficient Approaches Minimize memory usage during scans: ```bash Process directories incrementally find /large-dir -maxdepth 1 -type d | while read dir; do size=$(du -sh "$dir" 2>/dev/null | cut -f1) echo "$size $dir" done | sort -hr Use streaming approach for very large datasets find /huge-directory -type f -printf "%s %p\n" | \ awk '{size+=$1; files++} END {print "Total:", size/1024/1024 "MB in", files, "files"}' ``` Alternative Tools and Comparisons Comparison with Other Disk Usage Tools Understanding when to use different tools: ```bash du - Detailed directory analysis du -h -d 2 /home df - Filesystem-level usage df -h ncdu - Interactive disk usage analyzer ncdu /home tree - Directory structure with sizes tree -h -L 2 /home/user ls - File listing with sizes ls -lah /home/user/ ``` Modern Alternatives Explore enhanced tools: ```bash dust - Modern du replacement dust /home duf - Better df alternative duf gdu - Fast disk usage analyzer with TUI gdu /home baobab - Graphical disk usage analyzer (GUI) baobab ``` Conclusion The `du` command is an indispensable tool for effective disk space management in Unix and Linux environments. Throughout this comprehensive guide, we've explored everything from basic syntax to advanced usage scenarios, troubleshooting techniques, and best practices. Key Takeaways 1. Master the Essential Options: The `-h`, `-s`, `-d`, and `-a` options cover most common use cases 2. Combine with Other Tools: Integrate `du` with `sort`, `grep`, `find`, and other utilities for powerful analysis 3. Handle Permissions Properly: Use appropriate privileges and error handling for system directories 4. Optimize for Performance: Consider depth limits, exclusions, and parallel processing for large datasets 5. Automate Monitoring: Create scripts and reports for regular disk usage tracking 6. Understand Limitations: Know when to use alternative tools for specific scenarios Next Steps To further enhance your disk management skills: 1. Practice Regular Monitoring: Implement automated disk usage reporting in your systems 2. Explore Advanced Scripting: Create custom tools combining `du` with other utilities 3. Learn Complementary Tools: Familiarize yourself with `ncdu`, `dust`, and other modern alternatives 4. Develop Cleanup Strategies: Use `du` insights to implement effective disk cleanup procedures 5. Monitor Performance Impact: Understand how disk usage affects system performance Final Recommendations - Always test commands in non-production environments first - Keep regular backups before performing large cleanup operations - Document your disk monitoring procedures for team consistency - Stay updated with new tools and techniques in the ecosystem - Consider implementing automated alerting for disk usage thresholds By mastering the `du` command and following these best practices, you'll be well-equipped to handle disk space management challenges effectively, whether you're a system administrator, developer, or power user working with Unix-like systems.