How to see slowest units → systemd-analyze blame - Systemd Service Management Guide

How to See Slowest Units → systemd-analyze blame Table of Contents 1. [Introduction](#introduction) 2. [Prerequisites](#prerequisites) 3. [Understanding systemd-analyze blame](#understanding-systemd-analyze-blame) 4. [Basic Usage and Syntax](#basic-usage-and-syntax) 5. [Interpreting the Output](#interpreting-the-output) 6. [Advanced Usage Examples](#advanced-usage-examples) 7. [Related systemd-analyze Commands](#related-systemd-analyze-commands) 8. [Troubleshooting Common Issues](#troubleshooting-common-issues) 9. [Best Practices and Optimization Tips](#best-practices-and-optimization-tips) 10. [Real-World Use Cases](#real-world-use-cases) 11. [Conclusion](#conclusion) Introduction System administrators and Linux users frequently encounter situations where their systems take longer than expected to boot or start services. The `systemd-analyze blame` command is an invaluable tool for identifying which systemd units are consuming the most time during system initialization. This comprehensive guide will teach you how to effectively use this command to diagnose performance issues, optimize boot times, and maintain efficient system operations. The `systemd-analyze blame` command provides a detailed breakdown of service startup times, allowing you to pinpoint bottlenecks and make informed decisions about system optimization. Whether you're managing servers in a production environment or optimizing your personal Linux workstation, understanding how to leverage this tool is essential for maintaining peak system performance. Prerequisites Before diving into the specifics of `systemd-analyze blame`, ensure you have the following: System Requirements - A Linux system running systemd (most modern distributions) - Administrative privileges (root or sudo access) - Basic familiarity with command-line interface - Understanding of systemd concepts (units, services, targets) Knowledge Prerequisites - Basic Linux command-line skills - Understanding of system boot process - Familiarity with systemd service management - Basic knowledge of system performance concepts Verification Steps To verify your system supports `systemd-analyze`, run: ```bash Check if systemd is running systemctl --version Verify systemd-analyze is available which systemd-analyze Check if you have necessary permissions systemd-analyze --help ``` Understanding systemd-analyze blame What is systemd-analyze blame? The `systemd-analyze blame` command is part of the systemd suite of tools designed to analyze system and service manager performance. It specifically focuses on identifying which units took the longest time to initialize during the last system boot or service startup sequence. How It Works When systemd starts services during boot, it records timestamps for various stages of each unit's lifecycle. The `blame` subcommand processes this timing information and presents it in a human-readable format, sorted by initialization time in descending order. Key Features - Time-sorted output: Services listed from slowest to fastest - Precise timing: Millisecond-level accuracy - Unit identification: Clear service and unit names - Historical data: Based on the most recent boot cycle - No system impact: Read-only analysis with no performance overhead Basic Usage and Syntax Command Syntax The basic syntax for `systemd-analyze blame` is straightforward: ```bash systemd-analyze blame [OPTIONS] ``` Simple Usage Example ```bash Basic blame command sudo systemd-analyze blame ``` Sample Output ``` 8.123s NetworkManager-wait-online.service 2.945s mysql.service 2.234s apache2.service 1.876s postgresql.service 1.234s docker.service 987ms plymouth-quit-wait.service 654ms systemd-networkd-wait-online.service 432ms accounts-daemon.service 321ms gdm.service 234ms ssh.service 123ms systemd-logind.service 89ms dbus.service 45ms systemd-resolved.service 23ms systemd-timesyncd.service 12ms systemd-tmpfiles-setup.service ``` Understanding the Output Format Each line in the output follows this pattern: ``` [TIME] [UNIT_NAME] ``` Where: - TIME: Duration the unit took to start (seconds, milliseconds) - UNIT_NAME: The systemd unit identifier Interpreting the Output Time Units and Formatting The output displays time in the most appropriate unit: - Seconds (s): For times ≥ 1 second - Milliseconds (ms): For times < 1 second - Microseconds (μs): For very short times (rare in blame output) Identifying Problem Services Services to investigate typically include: - Network services: Often slow due to timeout periods - Database services: May have lengthy initialization procedures - Web servers: Can be slow if checking configurations - Custom applications: May have inefficient startup scripts Normal vs. Concerning Times Normal startup times: - System services: 10-500ms - Network services: 1-3 seconds - Database services: 1-5 seconds Concerning startup times: - Any service > 10 seconds - Multiple services > 5 seconds - Services that previously started faster Advanced Usage Examples Filtering and Processing Output Show Only Top 10 Slowest Services ```bash systemd-analyze blame | head -10 ``` Filter Services Taking More Than 1 Second ```bash systemd-analyze blame | grep -E "^[[:space:]]*[0-9]+\.[0-9]+s" ``` Search for Specific Service Types ```bash Find network-related services systemd-analyze blame | grep -i network Find database services systemd-analyze blame | grep -E "(mysql|postgres|mongodb|redis)" Find web server services systemd-analyze blame | grep -E "(apache|nginx|httpd)" ``` Combining with Other Commands Save Output for Analysis ```bash Save to file with timestamp systemd-analyze blame > boot_analysis_$(date +%Y%m%d_%H%M%S).txt Compare before and after optimization systemd-analyze blame > before_optimization.txt ... make changes ... sudo reboot systemd-analyze blame > after_optimization.txt diff before_optimization.txt after_optimization.txt ``` Monitor Changes Over Time ```bash Create a monitoring script #!/bin/bash echo "Boot Analysis - $(date)" >> boot_times.log systemd-analyze blame | head -5 >> boot_times.log echo "---" >> boot_times.log ``` Related systemd-analyze Commands systemd-analyze time Shows overall boot time breakdown: ```bash systemd-analyze time ``` Output example: ``` Startup finished in 2.345s (kernel) + 1.234s (initrd) + 15.678s (userspace) = 19.257s graphical.target reached after 15.234s in userspace ``` systemd-analyze critical-chain Shows the critical path of service dependencies: ```bash systemd-analyze critical-chain ``` systemd-analyze plot Generates an SVG timeline of the boot process: ```bash systemd-analyze plot > boot_timeline.svg ``` systemd-analyze dump Provides detailed information about all units: ```bash systemd-analyze dump > system_dump.txt ``` Troubleshooting Common Issues Permission Denied Errors Problem: Command fails with permission errors Solution: ```bash Use sudo for system-wide analysis sudo systemd-analyze blame Check if systemd is properly running systemctl status systemd-logind ``` No Data Available Problem: Command returns empty or minimal output Causes and Solutions: 1. System hasn't been rebooted recently: ```bash # Check last boot time systemd-analyze time # Reboot to generate fresh data sudo reboot ``` 2. Systemd logging disabled: ```bash # Check systemd configuration systemctl status systemd-journald # Ensure logging is enabled in /etc/systemd/system.conf grep -i "LogLevel" /etc/systemd/system.conf ``` Inconsistent Results Problem: Results vary significantly between runs Investigation Steps: ```bash Check for services with random timing systemd-analyze blame | head -10 Wait and check again sleep 60 systemd-analyze blame | head -10 Look for network-dependent services systemd-analyze critical-chain | grep -i network ``` Services Not Listed Problem: Expected services don't appear in output Possible Causes: - Service failed to start - Service is socket-activated - Service starts very quickly Investigation: ```bash Check service status systemctl status service_name Check all units, including failed ones systemctl list-units --failed Check socket-activated services systemctl list-sockets ``` Best Practices and Optimization Tips Regular Monitoring Establish Baseline Measurements ```bash Create baseline after fresh installation systemd-analyze blame > baseline_boot_times.txt systemd-analyze time > baseline_overall.txt Document system configuration uname -a >> baseline_system_info.txt systemctl list-enabled >> baseline_enabled_services.txt ``` Periodic Analysis ```bash Weekly boot time check script #!/bin/bash LOGFILE="/var/log/boot-performance.log" echo "=== Boot Analysis $(date) ===" >> $LOGFILE systemd-analyze time >> $LOGFILE echo "Top 10 slowest services:" >> $LOGFILE systemd-analyze blame | head -10 >> $LOGFILE echo "" >> $LOGFILE ``` Optimization Strategies Disable Unnecessary Services ```bash Identify services you don't need systemctl list-enabled Disable unwanted services sudo systemctl disable service_name sudo systemctl mask service_name # Prevent manual start ``` Optimize Network Services ```bash Reduce NetworkManager-wait-online timeout sudo systemctl edit NetworkManager-wait-online.service Add these lines: [Service] TimeoutStartSec=30 ``` Parallel Service Startup ```bash Edit service files to remove unnecessary dependencies sudo systemctl edit service_name Optimize After/Before directives [Unit] After=network.target # Instead of network-online.target ``` Service-Specific Optimizations Database Services ```bash MySQL optimization example sudo systemctl edit mysql.service [Service] Reduce innodb_buffer_pool_dump_at_startup time ExecStartPre=/usr/bin/mysql_optimize_startup.sh ``` Web Servers ```bash Apache optimization sudo systemctl edit apache2.service [Service] Pre-validate configuration ExecStartPre=/usr/sbin/apache2ctl configtest ``` Real-World Use Cases Case Study 1: Server Boot Optimization Scenario: Production server taking 45 seconds to boot Analysis: ```bash systemd-analyze blame | head -5 25.432s NetworkManager-wait-online.service 8.234s mysql.service 4.567s apache2.service 2.345s docker.service 1.234s ssh.service ``` Solution Applied: ```bash Reduce network wait timeout sudo systemctl edit NetworkManager-wait-online.service [Service] TimeoutStartSec=15 Optimize MySQL startup sudo systemctl edit mysql.service [Service] ExecStartPre=/opt/mysql_quick_start.sh Result: Boot time reduced to 18 seconds ``` Case Study 2: Desktop Performance Scenario: Linux desktop slow to reach graphical interface Analysis Process: ```bash Check overall timing systemd-analyze time Result: 23.456s to reach graphical.target Identify bottlenecks systemd-analyze blame | grep -E "(gdm|plymouth|graphics)" 5.678s plymouth-quit-wait.service 3.456s gdm.service 2.345s nvidia-persistenced.service Check critical path systemd-analyze critical-chain graphical.target ``` Optimization Steps: ```bash Disable plymouth if not needed sudo systemctl disable plymouth-quit-wait.service Optimize graphics drivers sudo systemctl edit nvidia-persistenced.service [Service] Type=forking TimeoutStartSec=10 ``` Case Study 3: Container Host Optimization Scenario: Docker host with slow container startup affecting boot time Investigation: ```bash systemd-analyze blame | grep -i docker 12.345s docker.service 3.456s docker-containerd.service Check what's causing Docker delays journalctl -u docker.service --since "last boot" ``` Resolution: ```bash Optimize Docker daemon startup sudo systemctl edit docker.service [Service] ExecStart= ExecStart=/usr/bin/dockerd --storage-driver=overlay2 --log-driver=journald Configure Docker to start after network is ready [Unit] After=network-online.target Wants=network-online.target ``` Advanced Analysis Techniques Correlation Analysis Comparing Multiple Boots ```bash #!/bin/bash Multi-boot analysis script for i in {1..5}; do echo "Boot $i - rebooting..." sudo reboot sleep 120 # Wait for system to stabilize echo "=== Boot $i Analysis ===" >> multi_boot_analysis.txt systemd-analyze blame | head -10 >> multi_boot_analysis.txt echo "" >> multi_boot_analysis.txt done ``` Service Dependency Analysis ```bash Find services that might benefit from parallelization systemd-analyze critical-chain | grep -A5 -B5 "slow_service.service" Check what services are waiting for slow ones systemctl list-dependencies --reverse slow_service.service ``` Performance Trending Historical Data Collection ```bash Advanced monitoring script #!/bin/bash DATE=$(date +%Y%m%d_%H%M%S) LOGDIR="/var/log/systemd-performance" mkdir -p $LOGDIR Collect comprehensive data systemd-analyze time > $LOGDIR/time_$DATE.txt systemd-analyze blame > $LOGDIR/blame_$DATE.txt systemd-analyze critical-chain > $LOGDIR/critical_$DATE.txt Generate summary report echo "Performance Summary - $DATE" > $LOGDIR/summary_$DATE.txt echo "Total boot time: $(systemd-analyze time | grep 'Startup finished')" >> $LOGDIR/summary_$DATE.txt echo "Slowest service: $(systemd-analyze blame | head -1)" >> $LOGDIR/summary_$DATE.txt ``` Integration with Monitoring Systems Automated Alerting ```bash #!/bin/bash Boot time monitoring with alerting THRESHOLD=30 # seconds BOOT_TIME=$(systemd-analyze time | grep -oP 'Startup finished in \K[0-9.]+(?=s)') if (( $(echo "$BOOT_TIME > $THRESHOLD" | bc -l) )); then echo "WARNING: Boot time ${BOOT_TIME}s exceeds threshold ${THRESHOLD}s" systemd-analyze blame | head -5 | mail -s "Slow boot detected" admin@company.com fi ``` Integration with Prometheus ```bash Export metrics for Prometheus #!/bin/bash METRICS_FILE="/var/lib/node_exporter/textfile_collector/boot_time.prom" Extract boot time BOOT_TIME=$(systemd-analyze time | grep -oP 'Startup finished in .*= \K[0-9.]+(?=s)') echo "system_boot_time_seconds $BOOT_TIME" > $METRICS_FILE Export top 5 service times systemd-analyze blame | head -5 | while read line; do TIME=$(echo $line | grep -oP '^[0-9.]+') SERVICE=$(echo $line | grep -oP '[a-zA-Z0-9.-]+\.service') echo "service_start_time_seconds{service=\"$SERVICE\"} $TIME" >> $METRICS_FILE done ``` Security Considerations Protecting Performance Data ```bash Secure log directory sudo mkdir -p /var/log/systemd-performance sudo chown root:adm /var/log/systemd-performance sudo chmod 750 /var/log/systemd-performance Rotate logs to prevent disk space issues sudo tee /etc/logrotate.d/systemd-performance << EOF /var/log/systemd-performance/*.txt { weekly rotate 4 compress delaycompress missingok notifempty create 644 root adm } EOF ``` Audit Trail ```bash Log all systemd-analyze commands sudo tee -a /etc/audit/rules.d/systemd-analyze.rules << EOF -w /usr/bin/systemd-analyze -p x -k systemd_analysis EOF sudo service auditd restart ``` Conclusion The `systemd-analyze blame` command is an essential tool for Linux system administrators and users who want to optimize their system's boot performance and identify service bottlenecks. Throughout this comprehensive guide, we've covered everything from basic usage to advanced optimization techniques and real-world applications. Key Takeaways 1. Regular Monitoring: Use `systemd-analyze blame` regularly to establish baselines and detect performance regressions early. 2. Systematic Approach: Always analyze the complete picture using multiple systemd-analyze commands rather than focusing solely on blame output. 3. Targeted Optimization: Focus optimization efforts on services that genuinely impact user experience or system functionality. 4. Documentation: Keep records of your analysis and optimizations to track improvements and avoid repeating unsuccessful changes. 5. Holistic View: Remember that boot time is just one aspect of system performance; balance optimization efforts with system stability and functionality. Next Steps After mastering `systemd-analyze blame`, consider exploring: - Advanced systemd configuration: Custom service files and targets - System profiling tools: perf, strace, and other performance analysis utilities - Automated monitoring: Integration with monitoring systems and alerting - Container optimization: Applying similar principles to containerized environments - Network performance: Analyzing and optimizing network-dependent services Final Recommendations 1. Start with the services consuming the most time, but always investigate why they're slow before making changes. 2. Test all optimizations in non-production environments first. 3. Document your baseline performance metrics before making any changes. 4. Remember that some services legitimately require time to start safely – not all slow services need optimization. 5. Consider the trade-offs between boot time and system functionality when making optimization decisions. By following the practices and techniques outlined in this guide, you'll be well-equipped to diagnose, analyze, and optimize your Linux systems for peak performance while maintaining stability and reliability.