How to use vmstat to monitor Linux system

How to Use vmstat to Monitor Linux System Performance System monitoring is a critical aspect of Linux system administration, and understanding how your system performs under various conditions is essential for maintaining optimal performance. The `vmstat` command is one of the most powerful and versatile tools available for monitoring virtual memory statistics, system processes, and overall system performance in Linux environments. This comprehensive guide will walk you through everything you need to know about using `vmstat` effectively, from basic usage to advanced monitoring techniques. Whether you're a beginner system administrator or an experienced Linux professional, this article will provide you with the knowledge and practical skills needed to leverage `vmstat` for comprehensive system monitoring. What is vmstat? The `vmstat` (Virtual Memory Statistics) command is a built-in Linux utility that provides detailed information about system processes, memory usage, paging activity, block I/O operations, CPU activity, and kernel statistics. It's part of the procfs-ng package (formerly sysstat) and comes pre-installed on most Linux distributions. Unlike other monitoring tools that focus on specific aspects of system performance, `vmstat` provides a holistic view of your system's health, making it an invaluable tool for: - Identifying performance bottlenecks - Monitoring system resource utilization - Troubleshooting memory-related issues - Analyzing I/O performance - Tracking CPU usage patterns - Planning system capacity and upgrades Prerequisites and Requirements Before diving into `vmstat` usage, ensure you have the following prerequisites: System Requirements - Any Linux distribution (Ubuntu, CentOS, RHEL, Debian, etc.) - Terminal access with basic user privileges - No special installation required (vmstat is typically pre-installed) Knowledge Prerequisites - Basic understanding of Linux command line - Familiarity with system administration concepts - Understanding of memory management basics - Knowledge of CPU and I/O concepts Verification of vmstat Installation To verify that `vmstat` is available on your system, run: ```bash vmstat --version ``` If `vmstat` is not installed, you can install it using your distribution's package manager: Ubuntu/Debian: ```bash sudo apt-get update sudo apt-get install procps ``` CentOS/RHEL/Fedora: ```bash sudo yum install procps-ng or for newer versions sudo dnf install procps-ng ``` Understanding vmstat Output Format Before exploring various `vmstat` options, it's crucial to understand the standard output format. The basic `vmstat` command displays information in several columns grouped by category: ```bash vmstat ``` Sample Output: ``` procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 2 0 0 1847516 92708 1055648 0 0 45 12 180 350 5 2 92 1 0 ``` Column Explanations Process Information (procs) - r: Number of runnable processes (running or waiting for run time) - b: Number of processes in uninterruptible sleep (blocked) Memory Information (memory) - swpd: Amount of virtual memory used (KB) - free: Amount of idle memory (KB) - buff: Amount of memory used as buffers (KB) - cache: Amount of memory used as cache (KB) Swap Information (swap) - si: Amount of memory swapped in from disk per second (KB/s) - so: Amount of memory swapped out to disk per second (KB/s) I/O Information (io) - bi: Blocks received from a block device per second (blocks/s) - bo: Blocks sent to a block device per second (blocks/s) System Information (system) - in: Number of interrupts per second, including clock interrupts - cs: Number of context switches per second CPU Information (cpu) - us: Time spent running non-kernel code (user time, including nice time) - sy: Time spent running kernel code (system time) - id: Time spent idle - wa: Time spent waiting for I/O - st: Time stolen from a virtual machine Basic vmstat Usage Examples Single Snapshot The simplest way to use `vmstat` is to run it without any arguments: ```bash vmstat ``` This provides a snapshot of current system statistics since the last reboot. Continuous Monitoring To monitor system performance continuously, specify an interval in seconds: ```bash vmstat 2 ``` This command updates the display every 2 seconds. To stop the monitoring, press `Ctrl+C`. Limited Sample Collection To collect a specific number of samples: ```bash vmstat 2 5 ``` This collects 5 samples with a 2-second interval between each sample. Monitoring with Timestamps To include timestamps in your output: ```bash vmstat -t 2 ``` Sample Output: ``` procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- -----timestamp----- r b swpd free buff cache si so bi bo in cs us sy id wa st UTC 1 0 0 1825432 93156 1058764 0 0 45 12 181 352 5 2 92 1 0 2023-12-07 10:30:15 ``` Advanced vmstat Options and Features Memory Statistics in Different Units By default, `vmstat` displays memory information in kilobytes. You can change the units: ```bash vmstat -S m # Display in megabytes vmstat -S k # Display in kilobytes (default) ``` Active and Inactive Memory To view active and inactive memory statistics: ```bash vmstat -a ``` Sample Output: ``` procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- r b swpd free inact active si so bi bo in cs us sy id wa st 1 0 0 1820544 456789 1234567 0 0 45 12 182 354 5 2 92 1 0 ``` Disk Statistics To display disk I/O statistics: ```bash vmstat -d ``` This shows detailed disk activity for each disk device on your system. Partition Statistics To view statistics for specific partitions: ```bash vmstat -p /dev/sda1 ``` Fork Statistics To display the number of forks since boot: ```bash vmstat -f ``` Slab Information To view kernel slab allocator information: ```bash vmstat -m ``` Practical Monitoring Scenarios Scenario 1: Identifying Memory Pressure When investigating potential memory issues, monitor these key metrics: ```bash vmstat -t 1 10 ``` What to look for: - High values in the `swpd` column indicate swap usage - Increasing `si` and `so` values suggest active swapping - Low `free` memory combined with high swap activity indicates memory pressure Example Analysis: ``` procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- -----timestamp----- r b swpd free buff cache si so bi bo in cs us sy id wa st UTC 3 1 45678 12456 2345 89012 150 75 25 45 890 1245 25 15 45 15 0 2023-12-07 10:35:22 ``` In this example, the system shows signs of memory pressure with active swapping (`si=150`, `so=75`) and low free memory. Scenario 2: CPU Performance Analysis To analyze CPU performance patterns: ```bash vmstat 2 30 ``` Key metrics to monitor: - us (user): High values indicate CPU-intensive applications - sy (system): High values suggest kernel-level processing or system calls - wa (wait): High values indicate I/O bottlenecks - id (idle): Low values suggest high CPU utilization Performance Interpretation: - `us + sy > 80%`: High CPU utilization - `wa > 20%`: Potential I/O bottleneck - `r > number of CPUs`: CPU queue buildup Scenario 3: I/O Performance Monitoring For I/O-intensive applications: ```bash vmstat -d 5 ``` Combined with regular vmstat monitoring: ```bash vmstat 1 | awk '{print strftime("%Y-%m-%d %H:%M:%S"), $0}' ``` This adds timestamps to help correlate I/O patterns with specific times. Scenario 4: Long-term Performance Trending For long-term monitoring and logging: ```bash vmstat 60 > /var/log/vmstat_$(date +%Y%m%d).log & ``` This runs vmstat in the background, collecting samples every minute and logging to a dated file. Interpreting vmstat Results for Performance Tuning Memory Performance Indicators Healthy Memory System: - `free` memory > 10% of total RAM - Minimal or zero swap usage (`swpd`, `si`, `so`) - `buff` and `cache` showing reasonable values Memory Issues: - Consistently high swap usage - Frequent swap in/out activity - Very low free memory with high cache usage CPU Performance Indicators Balanced CPU Usage: - `us` + `sy` < 80% consistently - `wa` < 10% most of the time - `r` (run queue) < number of CPU cores CPU Bottlenecks: - Sustained high `us` or `sy` values - High `wa` values indicating I/O wait - Run queue (`r`) consistently higher than CPU count I/O Performance Indicators Good I/O Performance: - Reasonable `bi` and `bo` values for workload - Low `wa` (I/O wait) percentage - Minimal process blocking (`b` column) I/O Bottlenecks: - High `wa` values (>20%) - Many blocked processes (`b`) - Disproportionate `bi` or `bo` values Combining vmstat with Other Tools vmstat with sar Combine vmstat with `sar` for comprehensive monitoring: ```bash Terminal 1 vmstat 5 Terminal 2 sar -u 5 ``` vmstat with iostat For detailed I/O analysis: ```bash Terminal 1 vmstat 2 Terminal 2 iostat -x 2 ``` Creating Comprehensive Monitoring Scripts Here's a practical monitoring script that combines vmstat with other utilities: ```bash #!/bin/bash system_monitor.sh LOGFILE="/var/log/system_monitor_$(date +%Y%m%d_%H%M%S).log" DURATION=${1:-300} # Default 5 minutes INTERVAL=${2:-5} # Default 5 seconds echo "Starting system monitoring for $DURATION seconds..." | tee -a $LOGFILE echo "Interval: $INTERVAL seconds" | tee -a $LOGFILE echo "Log file: $LOGFILE" echo "========================================" >> $LOGFILE Function to log system info log_system_info() { echo "=== $(date) ===" >> $LOGFILE echo "--- vmstat ---" >> $LOGFILE vmstat 1 2 | tail -1 >> $LOGFILE echo "--- Load Average ---" >> $LOGFILE uptime >> $LOGFILE echo "--- Memory Usage ---" >> $LOGFILE free -h >> $LOGFILE echo "" >> $LOGFILE } Monitor for specified duration END_TIME=$(($(date +%s) + $DURATION)) while [ $(date +%s) -lt $END_TIME ]; do log_system_info sleep $INTERVAL done echo "Monitoring completed. Check $LOGFILE for results." ``` Troubleshooting Common Issues Issue 1: vmstat Command Not Found Problem: `bash: vmstat: command not found` Solution: ```bash Ubuntu/Debian sudo apt-get install procps CentOS/RHEL sudo yum install procps-ng ``` Issue 2: Permission Denied Errors Problem: Cannot access certain system statistics Solution: - Run with appropriate privileges - Check if `/proc` filesystem is mounted - Verify user permissions for system monitoring Issue 3: Inconsistent or Unusual Values Problem: vmstat showing unexpected values Troubleshooting Steps: 1. Verify system time and date 2. Check for system load anomalies 3. Compare with other monitoring tools 4. Review system logs for errors ```bash Comprehensive system check date uptime free -h df -h dmesg | tail -20 ``` Issue 4: High I/O Wait Times Problem: Consistently high `wa` values in CPU section Investigation Steps: ```bash Check disk usage df -h Check for disk errors dmesg | grep -i error Monitor disk I/O iostat -x 2 5 Check for high I/O processes iotop -o ``` Best Practices for vmstat Usage 1. Establish Baseline Measurements Before troubleshooting performance issues, establish baseline measurements during normal operations: ```bash Collect baseline data during different periods vmstat 300 > baseline_normal_hours.log & vmstat 60 > baseline_peak_hours.log & ``` 2. Use Appropriate Monitoring Intervals Choose monitoring intervals based on your needs: - Real-time troubleshooting: 1-2 seconds - General monitoring: 5-10 seconds - Long-term trending: 60-300 seconds 3. Combine Multiple Monitoring Approaches Don't rely solely on vmstat; combine it with other tools: ```bash Comprehensive monitoring command { echo "=== System Overview ===" uptime echo "=== Memory Usage ===" free -h echo "=== vmstat Sample ===" vmstat 1 3 echo "=== Disk Usage ===" df -h } > system_health_$(date +%Y%m%d_%H%M%S).txt ``` 4. Automate Regular Health Checks Create automated health check scripts: ```bash #!/bin/bash daily_health_check.sh ALERT_THRESHOLD_CPU=80 ALERT_THRESHOLD_MEM=90 LOGFILE="/var/log/daily_health_$(date +%Y%m%d).log" Get current stats CPU_USAGE=$(vmstat 1 2 | tail -1 | awk '{print 100-$15}') MEM_USAGE=$(free | grep Mem | awk '{printf "%.0f", $3/$2 * 100}') echo "$(date): CPU Usage: ${CPU_USAGE}%, Memory Usage: ${MEM_USAGE}%" >> $LOGFILE Alert if thresholds exceeded if (( $(echo "$CPU_USAGE > $ALERT_THRESHOLD_CPU" | bc -l) )); then echo "ALERT: High CPU usage detected: ${CPU_USAGE}%" | mail -s "CPU Alert" admin@example.com fi if [ $MEM_USAGE -gt $ALERT_THRESHOLD_MEM ]; then echo "ALERT: High memory usage detected: ${MEM_USAGE}%" | mail -s "Memory Alert" admin@example.com fi ``` 5. Document and Share Findings Maintain documentation of your monitoring findings: ```bash Create monitoring reports vmstat_report() { echo "System Performance Report - $(date)" echo "==================================" echo "System Uptime:" uptime echo "" echo "Memory Summary:" free -h echo "" echo "Current vmstat snapshot:" vmstat echo "" echo "5-minute performance sample:" vmstat 10 30 } ``` Advanced Use Cases and Automation Performance Alerting System Create an advanced alerting system using vmstat: ```bash #!/bin/bash performance_monitor.sh CONFIG_FILE="/etc/performance_monitor.conf" LOG_FILE="/var/log/performance_monitor.log" Default thresholds CPU_THRESHOLD=80 MEMORY_THRESHOLD=85 LOAD_THRESHOLD=5.0 IO_WAIT_THRESHOLD=20 Load configuration if exists [ -f "$CONFIG_FILE" ] && source "$CONFIG_FILE" monitor_performance() { while true; do # Get current metrics VMSTAT_OUTPUT=$(vmstat 1 2 | tail -1) CPU_IDLE=$(echo $VMSTAT_OUTPUT | awk '{print $15}') CPU_USAGE=$((100 - CPU_IDLE)) IO_WAIT=$(echo $VMSTAT_OUTPUT | awk '{print $16}') LOAD_AVG=$(uptime | awk -F'load average:' '{print $2}' | awk '{print $1}' | sed 's/,//') # Memory usage MEM_USAGE=$(free | grep Mem | awk '{printf "%.1f", $3/$2 * 100}') # Check thresholds and alert check_and_alert "CPU" "$CPU_USAGE" "$CPU_THRESHOLD" check_and_alert "Memory" "$MEM_USAGE" "$MEMORY_THRESHOLD" check_and_alert "I/O Wait" "$IO_WAIT" "$IO_WAIT_THRESHOLD" # Log current status echo "$(date): CPU:${CPU_USAGE}% MEM:${MEM_USAGE}% IO_WAIT:${IO_WAIT}% LOAD:${LOAD_AVG}" >> "$LOG_FILE" sleep 60 done } check_and_alert() { local metric="$1" local current="$2" local threshold="$3" if (( $(echo "$current > $threshold" | bc -l) )); then alert_message="ALERT: $metric usage is ${current}% (threshold: ${threshold}%)" echo "$alert_message" | logger -t performance_monitor # Add email notification here if needed fi } Start monitoring monitor_performance ``` Integration with System Monitoring Tools Integrate vmstat with popular monitoring solutions: Nagios Plugin Example: ```bash #!/bin/bash check_vmstat.sh - Nagios plugin for vmstat monitoring WARNING_CPU=70 CRITICAL_CPU=90 WARNING_MEM=80 CRITICAL_MEM=95 Get vmstat data VMSTAT_DATA=$(vmstat 1 2 | tail -1) CPU_USAGE=$(echo $VMSTAT_DATA | awk '{print 100-$15}') MEM_TOTAL=$(free | grep Mem | awk '{print $2}') MEM_USED=$(free | grep Mem | awk '{print $3}') MEM_USAGE=$(echo "scale=0; $MEM_USED * 100 / $MEM_TOTAL" | bc) Check CPU if (( $(echo "$CPU_USAGE > $CRITICAL_CPU" | bc -l) )); then echo "CRITICAL - CPU usage is ${CPU_USAGE}%" exit 2 elif (( $(echo "$CPU_USAGE > $WARNING_CPU" | bc -l) )); then echo "WARNING - CPU usage is ${CPU_USAGE}%" exit 1 fi Check Memory if [ $MEM_USAGE -gt $CRITICAL_MEM ]; then echo "CRITICAL - Memory usage is ${MEM_USAGE}%" exit 2 elif [ $MEM_USAGE -gt $WARNING_MEM ]; then echo "WARNING - Memory usage is ${MEM_USAGE}%" exit 1 fi echo "OK - CPU: ${CPU_USAGE}%, Memory: ${MEM_USAGE}%" exit 0 ``` Conclusion and Next Steps The `vmstat` command is an indispensable tool for Linux system administrators and performance analysts. Its comprehensive reporting capabilities make it essential for understanding system behavior, identifying performance bottlenecks, and maintaining optimal system performance. Key Takeaways 1. Comprehensive Monitoring: vmstat provides a holistic view of system performance including CPU, memory, I/O, and process information 2. Flexible Usage: From quick snapshots to long-term monitoring, vmstat adapts to various monitoring needs 3. Integration Capabilities: Works well with other monitoring tools and can be integrated into automated monitoring solutions 4. Performance Insights: Proper interpretation of vmstat output enables proactive performance management Recommended Next Steps 1. Practice Regular Monitoring: Implement regular vmstat monitoring in your daily system administration routine 2. Develop Custom Scripts: Create monitoring scripts tailored to your specific environment and requirements 3. Learn Complementary Tools: Expand your monitoring toolkit with tools like `iostat`, `sar`, `top`, and `htop` 4. Establish Baselines: Document normal performance patterns for your systems to better identify anomalies 5. Automate Alerting: Implement automated alerting systems based on vmstat metrics Additional Resources To further enhance your system monitoring skills, consider exploring: - Advanced shell scripting for automation - System performance tuning techniques - Integration with monitoring platforms like Nagios, Zabbix, or Prometheus - Log analysis and correlation techniques - Capacity planning methodologies By mastering `vmstat` and implementing the practices outlined in this guide, you'll be well-equipped to maintain high-performing Linux systems and quickly identify and resolve performance issues as they arise. Remember that effective system monitoring is an ongoing process that requires consistent attention and continuous learning to adapt to evolving system requirements and technologies.