How to analyze performance with perf in Linux

How to Analyze Performance with perf in Linux Performance analysis is a critical skill for system administrators, developers, and DevOps engineers working with Linux systems. The `perf` tool, part of the Linux kernel performance analysis toolkit, provides powerful capabilities for profiling applications, identifying bottlenecks, and optimizing system performance. This comprehensive guide will take you through everything you need to know about using perf effectively, from basic profiling to advanced analysis techniques. Table of Contents - [Introduction to perf](#introduction-to-perf) - [Prerequisites and Installation](#prerequisites-and-installation) - [Understanding perf Fundamentals](#understanding-perf-fundamentals) - [Basic perf Commands](#basic-perf-commands) - [CPU Performance Analysis](#cpu-performance-analysis) - [Memory Performance Analysis](#memory-performance-analysis) - [Advanced Profiling Techniques](#advanced-profiling-techniques) - [Interpreting perf Output](#interpreting-perf-output) - [Real-World Examples](#real-world-examples) - [Troubleshooting Common Issues](#troubleshooting-common-issues) - [Best Practices and Tips](#best-practices-and-tips) - [Conclusion](#conclusion) Introduction to perf The `perf` tool is a Linux profiling toolkit that leverages hardware performance counters and kernel tracepoints to provide detailed insights into system and application performance. Originally developed as part of the Linux kernel source tree, perf offers a comprehensive suite of commands for collecting, analyzing, and visualizing performance data. Unlike traditional profiling tools that may introduce significant overhead, perf uses hardware counters and kernel facilities to minimize performance impact while providing accurate measurements. This makes it ideal for production environment analysis and real-time performance monitoring. Key Features of perf - Hardware Performance Counters: Access to CPU performance monitoring units (PMUs) - Software Events: Kernel and application-level event tracking - Statistical Sampling: Low-overhead profiling through sampling - Call Graph Analysis: Function call relationship visualization - Multi-core Support: System-wide and per-CPU analysis - Flexible Output Formats: Various reporting and visualization options Prerequisites and Installation Before diving into perf analysis, ensure your system meets the necessary requirements and has the appropriate tools installed. System Requirements - Linux kernel version 2.6.31 or later (recommended: 3.0+) - Root or sudo privileges for system-wide profiling - Debug symbols for detailed analysis (optional but recommended) - Sufficient disk space for data collection Installing perf The installation process varies depending on your Linux distribution: Ubuntu/Debian ```bash Install perf tools sudo apt update sudo apt install linux-tools-common linux-tools-generic linux-tools-$(uname -r) Install debug symbols (optional) sudo apt install libc6-dbg ``` CentOS/RHEL/Fedora ```bash Install perf tools sudo yum install perf or for newer versions sudo dnf install perf Install debug info (optional) sudo yum install glibc-debuginfo ``` Arch Linux ```bash Install perf tools sudo pacman -S perf Install debug packages from AUR if needed yay -S glibc-debug ``` Verifying Installation Confirm perf is properly installed and functional: ```bash Check perf version perf --version List available events perf list Test basic functionality perf stat ls ``` Understanding perf Fundamentals Before using perf effectively, it's essential to understand its core concepts and terminology. Performance Events Perf monitors various types of events: 1. Hardware Events: CPU cycles, instructions, cache misses, branch mispredictions 2. Software Events: Context switches, page faults, CPU migrations 3. Tracepoint Events: Kernel function entry/exit points 4. Dynamic Events: User-defined probe points Sampling vs. Counting Perf operates in two primary modes: - Counting Mode: Accumulates event counts over time - Sampling Mode: Periodically records event occurrences with context Key perf Subcommands - `perf stat`: Statistical counting of events - `perf record`: Record performance data to a file - `perf report`: Analyze recorded data - `perf top`: Real-time performance monitoring - `perf trace`: System call tracing - `perf annotate`: Source-level analysis Basic perf Commands Let's explore the fundamental perf commands that form the foundation of performance analysis. perf stat - Statistical Analysis The `perf stat` command provides high-level performance statistics: ```bash Basic statistics for a command perf stat ./my_application System-wide statistics for 10 seconds perf stat -a sleep 10 Specific events monitoring perf stat -e cycles,instructions,cache-misses ./my_program Per-CPU statistics perf stat -a -A sleep 5 ``` Example output: ``` Performance counter stats for './my_application': 1,234,567,890 cycles # 2.500 GHz 987,654,321 instructions # 0.80 insn per cycle 12,345,678 cache-references # 25.000 M/sec 1,234,567 cache-misses # 10.00% of all cache refs 5.678 task-clock-msecs # 0.998 CPUs utilized 0.005690123 seconds time elapsed ``` perf top - Real-time Monitoring Monitor system performance in real-time: ```bash System-wide real-time profiling perf top Focus on specific process perf top -p Monitor specific events perf top -e cache-misses Show call graphs perf top -g ``` perf record and perf report Record performance data for detailed analysis: ```bash Record performance data perf record ./my_application Record with call graphs perf record -g ./my_application System-wide recording for 30 seconds perf record -a sleep 30 Record specific events perf record -e cpu-cycles,cache-misses ./my_program Analyze recorded data perf report Interactive report with call graphs perf report -g ``` CPU Performance Analysis CPU performance analysis is one of the most common use cases for perf. Let's explore various techniques for identifying CPU bottlenecks and optimization opportunities. Identifying CPU Hotspots Use perf to identify functions consuming the most CPU time: ```bash Record CPU cycles with call graphs perf record -g -e cpu-cycles ./cpu_intensive_app Generate detailed report perf report -g --stdio Focus on specific symbol perf report --symbol=my_function ``` Cache Performance Analysis Analyze cache behavior to identify memory access patterns: ```bash Monitor cache events perf stat -e cache-references,cache-misses,L1-dcache-loads,L1-dcache-load-misses ./my_app Record cache miss samples perf record -e cache-misses -g ./my_application Analyze cache miss patterns perf report -g --stdio ``` Example cache analysis: ```bash Comprehensive cache analysis perf stat -e L1-dcache-loads,L1-dcache-load-misses,L1-icache-loads,L1-icache-load-misses,LLC-loads,LLC-load-misses ./my_program ``` Branch Prediction Analysis Examine branch prediction performance: ```bash Monitor branch events perf stat -e branches,branch-misses ./my_application Calculate branch miss rate perf stat -e branches,branch-misses --metric-only ./my_program ``` CPU Utilization Patterns Analyze CPU utilization across cores: ```bash Per-CPU statistics perf stat -a -A -e cpu-cycles sleep 10 Monitor CPU migrations perf stat -e cpu-migrations ./my_threaded_app Track context switches perf stat -e context-switches ./my_application ``` Memory Performance Analysis Memory performance significantly impacts overall system performance. Perf provides several tools for memory analysis. Page Fault Analysis Monitor page fault behavior: ```bash Monitor page faults perf stat -e page-faults,minor-faults,major-faults ./my_application Record page fault events perf record -e page-faults -g ./memory_intensive_app System-wide page fault monitoring perf stat -a -e page-faults sleep 60 ``` Memory Bandwidth Analysis Analyze memory bandwidth utilization: ```bash Monitor memory controller events (if available) perf stat -e uncore_imc/data_reads/,uncore_imc/data_writes/ ./my_app General memory access patterns perf stat -e cache-references,cache-misses,LLC-loads,LLC-stores ./my_program ``` NUMA Analysis For NUMA systems, analyze memory locality: ```bash Monitor NUMA events perf stat -e node-loads,node-load-misses,node-stores,node-store-misses ./numa_app Per-node statistics perf stat -a --per-node sleep 10 ``` Advanced Profiling Techniques Call Graph Analysis Generate detailed call graphs for function relationship analysis: ```bash Record with detailed call graphs perf record -g --call-graph=dwarf ./my_application Alternative call graph method perf record -g --call-graph=fp ./my_application Analyze call graphs with different views perf report -g 'graph,0.5,caller' perf report -g 'fractal,0.5,callee' ``` Flame Graphs Create flame graphs for visual performance analysis: ```bash Install FlameGraph tools git clone https://github.com/brendangregg/FlameGraph cd FlameGraph Record performance data perf record -g ./my_application Generate flame graph perf script | ./stackcollapse-perf.pl | ./flamegraph.pl > flamegraph.svg ``` Custom Event Analysis Define and monitor custom events: ```bash List available tracepoints perf list tracepoint Monitor system calls perf record -e syscalls:sys_enter_* ./my_application Monitor scheduler events perf record -e sched:sched_switch -a sleep 10 Custom probe points perf probe --add 'my_function' perf record -e probe:my_function ./my_application perf probe --del probe:my_function ``` Multi-threaded Application Analysis Analyze threaded applications effectively: ```bash Per-thread analysis perf record -g --per-thread ./multithreaded_app Thread-specific statistics perf stat --per-thread ./multithreaded_app Monitor thread synchronization perf record -e syscalls:sys_enter_futex -g ./threaded_app ``` Interpreting perf Output Understanding perf output is crucial for effective performance analysis. Statistical Output Interpretation ```bash Example perf stat output analysis perf stat -e cycles,instructions,cache-references,cache-misses ./my_app ``` Key metrics to analyze: - IPC (Instructions Per Cycle): Higher values indicate better CPU utilization - Cache Miss Rate: Lower percentages indicate better memory locality - CPU Utilization: Shows how effectively the CPU is being used Report Output Analysis ```bash Generate detailed report perf report --stdio --sort=comm,dso,symbol ``` Report columns explanation: - Overhead: Percentage of total samples - Command: Process name - Shared Object: Library or executable - Symbol: Function name Call Graph Interpretation ```bash Detailed call graph analysis perf report -g --stdio --no-children ``` Call graph symbols: - `+`: Expandable node - `-`: Expanded node - `|`: Continuation line - Numbers: Sample percentages Real-World Examples Example 1: Web Server Performance Analysis Analyze a web server's performance under load: ```bash Start web server ./my_web_server & SERVER_PID=$! Generate load ab -n 10000 -c 100 http://localhost:8080/ & Record performance data perf record -g -p $SERVER_PID sleep 30 Analyze results perf report -g --stdio Focus on network-related functions perf report --symbol="socket" --symbol="recv" --symbol="send" ``` Example 2: Database Query Optimization Profile database query performance: ```bash Record database process perf record -g -p $(pgrep mysql) & Execute problematic query mysql -e "SELECT * FROM large_table WHERE complex_condition;" Stop recording pkill -INT perf Analyze query hotspots perf report -g --stdio | grep -A 20 "query_execution" ``` Example 3: Compiler Performance Analysis Analyze compiler performance during large builds: ```bash Record compilation process perf record -g -e cpu-cycles,cache-misses make -j8 Identify compilation bottlenecks perf report -g --stdio --sort=dso,symbol Focus on specific compilation phases perf report --symbol="parse" --symbol="optimize" --symbol="codegen" ``` Troubleshooting Common Issues Permission Issues ```bash Common permission error echo -1 | sudo tee /proc/sys/kernel/perf_event_paranoid Temporary solution for non-root users echo 0 | sudo tee /proc/sys/kernel/perf_event_paranoid ``` Missing Symbols ```bash Install debug symbols sudo apt install libc6-dbg linux-image-$(uname -r)-dbg Build with debug information gcc -g -O2 -o my_program my_program.c ``` Kernel Symbol Resolution ```bash Enable kernel symbol resolution echo 0 | sudo tee /proc/sys/kernel/kptr_restrict Install kernel debug symbols sudo apt install linux-image-$(uname -r)-dbgsym ``` High Overhead Issues ```bash Reduce sampling frequency perf record -F 99 ./my_application Use specific events instead of default perf record -e cpu-cycles ./my_application Limit recording duration timeout 30 perf record ./my_application ``` Storage Space Issues ```bash Compress perf data perf record -z ./my_application Limit file size perf record --max-size=100M ./my_application Clean up old data files find . -name "perf.data*" -mtime +7 -delete ``` Best Practices and Tips Performance Analysis Workflow 1. Start with High-Level Analysis: Use `perf stat` for overview 2. Identify Hotspots: Use `perf top` for real-time monitoring 3. Detailed Investigation: Use `perf record` and `perf report` 4. Validate Changes: Compare before and after measurements Optimization Guidelines ```bash Baseline measurement perf stat -r 5 ./my_application > baseline.txt After optimization perf stat -r 5 ./my_optimized_application > optimized.txt Compare results diff baseline.txt optimized.txt ``` Production Environment Considerations - Minimize Overhead: Use appropriate sampling rates - Limit Scope: Focus on specific processes or time windows - Monitor Impact: Check system load during profiling - Automate Collection: Script routine performance monitoring Advanced Tips ```bash Combine multiple events efficiently perf record -e '{cpu-cycles,instructions,cache-misses}:S' ./my_app Use hardware breakpoints for specific addresses perf record -e mem:0x12345678:rw ./my_application Profile only user space perf record --user-callchains ./my_application Profile only kernel space perf record --kernel-callchains ./my_application ``` Scripting and Automation ```bash #!/bin/bash Performance monitoring script APP_NAME="my_application" DURATION=60 OUTPUT_DIR="/var/log/perf" Create output directory mkdir -p $OUTPUT_DIR Record performance data perf record -g -o $OUTPUT_DIR/perf-$(date +%Y%m%d-%H%M%S).data \ -e cpu-cycles,cache-misses,page-faults \ timeout $DURATION $APP_NAME Generate report perf report -i $OUTPUT_DIR/perf-*.data --stdio > $OUTPUT_DIR/report-$(date +%Y%m%d-%H%M%S).txt ``` Conclusion The `perf` tool is an indispensable resource for Linux performance analysis, offering comprehensive capabilities for identifying bottlenecks, optimizing applications, and understanding system behavior. From basic statistical analysis to advanced profiling techniques, perf provides the insights needed to make informed optimization decisions. Key takeaways from this guide: - Start Simple: Begin with `perf stat` and `perf top` for initial analysis - Understand Context: Always consider the broader system context when interpreting results - Iterate and Validate: Use perf throughout the optimization process to validate improvements - Combine Techniques: Use multiple perf commands and events for comprehensive analysis - Document Findings: Keep records of performance analysis for future reference Next Steps To further enhance your performance analysis skills: 1. Explore Advanced Features: Investigate perf scripting and custom event creation 2. Learn Complementary Tools: Combine perf with other profiling tools like valgrind, gprof, and ftrace 3. Study System Architecture: Deepen understanding of CPU architecture and memory hierarchies 4. Practice Regularly: Apply perf analysis to various types of applications and workloads 5. Stay Updated: Follow perf development and new features in recent kernel versions Performance optimization is an ongoing process, and perf provides the foundation for data-driven optimization decisions. By mastering these techniques, you'll be well-equipped to tackle performance challenges in any Linux environment, from development systems to large-scale production deployments. Remember that effective performance analysis requires not just technical knowledge of tools like perf, but also understanding of system architecture, application design, and optimization principles. Continue building this knowledge to become a more effective performance engineer and system optimizer.