How to analyze performance with perf in Linux
How to Analyze Performance with perf in Linux
Performance analysis is a critical skill for system administrators, developers, and DevOps engineers working with Linux systems. The `perf` tool, part of the Linux kernel performance analysis toolkit, provides powerful capabilities for profiling applications, identifying bottlenecks, and optimizing system performance. This comprehensive guide will take you through everything you need to know about using perf effectively, from basic profiling to advanced analysis techniques.
Table of Contents
- [Introduction to perf](#introduction-to-perf)
- [Prerequisites and Installation](#prerequisites-and-installation)
- [Understanding perf Fundamentals](#understanding-perf-fundamentals)
- [Basic perf Commands](#basic-perf-commands)
- [CPU Performance Analysis](#cpu-performance-analysis)
- [Memory Performance Analysis](#memory-performance-analysis)
- [Advanced Profiling Techniques](#advanced-profiling-techniques)
- [Interpreting perf Output](#interpreting-perf-output)
- [Real-World Examples](#real-world-examples)
- [Troubleshooting Common Issues](#troubleshooting-common-issues)
- [Best Practices and Tips](#best-practices-and-tips)
- [Conclusion](#conclusion)
Introduction to perf
The `perf` tool is a Linux profiling toolkit that leverages hardware performance counters and kernel tracepoints to provide detailed insights into system and application performance. Originally developed as part of the Linux kernel source tree, perf offers a comprehensive suite of commands for collecting, analyzing, and visualizing performance data.
Unlike traditional profiling tools that may introduce significant overhead, perf uses hardware counters and kernel facilities to minimize performance impact while providing accurate measurements. This makes it ideal for production environment analysis and real-time performance monitoring.
Key Features of perf
- Hardware Performance Counters: Access to CPU performance monitoring units (PMUs)
- Software Events: Kernel and application-level event tracking
- Statistical Sampling: Low-overhead profiling through sampling
- Call Graph Analysis: Function call relationship visualization
- Multi-core Support: System-wide and per-CPU analysis
- Flexible Output Formats: Various reporting and visualization options
Prerequisites and Installation
Before diving into perf analysis, ensure your system meets the necessary requirements and has the appropriate tools installed.
System Requirements
- Linux kernel version 2.6.31 or later (recommended: 3.0+)
- Root or sudo privileges for system-wide profiling
- Debug symbols for detailed analysis (optional but recommended)
- Sufficient disk space for data collection
Installing perf
The installation process varies depending on your Linux distribution:
Ubuntu/Debian
```bash
Install perf tools
sudo apt update
sudo apt install linux-tools-common linux-tools-generic linux-tools-$(uname -r)
Install debug symbols (optional)
sudo apt install libc6-dbg
```
CentOS/RHEL/Fedora
```bash
Install perf tools
sudo yum install perf
or for newer versions
sudo dnf install perf
Install debug info (optional)
sudo yum install glibc-debuginfo
```
Arch Linux
```bash
Install perf tools
sudo pacman -S perf
Install debug packages from AUR if needed
yay -S glibc-debug
```
Verifying Installation
Confirm perf is properly installed and functional:
```bash
Check perf version
perf --version
List available events
perf list
Test basic functionality
perf stat ls
```
Understanding perf Fundamentals
Before using perf effectively, it's essential to understand its core concepts and terminology.
Performance Events
Perf monitors various types of events:
1. Hardware Events: CPU cycles, instructions, cache misses, branch mispredictions
2. Software Events: Context switches, page faults, CPU migrations
3. Tracepoint Events: Kernel function entry/exit points
4. Dynamic Events: User-defined probe points
Sampling vs. Counting
Perf operates in two primary modes:
- Counting Mode: Accumulates event counts over time
- Sampling Mode: Periodically records event occurrences with context
Key perf Subcommands
- `perf stat`: Statistical counting of events
- `perf record`: Record performance data to a file
- `perf report`: Analyze recorded data
- `perf top`: Real-time performance monitoring
- `perf trace`: System call tracing
- `perf annotate`: Source-level analysis
Basic perf Commands
Let's explore the fundamental perf commands that form the foundation of performance analysis.
perf stat - Statistical Analysis
The `perf stat` command provides high-level performance statistics:
```bash
Basic statistics for a command
perf stat ./my_application
System-wide statistics for 10 seconds
perf stat -a sleep 10
Specific events monitoring
perf stat -e cycles,instructions,cache-misses ./my_program
Per-CPU statistics
perf stat -a -A sleep 5
```
Example output:
```
Performance counter stats for './my_application':
1,234,567,890 cycles # 2.500 GHz
987,654,321 instructions # 0.80 insn per cycle
12,345,678 cache-references # 25.000 M/sec
1,234,567 cache-misses # 10.00% of all cache refs
5.678 task-clock-msecs # 0.998 CPUs utilized
0.005690123 seconds time elapsed
```
perf top - Real-time Monitoring
Monitor system performance in real-time:
```bash
System-wide real-time profiling
perf top
Focus on specific process
perf top -p
Monitor specific events
perf top -e cache-misses
Show call graphs
perf top -g
```
perf record and perf report
Record performance data for detailed analysis:
```bash
Record performance data
perf record ./my_application
Record with call graphs
perf record -g ./my_application
System-wide recording for 30 seconds
perf record -a sleep 30
Record specific events
perf record -e cpu-cycles,cache-misses ./my_program
Analyze recorded data
perf report
Interactive report with call graphs
perf report -g
```
CPU Performance Analysis
CPU performance analysis is one of the most common use cases for perf. Let's explore various techniques for identifying CPU bottlenecks and optimization opportunities.
Identifying CPU Hotspots
Use perf to identify functions consuming the most CPU time:
```bash
Record CPU cycles with call graphs
perf record -g -e cpu-cycles ./cpu_intensive_app
Generate detailed report
perf report -g --stdio
Focus on specific symbol
perf report --symbol=my_function
```
Cache Performance Analysis
Analyze cache behavior to identify memory access patterns:
```bash
Monitor cache events
perf stat -e cache-references,cache-misses,L1-dcache-loads,L1-dcache-load-misses ./my_app
Record cache miss samples
perf record -e cache-misses -g ./my_application
Analyze cache miss patterns
perf report -g --stdio
```
Example cache analysis:
```bash
Comprehensive cache analysis
perf stat -e L1-dcache-loads,L1-dcache-load-misses,L1-icache-loads,L1-icache-load-misses,LLC-loads,LLC-load-misses ./my_program
```
Branch Prediction Analysis
Examine branch prediction performance:
```bash
Monitor branch events
perf stat -e branches,branch-misses ./my_application
Calculate branch miss rate
perf stat -e branches,branch-misses --metric-only ./my_program
```
CPU Utilization Patterns
Analyze CPU utilization across cores:
```bash
Per-CPU statistics
perf stat -a -A -e cpu-cycles sleep 10
Monitor CPU migrations
perf stat -e cpu-migrations ./my_threaded_app
Track context switches
perf stat -e context-switches ./my_application
```
Memory Performance Analysis
Memory performance significantly impacts overall system performance. Perf provides several tools for memory analysis.
Page Fault Analysis
Monitor page fault behavior:
```bash
Monitor page faults
perf stat -e page-faults,minor-faults,major-faults ./my_application
Record page fault events
perf record -e page-faults -g ./memory_intensive_app
System-wide page fault monitoring
perf stat -a -e page-faults sleep 60
```
Memory Bandwidth Analysis
Analyze memory bandwidth utilization:
```bash
Monitor memory controller events (if available)
perf stat -e uncore_imc/data_reads/,uncore_imc/data_writes/ ./my_app
General memory access patterns
perf stat -e cache-references,cache-misses,LLC-loads,LLC-stores ./my_program
```
NUMA Analysis
For NUMA systems, analyze memory locality:
```bash
Monitor NUMA events
perf stat -e node-loads,node-load-misses,node-stores,node-store-misses ./numa_app
Per-node statistics
perf stat -a --per-node sleep 10
```
Advanced Profiling Techniques
Call Graph Analysis
Generate detailed call graphs for function relationship analysis:
```bash
Record with detailed call graphs
perf record -g --call-graph=dwarf ./my_application
Alternative call graph method
perf record -g --call-graph=fp ./my_application
Analyze call graphs with different views
perf report -g 'graph,0.5,caller'
perf report -g 'fractal,0.5,callee'
```
Flame Graphs
Create flame graphs for visual performance analysis:
```bash
Install FlameGraph tools
git clone https://github.com/brendangregg/FlameGraph
cd FlameGraph
Record performance data
perf record -g ./my_application
Generate flame graph
perf script | ./stackcollapse-perf.pl | ./flamegraph.pl > flamegraph.svg
```
Custom Event Analysis
Define and monitor custom events:
```bash
List available tracepoints
perf list tracepoint
Monitor system calls
perf record -e syscalls:sys_enter_* ./my_application
Monitor scheduler events
perf record -e sched:sched_switch -a sleep 10
Custom probe points
perf probe --add 'my_function'
perf record -e probe:my_function ./my_application
perf probe --del probe:my_function
```
Multi-threaded Application Analysis
Analyze threaded applications effectively:
```bash
Per-thread analysis
perf record -g --per-thread ./multithreaded_app
Thread-specific statistics
perf stat --per-thread ./multithreaded_app
Monitor thread synchronization
perf record -e syscalls:sys_enter_futex -g ./threaded_app
```
Interpreting perf Output
Understanding perf output is crucial for effective performance analysis.
Statistical Output Interpretation
```bash
Example perf stat output analysis
perf stat -e cycles,instructions,cache-references,cache-misses ./my_app
```
Key metrics to analyze:
- IPC (Instructions Per Cycle): Higher values indicate better CPU utilization
- Cache Miss Rate: Lower percentages indicate better memory locality
- CPU Utilization: Shows how effectively the CPU is being used
Report Output Analysis
```bash
Generate detailed report
perf report --stdio --sort=comm,dso,symbol
```
Report columns explanation:
- Overhead: Percentage of total samples
- Command: Process name
- Shared Object: Library or executable
- Symbol: Function name
Call Graph Interpretation
```bash
Detailed call graph analysis
perf report -g --stdio --no-children
```
Call graph symbols:
- `+`: Expandable node
- `-`: Expanded node
- `|`: Continuation line
- Numbers: Sample percentages
Real-World Examples
Example 1: Web Server Performance Analysis
Analyze a web server's performance under load:
```bash
Start web server
./my_web_server &
SERVER_PID=$!
Generate load
ab -n 10000 -c 100 http://localhost:8080/ &
Record performance data
perf record -g -p $SERVER_PID sleep 30
Analyze results
perf report -g --stdio
Focus on network-related functions
perf report --symbol="socket" --symbol="recv" --symbol="send"
```
Example 2: Database Query Optimization
Profile database query performance:
```bash
Record database process
perf record -g -p $(pgrep mysql) &
Execute problematic query
mysql -e "SELECT * FROM large_table WHERE complex_condition;"
Stop recording
pkill -INT perf
Analyze query hotspots
perf report -g --stdio | grep -A 20 "query_execution"
```
Example 3: Compiler Performance Analysis
Analyze compiler performance during large builds:
```bash
Record compilation process
perf record -g -e cpu-cycles,cache-misses make -j8
Identify compilation bottlenecks
perf report -g --stdio --sort=dso,symbol
Focus on specific compilation phases
perf report --symbol="parse" --symbol="optimize" --symbol="codegen"
```
Troubleshooting Common Issues
Permission Issues
```bash
Common permission error
echo -1 | sudo tee /proc/sys/kernel/perf_event_paranoid
Temporary solution for non-root users
echo 0 | sudo tee /proc/sys/kernel/perf_event_paranoid
```
Missing Symbols
```bash
Install debug symbols
sudo apt install libc6-dbg linux-image-$(uname -r)-dbg
Build with debug information
gcc -g -O2 -o my_program my_program.c
```
Kernel Symbol Resolution
```bash
Enable kernel symbol resolution
echo 0 | sudo tee /proc/sys/kernel/kptr_restrict
Install kernel debug symbols
sudo apt install linux-image-$(uname -r)-dbgsym
```
High Overhead Issues
```bash
Reduce sampling frequency
perf record -F 99 ./my_application
Use specific events instead of default
perf record -e cpu-cycles ./my_application
Limit recording duration
timeout 30 perf record ./my_application
```
Storage Space Issues
```bash
Compress perf data
perf record -z ./my_application
Limit file size
perf record --max-size=100M ./my_application
Clean up old data files
find . -name "perf.data*" -mtime +7 -delete
```
Best Practices and Tips
Performance Analysis Workflow
1. Start with High-Level Analysis: Use `perf stat` for overview
2. Identify Hotspots: Use `perf top` for real-time monitoring
3. Detailed Investigation: Use `perf record` and `perf report`
4. Validate Changes: Compare before and after measurements
Optimization Guidelines
```bash
Baseline measurement
perf stat -r 5 ./my_application > baseline.txt
After optimization
perf stat -r 5 ./my_optimized_application > optimized.txt
Compare results
diff baseline.txt optimized.txt
```
Production Environment Considerations
- Minimize Overhead: Use appropriate sampling rates
- Limit Scope: Focus on specific processes or time windows
- Monitor Impact: Check system load during profiling
- Automate Collection: Script routine performance monitoring
Advanced Tips
```bash
Combine multiple events efficiently
perf record -e '{cpu-cycles,instructions,cache-misses}:S' ./my_app
Use hardware breakpoints for specific addresses
perf record -e mem:0x12345678:rw ./my_application
Profile only user space
perf record --user-callchains ./my_application
Profile only kernel space
perf record --kernel-callchains ./my_application
```
Scripting and Automation
```bash
#!/bin/bash
Performance monitoring script
APP_NAME="my_application"
DURATION=60
OUTPUT_DIR="/var/log/perf"
Create output directory
mkdir -p $OUTPUT_DIR
Record performance data
perf record -g -o $OUTPUT_DIR/perf-$(date +%Y%m%d-%H%M%S).data \
-e cpu-cycles,cache-misses,page-faults \
timeout $DURATION $APP_NAME
Generate report
perf report -i $OUTPUT_DIR/perf-*.data --stdio > $OUTPUT_DIR/report-$(date +%Y%m%d-%H%M%S).txt
```
Conclusion
The `perf` tool is an indispensable resource for Linux performance analysis, offering comprehensive capabilities for identifying bottlenecks, optimizing applications, and understanding system behavior. From basic statistical analysis to advanced profiling techniques, perf provides the insights needed to make informed optimization decisions.
Key takeaways from this guide:
- Start Simple: Begin with `perf stat` and `perf top` for initial analysis
- Understand Context: Always consider the broader system context when interpreting results
- Iterate and Validate: Use perf throughout the optimization process to validate improvements
- Combine Techniques: Use multiple perf commands and events for comprehensive analysis
- Document Findings: Keep records of performance analysis for future reference
Next Steps
To further enhance your performance analysis skills:
1. Explore Advanced Features: Investigate perf scripting and custom event creation
2. Learn Complementary Tools: Combine perf with other profiling tools like valgrind, gprof, and ftrace
3. Study System Architecture: Deepen understanding of CPU architecture and memory hierarchies
4. Practice Regularly: Apply perf analysis to various types of applications and workloads
5. Stay Updated: Follow perf development and new features in recent kernel versions
Performance optimization is an ongoing process, and perf provides the foundation for data-driven optimization decisions. By mastering these techniques, you'll be well-equipped to tackle performance challenges in any Linux environment, from development systems to large-scale production deployments.
Remember that effective performance analysis requires not just technical knowledge of tools like perf, but also understanding of system architecture, application design, and optimization principles. Continue building this knowledge to become a more effective performance engineer and system optimizer.