How to profile perf counters → perf stat ; perf record ...; perf report

How to Profile Performance Counters with Linux Perf: Complete Guide to perf stat, perf record, and perf report Table of Contents 1. [Introduction](#introduction) 2. [Prerequisites](#prerequisites) 3. [Understanding Performance Counters](#understanding-performance-counters) 4. [Getting Started with perf stat](#getting-started-with-perf-stat) 5. [Advanced Performance Recording with perf record](#advanced-performance-recording-with-perf-record) 6. [Analyzing Results with perf report](#analyzing-results-with-perf-report) 7. [Practical Examples and Use Cases](#practical-examples-and-use-cases) 8. [Advanced Profiling Techniques](#advanced-profiling-techniques) 9. [Troubleshooting Common Issues](#troubleshooting-common-issues) 10. [Best Practices and Professional Tips](#best-practices-and-professional-tips) 11. [Conclusion](#conclusion) Introduction Performance profiling is a critical skill for developers, system administrators, and performance engineers who need to optimize applications and understand system behavior. The Linux `perf` tool provides a powerful interface to hardware and software performance counters, enabling detailed analysis of program execution characteristics. This comprehensive guide will teach you how to effectively use the three core perf commands: `perf stat` for collecting performance statistics, `perf record` for capturing detailed profiling data, and `perf report` for analyzing the collected information. By the end of this article, you'll have the knowledge to identify performance bottlenecks, understand CPU behavior, and optimize your applications using real-world profiling techniques. Whether you're debugging a slow application, optimizing critical code paths, or conducting performance research, mastering these perf tools will significantly enhance your ability to understand and improve system performance. Prerequisites Before diving into performance profiling with perf, ensure you have the following requirements met: System Requirements - Linux Operating System: perf is available on most modern Linux distributions - Kernel Version: Linux kernel 2.6.31 or later (recommended: 4.0+) - Hardware: x86_64, ARM, or other supported architecture with performance monitoring unit (PMU) Software Installation Install perf tools on your system: ```bash Ubuntu/Debian sudo apt-get update sudo apt-get install linux-tools-common linux-tools-generic linux-tools-$(uname -r) CentOS/RHEL/Fedora sudo yum install perf or for newer versions sudo dnf install perf Arch Linux sudo pacman -S perf ``` Permissions and Security Configure proper permissions for profiling: ```bash Check current perf_event_paranoid setting cat /proc/sys/kernel/perf_event_paranoid Temporarily allow profiling (requires root) sudo sysctl kernel.perf_event_paranoid=1 For persistent configuration, add to /etc/sysctl.conf echo 'kernel.perf_event_paranoid = 1' | sudo tee -a /etc/sysctl.conf ``` Debug Symbols Install debug symbols for better profiling results: ```bash Ubuntu/Debian sudo apt-get install libc6-dbg Enable debug symbol repositories if needed echo 'deb http://ddebs.ubuntu.com $(lsb_release -cs) main restricted universe multiverse' | \ sudo tee -a /etc/apt/sources.list.d/ddebs.list ``` Understanding Performance Counters Performance counters are special hardware and software registers that track various system events during program execution. Understanding these counters is essential for effective profiling. Hardware Performance Counters Modern CPUs include Performance Monitoring Units (PMUs) that track events such as: - CPU Cycles: Total processor cycles consumed - Instructions: Number of instructions executed - Cache Events: L1, L2, L3 cache hits and misses - Branch Events: Branch predictions, mispredictions - Memory Events: Memory loads, stores, stalls Software Performance Counters The Linux kernel also provides software counters for: - Context Switches: Task switching frequency - Page Faults: Virtual memory page faults - System Calls: Kernel system call invocations - CPU Migrations: Process movement between CPU cores Counter Categories Perf organizes counters into several categories: ```bash List available events perf list Show hardware events perf list hardware Show software events perf list software Show cache events perf list cache Show tracepoint events perf list tracepoint ``` Getting Started with perf stat The `perf stat` command provides a simple way to collect basic performance statistics for any command or running process. It's the perfect starting point for performance analysis. Basic Usage The fundamental syntax for perf stat is: ```bash perf stat [options] ``` Simple Performance Measurement Start with basic performance measurement of a command: ```bash Profile a simple command perf stat ls -la Profile a compilation process perf stat gcc -O2 myprogram.c -o myprogram Profile a custom application perf stat ./myprogram ``` Example output: ``` Performance counter stats for 'ls -la': 1.23 msec task-clock # 0.891 CPUs utilized 0 context-switches # 0.000 K/sec 0 cpu-migrations # 0.000 K/sec 156 page-faults # 0.127 M/sec 4,234,567 cycles # 3.441 GHz 2,876,543 instructions # 0.68 insn per cycle 543,210 branches # 441.634 M/sec 12,345 branch-misses # 2.27% of all branches 0.001381 seconds time elapsed 0.001234 seconds user 0.000147 seconds sys ``` Customizing Measured Events Specify custom events to measure: ```bash Measure specific hardware events perf stat -e cycles,instructions,cache-references,cache-misses ./myprogram Measure cache performance perf stat -e L1-dcache-loads,L1-dcache-load-misses,L1-dcache-stores ./myprogram Measure branch prediction perf stat -e branches,branch-misses,branch-loads,branch-load-misses ./myprogram ``` Profiling Running Processes Profile already running processes using process ID: ```bash Profile a running process for 10 seconds perf stat -p sleep 10 Profile multiple processes perf stat -p ,, sleep 5 Profile all processes of a specific command perf stat -a -G myprogram sleep 10 ``` Statistical Analysis Options Enhance statistical reliability with multiple runs: ```bash Run command multiple times for better statistics perf stat -r 10 ./myprogram Show detailed statistics perf stat -d ./myprogram Show very detailed statistics (more counters) perf stat -dd ./myprogram Show extremely detailed statistics perf stat -ddd ./myprogram ``` Output Formatting Control output format for analysis: ```bash CSV format for scripting perf stat -x, ./myprogram JSON format (newer perf versions) perf stat -j ./myprogram Save output to file perf stat -o stats.txt ./myprogram Append to existing file perf stat -a -o stats.txt ./myprogram ``` Advanced Performance Recording with perf record While `perf stat` provides summary statistics, `perf record` captures detailed profiling data including call stacks, instruction pointers, and timing information. Basic Recording Start with basic profiling data collection: ```bash Basic profiling with default settings perf record ./myprogram Profile with higher sampling frequency perf record -F 1000 ./myprogram Profile specific events perf record -e cycles ./myprogram ``` Sampling Configuration Configure sampling behavior for different analysis needs: ```bash Sample every 1000 cycles perf record -c 1000 ./myprogram Sample at 500 Hz frequency perf record -F 500 ./myprogram Record all events (no sampling) - use carefully! perf record -a --all-kernel ./myprogram ``` Call Stack Recording Capture call stack information for detailed analysis: ```bash Record call graphs using frame pointers perf record -g ./myprogram Record call graphs using DWARF debug info perf record --call-graph dwarf ./myprogram Record with specific unwinding method perf record --call-graph lbr ./myprogram # Last Branch Record Limit call stack depth perf record -g --max-stack 10 ./myprogram ``` Multi-Event Recording Record multiple events simultaneously: ```bash Record multiple hardware events perf record -e cycles,instructions,cache-misses ./myprogram Record with different sampling rates perf record -e cycles/period=1000/,instructions/period=10000/ ./myprogram Record hardware and software events perf record -e cycles,context-switches,page-faults ./myprogram ``` Process and Thread Targeting Target specific processes or threads: ```bash Record specific running process perf record -p Record specific thread perf record -t Record all processes system-wide perf record -a Record with inheritance to child processes perf record --inherit ./myprogram ``` Advanced Recording Options Utilize advanced features for specialized analysis: ```bash Record with timestamps perf record -T ./myprogram Record with CPU information perf record -C 0-3 ./myprogram Record kernel and user space perf record -k -u ./myprogram Record with branch sampling perf record -b ./myprogram Record with data address sampling perf record -d ./myprogram ``` Output File Management Manage profiling data files effectively: ```bash Specify output file perf record -o profile.data ./myprogram Compress output file perf record --compress ./myprogram Set file size limits perf record -S 100 ./myprogram # 100 MB limit Real-time processing perf record --realtime=1 ./myprogram ``` Analyzing Results with perf report The `perf report` command analyzes data collected by `perf record`, providing detailed insights into program performance characteristics. Basic Report Generation Generate basic performance reports: ```bash Generate report from default perf.data file perf report Generate report from specific file perf report -i profile.data Show report with call graphs perf report -g Sort by different criteria perf report --sort=comm,dso,symbol ``` Interactive Report Navigation Navigate reports interactively: ```bash Start interactive mode perf report --tui Use stdio mode for scripting perf report --stdio Generate HTML report perf report --gtk ``` Call Graph Analysis Analyze call stack information: ```bash Show call graphs with different formats perf report -g graph,0.5,caller Show flat call graph perf report -g flat Show fractal call graph perf report -g fractal Limit call graph depth perf report -g --max-stack 5 ``` Filtering and Focusing Filter reports for specific analysis: ```bash Filter by command perf report --comms=myprogram Filter by shared library perf report --dsos=libc.so.6 Filter by symbol perf report --symbols=main,compute_function Filter by CPU perf report --cpu=0,1 Show only kernel symbols perf report --kernel Show only user symbols perf report --user ``` Statistical Analysis Generate statistical summaries: ```bash Show header information perf report --header Show sample statistics perf report --stats Show event frequency perf report --sort=period Show percentage threshold perf report --percent-limit=1.0 ``` Output Formatting Customize report output: ```bash Generate field-separated output perf report --field-separator=, Show full paths perf report --full-paths Show source code annotations perf report --source Show assembly code annotations perf report --asm-raw Generate custom format perf report --format="Overhead,Command,Shared Object,Symbol" ``` Practical Examples and Use Cases This section demonstrates real-world applications of perf profiling across different scenarios and performance optimization challenges. Example 1: CPU-Intensive Application Analysis Analyze a compute-heavy application: ```bash Create a CPU-intensive test program cat > cpu_intensive.c << 'EOF' #include #include double compute_intensive(int iterations) { double result = 0.0; for (int i = 0; i < iterations; i++) { result += sin(i) cos(i) sqrt(i + 1); } return result; } int main() { printf("Result: %f\n", compute_intensive(10000000)); return 0; } EOF gcc -O2 -g cpu_intensive.c -o cpu_intensive -lm Profile with basic statistics perf stat -d ./cpu_intensive Record detailed profiling data perf record -g -F 1000 ./cpu_intensive Analyze the results perf report -g graph,0.5,caller --stdio ``` Example 2: Memory Access Pattern Analysis Examine memory usage patterns: ```bash Create memory-intensive test program cat > memory_test.c << 'EOF' #include #include #define ARRAY_SIZE 10000000 int main() { int array = malloc(ARRAY_SIZE sizeof(int)); // Sequential access for (int i = 0; i < ARRAY_SIZE; i++) { array[i] = i; } // Random access for (int i = 0; i < ARRAY_SIZE; i++) { int idx = rand() % ARRAY_SIZE; array[idx] += 1; } free(array); return 0; } EOF gcc -O2 -g memory_test.c -o memory_test Profile cache behavior perf stat -e cache-references,cache-misses,L1-dcache-loads,L1-dcache-load-misses ./memory_test Record with data address sampling perf record -d -g ./memory_test Analyze memory access patterns perf report --sort=mem,symbol ``` Example 3: Multi-threaded Application Profiling Profile parallel applications: ```bash Create multi-threaded test program cat > parallel_test.c << 'EOF' #include #include #include void worker_thread(void arg) { int thread_id = (int)arg; for (int i = 0; i < 1000000; i++) { // Simulate work volatile int x = i * thread_id; } return NULL; } int main() { pthread_t threads[4]; int thread_ids[4]; for (int i = 0; i < 4; i++) { thread_ids[i] = i; pthread_create(&threads[i], NULL, worker_thread, &thread_ids[i]); } for (int i = 0; i < 4; i++) { pthread_join(threads[i], NULL); } return 0; } EOF gcc -O2 -g -pthread parallel_test.c -o parallel_test Profile all CPUs perf record -a -g ./parallel_test Analyze per-CPU performance perf report --sort=cpu,comm,symbol ``` Example 4: System Call Analysis Examine system call overhead: ```bash Profile system call intensive program perf record -e syscalls:sys_enter_* ./file_operations Analyze system call patterns perf trace -s ./file_operations Record with context switches perf record -e context-switches,syscalls:* ./file_operations ``` Example 5: Web Server Performance Analysis Profile a running web server: ```bash Find web server process ID pgrep nginx Profile running web server for 30 seconds perf record -g -p $(pgrep nginx) sleep 30 Generate load while profiling ab -n 10000 -c 100 http://localhost/ & Analyze web server performance perf report -g --sort=overhead,comm,symbol ``` Advanced Profiling Techniques Explore sophisticated profiling methods for complex performance analysis scenarios. Differential Profiling Compare performance between different versions or configurations: ```bash Profile baseline version perf record -o baseline.data ./myprogram_v1 Profile optimized version perf record -o optimized.data ./myprogram_v2 Compare profiles perf diff baseline.data optimized.data Show detailed differences perf diff -c delta-abs baseline.data optimized.data ``` Event Correlation Analysis Analyze relationships between different performance events: ```bash Record multiple correlated events perf record -e cycles,instructions,cache-misses,branch-misses -g ./myprogram Analyze correlation patterns perf report --sort=overhead_sys,overhead_us Generate correlation matrix perf script | custom_correlation_script.py ``` Hardware-Specific Profiling Utilize processor-specific performance monitoring features: ```bash Intel-specific events perf record -e intel_pt// ./myprogram # Intel Processor Trace AMD-specific events perf record -e amd_iommu// ./myprogram ARM-specific events (on ARM systems) perf record -e armv8_pmuv3// ./myprogram ``` Kernel Profiling Profile kernel-level performance: ```bash Profile kernel functions sudo perf record -a -g --kernel-only Profile specific kernel subsystem sudo perf record -e kmem:* -a Analyze kernel call stacks sudo perf report -g --kernel-only --sort=overhead,symbol ``` Custom Event Definition Define and use custom performance events: ```bash Create custom event configuration cat > custom_events.txt << 'EOF' cpu/event=0x3c,umask=0x00,name=cpu_clk_unhalted/ cpu/event=0xc0,umask=0x00,name=inst_retired/ EOF Use custom events perf record -e $(cat custom_events.txt | tr '\n' ',') ./myprogram ``` Troubleshooting Common Issues Address frequent problems encountered during performance profiling with practical solutions. Permission and Security Issues Problem: Permission denied errors when running perf commands. Solution: ```bash Check current paranoid level cat /proc/sys/kernel/perf_event_paranoid Temporarily reduce paranoid level (as root) sudo sysctl kernel.perf_event_paranoid=1 Permanently configure in /etc/sysctl.conf echo 'kernel.perf_event_paranoid = 1' | sudo tee -a /etc/sysctl.conf Add user to perf_users group (if available) sudo usermod -a -G perf_users $USER ``` Missing Debug Symbols Problem: Reports show only addresses instead of function names. Solution: ```bash Install debug packages sudo apt-get install libc6-dbg For your own programs, compile with debug info gcc -g -O2 myprogram.c -o myprogram Check if symbols are available objdump -t myprogram | head -20 Use addr2line for manual symbol resolution addr2line -e myprogram -f -C 0x401234 ``` High Overhead and Performance Impact Problem: Profiling significantly affects program performance. Solution: ```bash Reduce sampling frequency perf record -F 100 ./myprogram # Instead of default 1000 Hz Use period-based sampling perf record -c 10000 ./myprogram # Sample every 10000 events Profile for shorter durations timeout 10s perf record -g ./long_running_program Use statistical sampling perf record --sample-cpu ./myprogram ``` Large Profile Data Files Problem: perf.data files become too large to handle. Solution: ```bash Compress profile data perf record --compress ./myprogram Limit profile size perf record -S 100 ./myprogram # Limit to 100MB Filter events during recording perf record --filter 'period > 1000' -e cycles ./myprogram Post-process to reduce size perf inject -i large.data -o compressed.data --compress ``` Incomplete Call Stacks Problem: Call graphs show incomplete or missing stack frames. Solution: ```bash Use DWARF unwinding for better stacks perf record --call-graph dwarf ./myprogram Increase stack dump size perf record --call-graph dwarf,16384 ./myprogram Compile with frame pointers gcc -fno-omit-frame-pointer -g -O2 myprogram.c -o myprogram Use Last Branch Record (Intel CPUs) perf record --call-graph lbr ./myprogram ``` Event Not Supported Errors Problem: Specific performance events are not available. Solution: ```bash List available events for your system perf list | grep -i cache Check PMU capabilities cat /proc/cpuinfo | grep -i pmu Use generic events instead of specific ones perf record -e cycles,instructions ./myprogram # Instead of specific cache events Check kernel configuration zcat /proc/config.gz | grep PERF ``` Profiling Container Applications Problem: Difficulty profiling applications running in containers. Solution: ```bash Profile from host system sudo perf record -a -g docker run mycontainer Profile specific container process docker exec -it mycontainer bash perf record -g ./myprogram Use privileged containers for full profiling docker run --privileged --pid=host mycontainer Mount perf data volume docker run -v /tmp:/tmp --pid=host mycontainer ``` Best Practices and Professional Tips Implement professional-grade performance profiling with these expert recommendations and industry best practices. Profiling Methodology Establish Baseline Measurements: ```bash Always create baseline profiles before optimization perf record -o baseline.data -g ./myprogram Document system state during profiling uname -a > system_info.txt cat /proc/cpuinfo >> system_info.txt cat /proc/meminfo >> system_info.txt ``` Use Consistent Environment: - Disable CPU frequency scaling during profiling - Set CPU governor to performance mode - Minimize background processes - Use dedicated profiling systems when possible ```bash Set performance governor sudo cpupower frequency-set -g performance Disable address space layout randomization echo 0 | sudo tee /proc/sys/kernel/randomize_va_space Set CPU affinity for consistent results taskset -c 0 perf record -g ./myprogram ``` Optimization Workflow Follow Systematic Approach: 1. Profile First: Always profile before optimizing 2. Focus on Hotspots: Optimize the most time-consuming functions first 3. Measure Impact: Verify optimization effectiveness with profiling 4. Iterate: Repeat the profile-optimize-measure cycle ```bash Workflow example perf record -g ./myprogram # Initial profile perf report --stdio | head -20 # Identify hotspots ... make optimizations ... perf record -g -o optimized.data ./myprogram # Profile optimized version perf diff baseline.data optimized.data # Compare results ``` Advanced Configuration Kernel Configuration for Optimal Profiling: ```bash Check required kernel options zcat /proc/config.gz | grep -E "(PERF|PMU|TRACING)" Required options: CONFIG_PERF_EVENTS=y CONFIG_HAVE_PERF_EVENTS=y CONFIG_PERF_USE_VMALLOC=y ``` System Tuning for Profiling: ```bash Increase perf buffer sizes echo 1024 | sudo tee /proc/sys/kernel/perf_event_max_sample_rate Adjust memory limits echo 32768 | sudo tee /proc/sys/kernel/perf_event_mlock_kb Configure core dump handling echo core | sudo tee /proc/sys/kernel/core_pattern ``` Automated Profiling Scripts Create reusable profiling scripts: ```bash #!/bin/bash profile_application.sh - Automated profiling script APPLICATION=$1 OUTPUT_DIR="profile_$(date +%Y%m%d_%H%M%S)" mkdir -p "$OUTPUT_DIR" echo "Starting comprehensive profiling of $APPLICATION" System information uname -a > "$OUTPUT_DIR/system_info.txt" cat /proc/cpuinfo > "$OUTPUT_DIR/cpu_info.txt" cat /proc/meminfo > "$OUTPUT_DIR/memory_info.txt" Basic statistics echo "Collecting basic statistics..." perf stat -d -o "$OUTPUT_DIR/basic_stats.txt" "$APPLICATION" Detailed profiling echo "Recording detailed profile..." perf record -g -F 1000 -o "$OUTPUT_DIR/detailed.data" "$APPLICATION" Cache analysis echo "Analyzing cache performance..." perf record -e cache-references,cache-misses,L1-dcache-loads,L1-dcache-load-misses \ -o "$OUTPUT_DIR/cache.data" "$APPLICATION" Generate reports echo "Generating reports..." perf report -i "$OUTPUT_DIR/detailed.data" --stdio > "$OUTPUT_DIR/detailed_report.txt" perf report -i "$OUTPUT_DIR/cache.data" --stdio > "$OUTPUT_DIR/cache_report.txt" echo "Profiling complete. Results in $OUTPUT_DIR/" ``` Performance Analysis Guidelines Interpreting Results: - CPU Utilization: Look for low instructions-per-cycle (IPC) ratios - Cache Performance: High cache miss rates indicate memory bottlenecks - Branch Prediction: High branch miss rates suggest unpredictable code paths - Context Switches: Frequent switches may indicate synchronization issues Common Performance Patterns: ```bash Identify CPU-bound vs memory-bound workloads perf stat -e cycles,instructions,cache-references,cache-misses ./myprogram Calculate key metrics: IPC = instructions / cycles (higher is better, typically 0.5-2.0) Cache miss rate = cache-misses / cache-references (lower is better, <5%) Branch prediction rate = 1 - (branch-misses / branches) (higher is better, >95%) ``` Integration with Development Workflow Continuous Performance Monitoring: ```bash Add performance regression testing to CI/CD perf record -o ci_profile.data ./test_suite perf diff baseline_profile.data ci_profile.data --percent-limit 5 ``` Code Review Integration: - Include performance profiles in code reviews - Set performance budgets for critical functions - Automate performance regression detection Security Considerations Protect Sensitive Information: ```bash Avoid profiling sensitive applications in production Use sampling to reduce data exposure perf record -F 10 ./sensitive_app # Low frequency sampling Sanitize profile data before sharing perf script | sed 's/sensitive_function/REDACTED/g' > sanitized_profile.txt ``` Access Control: - Limit perf access to authorized personnel - Use dedicated profiling environments - Implement audit logging for profiling activities Conclusion Performance profiling with Linux perf tools provides invaluable insights into application behavior and system performance characteristics. Through this comprehensive guide, you've learned to effectively use `perf stat` for basic performance measurement, `perf record` for detailed data collection, and `perf report` for thorough analysis. Key Takeaways Essential Skills Acquired: - Understanding performance counters and their significance - Configuring and executing performance measurements with perf stat - Capturing detailed profiling data using perf record - Analyzing and interpreting results with perf report - Troubleshooting common profiling challenges - Implementing professional profiling practices Performance Optimization Process: 1. Measure: Use perf stat to identify performance characteristics 2. Profile: Apply perf record to capture detailed execution data 3. Analyze: Employ perf report to understand performance bottlenecks 4. Optimize: Make targeted improvements based on profiling insights 5. Validate: Verify optimization effectiveness through comparative profiling Next Steps Advanced Topics to Explore: - Custom performance event development - Hardware-specific profiling features - Integration with other profiling tools (Valgrind, Intel VTune, etc.) - Performance monitoring in production environments - Automated performance regression detection Recommended Practice: - Start with simple applications to build profiling skills - Create a library of profiling scripts for common scenarios - Establish performance baselines for critical applications - Integrate profiling into your regular development workflow Resources for Continued Learning: - Linux kernel perf documentation - Processor vendor optimization manuals - Performance engineering communities and conferences - Open-source performance analysis tools and frameworks By mastering these perf profiling techniques, you've gained powerful capabilities for understanding, analyzing, and optimizing software performance. These skills will prove invaluable whether you're developing high-performance applications, troubleshooting system issues, or conducting performance research. Remember that effective profiling is both an art and a science – combine these technical tools with systematic methodology and domain expertise to achieve optimal results. The journey to performance optimization excellence continues beyond this guide. Apply these techniques consistently, stay curious about performance characteristics, and always validate your optimizations with thorough measurement. Your applications and users will benefit significantly from the performance insights and improvements you can now confidently deliver.