Linux - Linux Performance Profiling with perf and eBPF
Linux systems often run a variety of applications simultaneously, making performance optimization an important responsibility for system administrators and developers. When a system becomes slow or consumes excessive CPU, memory, or disk resources, identifying the root cause can be challenging. Linux provides advanced performance profiling tools such as perf and eBPF (Extended Berkeley Packet Filter) that help analyze system behavior in real time. These tools allow administrators to observe how the kernel and applications interact, making it easier to identify bottlenecks and optimize system performance.
Unlike basic monitoring tools that only display current resource usage, performance profiling tools collect detailed information about function execution, kernel events, system calls, hardware performance counters, and application behavior. This enables deep analysis without significantly affecting system performance.
Understanding Linux Performance Profiling
Performance profiling is the process of collecting detailed runtime information about a system or application to determine where processing time and system resources are being used. Instead of guessing why a program is slow, profiling provides measurable data that helps locate inefficient code, excessive system calls, scheduling delays, or hardware limitations.
Profiling helps answer questions such as:
-
Which function consumes the most CPU time?
-
Why is an application responding slowly?
-
Which processes generate excessive disk activity?
-
What system calls are executed most frequently?
-
Are kernel operations causing delays?
-
Is network traffic affecting application performance?
Linux performance profiling provides answers by observing the operating system while it is actively running.
Introduction to perf
The perf utility is an official Linux performance analysis tool included with the Linux kernel. It uses hardware performance counters and kernel tracing mechanisms to collect detailed performance statistics.
Perf works with modern processors to measure events such as:
-
CPU cycles
-
Instructions executed
-
Cache hits and cache misses
-
Branch prediction accuracy
-
Context switches
-
System calls
-
Page faults
-
Memory accesses
-
Scheduler activity
Perf can analyze both user-space applications and kernel-space operations.
Features of perf
Some important capabilities of perf include:
CPU Performance Analysis
Perf measures how efficiently the processor executes instructions.
Example:
perf stat ls
This command displays statistics including:
-
CPU cycles
-
Number of executed instructions
-
Cache references
-
Cache misses
-
Branch instructions
These values help determine processor efficiency.
Function-Level Profiling
Perf identifies which functions consume the most execution time.
Example:
perf record ./application
After execution:
perf report
The report lists functions ordered by CPU usage.
Example:
40% process_data()
25% sort_records()
15% write_output()
Developers can immediately identify which functions require optimization.
Sampling-Based Profiling
Perf periodically samples running processes rather than recording every instruction.
Advantages include:
-
Low overhead
-
Minimal performance impact
-
Suitable for production environments
-
Long-duration monitoring
Sampling provides an accurate picture of application behavior without slowing the system significantly.
Hardware Performance Counters
Modern processors include hardware counters that measure low-level processor activity.
Examples include:
-
CPU cycles
-
Executed instructions
-
Cache misses
-
Cache references
-
Branch mispredictions
-
Bus cycles
Perf accesses these counters directly.
Example:
perf stat ./program
Sample output:
CPU cycles
Instructions
Cache misses
Context switches
Page faults
These metrics help identify hardware-related performance issues.
Kernel Profiling
Linux kernel operations influence application performance.
Perf profiles:
-
Scheduler activity
-
Interrupt handling
-
Device drivers
-
Memory management
-
File systems
-
Network stack
Administrators can determine whether performance issues originate in user applications or within the kernel itself.
Call Graph Analysis
A call graph shows how functions invoke one another during execution.
Example:
main()
|
|-- read_file()
|
|-- process_data()
|
|-- calculate()
|
|-- compress()
Perf generates call graphs showing where execution time is spent throughout the application's function hierarchy.
Command:
perf record -g ./application
Then:
perf report
The report displays nested function calls with CPU usage percentages.
Flame Graphs
Flame graphs provide a visual representation of CPU usage across function call stacks. They make it easy to identify performance bottlenecks by displaying frequently executed functions as wider blocks.
Characteristics of flame graphs include:
-
Wider blocks represent functions consuming more CPU time.
-
Stacked blocks show the sequence of function calls.
-
Developers can quickly locate hotspots in complex applications.
Flame graphs are commonly generated using perf data and external visualization tools.
Introduction to eBPF
Extended Berkeley Packet Filter (eBPF) is a modern Linux kernel technology that allows safe execution of custom programs inside the kernel without modifying kernel source code.
Originally designed for network packet filtering, eBPF has evolved into a powerful framework for:
-
Performance analysis
-
System tracing
-
Networking
-
Security monitoring
-
Observability
eBPF enables administrators to collect detailed runtime information with minimal overhead.
How eBPF Works
The process involves several steps:
-
A user creates an eBPF program.
-
The program is verified by the kernel to ensure safety.
-
The verified program is loaded into the kernel.
-
It attaches to specific events, such as system calls, network activity, or function execution.
-
Data is collected whenever those events occur.
-
Results are sent back to user-space applications for analysis.
This architecture allows powerful monitoring while maintaining system stability.
Safety Features of eBPF
Since eBPF programs run within the kernel, Linux enforces strict safety measures:
-
Code verification before execution
-
Prevention of infinite loops (or only bounded loops where supported)
-
Memory safety checks
-
Restricted kernel access
-
Controlled execution time
These safeguards ensure that eBPF programs cannot crash or compromise the kernel.
Common Uses of eBPF
System Call Tracing
Every application interacts with the kernel using system calls.
Examples include:
-
open()
-
read()
-
write()
-
close()
-
fork()
-
execve()
eBPF can monitor how often these calls occur and how long they take.
CPU Profiling
eBPF identifies which functions consume processor time, helping developers locate inefficient code.
Memory Analysis
eBPF tracks:
-
Memory allocations
-
Memory leaks
-
Page faults
-
Buffer usage
-
Heap activity
This information is valuable for optimizing memory-intensive applications.
Network Monitoring
eBPF provides detailed insights into:
-
Packet flow
-
Connection latency
-
TCP retransmissions
-
Bandwidth usage
-
Packet drops
-
Firewall behavior
Network administrators use eBPF for troubleshooting and performance tuning.
Disk I/O Monitoring
eBPF measures:
-
Read latency
-
Write latency
-
Queue depth
-
Block device utilization
-
File system delays
These metrics help diagnose storage bottlenecks.
Popular eBPF Tools
Several tools built on eBPF simplify performance analysis:
bpftrace
A high-level tracing language for creating performance analysis scripts.
Example:
bpftrace -e 'tracepoint:syscalls:sys_enter_openat { printf("File opened\n"); }'
BCC (BPF Compiler Collection)
BCC provides a collection of ready-to-use tracing tools and Python libraries for building custom eBPF applications.
Common BCC tools include:
-
execsnoop
-
opensnoop
-
biosnoop
-
fileslower
-
tcptop
-
runqlat
bpftool
An official Linux utility used to inspect and manage eBPF programs, maps, and related objects.
Example:
bpftool prog show
This command lists all loaded eBPF programs.
Comparing perf and eBPF
| Feature | perf | eBPF |
|---|---|---|
| CPU profiling | Yes | Yes |
| Kernel tracing | Yes | Yes |
| System call monitoring | Limited | Extensive |
| Network tracing | Basic | Advanced |
| Custom event monitoring | Limited | Extensive |
| Runtime flexibility | Moderate | High |
| Performance overhead | Low | Very Low |
| Custom programmability | Limited | Excellent |
Benefits of Using perf and eBPF
Using perf and eBPF offers several advantages:
-
Identifies CPU bottlenecks accurately.
-
Detects inefficient algorithms and functions.
-
Monitors applications with minimal overhead.
-
Provides deep visibility into kernel operations.
-
Supports real-time performance analysis.
-
Helps optimize memory and disk usage.
-
Improves network performance troubleshooting.
-
Reduces application response times.
-
Enables proactive detection of performance issues.
-
Assists in capacity planning and system tuning.
Best Practices
To achieve effective performance profiling:
-
Begin with basic monitoring tools to identify general resource usage before using advanced profilers.
-
Use
perf statfor overall performance metrics andperf recordwithperf reportfor detailed CPU analysis. -
Employ eBPF tools for real-time tracing of system calls, networking, and kernel events.
-
Profile systems under realistic workloads to capture representative data.
-
Compare profiling results before and after optimizations to verify improvements.
-
Avoid collecting unnecessary trace data, as it can make analysis more complex.
-
Keep the Linux kernel and profiling tools updated to benefit from the latest performance features and enhancements.
Conclusion
Linux Performance Profiling with perf and eBPF provides a comprehensive approach to understanding how applications and the operating system use system resources. While perf excels at measuring CPU performance, function execution, and hardware events, eBPF extends observability by enabling safe, programmable tracing of kernel and application behavior in real time. Together, these tools allow administrators and developers to diagnose performance bottlenecks, optimize applications, improve system reliability, and maintain high-performance Linux environments with minimal overhead.