Linux - Linux Performance Profiling with perf and eBPF

Linux systems often run a variety of applications simultaneously, making performance optimization an important responsibility for system administrators and developers. When a system becomes slow or consumes excessive CPU, memory, or disk resources, identifying the root cause can be challenging. Linux provides advanced performance profiling tools such as perf and eBPF (Extended Berkeley Packet Filter) that help analyze system behavior in real time. These tools allow administrators to observe how the kernel and applications interact, making it easier to identify bottlenecks and optimize system performance.

Unlike basic monitoring tools that only display current resource usage, performance profiling tools collect detailed information about function execution, kernel events, system calls, hardware performance counters, and application behavior. This enables deep analysis without significantly affecting system performance.

Understanding Linux Performance Profiling

Performance profiling is the process of collecting detailed runtime information about a system or application to determine where processing time and system resources are being used. Instead of guessing why a program is slow, profiling provides measurable data that helps locate inefficient code, excessive system calls, scheduling delays, or hardware limitations.

Profiling helps answer questions such as:

Which function consumes the most CPU time?
Why is an application responding slowly?
Which processes generate excessive disk activity?
What system calls are executed most frequently?
Are kernel operations causing delays?
Is network traffic affecting application performance?

Linux performance profiling provides answers by observing the operating system while it is actively running.

Introduction to perf

The perf utility is an official Linux performance analysis tool included with the Linux kernel. It uses hardware performance counters and kernel tracing mechanisms to collect detailed performance statistics.

Perf works with modern processors to measure events such as:

CPU cycles
Instructions executed
Cache hits and cache misses
Branch prediction accuracy
Context switches
System calls
Page faults
Memory accesses
Scheduler activity

Perf can analyze both user-space applications and kernel-space operations.

Features of perf

Some important capabilities of perf include:

CPU Performance Analysis

Perf measures how efficiently the processor executes instructions.

Example:

perf stat ls

This command displays statistics including:

CPU cycles
Number of executed instructions
Cache references
Cache misses
Branch instructions

These values help determine processor efficiency.

Function-Level Profiling

Perf identifies which functions consume the most execution time.

Example:

perf record ./application

After execution:

perf report

The report lists functions ordered by CPU usage.

Example:

40% process_data()
25% sort_records()
15% write_output()

Developers can immediately identify which functions require optimization.

Sampling-Based Profiling

Perf periodically samples running processes rather than recording every instruction.

Advantages include:

Low overhead
Minimal performance impact
Suitable for production environments
Long-duration monitoring

Sampling provides an accurate picture of application behavior without slowing the system significantly.

Hardware Performance Counters

Modern processors include hardware counters that measure low-level processor activity.

Examples include:

CPU cycles
Executed instructions
Cache misses
Cache references
Branch mispredictions
Bus cycles

Perf accesses these counters directly.

Example:

perf stat ./program

Sample output:

CPU cycles
Instructions
Cache misses
Context switches
Page faults

These metrics help identify hardware-related performance issues.

Kernel Profiling

Linux kernel operations influence application performance.

Perf profiles:

Scheduler activity
Interrupt handling
Device drivers
Memory management
File systems
Network stack

Administrators can determine whether performance issues originate in user applications or within the kernel itself.

Call Graph Analysis

A call graph shows how functions invoke one another during execution.

Example:

main()
 |
 |-- read_file()
 |
 |-- process_data()
       |
       |-- calculate()
       |
       |-- compress()

Perf generates call graphs showing where execution time is spent throughout the application's function hierarchy.

Command:

perf record -g ./application

Then:

perf report

The report displays nested function calls with CPU usage percentages.

Flame Graphs

Flame graphs provide a visual representation of CPU usage across function call stacks. They make it easy to identify performance bottlenecks by displaying frequently executed functions as wider blocks.

Characteristics of flame graphs include:

Wider blocks represent functions consuming more CPU time.
Stacked blocks show the sequence of function calls.
Developers can quickly locate hotspots in complex applications.

Flame graphs are commonly generated using perf data and external visualization tools.

Introduction to eBPF

Extended Berkeley Packet Filter (eBPF) is a modern Linux kernel technology that allows safe execution of custom programs inside the kernel without modifying kernel source code.

Originally designed for network packet filtering, eBPF has evolved into a powerful framework for:

Performance analysis
System tracing
Networking
Security monitoring
Observability

eBPF enables administrators to collect detailed runtime information with minimal overhead.

How eBPF Works

The process involves several steps:

A user creates an eBPF program.
The program is verified by the kernel to ensure safety.
The verified program is loaded into the kernel.
It attaches to specific events, such as system calls, network activity, or function execution.
Data is collected whenever those events occur.
Results are sent back to user-space applications for analysis.

This architecture allows powerful monitoring while maintaining system stability.

Safety Features of eBPF

Since eBPF programs run within the kernel, Linux enforces strict safety measures:

Code verification before execution
Prevention of infinite loops (or only bounded loops where supported)
Memory safety checks
Restricted kernel access
Controlled execution time

These safeguards ensure that eBPF programs cannot crash or compromise the kernel.

Common Uses of eBPF

System Call Tracing

Every application interacts with the kernel using system calls.

Examples include:

open()
read()
write()
close()
fork()
execve()

eBPF can monitor how often these calls occur and how long they take.

CPU Profiling

eBPF identifies which functions consume processor time, helping developers locate inefficient code.

Memory Analysis

eBPF tracks:

Memory allocations
Memory leaks
Page faults
Buffer usage
Heap activity

This information is valuable for optimizing memory-intensive applications.

Network Monitoring

eBPF provides detailed insights into:

Packet flow
Connection latency
TCP retransmissions
Bandwidth usage
Packet drops
Firewall behavior

Network administrators use eBPF for troubleshooting and performance tuning.

Disk I/O Monitoring

eBPF measures:

Read latency
Write latency
Queue depth
Block device utilization
File system delays

These metrics help diagnose storage bottlenecks.

Popular eBPF Tools

Several tools built on eBPF simplify performance analysis:

bpftrace

A high-level tracing language for creating performance analysis scripts.

Example:

bpftrace -e 'tracepoint:syscalls:sys_enter_openat { printf("File opened\n"); }'

BCC (BPF Compiler Collection)

BCC provides a collection of ready-to-use tracing tools and Python libraries for building custom eBPF applications.

Common BCC tools include:

execsnoop
opensnoop
biosnoop
fileslower
tcptop
runqlat

bpftool

An official Linux utility used to inspect and manage eBPF programs, maps, and related objects.

Example:

bpftool prog show

This command lists all loaded eBPF programs.

Comparing perf and eBPF

Feature	perf	eBPF
CPU profiling	Yes	Yes
Kernel tracing	Yes	Yes
System call monitoring	Limited	Extensive
Network tracing	Basic	Advanced
Custom event monitoring	Limited	Extensive
Runtime flexibility	Moderate	High
Performance overhead	Low	Very Low
Custom programmability	Limited	Excellent

Benefits of Using perf and eBPF

Using perf and eBPF offers several advantages:

Identifies CPU bottlenecks accurately.
Detects inefficient algorithms and functions.
Monitors applications with minimal overhead.
Provides deep visibility into kernel operations.
Supports real-time performance analysis.
Helps optimize memory and disk usage.
Improves network performance troubleshooting.
Reduces application response times.
Enables proactive detection of performance issues.
Assists in capacity planning and system tuning.

Best Practices

To achieve effective performance profiling:

Begin with basic monitoring tools to identify general resource usage before using advanced profilers.
Use perf stat for overall performance metrics and perf record with perf report for detailed CPU analysis.
Employ eBPF tools for real-time tracing of system calls, networking, and kernel events.
Profile systems under realistic workloads to capture representative data.
Compare profiling results before and after optimizations to verify improvements.
Avoid collecting unnecessary trace data, as it can make analysis more complex.
Keep the Linux kernel and profiling tools updated to benefit from the latest performance features and enhancements.

Conclusion

Linux Performance Profiling with perf and eBPF provides a comprehensive approach to understanding how applications and the operating system use system resources. While perf excels at measuring CPU performance, function execution, and hardware events, eBPF extends observability by enabling safe, programmable tracing of kernel and application behavior in real time. Together, these tools allow administrators and developers to diagnose performance bottlenecks, optimize applications, improve system reliability, and maintain high-performance Linux environments with minimal overhead.