IntroToPerf - WordPress.com

advertisement
Introduction to Open Source
Performance Tool
--Linux Tool Perf
Yiqi Ju (Fred)
Sep. 13, 2012
Task
07/09~09/14
 Verizon Box
 Embedded System
 Software Environment
 Open Source Performance Tools
 Kernel Profiling

Kernel Profiling?
Collect and analyze kernel space
system-wide resource statistic
 HW trend—increasing core numbers
 SW performance—find the bottleneck
 Solution—full use of available tools

Available Tools
Top(on board)/htop
--real-time monitoring
 Sysstat utilities
--sar, iostat (on board), vmstat…
 SS—socket statistics
 Lttng—kernel tracing
 Perf—counting and sampling
…

Perf Tool
Perf_event kernel interface
 Linux kernel subsystem, merged into
v2.6.31 and after

Perf_event Kernel Interface



Performance counter—hardware counter, no bother
register, often called PMU (Performance Measurement
Unit)
Event-oriented API—do not use HW register but relies on
PMU ready CPUs
Support Events grouping, measure simultaneously
Source: Perf File Format, Urs Fassler. CERN openlab
Sampling
Perf record initializes sampling
through perf_event interface
 Create blank mmap pages to kernel
space
 Kernel writes record and send back to
perf, perf record *.data file and save to
current directory

Sampling cont.
Blank mmap
pages generated
through
perf_events
Written mmap
page
Source: Perf File Format, Urs Fassler. CERN openlab
Advantage
Low overhead—compare to
instrumenting profiling
 Fast—counting is done at the time
the load is off, even cannot tell
delays
 Bunch of usages, provides much
information

Perf usage
metro-root-perf_record> perf
usage: perf [--version] [--help] COMMAND [ARGS]
The most commonly used perf commands are:
annotate
Read perf.data (created by perf record) and display annotated code
diff
Read two perf.data files and display the differential profile
list
List all symbolic event types
lock
Analyze lock events
probe
Define new dynamic tracepoints
record
Run a command and record its profile into perf.data
report
Read perf.data (created by perf record) and display the profile
sched
Tool to trace/measure scheduler properties (latencies)
stat
Run a command and gather performance counter statistics
timechart
Tool to visualize total system behavior during a workload
top
System profiling tool.
trace
Read perf.data (created by perf record) and display trace output
…
List of Events
List of pre-defined events (to be used in -e):
cpu-cycles OR cycles
instructions
cache-references
cache-misses
branch-instructions OR branches
branch-misses
bus-cycles
[Hardware event]
[Hardware event]
[Hardware event]
[Hardware event]
[Hardware event]
[Hardware event]
[Hardware event]
cpu-clock
task-clock
page-faults OR faults
minor-faults
major-faults
context-switches OR cs
cpu-migrations OR migrations
alignment-faults
emulation-faults
[Software event]
[Software event]
[Software event]
[Software event]
[Software event]
[Software event]
[Software event]
[Software event]
[Software event]
L1-dcache-loads
L1-dcache-load-misses
L1-dcache-stores
L1-dcache-store-misses
L1-dcache-prefetches
L1-dcache-prefetch-misses
L1-icache-loads
L1-icache-load-misses
L1-icache-prefetches
L1-icache-prefetch-misses
LLC-loads
LLC-load-misses
LLC-stores
LLC-store-misses
LLC-prefetches
LLC-prefetch-misses
dTLB-loads
dTLB-load-misses
dTLB-stores
dTLB-store-misses
dTLB-prefetches
dTLB-prefetch-misses
iTLB-loads
iTLB-load-misses
branch-loads
branch-load-misses
…
[Hardware cache event]
[Hardware cache event]
[Hardware cache event]
[Hardware cache event]
[Hardware cache event]
[Hardware cache event]
[Hardware cache event]
[Hardware cache event]
[Hardware cache event]
[Hardware cache event]
[Hardware cache event]
[Hardware cache event]
[Hardware cache event]
[Hardware cache event]
[Hardware cache event]
[Hardware cache event]
[Hardware cache event]
[Hardware cache event]
[Hardware cache event]
[Hardware cache event]
[Hardware cache event]
[Hardware cache event]
[Hardware cache event]
[Hardware cache event]
[Hardware cache event]
[Hardware cache event]
Perf stat
metro-root-perf_record> perf stat -e L1-dcache-loads -e L1-dcache-load-misses
-e dTLB-loads -e dTLB-load-misses -e L1-icache-loads -e L1-icache-misses
start_appli
Start_appli…
Performance counter stats for 'start_appli':
354543239
<not counted>
507073444
305313
2303127335
7994049
L1-dcache-loads
L1-dcache-load-misses
dTLB-loads
dTLB-load-misses
L1-icache-loads
L1-icache-load-misses
(scaled from 80.54%)
(scaled from 83.87%)
(scaled from 83.89%)
(scaled from 83.80%)
(scaled from 84.33%)
74.850334944 seconds time elapsed
----(Data from mt2179, P1.0 board, 12:25AM, 9/12/2012)
missrate: 0.0602%
missrate: 0.347%
Perf stat cont.
metro-root-perf_record> perf stat -e dTLB-loads -e dTLB-load-misses -e L1icache-loads -e L1-icache-misses start_appli
…
Performance counter stats for 'start_appli':
534611783 dTLB-loads
308219 dTLB-load-misses
2375996954 L1-icache-loads
7810360 L1-icache-load-misses
missrate: 0.0577%
missrate: 0.329%
55.029461151 seconds time elapsed
----(Data collected from mt2179, P1.0 board, 12:35PM, 9/12/2012)
Perf record/report
metro-root-perf_record> perf record -F 3000 -o startapp.data start_appli
…
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.560 MB startapp.data (~24470 samples) ]
…
metro-root-perf_record> perf report -i startapp.data > startapp.txt
(Data collected from mt2179, P1.0 board, 12:35PM, 9/12/2012)
Perf diff
metro-root-perf_record> perf diff lsactive.data lslactive.data
(Data collected from mt2179, P1.0 board, 12:35PM, 9/12/2012)
More on future
Perf timechart—visualize total system
behavior in time sequence
 Perf trace—enable script tracing, Perl
support from 2.6.33-rc, Python
support patches available
 Perf annotate—source code allocation
 Perf event converter, web-based GUI
enable remote profiling

Source: Scripting support for perf. Jake Edge, Feb 10, 2010
References





Perf_event project
http://web.eecs.utk.edu/~vweaver1/projects/perfevents/index.html
Perf File Format by CERN openlab
http://openlab.web.cern.ch/sites/openlab.web.cern.ch/file
s/technical_documents/Urs_Fassler_report.pdf
Perf wiki
https://perf.wiki.kernel.org/index.php
perf_events status update by Stephane Eranian,
Google, Inc.
Kenel mailing list
http://lwn.net/Articles/373842/
Download