Introduction to Open Source Performance Tool --Linux Tool Perf Yiqi Ju (Fred) Sep. 13, 2012 Task 07/09~09/14 Verizon Box Embedded System Software Environment Open Source Performance Tools Kernel Profiling Kernel Profiling? Collect and analyze kernel space system-wide resource statistic HW trend—increasing core numbers SW performance—find the bottleneck Solution—full use of available tools Available Tools Top(on board)/htop --real-time monitoring Sysstat utilities --sar, iostat (on board), vmstat… SS—socket statistics Lttng—kernel tracing Perf—counting and sampling … Perf Tool Perf_event kernel interface Linux kernel subsystem, merged into v2.6.31 and after Perf_event Kernel Interface Performance counter—hardware counter, no bother register, often called PMU (Performance Measurement Unit) Event-oriented API—do not use HW register but relies on PMU ready CPUs Support Events grouping, measure simultaneously Source: Perf File Format, Urs Fassler. CERN openlab Sampling Perf record initializes sampling through perf_event interface Create blank mmap pages to kernel space Kernel writes record and send back to perf, perf record *.data file and save to current directory Sampling cont. Blank mmap pages generated through perf_events Written mmap page Source: Perf File Format, Urs Fassler. CERN openlab Advantage Low overhead—compare to instrumenting profiling Fast—counting is done at the time the load is off, even cannot tell delays Bunch of usages, provides much information Perf usage metro-root-perf_record> perf usage: perf [--version] [--help] COMMAND [ARGS] The most commonly used perf commands are: annotate Read perf.data (created by perf record) and display annotated code diff Read two perf.data files and display the differential profile list List all symbolic event types lock Analyze lock events probe Define new dynamic tracepoints record Run a command and record its profile into perf.data report Read perf.data (created by perf record) and display the profile sched Tool to trace/measure scheduler properties (latencies) stat Run a command and gather performance counter statistics timechart Tool to visualize total system behavior during a workload top System profiling tool. trace Read perf.data (created by perf record) and display trace output … List of Events List of pre-defined events (to be used in -e): cpu-cycles OR cycles instructions cache-references cache-misses branch-instructions OR branches branch-misses bus-cycles [Hardware event] [Hardware event] [Hardware event] [Hardware event] [Hardware event] [Hardware event] [Hardware event] cpu-clock task-clock page-faults OR faults minor-faults major-faults context-switches OR cs cpu-migrations OR migrations alignment-faults emulation-faults [Software event] [Software event] [Software event] [Software event] [Software event] [Software event] [Software event] [Software event] [Software event] L1-dcache-loads L1-dcache-load-misses L1-dcache-stores L1-dcache-store-misses L1-dcache-prefetches L1-dcache-prefetch-misses L1-icache-loads L1-icache-load-misses L1-icache-prefetches L1-icache-prefetch-misses LLC-loads LLC-load-misses LLC-stores LLC-store-misses LLC-prefetches LLC-prefetch-misses dTLB-loads dTLB-load-misses dTLB-stores dTLB-store-misses dTLB-prefetches dTLB-prefetch-misses iTLB-loads iTLB-load-misses branch-loads branch-load-misses … [Hardware cache event] [Hardware cache event] [Hardware cache event] [Hardware cache event] [Hardware cache event] [Hardware cache event] [Hardware cache event] [Hardware cache event] [Hardware cache event] [Hardware cache event] [Hardware cache event] [Hardware cache event] [Hardware cache event] [Hardware cache event] [Hardware cache event] [Hardware cache event] [Hardware cache event] [Hardware cache event] [Hardware cache event] [Hardware cache event] [Hardware cache event] [Hardware cache event] [Hardware cache event] [Hardware cache event] [Hardware cache event] [Hardware cache event] Perf stat metro-root-perf_record> perf stat -e L1-dcache-loads -e L1-dcache-load-misses -e dTLB-loads -e dTLB-load-misses -e L1-icache-loads -e L1-icache-misses start_appli Start_appli… Performance counter stats for 'start_appli': 354543239 <not counted> 507073444 305313 2303127335 7994049 L1-dcache-loads L1-dcache-load-misses dTLB-loads dTLB-load-misses L1-icache-loads L1-icache-load-misses (scaled from 80.54%) (scaled from 83.87%) (scaled from 83.89%) (scaled from 83.80%) (scaled from 84.33%) 74.850334944 seconds time elapsed ----(Data from mt2179, P1.0 board, 12:25AM, 9/12/2012) missrate: 0.0602% missrate: 0.347% Perf stat cont. metro-root-perf_record> perf stat -e dTLB-loads -e dTLB-load-misses -e L1icache-loads -e L1-icache-misses start_appli … Performance counter stats for 'start_appli': 534611783 dTLB-loads 308219 dTLB-load-misses 2375996954 L1-icache-loads 7810360 L1-icache-load-misses missrate: 0.0577% missrate: 0.329% 55.029461151 seconds time elapsed ----(Data collected from mt2179, P1.0 board, 12:35PM, 9/12/2012) Perf record/report metro-root-perf_record> perf record -F 3000 -o startapp.data start_appli … [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.560 MB startapp.data (~24470 samples) ] … metro-root-perf_record> perf report -i startapp.data > startapp.txt (Data collected from mt2179, P1.0 board, 12:35PM, 9/12/2012) Perf diff metro-root-perf_record> perf diff lsactive.data lslactive.data (Data collected from mt2179, P1.0 board, 12:35PM, 9/12/2012) More on future Perf timechart—visualize total system behavior in time sequence Perf trace—enable script tracing, Perl support from 2.6.33-rc, Python support patches available Perf annotate—source code allocation Perf event converter, web-based GUI enable remote profiling Source: Scripting support for perf. Jake Edge, Feb 10, 2010 References Perf_event project http://web.eecs.utk.edu/~vweaver1/projects/perfevents/index.html Perf File Format by CERN openlab http://openlab.web.cern.ch/sites/openlab.web.cern.ch/file s/technical_documents/Urs_Fassler_report.pdf Perf wiki https://perf.wiki.kernel.org/index.php perf_events status update by Stephane Eranian, Google, Inc. Kenel mailing list http://lwn.net/Articles/373842/