LISA_Sol_Linux_Perf - Computer Measurement Group

advertisement
Solaris/Linux Performance
Measurement and Tuning
Adrian Cockcroft, acockcroft@netflix.com
March 21, 2016
2008
3/21/2016
Page 1
Abstract
• This course focuses on the measurement sources and
tuning parameters available in Unix and Linux, including
TCP/IP measurement and tuning, complex storage
subsystems, and with a deep dive on advanced Solaris
metrics such as microstates and extended system
accounting.
• The meaning and behavior of metrics is covered in detail.
Common fallacies, misleading indicators, sources of
measurement error and other traps for the unwary will be
exposed.
• Free tools for Capacity Planning are covered in detail by
this presenter in a separate Usenix Workshop.
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 2
Sources
• Adrian Cockcroft
–
–
–
–
Sun Microsystems 1988-2004, Distinguished Engineer
eBay Research Labs 2004-2007, Distinguished Engineer
Netflix 2007, Director - Web Engineering
Note: I am a Netflix employee, but this material does not refer to and is not endorsed by
Netflix. It is based on the author's work over the last 20 years.
• CMG Papers and Sunday Workshops by the author - see www.cmg.org
–
–
–
–
–
Unix CPU Time Measurement Errors - (Best paper 1998)
TCP/IP Tutorial - Sunday Workshop
Capacity Planning - Sunday Workshop
Grid Tutorial - Sunday Workshop
Capacity Planning with Free Tools - Sunday Workshop
• Books by the author
– Sun Performance and Tuning, Prentice Hall, 1994, 1998 (2nd Ed)
– Resource Management, Prentice Hall, 2000
– Capacity Planning for Internet Services, Prentice Hall, 2001
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 3
Contents
• Capacity Planning Definitions
• Metric collection interfaces
• Process - microstate and extended accounting
• CPU - measurement issues
• Network - Internet Servers and TCP/IP
• Disks - iostat, simple disks and RAID
• Memory
• Quick tips and Recipes
• References
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 4
Definitions
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 5
Capacity Planning Definitions
•
Capacity
– Resource utilization and headroom
•
Planning
– Predicting future needs by analyzing historical data and
modeling future scenarios
•
Performance Monitoring
– Collecting and reporting on performance data
•
Unix/Linux (apologies to users of OSX, HP-UX, AIX etc.)
– Emphasis on Solaris since it is a comprehensively
instrumented and full featured Unix
– Linux is mostly a subset
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 6
Measurement Terms and Definitions
• Bandwidth - gross work per unit time [unattainable]
• Throughput - net work per unit time
• Peak throughput - at maximum acceptable response time
• Response time - time to complete a unit of work including waiting
• Service time - time to process a unit of work after waiting
• Queue length - number of requests waiting
• Utilization - busy time relative to elapsed time [can be misleading]
• Rule of thumb: Estimate 95th percentile response time as three
times mean response time
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 7
Capacity Planning Requirements
• We care about CPU, Memory, Network and Disk resources, and
Application response times
• We need to know how much of each resource we are using
now, and will use in the future
• We need to know how much headroom we have to handle
higher loads
• We want to understand how headroom varies, and how it relates
to application response times and throughput
• We want to be able to find the bottleneck in an under-performing
system
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 8
Metrics
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 9
Measurement Data Interfaces
• Several generic raw access methods
–
–
–
–
–
–
Read the kernel directly
Structured system data
Process data
Network data
Accounting data
Application data
• Command based data interfaces
– Scrape data from vmstat, iostat, netstat, sar, ps
– Higher overhead, lower resolution, missing metrics
• Data available is platform and release specific either way
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 10
Reading kernel memory - kvm
•
The only way to get data in very old Unix variants
•
Use kernel namelist symbol table and open /dev/kmem
•
Solaris wraps up interface in kvm library
•
Advantages
– Still the only way to get at some kinds of data
– Low overhead, fast bulk data capture
•
Disadvantages
– Too much intimate implementation detail exposed
– No locking protection to ensure consistent data
– Highly non-portable, unstable over releases and patches
– Tools break when kernel moves between 32 and 64bit address support
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 11
Structured Kernel Statistics - kstat
•
Solaris 2 introduced kstat and extended usage in each release
•
Used by Solaris 2 vmstat, iostat, sar, network interface stats, etc.
•
Advantages
– The recommended and supported Solaris metric access API
– Does not require setuid root commands to access for reads
– Individual named metrics stable over releases
– Consistent data using locking, but low overhead
– Unchanged when kernel moves to 64bit address support
– Extensible to add metrics without breaking existing code
•
Disadvantages
– Somewhat complex hierarchical kstat_chain structure
– State changes (device online/offline) cause kstat_chain rebuild
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 12
Kernel Trace - TNF, Dtrace, ktrace
•
Solaris, Linux, Windows and other Unixes have similar features
– Solaris has TNF probes and prex command to control them
– User level probe library for hires tracepoints allows instrumentation of
multithreaded applications
– Kernel level probes allow disk I/O and scheduler tracing
•
Advantages
– Low overhead, microsecond resolution
– I/O trace capability is extremely useful
•
Disadvantages
– Too much data to process with simple tracing capabilities
– Trace buffer can overflow or cause locking issues
•
2008
Solaris 10 Dtrace is a quite different beast! Much more flexible
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 13
Dtrace – Dynamic Tracing
•
One of the most exiting new features in Solaris 10, rave reviews
•
Book: "Solaris Performance and Tools" by Richard McDougall and Brendan Gregg
•
Advantages
– No overhead when it is not in use
– Low overhead probes can be put anywhere/everywhere
– Trace data is correlated and filtered at source, get exactly the data you want,
very sophisticated data providers included
– Bundled, supported, designed to be safe for production systems
•
Disadvantages
– Solaris specific, but being ported to BSD/Linux
– No high level tools support yet
– Yet another scripting language to learn – somewhat similar to “awk”
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 14
Hardware counters
•
Solaris cpustat for X86 and UltraSPARC pipeline and cache counters
•
Solaris busstat for server backplanes and I/O buses, corestat for multi-core systems
•
Intel Trace Collector, Vampir for Linux
•
Most modern CPUs and systems have counters
•
Advantages
– See what is really happening, more accurate than kernel stats
– Cache usage useful for tuning code algorithms
– Pipeline usage useful for HPC tuning for megaflops
– Backplane and memory bank usage useful for database servers
•
Disadvantages
– Raw data is confusing, lots of architectural background info needed
– Most tools focus on developer code tuning
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 15
Configuration information
•
Configuration data comes from too many sources!
– Solaris device tree displayed by prtconf and prtdiag
– Solaris 8 adds dynamic configuration notification device picld
– SunVTS component test system has vtsprobe to get config
– SCSI device info using iostat -E in Solaris
– Logical volume info from product specific vxprint and metastat
– Hardware RAID info from product specific tools
– Critical storage config info must be accessed over ethernet…
•
It is very hard to combine all this data!
•
DMTF CIM objects try to address this, but no-one seems to use them…
•
Free tool - Config Engine: http://www.cfengine.org
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 16
Application instrumentation Examples
•
Oracle V$ Tables – detailed metrics used by many tools
•
ARM standard instrumentation
•
Custom do-it-yourself and log file scraping
•
Advantages
– Focussed application specific information
– Business metrics needed to do real capacity planning
•
Disadvantages
– No common access methods
– ARM is a collection interface only, vendor specific tools, data
– Very few applications are instrumented, even fewer have support
from performance tools vendors
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 17
Kernel values, tunables and defaults
• There is often far too much emphasis on kernel tweaks
– There really are few “magic bullet” tunables
– It rarely makes a significant difference
• Fix the system configuration or tune the application instead!
• Very few adjustable components
–
–
–
–
–
–
“No user serviceable parts inside”
But Unix has so much history people think it is like a 70’s car
Solaris really is dynamic, adaptive and self-tuning
Most other “traditional Unix” tunables are just advisory limits
Tweaks may be workarounds for bugs/problems
Patch or OS release removes the problem - remove the tweak
Solaris Tunable Parameters Reference Manual (if you must…)
– http://docs.sun.com/app/docs/doc/817-0404
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 18
Processes
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 25
Process based data - /proc
•
Used by ps, proctool and debuggers, pea.se, proc(1) tools on Solaris
•
Solaris and Linux both have /proc/pid/metric hierarchy
•
Linux also includes system information in /proc rather than kstat
•
Advantages
– The recommended and supported process access API
– Metric data structures reasonably stable over releases
– Consistent data using locking
– Solaris microstate data provides accurate process state timers
•
Disadvantages
– High overhead for open/read/close for every process
– Linux reports data as ascii text, Solaris as binary structures
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 26
Tracing and profiling
• Tracing Tools
– truss - shows system calls made by a process
– sotruss / apitrace - shows shared library calls
– prex - controls TNF tracing for user and kernel code
• Profiling Tools
–
–
–
–
Compiler profile feedback using -xprofile=collect and use
Sampled profile relink using -p and prof/gprof
Function call tree profile recompile using -pg and gprof
Shared library call profiling setenv LD_PROFILE and gprof
• Accurate CPU timing for process using /usr/proc/bin/ptime
• Microstate process information using pea.se and pw.se
10:40:16 name lwmx
pid
nis_cachemgr
5
176
jre
1 17255
sendmail
1 16751
se.sparc.5.6
1 16741
imapd
1 16366
dtmail
10 16364
2008
ppid
1
3184
1
1186
198
9070
uid
0
5743
0
9506
5710
5710
usr%
1.40
11.80
1.01
5.90
6.88
0.75
sys% wait% chld%
size
rss
0.19 0.00 0.00 16320 11584
0.19 0.00 0.00 178112 110336
0.43 0.00 0.43 18624 16384
0.47 0.00 0.00 16320 14976
1.09 1.02 0.00 34048 29888
1.12 0.00 0.00 102144 94400
Solaris/Linux Performance Measurement and Tuning
pf
0.0
0.0
0.0
0.0
0.1
0.0
3/21/2016
Slide 27
Accounting Records
• Standard Unix System V Accounting - acct
– Tiny, incomplete (no process id!) low resolution, no overhead!
• Solaris Extended System and Network Accounting - exacct
– Flexible, Overly complex, Detailed data
– Interval support for recording long running processes
– No overhead! 100% capture ratio for infrequent samples!
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 28
Extracct for Solaris
• extracct tool to get extended acct data out in a useful form
• See http://perfcap.blogspot.com for description and get code from
http://www.orcaware.com/orca/pub/extracct
• Pre-compiled code for Solaris SPARC and x86. Solaris 8 to 10.
–
–
–
–
–
Useful data is logged in regular columns for easy import
Includes low overhead network accounting config file for TCP flows
Interval accounting option to force all processes to cut records
Automatic log filename generation and clean switching
Designed to run directly as a cron job, useful today
• More work needed to interface output to SE toolkit and Orca
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 29
Example Extracct Output
# ./extracct
Usage: extracct [-vwr] [ file | -a dir ]
-v: verbose
-w: wracct all processes first
-r: rotate logs
-a dir: use acctadm.conf to get input logs, and write output files to dir
The usual way to run the command will be from cron as shown
0 * * * * /opt/exdump/extracct -war /var/tmp/exacct > /dev/null 2>&1
2 * * * * /bin/find /var/adm/exacct -ctime +7 -exec rm {} \;
This also shows how to clean up old log files, I only delete the binary files in this example, and I created
/var/tmp/exacct to hold the text files. The process data in the text file looks like this:
timestamp locltime duration
1114734370 17:26:10 0.0027
1114734370 17:26:10 0.0045
1114734370 17:26:10 0.0114
1109786959 10:09:19 -1.0000
1109786959 10:09:19 -1.0000
2008
procid ppid
16527 16526
16526 16525
16525 8020
1
0
2
0
uid usr
0 0.000
0 0.000
0 0.001
0 4.311
0 0.000
sys majf
0.002 0
0.001 0
0.005 0
3.066 96
0.000 0
rwKB vcxK icxK
0.53 0.00 0.00
0.00 0.00 0.00
1.71 0.00 0.00
47504.69 49.85
0.00 0.00 0.00
sigK sycK arMB mrMB command
0.00 0.1 0.7 28.9 acctadm
0.00 0.1 1.1 28.9 sh
0.00 0.3 1.0 28.9 exdump
0.18 0.34 456.2 0.9 1.0 init
0.00 0.0 0.0 0.0 pageout
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 30
What would you say if you were asked:
How busy is that system?
A: I have no idea…
A: 10%
A: Why do you want to know?
A: I’m sorry, you don’t understand your question….
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 31
Headroom Estimation
• CPU Capacity
– Relatively easy to figure out
• Network Usage
– Use bytes not packets/s
• Memory Capacity
– Tricky - easier in Solaris 8
• Disk Capacity
– Can be very complex
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 32
Headroom
•
Headroom is available usable resources
– Total Capacity minus Peak Utilization and Margin
– Applies to usr+sys
CPU, RAM,
and OS
Period
PeakDisk
CPU forNet,
100
90
80
CPU %
70
Margin
Headroom
60
50
40
Utilization
30
20
10
0
Time
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 33
Utilization
•
Utilization is the proportion of busy time
•
Always defined over a time interval
usr+sys CPU for Peak Period
100
90
80
OnCPU and
Mean CPU Util
OnCPU Scheduling for Each CPU
0.56
0
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41
CPU %
70
Microseconds
60
50
40
30
Utilization
20
10
0
Time
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 34
Response Time
•
Response Time = Queue time + Service time
•
The Usual Assumptions…
– Steady state averages
– Random arrivals
– Constant service time
– M servers processing the same queue
•
Approximations
– Queue length = Throughput x Response Time
• (Little's Law)
– Response Time = Service Time / (1 - UtilizationM)
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 35
Response Time Curves
The traditional view of Utilization as a proxy for response time
Systems with many CPUs can run at higher utilization levels, but degrade more
rapidly when they run out of capacity
Headroom margin should be set according to a response time target.
Response Time Curves
R = S / (1 - (U%)m)
Response Time Increase Factor
10.00
9.00
8.00
One CPU
Two CPUs
Four CPUs
Eight CPUs
16 CPUs
32 CPUs
64 CPUs
7.00
6.00
5.00
Headroom
margin
4.00
3.00
2.00
1.00
0.00
0
10
20
30
40
50
60
70
80
90
100
Total System Utilization %
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 36
So what's the problem with Utilization?
•
Unsafe assumptions! Complex adaptive systems are not simple!
•
Random arrivals?
– Bursty traffic with long tail arrival rate distribution
•
Constant service time?
– Variable clock rate CPUs, inverse load dependent service time
– Complex transactions, request and response dependent
•
M servers processing the same queue?
– Virtual servers with varying non-integral concurrency
– Non-identical servers or CPUs, Hyperthreading, Multicore, NUMA
•
Measurement Errors?
– Mechanisms with built in bias, e.g. sampling from the scheduler clock
– Platform and release specific systemic changes in accounting of interrupt time
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 37
Threaded CPU Pipelines
•
CPU microarchitecture optimizations
–
–
–
•
Intel Hyperthreading
–
–
–
–
•
Each CPU core has an extra thread to use spare cycles
Typical benefit is 20%, so total capacity is 1.2 CPUs
I.e. Second thread much slower when first thread is busy
Hyperthreading aware optimizations in recent operating systems
Sun “CoolThreads”
–
–
–
–
–
2008
Extra register sets working with one execution pipeline
When the CPU stalls on a memory read, it switches registers/threads
Operating system sees multiple schedulable entities (CPUs)
"Niagara" SPARC CPU has eight cores, one shared floating point unit
Each CPU core has four threads, but each core is a very simple design
Behaves like 32 slow CPUs for integer, snail like uniprocessor for FP
Overall throughput is very high, performance per watt is exceptional
New Niagara 2 has dedicated FPU and 8 threads per core (total 64 threads)
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 38
Variable Clock Rate CPUs
•
Laptop and other low power devices do this all the time
–
•
Server CPU Power Optimization - AMD PowerNow!™
–
–
–
–
–
–
•
2008
AMD Opteron server CPU detects overall utilization and reduces clock rate
Actual speeds vary, but for example could reduce from 2.6GHz to 1.2GHz
Changes are not understood or reported by operating system metrics
Speed changes can occur every few milliseconds (thermal shock issues)
Dual core speed varies per socket, Quad core varies per core
Quad core can dynamically stop entire cores to save power
Possible scenario:
–
–
–
–
•
Watch CPU usage of a video application and toggle mains/battery power….
You estimate 20% utilization at 2.6GHz
You see 45% reported in practice (at 1.2GHz)
Load doubles, reported utilization drops to 40% (at 2.6GHz)
Actual mapping of utilization to clock rate is unknown at this point
Note: Older and "low power" Opterons used in blades fix clock rate
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 39
Virtual Machine Monitors
• VMware, Xen, IBM LPARs etc.
– Non-integral and non-constant fractions of a machine
– Naiive operating systems and applications that don't expect this
behavior
– However, lots of recent tools development from vendors
• Average CPU count must be reported for each measurement
interval
• VMM overhead varies, application scaling characteristics may
be affected
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 40
Measurement Errors
•
Mechanisms with built in bias
–
e.g. sampling from the scheduler clock underestimates CPU usage
–
Solaris 9 and before, Linux, AIX, HP-UX “sampled CPU time”
–
Solaris 10 and HP-UX “measured CPU time” far more accurate
–
Solaris microstate process accounting always accurate but in Solaris 10 microstates
are also used to generate system-wide CPU
•
2008
Accounting of interrupt time
–
Platform and release specific systemic changes
–
Solaris 8 - sampled interrupt time spread over usr/sys/idle
–
Solaris 9 - sampled interrupt time accumulated into sys only
–
Solaris 10 - accurate interrupt time spread over usr/sys/idle
–
Solaris 10 Update 1 - accurate interrupt time in sys only
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 41
Storage Utilization
• Storage virtualization broke utilization metrics a long time ago
• Host server measures busy time on a "disk"
– Simple disk, "single server" response time gets high near 100%
utilization
– Cached RAID LUN, one I/O stream can report 100% utilization, but
full capacity supports many threads of I/O since there are many
disks and RAM buffering
• New metric - "Capability Utilization"
– Adjusted to report proportion of actual capacity for current workload
mix
– Measured by tools such as Ortera Atlas (http://www.ortera.com)
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 42
How to plot Headroom
• Measure and report absolute CPU power if you can get it…
• Plot shows headroom in blue, margin in red, total power tracking
day/night workload variation, plotted as mean + two standard deviations.
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 43
“Cockcroft Headroom Plot”
•
Scatter plot of response time
(ms) vs. Throughput (KB) from
iostat metrics
•
Histograms on axes
•
Throughput time series plot
•
Shows distributions and shape
of response time
•
Fits throughput weighted
inverse gaussian curve
•
Coded using "R" statistics
package
•
Blogged development at
http://perfcap.blogspot.com/search?q=chp
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 44
Response Time vs. Throughput
• A different problem…
• Thread-limited appserver
• CPU utilization is low
• Measurements are of a single SOA
service pool
• Response is in milliseconds
• Throughput is executions/s
Exec
Min.
Resp
:
1.00
Min.
:
0.0
1st Qu.:
2.00
1st Qu.:
150.0
Median :
8.00
Median :
361.0
Mean
:
64.68
Mean
:
533.5
3rd Qu.:
45.00
3rd Qu.:
771.9
Max.
2008
:10795.00
Max.
:19205.0
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 45
How busy is that system again?
• Check your assumptions…
• Record and plot absolute capacity for each measurement interval
• Plot response time as a function of throughput, not just utilization
• SOA response characteristics are complicated…
• More detailed discussion in CMG06 Paper and blog entries
– “Utilization is Virtually Useless as a Metric” - Adrian Cockcroft - CMG06
http://perfcap.blogspot.com/search?q=utilization
http://perfcap.blogspot.com/search?q=chp
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 46
CPU
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 47
CPU Capacity Measurements
• CPU Capacity is defined by CPU type and clock rate, or a
benchmark rating like SPECrateInt2000
• CPU throughput - CPU scheduler transaction rate
– measured as the number of voluntary context switches
• CPU Queue length
– CPU load average gives an approximation via a time
decayed average of number of jobs running and ready to run
• CPU response time
– Solaris microstate accounting measures scheduling delay
• CPU utilization
– Defined as busy time divided by elapsed time for each CPU
– Badly distorted and undermined by virtualization……
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 48
CPU time measurements
• Biased sample CPU measurements
– See 1998 Paper "Unix CPU Time Measurement Errors"
– Microstate measurements are accurate, but are platform and tool specific. Sampled
metrics are more inaccurate at low utilization
• CPU time is sampled by the 100Hz clock interrupt
–
–
–
–
sampling theory says this is accurate for an unbiased sample
the sample is very biased, as the clock also schedules the CPU
daemons that wakeup on the clock timer can hide in the gaps
problem gets worse as the CPU gets faster
• Increase clock interrupt rate? (Solaris)
– set hires_tick=1 sets rate to 1000Hz, good for realtime wakeups
– harder to hide CPU usage, but slightly higher overhead
• Use measured CPU time at per-process level
–
–
–
–
2008
microstate accounting takes timestamp on each state change
very accurate and also provides extra information
still doesn’t allow for interrupt overhead
Prstat -m and the pea.se command uses this accurate measurement
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 49
More CPU Measurement Issues
• Platform and release specific details
• Are interrupts included in system time? It depends…
• Is vmstat CPU sampled (Linux) or measured (Solaris 10)?
• Load average includes CPU queue (Solaris) or CPU+Disk
(Linux)
• Wait for I/O is a misleading subset of idle time, metric removed
in Solaris 10, ignore it in all other Unix/Linux releases
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 50
Controlling and CPUs in Solaris
• psrinfo - show CPU status and clock rate
• Corestat - show internal behavior of multi-core CPUs
• psradm - enable/disable CPUs
• pbind - bind a process to a CPU
• psrset - create sets of CPUs to partition a system
– At least one CPU must remain in the default set, to run kernel services
like NFS threads
– All CPUs still take interrupts from their assigned sources
– Processes can be bound to sets
• mpstat shows per-CPU counters (per set in Solaris 9)
CPU minf mjf xcal
0
45
1
0
1
29
1
0
2
27
1
0
3
26
0
0
4
9
0
0
2008
intr ithr
232
0
243
0
235
0
217
0
234
92
csw icsw migr smtx
780 234 106 201
810 243 115 186
827 243 110 199
794 227 120 189
403
94
84 1157
srw syscl
0
950
0 1045
0 1000
0
925
0
625
Solaris/Linux Performance Measurement and Tuning
usr sys
72 28
69 31
75 25
70 30
66 34
wt idl
0
0
0
0
0
0
0
0
0
0
3/21/2016
Slide 51
Monitoring CPU mutex lock statistics
• To fix mutex contention change the application workload or upgrade to a newer
OS release
• Locking strategies are too complex to be patched
• Lockstat Command
–
–
–
–
very powerful and easy to use
Solaris 8 extends lockstat to include kernel CPU time profiling
dynamically changes all locks to be instrumented
displays lots of useful data about which locks are contending
# lockstat sleep 5
Adaptive mutex spin: 3318 events
Count indv cuml rcnt
spin Lock
Caller
------------------------------------------------------------------------------601 18% 18% 1.00
1 flock_lock
cleanlocks+0x10
302
9% 27% 1.00
7 0xf597aab0
dev_get_dev_info+0x4c
251
8% 35% 1.00
1 0xf597aab0
mod_rele_dev_by_major+0x2c
245
7% 42% 1.00
3 0xf597aab0
cdev_size+0x74
160
5% 47% 1.00
7 0xf5b3c738
ddi_prop_search_common+0x50
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 52
Network
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 53
Network protocol data
•
Based on a streams module interface in Solaris
•
Solaris 2 ndd interface used to configure protocols and interfaces
•
Solaris 2 mib interface used by netstat -s and snmpd to get TCP stats etc.
•
Advantages
– Individual named metrics reasonably stable over releases
– Consistent data using locking
– Extensible to add metrics without breaking existing code
– Solaris ndd can retune TCP online without reboot
– System data is often also made available via SNMP prototcol
•
Disadvantages
– Underlying API is not supported, SNMP access is preferred
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 54
Network interface and NFS metrics
• Network interface throughput counters from kstat
–
–
–
–
rbytes, obytes — read and output byte counts
multircv, multixmt — multicast byte counts
brdcstrcv, brdcstxmt — broadcast byte counts
norcvbuf, noxmtbuf — buffer allocation failure counts
• NFS Client Statistics Shown in iostat on Solaris
crun% iostat -xnP
extended device Statistics
r/s w/s
kr/s
kw/s wait actv wsvc_t asvc_t %w %b device
0.0 0.0
0.0
0.0 0.0 0.0
0.0
0.0
0 0 crun:vold(pid363)
0.0 0.0
0.0
0.0 0.0 0.0
0.0
0.0
0 0 servdist:/usr/dist
0.0 0.5
0.0
7.9 0.0 0.0
0.0
20.7
0 1 servhome:/export/home/adrianc
0.0 0.0
0.0
0.0 0.0 0.0
0.0
0.0
0 0 servhome:/var/mail
0.0 1.3
0.0
10.4 0.0 0.2
0.0 128.0
0 2 c0t2d0s0
0.0 0.0
0.0
0.0 0.0 0.0
0.0
0.0
0 0 c0t2d0s2
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 55
How NFS Works
• Showing the many layers of caching involved
fstat
open
fopen
NFS Client
NFS Server
NFS Rnode
information
cache
DNLC
name
cache
lookup
putchar
getchar
read
write
mmap
readdir
In-memory
page
cache
64KB
chunks
read
write
In-memory
page
cache
pgin
Solaris/Linux Performance Measurement and Tuning
Disk
Storage
pgout
page-out
CacheFS
Storage
2008
bread
bwrite
read
write
printf etc.
UFS metadata buf fer
cache
lread
lwrite
pointers
pointers
stdio
1 KB
buf fer
UFS Inode
information
cache
Disk A rray
write cache/
Prestoserve
3/21/2016
Slide 56
Network Capacity Measurements
• Network Interface Throughput
– Byte and packet rates input and output
• TCP Protocol Specific Throughput
– TCP connection count and connection rates
– TCP byte rates input and output
• NFS/SMB Protocol Specific Throughput
– Byte rates read and write
– NFS/SMB service response times
• HTTP Protocol Specific Throughput
– HTTP operation rates
– Get and post payload byte rates and size distribution
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 57
TCP - A Simple Approach
• Capacity and Throughput Metrics to Watch
• Connections
–
–
–
–
–
Current number of established connections
New outgoing connection rate (active opens)
Outgoing connection attempt failure rate
New incoming connection rate (passive opens)
Incoming connection attempt failure rate (resets)
• Throughput
– Input and output byte rates
– Input and output segment rates
– Output byte retransmit percentage
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 58
Obtaining Measurements
• Get the TCP MIB via SNMP or netstat -s
• Standard TCP metric names:
–
–
–
–
–
–
tcpCurrEstab: current number of established connections
tcpActiveOpens: number of outgoing connections since boot
tcpAttemptFails: number of outgoing failures since boot
tcpPassiveOpens: number of incoming connections since boot
tcpOutRsts: number of resets sent to reject connection
tcpEstabResets: resets sent to terminate established
connections
– (tcpOutRsts - tcpEstabResets): incoming connection failures
– tcpOutDataSegs, tcpInDataSegs: data transfer in segments
– tcpRetransSegs: retransmitted segments
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 59
Internet Server Issues
• TCP Connections are expensive
– TCP is optimized for reliable data on long lived connections
– Making a connection uses a lot more CPU than moving data
– Connection setup handshake involves several round trip
delays
– Each open connection consumes about 1 KB plus data buffers
• Pending connections cause “listen queue” issues
• Each new connection goes through a “slow start” ramp up
• Other TCP Issues
– TCP windows can limit high latency high speed links
– Lost or delayed data causes time-outs and retransmissions
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 60
TCP Sequence Diagram for HTTP Get
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 61
Stalled HTTP Get and Persistent HTTP
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 62
Memory
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 63
Memory Capacity Measurements
• Physical Memory Capacity Utilization and Limits
– Kernel memory, Shared Memory segment
– Executable code, stack and heap
– File system cache usage, Unused free memory
• Virtual Memory Capacity - Paging/Swap Space
– When there is no more available swap, Unix stops working
• Memory Throughput
– Hardware counter metrics can track CPU to Memory traffic
– Page in and page out rates
• Memory Response Time
– Platform specific hardware memory latency makes a difference, but
hard to measure
– Time spent waiting for page-in is part of Solaris microstate
accounting
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 64
Page Size Optimization
• Systems may support large pages for reduced overhead
– Solaris support is more dynamic/flexible than Linux at present
• Intimate Shared Memory locks large pages in RAM
– No swap space reservation
– Used for large database server Shared Global Area
• No good metrics to track usage and fragmentation issues
• Solaris ppgsz command can set heap and stack pagesize
• SPARC Architecture
– Base page size is 8KB, Large pages are 4MB
• Intel/AMD x86 Architectures
– Base page size is 4KB, Large pages are 2MB
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 65
Cache principles
• Temporal locality - “close in time”
– If you need something frequently, keep it near you
– If you don’t use it for a while, put it back
– If you change it, save the change by putting it back
• Spacial locality - “close in space - nearby”
– If you go to get one thing, get other stuff that is nearby
– You may save a trip by prefetching things
– You can waste bandwidth if you fetch too much you don’t use
• Caches work well with randomness
– Randomness prevents worst case behaviour
– Deterministic patterns often cause cache busting accesses
• Very careful cache friendly tuning can give great speedups
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 66
The memory go round - Unix/Linux
• Memory usage flows between subsystems
Kernel
Memory
Buffers
kernel
free
System V
Shared
Memory
shm_unlink
kernel
shmget
alloc
Head
exit
brk
pagein
Process
Stack and
Heap
Free
RAM
List
reclaim
read
write
mmap
reclaim
Filesystem
Cache
Tail
pageout
scanner
2008
delete
pageout
scanner
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 67
The memory go round - Solaris 8 and Later
• Memory usage flows between subsystems
Kernel
Memory
Buffers
kernel
free
System V
Shared
Memory
shm_unlink
kernel
shmget
alloc
Head
Free RAM List
read
write
delete
mmap
Filesystem
reclaim Cache
exit
brk
pagein
Process
Stack and
Heap
Tail
pageout
scanner
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 68
Swap space
• Swap is very confusing and badly instrumented!
# se swap.se
ani_max 54814 ani_resv 19429 ani_free 37981 availrmem 13859 swapfs_minfree
1972 ramres 11887 swap_resv 19429 swap_alloc 16833 swap_avail 47272
swap_free 49868
Misleading data printed by swap -s
134664 K allocated + 20768 K reserved = 155432 K used, 378176 K available
Corrected labels:
134664 K allocated + 20768 K unallocated = 155432 K reserved, 378176 K available
Mislabelled sar -r 1
freeswap (really swap available) 756352 blocks
Useful swap data:
Total swap 520 M available 369 M
reserved 151 M
Total disk 428 M
Total RAM 92 M
# swap -s
total: 134056k bytes allocated + 20800k reserved = 154856k used, 378752k available
# sar -r 1
18:40:51 freemem freeswap
18:40:52
2008
4152
756912
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 69
Disk
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 70
Disk Capacity Measurements
• Detailed metrics vary by platform
• Easy for the simple disk cases
• Hard for cached RAID subsystems
• Almost Impossible for shared disk subsystems and SANs
– Another system or volume can be sharing a backend
spindle, when it gets busy your own volume can saturate,
even though you did not change your own workload!
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 71
Solaris Filesystem issues
ufs - standard, reliable, good for lots of small files
ufs with transaction log - faster writes and recovery
tmpfs - fastest if you have enough RAM, volatile
NFS
NFS2 - safe and common, 8KB blocks, slow writes
NFS3 - more readahead and writebehind, faster
default 32KB block size - fast sequential, may be slow random
default TCP instead of UDP, more robust over WAN
NFS4 - adds stateful behavior
cachefs - good for read-mostly NFS speedup
Veritas VxFS - useful on old Solaris releases
Solaris 8 UFS Upgrade
ufs was extended to be more competitive with VxFS
transaction log unbuffered direct access option and snapshot backup capability
now available “for free” with Solaris 8
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 72
Solaris 10 ZFS - What it doesn't have....
• Nice features
–
–
–
–
–
–
–
–
–
–
–
–
–
No extra cost - its bundled in a free OS
No volume manager - its built in
No space management - file systems use a common pool
No long wait for newfs to finish - create a 3TB file system in a second
No fsck - its transactional commit means its consistent on disk
No slow writes - disk write caches are enabled and flushed reliably
No random or small writes - all writes are large batched sequential
No rsync - snapshots can be differenced and replicated remotely
No silent data corruption - all data is checksummed as it is read
No bad archives - all the data in the file system is scrubbed regularly
No penalty for software RAID - RAID-Z has a clever optimization
No downtime - mirroring, RAID-Z and hot spares
No immediate maintenance - double parity disks if you need them
• Wish-list
– No way to know how much performance headroom you have!
– No clustering support
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 73
Linux Filesystems
• There are a large number of options!
– http://en.wikipedia.org/wiki/Comparison_of_file_systems
• EXT3
–
–
–
–
–
Common default for many Linux distributions
Efficient for CPU and space, small block size
relatively simple for reliability and recovery
Journalling support options can improve performance
EXT4 is in development
• XFS
– Based on Silicon Graphics XFS, mature and reliable
– Better for large files and streaming throughput
– High Performance Computing heritage
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 74
Disk Configurations
• Sequential access is ~10 times faster than random
– Sequential rates are now about 50-100 MB/s per disk
– Random rates are 166 operations/sec, (250/sec at 15000rpm)
– The size of each random read should be as big as possible
• Reads should be cached in main memory
–
–
–
–
“The only good fast read is the one you didn’t have to do”
Database shared memory or filesystem cache is microseconds
Disk subsystem cache is milliseconds, plus extra CPU load
Underlying disk is ~6ms, as its unlikely that data is in cache
• Writes should be cached in nonvolatile storage
– Allows write cancellation and coalescing optimizations
– NVRAM inside the system - Direct access to Flash storage
– Solid State Disks based on Flash are the "Next Big Thing"
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 75
Slow idle disks explained
extended disk statistics
disk
r/s w/s
Kr/s
sd2
1.3 0.3
11.7
sd3
0.0 0.1
0.1
Kw/s wait actv
3.3 0.1 0.1
0.7 0.0 0.0
svc_t
146.6
131.0
%w
0
0
%b
3
0
Why do these disks have high svc_t when they are idle?
Use prex to turn on kernel TNF probes for disk I/O
sdstrategy is called when an I/O is started
biodone is called when it completes
match the pairs of TNF records to see the time sequences
We find a burst of writes from pid 3 every 30s
fsflush is updating inodes scattered all over the filesystem
all writes are issued back to back without waiting to complete
a long queue forms, each write taking on average ~10ms to service, but
response (svc_t) includes a long queue time
Typically 20 or so writes each 30s is 0% busy, 100-200ms svc_t
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 76
Disk Throughput
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 77
Max and Avg Disk Utilization (Same data)
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 78
Data from iostat
• What can we see here?
extended disk statistics
disk
sd7
w/s
Kr/s
0.1
1.7
Kw/s wait actv
svc_t
%w
%b
0.1
13.3
0.0
0.2
109.8
0
1
sd15
534.2 17.5 1320.4
35.0
0.0
0.3
0.6
0
26
sd45
291.9 23.0
603.2
49.8
0.0
0.2
0.6
0
15
sd60
3.1
0.0
25.3
0.0
0.0
0.0
7.8
0
2
sd61
3.3
0.0
26.4
0.0
0.0
0.0
7.6
0
2
sd62
3.2
0.0
26.1
0.0
0.0
0.0
8.1
0
3
sd63
3.8
0.0
30.1
0.0
0.0
0.0
7.2
0
3
sd64
3.6
0.0
28.8
0.0
0.0
0.0
7.4
0
3
sd65
3.8
0.0
31.2
0.0
0.0
0.0
7.3
0
3
sd67
9.7
1.5
77.8
4.3
0.0
0.1
9.0
0
8
sd68
10.7
1.4
85.3
4.2
0.0
0.1
9.0
0
10
sd69
10.0
1.5
79.9
4.2
0.0
0.1
9.0
0
9
sd70
10.4
1.0
83.1
3.2
0.0
0.1
9.1
0
9
sd71
9.9
1.4
78.8
4.6
0.0
0.1
8.7
0
9
sd72
10.0
1.1
79.9
3.7
0.0
0.1
8.5
0
8
0.0 27.6
0.0
297.3
0.0
0.0
1.1
0
2
sd75
2008
r/s
sd210
12.1
0.3
108.9
0.6
0.0
0.1
9.8
0
10
sd211
12.9
0.4
114.8
0.7
0.0
0.1
10.6
0
11
sd212
12.0
0.6
107.1
1.3
0.0
0.1
11.1
0
10
sd213
13.8
0.3
122.2
0.9
0.0
0.2
11.1
0
11
sd214
12.5
0.5
112.1
1.0
0.0
0.1
10.3
0
10
sd215
12.1
0.3
109.5
0.8
0.0
0.1
10.5
0
10
sd7 root ufs
solid state disks
stripe 8K RR
stripe
cached write log
stripe
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 79
Simple Disks
• Utilization shows capacity usage
Measured using iostat %b
• Response time is svc_t
svc_t increases due to waiting in the queues caused by bursty
loads
• Service time per I/O is Util/IOPS
Calculate as(%b/100)/(rps+wps)
Decreases due to optimization of queued requests as load
increases
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 80
Single Disk Parameters
• e.g. Seagate 18GB ST318203FC
–
–
–
–
–
–
Obtain from www.seagate.com
RPM = 10000 = 6.0ms = 166/s
Avg read seek = 5.2ms
Avg write seek = 6.0ms
Avg transfer rate = 24.5 MB/s
Random IOPS
• Approx 166/s for small requests
• Approx 24.5/size for large requests
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 81
Mirrored Disks
• All writes go to both disks
• Read policy alternatives
–
–
–
–
All reads from one side
Alternate from side to side
Split by block number to reduce seek
Read both and use first to respond
• Simple Capacity Assumption
– Assume duplicated interconnects
– Same capacity as unmirrored
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 82
Concatenated and Fat
Stripe Disks
• Request size less than interlace
• Requests go to one disk
• Single threaded requests
– Same capacity as single disk
• Multithreaded requests
– Same service time as one disk
– Throughput of N disks if more than N threads are evenly
distributed
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 83
Striped Disks
• Request size more than interlace
• Requests split over N disks
– Single and multithreaded requests
– N = request size / interlace
– Throughput of N disks
• Service Time Reduction
– Reduced size of request reduces service time for large
transfers
– Need to wait for all disks to complete - slowest dominates
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 84
RAID5 for Small
Requests
• Writes must calculate parity
–
–
–
–
–
log
Read parity and old data blocks
Calculate new parity
Write log and data and parity
Triple service time
One third throughput of one disk
• Read performs like stripe
– Throughput of N-1, service of one
– Degraded mode throughput about one
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 85
RAID5 for Large
Requests
• Write full stripe and parity
log
• Capacity similar to stripe
–
–
–
–
Similar read and write performance
Throughput of N-1 disks
Service time for size reduced by N-1
Less interconnect load than mirror
• Degraded Mode
– Throughput halved and service similar
– Extra CPU used to regenerate data
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 86
Cached RAID5
• Nonvolatile cache
– No need for recovery log disk
• Fast service time for writes
– Interconnect transfer time only
• Cache optimizes RAID5
– Makes all backend writes full stripe
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 87
Cached Stripe
• Write caching for stripes
–
–
–
–
Greatly reduced service time
Very worthwhile for small transfers
Large transfers should not be cached
In many cases, 128KB is crossover point from small to large
• Optimizations
– Rewriting same block cancels in cache
– Small sequential writes coalesce
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 88
Capacity Model Measurements
• Derived from iostat outputs
extended disk statistics
disk
sd9
r/s
w/s
Kr/s
33.1
8.7
271.4
Kw/s wait actv
71.3
0.0
svc_t
%w
%b
15.8
0
27
2.3
• Utilization U = %b / 100 = 0.27
• Throughput X = r/s + w/s = 41.8
• Size K = Kr/s + Kw/s / X = 8.2K
• Concurrency N = actv = 2.3
• Service time S = U / X = 6.5ms
• Response time R = svc_t = 15.8ms
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 89
Cache Throughput
• Hard to model clustering and write cancellation
improvements
• Make pessimistic assumption that throughput is unchanged
• Primary benefit of cache is fast response time
• Writes can flood cache and saturate back-end disks
– Service times suddenly go from 3ms to 300ms
– Very hard to figure out when this will happen
– Paranoia is a good policy….
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 90
Concluding Summary
Walk out of here with the most useful content fresh in your mind!
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 91
Quick Tips #1 - Disk
• The system will usually have a disk bottleneck
• Track how busy is the busiest disk of all
• Look for unbalanced, busy or slow disks with iostat
• Options: timestamp, look for busy controllers, ignore idle disks:
% iostat -xnzCM -T d 30
Tue Jan 21 09:19:21 2003
extended device statistics
r/s
w/s
Mr/s
Mw/s wait actv wsvc_t asvc_t %w %b device
141.0
8.6
0.6
0.0 0.0 1.5
0.0
10.0
0 25 c0
3.3
0.0
0.0
0.0 0.0 0.0
0.0
6.5
0
2 c0t0d0
137.7
8.6
0.6
0.0 0.0 1.5
0.0
10.1
0 74 c0t1d0
Watch out for sd_max_throttle limiting throughput when set too low
Watch out for RAID cache being flooded on writes, causes sudden very
large increase in write service time
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 92
Quick Tips #2 - Network
• If you ever see a slow machine that also appears to be idle, you should
suspect a network lookup problem. i.e. the system is waiting for some
other system to respond.
• Poor Network Filesystem response times may be hard to see
– Use iostat -xn 30 on a Solaris client
– wsvc_t is the time spent in the client waiting to send a request
– asvc_t is the time spent in the server responding
– %b will show 100% whenever any requests are being processed, it does NOT
mean that the network server is maxed out, as an NFS server is a complex
system that can serve many requests at once.
• Name server delays are also hard to detect
– Overloaded LDAP or NIS servers can cause problems
– DNS configuration errors or server problems often cause 30s delays as the
request times out
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 93
Quick Tips #3 - Memory
• Avoid the common vmstat misconceptions
– The first line is average since boot, so ignore it
• Linux, Other Unix and earlier Solaris Releases
– Ignore “free” memory
– Use high page scanner “sr” activity as your RAM shortage indicator
• Solaris 8 and Later Releases
– Use “free” memory to see how much is left for code to use
– Use non-zero page scanner “sr” activity as your RAM shortage indicator
• Don’t panic when you see page-ins and page-outs in vmstat
• Normal filesystem activity uses paging
solaris9% vmstat 30
kthr
memory
r b w
swap free re
0 0 0 2367832 91768 3
0 0 0 2332728 75704 3
2008
page
disk
mf pi po fr de sr f0 s0 s1 s6
31 2 1 1 0 0 0 0 0 0
29 0 0 0 0 0 0 0 0 0
faults
cpu
in
sy
cs us sy id
511 404 350 0 0 99
508 537 410 0 0 99
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 94
Quick Tips #4 - CPU
• Look for a long run queue (vmstat procs r) - and add CPUs
– To speedup with a zero run queue you need faster CPUs, not more of them
• Check for CPU system time dominating user time
– Most systems should have lots more Usr than Sys, as they are running
application code
– But... dedicated NFS servers should be 100% Sys
– And... dedicated web servers have high Sys as well
– So... assume that lots of network service drives Sys time
• Watch out for processes that hog the CPU
– Big problem on user desktop systems - look for looping web browsers
– Web search engines may get queries that loop
– Use resource management or limit cputime (ulimit -t) in startup scripts to
terminate web queries
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 95
Quick Tips #5 - I/O Wait
• Look for processes blocked waiting for disk I/O (vmstat procs b)
– This is what causes CPU time to be counted as wait not idle
– Nothing else ever causes CPU wait time!
• CPU wait time is a subset of idle time, consumes no resources
– CPU wait time is not calculated properly on multiprocessor machines
on older Solaris releases, it is greatly inflated!
– CPU wait time is no longer calculated, zero in Solaris 10
– Bottom line - don’t worry about CPU wait time, it’s a broken metric
• Look at individual process wait time using microstates
– prstat -m or SE toolkit process monitoring
• Look at I/O wait time using iostat asvc_t
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 96
Quick Tips #6 - iostat
• For Solaris remember “expenses” iostat -xPncez 30
• Add -M for Megabytes, and -T d for timestamped logging
• Use 30 second interval to avoid spikes in load. Watch
asvc_t which is the response time for Solaris
• Look for regular disks over 5% busy that have response
times of more than 10ms as a problem.
• If you have cached hardware RAID, look for response
times of more than 5ms as a problem.
• Ignore large response times on idle disks that have
filesystems - its not a problem and the cause is the fsflush
process
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 97
Recipe to fix a slow system
• Essential Background Information
–
–
–
–
What is the business function of the system?
Who and where are the users?
Who says there is a problem, and what is slow?
What changed recently and what is on the way?
• What is the system configuration?
– CPU/RAM/Disk/Net/OS/Patches, what application software is in use?
• What are the busy processes on the system doing?
– use top, prstat, pea.se or /usr/ucb/ps uax | head
• Report CPU and disk utilization levels, iostat -xPncezM -T d 30
– What is making the disks busy?
• What is the network name service configuration?
– How much network activity is there? Use netstat -i 30 or nx.se 30
• Is there enough memory?
– Check free memory and the scan rate with vmstat 30
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 98
Further Reading - Books
General Solaris/Unix/Linux Performance Tuning
– System Performance Tuning (2nd Edition) by Gian-Paolo D. Musumeci and Mike
Loukides; O'Reilly & Associates
Solaris Performance Tuning Books
– Solaris Performance and Tools, Richard McDougall, Jim Mauro, Brendan Gregg; Prentice
Hall
– Configuring and Tuning Databases on the Solaris Platform, Allan Packer; Prentice Hall
– Sun Performance and Tuning, by Adrian Cockcroft and Rich Pettit; Prentice Hall
Sun BluePrints™
– Capacity Planning for Internet Services, Adrian Cockcroft and Bill Walker; Prentice Hall
– Resource Management, Richard McDougall, Adrian Cockcroft et al. Prentice Hall
Linux
– Linux Performance Tuning and Capacity Planning by Jason R. Fink and Matthew D.
Sherer
– Google has a Linux specific search mode http://www.google.com/linux
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide 99
Questions?
(The End)
2008
Solaris/Linux Performance Measurement and Tuning
3/21/2016
Slide
100
Download