Daniel Pearson
• Started working with Windows NT 3.51
• Three years at Digital Equipment Corporation
• Supporting Intel and Alpha systems running Windows NT
• Seven years at Microsoft
• Senior Escalation Lead in Windows base team
• Worked in the Mobile Internet sustained engineering team
• Instructor for David Solomon, co-author of the Windows Internals book series
Agenda
• Components of performance analysis
• Understanding the tools for troubleshooting and analyzing performance issues
• Troubleshooting CPU and memory issues using various
Windows tools
* Portions of this session are based on material developed by
Mark Russinovich and David Solomon
Components of Performance Analysis
• Event Tracing for Windows
• Core component of the operating system
• Kernel mode data structures
• Used to store information about the system and system objects that can be read by various tools
• e.g. dt nt!_KTHREAD KernelTime
• CPU performance monitoring events
• Refer to the Intel 64 and IA32 Architectures Software Developer’s
Manual
• http://developer.intel.com/products/processor/manuals
Event Tracing for Windows
•
•
• Built in to the system
•
High performance, low overhead and scalable
2.5% CPU usage for a sustained rate of 10,000 events/sec on a 2 GHz CPU 1
•
Operations throughout the system that are of interest to performance are fully instrumented e.g. process and thread activity, registry I/O, disk I/O
1. Milirud, Michael. 2008. Windows Performance Analysis: Using Windows
Performance Tools. Presented at Microsoft's WinHEC conference, November
5-7, Los Angeles, CA.
Event Tracing for Windows
•
•
•
• Uses a buffering and logging mechanism implemented in the kernel
Per-processor buffers that are written to disk by an asynchronous writer thread
Ability to enable and disable tracing dynamically
Supports a managed code provider
Sysinternals Utilities
Sysinternals Utilities
• Process Explorer
• Useful for displaying which files, registry keys and other objects processes have open and which DLLs they have loaded
• Process Monitor
• Useful for showing real-time file system, registry and process & thread activity
• Available for download from the TechNet site
• http://technet.microsoft.com/sysinternals
Resource Monitor
Resource Monitor
• Included with Windows Vista and greatly enhanced in Windows 7 and
Windows Server 2008 R2
• Allows the viewing of CPU, memory, disk and network resources as well as handles and modules in real time
• Ability to end , suspend and resume processes as well as to start , stop and restart Windows services
• Useful for identifying the highest resource consumers by individual resource type, e.g. CPU
• Able to list the wait chain tree of a process to determine if a process is waiting on another
Performance Monitor
Performance Monitor
• Queries performance counters that measure system state or activity
• Current values are read at specific intervals
• Performance counters are included in the operating system and can be included as part of applications
• Able to collect event trace data from trace providers that report actions or events
• Can combine multiple trace providers into a single session
• Configuration information can be collected from registry keys at a specific time or interval
Windows Performance Analyzer
Windows Performance Analyzer
•
• Part of the Windows Performance Toolkit
• Support for both x86, x64, and IA64 architectures
Consists of three primary programs
•
• xperf.exe
Used for controlling tracing and processing trace data
•
• xbootmgr.exe
Automates on and off state transitions and captures traces during those transitions
•
• xperfview.exe
A graphical trace visualization tool to represent data in the form of interactive graphs and summary tables
Windows Performance Analyzer
• Primarily uses the Event Tracing for Windows infrastructure built in to the system
• Can be enabled or disabled at any time without requiring a system or process restart
• Supports symbol decoding, sample profiling, and recording of call stacks on kernel events
• Designed to be used during automation
• All the functions of the tools are available via the command line tool xperf.exe
Support for Earlier Systems
• The Windows Performance Toolkit will fail to install on Windows XP and on Windows Server 2003 although data collection is supported
• Copy xperf.exe and perfctrl.dll
• Trace analysis is only supported on Windows Vista and later systems
Capturing a Performance Trace
• Kernel options divided into two parts
• Kernel Flags
• Identified by the use of uppercase characters
• e.g. PROC_THREAD, LOADER, PROFILE
• Kernel Groups
• Indentified by the use of title case characters
• e.g. Base, Diag, Latency, FileIO
• Kernel Groups are made up of a collection of Kernel Flags
• e.g. SysProf = PROC_THREAD+LOADER+PROFILE
• Flags and groups are separated by the ‘+’ token
• e.g. xperf.exe -on FileIO+DISK_IO_INIT
Merging of Performance Trace Data
• Traces can be copied to another system for analysis
• The trace file should be “merged” on the collection system before analysis to include additional system information
• xperf -d trace.etl
System and symbol information
Trace
Kernel trace
Merged trace
XPerf
Understanding CPU Activity
• Windows uses 32 priority levels
• The system implements a preemptive, priority driven scheduler
• Priority adjustments can be applied to threads in the “dynamic” range
• At least one runnable thread with the highest priority will be running
31
16
15
Real time
Dynamic
0
Context Switching
•
•
•
• A switch from one thread to another is known as a context switch
Switching involves saving the hardware state of a thread and restoring the state of another
When a thread is scheduled, that thread’s context switch count is also incremented
The context switch count represents how often a thread begins running, not how long it ran
Time Accounting Quirks
•
•
•
Looking at total CPU time for each process may not reveal where the system has spent its time
•
CPU time accounting is driven by an interrupt timer which is set by the Hardware Abstraction Layer
Usually at either 10 or 15 msec intervals
•
•
Thread execution and context switches that happen between clock intervals are not accounted for e.g. a thread runs and enters a wait before the clock fires
Thus threads may run but never get charged
Time Accounting Prior to Windows Vista
• Windows accounted for CPU time based on the interval clock timer
• Thread quantum expiration was not always fair
• A thread might get almost no turn
• Threads were also charged for interrupts that occurred while they were running
Idle Idle T1 T2 T2
Time Accounting Since Windows Vista
• Windows Vista and later reads the Time Stamp Counter during every context switch
• The actual CPU cycles consumed are charged to a thread
• Any interrupt time is not charged to the interrupted thread
• Allows for more accurate quantum accounting
• A thread gets at least one turn and at most will be given one turn plus an additional tick
Idle Idle T1 T1 T2
Understanding Memory Management
• Windows provides two system memory pools
• Nonpaged Pool and Paged Pool
• Used for system wide persistent data
• Prior to Windows Vista, pool sizes were a function of memory size and whether or not the system was configured as a server or a workstation
• Windows Vista introduced the concept of a dynamic system address space
Dynamic System Address Space
• In 32-bit Windows Vista and later, virtual memory is assigned as needed
• Permits larger paged, nonpaged, and session pools
• Components still cannot exceed 2 GB on 32-bit systems
• On 64-bit systems, address space regions are configured to their current maximum limits for all memory sizes
Additional Information
• Windows Internals 5 th edition
• Windows Performance Analysis Developer Center
• http://msdn.microsoft.com/performance
• Windows Server Performance Team Blog
• http://blogs.technet.com/winserverperformance
• Ask the Performance Team Blog
• http://blogs.technet.com/askperf
Additional Information
• David Solomon Expert Seminars offers training on Windows Internals both as public and private workshops and public webinars via the Internet
• Currently scheduled up and coming classes
• Public workshop in London, April 12 th – April 16 th
• Public webinar, April 26 th & April 28 th
• Public workshop in New York, May 3 rd – May 7 th
• Public workshop in San Francisco, November 8 th – November 12 th
• Visit http://www.solsem.com
for further course descriptions and up to date information