aka.ms/downloadWPT aka.ms/downloadWPT Get started with WPA Introduction to App Analysis Capabilities Big Picture Tab Frame Analysis Tab Performance fundamentals XAML and content virtualization Diagnosing overdraw – XAML Planning costs less than rearchitecting From UX to APIs for 5 key scenarios The Mental Model for Interacting with the Platform The Windows Performance Toolkit XAML performance fundamentals Web runtime performance CPU Usage (Attributed) Description This summary table breaks down the CPU costs into several categories that are important for analysis. Key Info The Thread Activity Tag column aggregates costs into several defined categories. There are several preset views available (shown on the right) that allow you to filter the data to specific types of threads. We will typically use the UI Thread preset views for our analysis. Default Location: Frame Analysis tab CPU Usage (Precise) Description This summary table shows overall CPU Usage, based on context switch events. Key Info You can use this summary table to understand, at a high level, what processes are running on your system. You can also expand each Process to see its component threads. Default Location: Big Picture tab CPU Usage (Sampled) Description This graph provides a third view of CPU usage in your system based on CPU samples. Key Information This summary table is especially useful for understanding what code is running at any given time. To do this, you’ll first need to load symbols (Trace -> Load Symbols) Zoom into a region that you are interested in (CPU samples are typically collected every millisecond, so zoom in close) and expand the Stack column for your app. Default Location: Big Picture tab Disk Usage Description The Disk Usage graph shows disk activity on your system. Key Info For our analysis, we will typically use this in its graph view, to correlate disk activity with delays in our app. There are numerous preset views that help you to dig in to the different types of disk I/O and utilization in your system. Default Location: Big Picture tab DWM Frame Details Description This summary table presents information about the DWM (Desktop Window Manager) Key Info There are several default graph views that show different information related to the DWM frame rate (one way to measure smoothness) DWM Frame E2E – This graph plots a timeline of time spent per frame (from the start of the associated CPU work until the frame is flipped to the screen) DWM Frame GPU – This graph plots a timeline of GPU time spent per frame DWM Frame Rate – This graph plots the actual frame rate, to easily see when an app glitches and drops from the ideal 60 FPS Default Location: Frame Analysis tab Video - Frame Analysis Tab (part I) File I/O Description The File I/O graph shows disk activity on your system with a per-file granularity. Key Info To populate this graph with data, you must enable the “File I/O activity” profile when you collect your trace. Note: Only enable this profile if you require File I/O data. This is a very verbose provider and can affect the performance of your app while tracing. This graph allows you to see exactly which files were requested by your app (See the File Name column) and what type of File operation it used. Default Location: Graph Explorer -> Storage Generic Events Description This summary table presents all events that were collected in your trace. Key Info We have provided several filters to make these events more useful and easy to understand: Touch Events – shows marks for each type of touch event and highlights when they were generated with red lines msWriteProfilerMark – shows markers for each of the events that msWriteProfilerMark logs VSync-DWMFrame – marks the VSync events during screen updates with red lines Default Location: Trace Markers tab HTML/XAML Frame Details Description This summary table presents information similar to the DWM Frame Details summary table, with a focus on the HTML/XAML platform. Key Info There are several default graph views that plot different aspects of the data: HTML/XAML Frame Visuals Composition – plots a timeline of the work the DWM performed on this frame (from when the app handed it off until it was flipped to the screen) HTML/XAML Frame E2E – plots a timeline of the total time spent per frame HTML/XAML Frame GPU – plots a timeline of GPU time spent per frame HTML/XAML Frame Work Breakdown – This summary table shows what UI elements contributed to work each frame, to help pinpoint optimization opportunities Default Location: Frame Analysis tab Video - Frame Analysis Tab (part II) Window in Focus Description The Window in Focus graph simply shows which process has the current UI focus or is in the foreground of your system. Key Info This graph breaks down focus by process and thread. This graph should be used to locate your launch span, based on which thread of explorer.exe is in focus (since explorer.exe owns the system splash screen, the start of your launch scenario) Default Location: Big Picture tab WinINet Description The WinINet summary table and graph show what network requests were made on your system. Key Information This graph helps to identify what network requests your app makes or is blocked by. If your app’s UI thread CPU usage dips, only to return after a network request completes, it was probably blocked by the network request. Default Location: Graph Explorer -> Other aka.ms/downloadWPT Get started with WPA Verifying a Good Trace Eliminate interference and other factors that affect the repeatability of your trace analysis Bad Trace #1: CPU Interference (Other Processes) Problem: If other things are running on the system when you took your trace, they may be affecting your app’s performance. You may want to re-collect your trace if this happened. Identify the Problem: Check in the Big Picture tab – CPU Usage (Precise) summary table to see what processes are using CPU time. If other processes are taking CPU during your scenario, you should try to recapture the trace (there is little you can do from within your app) HTML/XAML “Your app” is shown in Red. Internet Explorer (Orange) and a second app (Green) are causing interference Solution Capture a trace on a clean, quiet system, to reduce interference. Note: explorer.exe, dwm.exe, RuntimeBroker.exe, and System may show up throughout your trace – these are typically ok. HTML/XAML Bad Trace #2: Disk I/O Problem: Disk response times can vary greatly and can affect the repeatability of your trace capture. Identify the Problem: To find time spent waiting for Disk I/O, copy the Frame Analysis - CPU Usage (Attributed) table to the Big Picture tab. Correlate it with the Big Picture - Disk Usage summary table Find time when your app is not using 100% CPU of its UI thread. If these dips in CPU utilization occur during disk activity, you are probably waiting on disk I/O. Solution “Warm up” your app scenarios before capturing traces for analysis. We recommend launching your app and exercising the scenario to be analyzed at least once before capturing a trace. Bad Trace #2: Disk I/O - Further Details HTML/XAML While the previous slide focused on reducing variability in your trace for the purposes of this workshop, Disk I/O can also be indicative of performance issues. Problem: The “real world” performance of your app can heavily depend on Disk I/O, since your app will often be “cold”. Also, even after being “warmed,” your scenario may still be disk-bound if you require lots of resources stored on disk. Solution Reduce the amount of data you must read from disk to reach your responsive UI. Page #15 has more details about using the File I/O graph to see which resources you use Identify Your Scenario in Your Trace Launch Analysis This Section Will Cover… 1. How to find your app’s launch within the trace 2. How to identify if common problems in app launch are impacting your app 3. Best practices for resolving these common problems All the issues that can affect page navigation can affect your launch – Go to the Page Navigation section for more analysis tips This section will not cover: 1. Analyzing animation frame rate problems (See the Animations section) 2. Analyzing panning frame rate and item realization problems (See the Panning section) Locate your Launch Span in the Big Picture tab Your splash screen is shown when the Window in Focus graph shows explorer.exe switching threads to the system splash screen (First Box). You can consider launch complete when the DWM Frame Rate graph reaches a relatively steady idle state. This means you aren’t drawing new content to the screen anymore. If you do not use an extended splash screen, this should be immediately after your system splash screen (the green line) ends. If you do use an extended splash screen, the DWM should deliver a high frame rate while the extended splash screen is shown. When it is torn down, the frame rate will drop to idle (Second Box). Investigation #1: CPU Interference (Background Threads) Problem: While we recommend offloading work from your UI thread to background threads, this alone may not improve your app’s performance. Your UI thread may lose CPU time while waiting for background work to complete. Identify the Problem: Check in the Big Picture tab – CPU Usage (Precise) summary table to see what threads are using CPU time. Identify your UI thread using the CPU Usage (Attributed) summary table (see the Blue Boxes) If other threads of your app are taking significant CPU, you should investigate what work on background threads is running and if it can be deferred Solution Defer or de-prioritize work on background threads HTML/XAML HTML/XAML I-2: Network I/O Problem: Network connections can vary in signal strength and speed. Your app’s launch to a responsive UI should not be blocked by network I/O. Identify the Problem: To find time spent waiting for Network I/O correlate the Frame Analysis – Activity CPU summary table with the WinINet Details summary table Find time when your app is not using 100% CPU of its UI thread. If these dips in CPU utilization end when a download ends, you are probably waiting for network I/O. Solution Design your app so that you can reach a responsive UI without network I/O HTML/XAML I-5: Too Many Resources Problem: There is a per-file loading cost for each resource file you load (CSS and JS files for HTML apps, XAML code for XAML apps) Identify the Problem: Use the Frame Analysis - CPU Usage (Attributed) table to identify what your UI thread is working on. Time spent fetching code files appears in the summary table under: [Root]/Trident/Parsing/<Pre or Post> (HTML apps) [Root]/XamlUI/Parse (XAML apps) If you enabled File I/O activity in your trace, you can use the File I/O summary table to find which files are fetched .js files will not be called out in the File I/O table HTML apps - Most often these files are linked in your start page <head> tag. Solution HTML apps - When packaging the app, consolidate your JS and CSS into as few files as possible. XAML apps - Reduce the # of templates you use Though fewer files are better, be careful not to over-consolidate. For more details, see this MSDN page. HTML/XAML I-6: Too Much Code Problem: It is easy to include all of your app code upfront at launch, but this will increase the time needed to launch your app. Identify the Problem: This time will appear in the Frame Analysis - CPU Usage (Attributed) summary table, in the following categories: HTML Code: CSS XAML Code: [Root]/XamlUI/Parse [Root]/Trident/Parsing/CSS JavaScript [Root]/JScript/ If there is a large amount of time in these categories, you should consider reducing the amount of code you include at launch. Solution You should only include the code that is needed for the launch scenario in your start page. Defer load everything else. For more details, see this MSDN page. I-7: Non-Packaged JavaScript HTML/XAML Problem: If your app heavily uses web content on responsiveness critical paths, a large portion of time could be going towards script parsing and bytecode generation. Identify the Problem: This time will appear in the Frame Analysis – CPU Usage (Attributed) summary table, in the following categories: [Root]/Jscript/ParseSource [Root]/Jscript/ByteCodeGen If you encounter large amounts of CPU in these two categories, examine if you are leveraging bytecode caching and strongly consider design changes. Solution Redesign/refactor your app so that a majority of your script can be in-package. Follow the best practices at http://msdn.microsoft.com/enus/library/windows/apps/hh849088.aspx I-13: Animations for Hidden UI HTML/XAML Problem: Animating items that are offscreen or covered up by a splash screen can waste valuable CPU time. Identify the Problem: Use the Generic Events summary table to determine if you have any animations: Pivot by “Provider Name” and then “Task Name” columns. For HTML apps, expand the “Microsoft-IE” provider and look for “Mshtml_Animations_Animating” and “Mshtml_Animations_Transitioning” tasks. For XAML apps, expand the “Microsoft-XAML” provider and look for “Animation” and “[Begin/End/Stop]Storyboard” tasks If you have animations, there will be events under these tasks. Solution Turn off animations while under a splash screen and for all hidden UI elements Page Navigation Analysis The app launch scenario contains a page navigation – many of the investigations for page navigation can also help your launch. Locate your Page Navigation Span in the Frame Analysis tab Add the Trace Markers - Touch Events summary table to your Frame Analysis tab In general, your page navigation will start with the user’s touch event (First Box) The page navigation is probably done (in a responsive state) when your DWM Frame Rate reaches a steady state (Second Box). I-8: Expensive Layout Work HTML/XAML Problem: If you have a complex UI, the process of laying out all of its elements can take a long time. Identify the Problem: Layouts can be identified in the Frame Analysis – CPU Usage (Attributed) by these thread activity tags: [Root]/Trident/Layout (HTML app) [Root]/XamlUI/Frame/Layout (XAML app) [Root]/XamlUI/Frame/Arrange (XAML app) Expensive layouts can result from a large UI (too many elements) or a complex one (expensive types of elements) Solution Check your UI – try to reduce the number of elements you use and avoid expensive elements & patterns (such as nesting Flexboxes inside each other) I-9: Expensive Format Work HTML/XAML Problem: Like your layout work, formatting work is a direct result of the styles and formatting applied to your DOM elements. Identify the Problem: Formats can be identified in the Frame Analysis – CPU Usage (Attributed) by the thread activity tag: [Root]/Trident/Format Expensive formats can result from a large set of styles (too many CSS rules) or other bad patterns (expensive types of rules & selectors) Solution Reduce the number of your rules and avoid bad patterns, such as using “*” in your CSS selectors. You can use the HTML Frames summary table to examine which DOM elements are being formatted (check the DispNodeDesc column to see HTML tags, classes, and IDs) I-10: Unnecessary Code Execution HTML/XAML Problem: Your app may be executing more code than is necessary to reach a responsive state. Identify the Problem: This time will appear in the Frame Analysis – CPU Usage (Attributed) summary table, in the following thread activity tags: [Root]/JScript/OM (HTML App) XAML UI (XAML App) If you are spending a long time executing code, you should examine what work is on this critical path Solution Defer work that is not necessary to reach a responsive state or schedule it at low priority You can see what code is executing by digging in to the Stacks in the CPU Usage (Sampled) summary table. I-11: Inline Format/Layout HTML/XAML Problem: Your script can force the app platform to format and layout your DOM if you call certain functions. Identify the Problem: Inline layouts can be identified in the Frame Analysis – CPU Usage (Attributed) by looking for patterns in these categories: [Root]/Trident/Formatting [Root]/Trident/Layout [Root]/JScript/OM An inline layout will start in JScript/OM and call in to Trident/Format and/or Layout, then return to JScript/OM Solution Avoid querying DOM layout properties from your script such as getElementByTag(“div”).offsetHeight For more details, see this MSDN page. I-12: Excessive WinRT Calls HTML/XAML Problem: Inefficient use or repeated calls to expensive WinRT APIs can negatively impact performance Identify the Problem: For WinRT costs, copy the Frame Analysis - CPU Usage (Attributed) summary table to the Big Picture Tab. Compare the two CPU Usage graphs: Look for time when your app is not using 100% CPU of its UI thread in Attributed. If RuntimeBroker.exe is using the CPU during these dips, you are probably making WinRT calls (look in Precise). Solution Examine your code and try to improve the way you make WinRT calls: For example, if your app is making repeated calls to the same function, consider calling it once and caching the result. Panning Analysis Touch manipulation covers two aspects: Being Fast (quickly respond to input/render content) & Fluid (smooth animations) This Section Will Cover… 1. How to find your panning scenario within the trace 2. How to identify if common problems in app panning are impacting your app 3. Best practices for resolving these common problems All the issues that can affect animation smoothness can affect your panning smoothness – Go to the Smooth Animations & Glitch-Free Panning section for more analysis tips This section will not cover analysis of Launch or Page Navigation scenarios (See the previous sections) Locate your Panning Span in the Trace Markers tab Your panning span will start with the user’s touch event, such as a flick, to pan through a list (First Box) The scenario is complete when your DWM Frame Rate reaches a steady, idle state (Second Box). I-15.1: Always Display List Items HTML/XAML Problem: While panning through lists of content, it is important that a user always know where they are in the list. Identify the Problem: If you notice blank spots as you pan through your list, this means you are not rendering your items quickly enough. The Frame Analysis - XAML/HTML Frame Details summary tables can show what visuals contributed to the cost of each frame. You can be CPU or GPU bound Use the CPU Usage (Attributed) summary table to see what type of work is taking the most time Remember: if you aren’t keeping up, your DWM frame rate will be high (because it doesn’t have to do work on blank items!) Solution Check if your scenario is CPU or GPU bound – if so, reduce the complexity of your list items If the CPU breakdown shows high cost in script or app code, you should simplify your item template I-15.2 Always Display List Items HTML/XAML Identify Problem UI Elements: The Frame Analysis - HTML Frame Details summary table can also help identify which DOM elements caused format and layout costs If your Frame Analysis - CPU Usage (Attributed) summary table shows significant Formatting or Layout costs, open the HTML Frame Work Breakdown view of the HTML Frame Details summary table Dig in to your expensive frames to understand what elements contributed to that frame’s work The Work Stack column breaks down the work per frame into Layout and Format costs and shows information such as HTML tag and CSS class names Solution Identify the expensive pieces of your UI (those with the highest TaskExclusive cost) and try to simplify those parts of your UI I-16: Panning through Complex Items HTML/XAML HTML apps can use an item renderer function to specify different stages of content that will help to quickly fill in a list of data during panning: Stage 1: Placeholder – this stage presents an empty placeholder and should only be used if you do not have any data for the item Stage 2: Placeholder with Data – this stage fills in basic, meaningful data for the item, so that a user knows their location in the list Stage 3: Full Content – this stage is run after all Stage 2’s are complete and will fill in all remaining (expensive) data, such as images XAML apps do not have a multi-stage rendering function. Solution Apply these principles if you see lots of blank items while panning: Design your item renderer to have a very fast Stage 2 placeholder step. XAML apps do not have a multi-stage rendering function. If you cannot render items quickly enough, consider reducing the complexity of your item template. HTML/XAML Panning through Complex Items (Sample) function itemRenderer(itemPromise, recycled) { // STAGE 1 // When this is called, we can immediately prepare a generic placeholder var div = document.createElement("div"); div.innerText = "loading..."; // END STAGE 1 Use this sample code to easily set up your multi-stage item renderer. return { element: div, renderComplete: itemPromise.then(function (item) { // STAGE 2 // Note: Stage 2 may run inline immediately following stage 1 if the data is already available. // Lightweight placeholder with basic information div.innerText = item.data.title; // END STAGE 2 // Waiting for item.ready to do heavy work return item.ready.then(function () { // STAGE 3 // More expensive work should be done in stage 3 such as loading images. var img = document.createElement('img'); img.src = item.data.imgurl; div.appendChild(img); // END STAGE 3 }); }) } } Smooth Animations & Glitch-Free Panning I-4.1: Smoothness & Glitches (Identify) HTML/XAML Problem: It is important for both animations and touch manipulation (such as panning) to be smooth and free of glitches or jerkiness. A smooth experience is measured as delivering a consistent 60 frames per second (FPS) during animations or touch manipulation. Identify the Problem: The Frame Analysis DWM Frame Details summary table shows your FPS. If it is not consistently 60 FPS, identify why: Duplicate the DWM Frame Details table and switch to the DWM Frame E2E table view To achieve 60 FPS, all frames (rows) should have a SinceLastFlip time of about 16 ms – the frames that don’t probably glitched You can expand each row to see what visuals/layers contributed to its cost Causes & solution explained on next page… I-4.2: Smoothness & Glitches (Fix) HTML/XAML Glitches can be caused by a lack of two system resources: CPU and/or GPU. The Frame Analysis - DWM Frame Details summary table (DWM Frame E2E view) helps identify which situation your app falls in to: High CpuEndDelta time = CPU Bound (example on the right) High GpuDuration time = GPU Bound You can also see what Visuals (or layers) make up your UI by expanding a given row. Solution If your scenario is CPU bound, reduce the complexity of your scene & the number of UI elements you use Too many layers will result in high CPU cost If your scenario is GPU bound, reduce the amount of overdraw (overlapping elements) Lots of overlap between layers will result in high GPU cost For more info, see Cenk Ergan’s Performance Centric Framework Overview (Main slide deck, slide #19) HTML/XAML I-4.3 Smoothness & Glitches (HTML apps) HTML apps can make use of special instrumentation to get more context into their DWM frames. The Frame Analysis - HTML Frame Details summary table shows information similar to the DWM Frame Details summary table, but adds the tag, class, and ID names of the DOM elements in each visual/layer.