Virtual Machine Manager (VMM) for System Center 2012 General Troubleshooting Guide Updated: September 8, 2011 Released: 1 This document is provided "as-is". Information and views expressed in this document, including URL and other Internet Web site references, may change without notice. Some examples depicted herein are provided for illustration only and are fictitious. No real association or connection is intended or should be inferred. This document does not provide you with any legal rights to any intellectual property in any Microsoft product. You may copy and use this document for your internal, reference purposes. You may modify this document for your internal, reference purposes. © 2011 Microsoft Corporation. All rights reserved. Microsoft, Active Directory, Bing, Excel, Hyper-V, Internet Explorer, Silverlight, SQL Server, Windows, Windows Intune, Windows PowerShell, Windows Server, and Windows Vista are trademarks of the Microsoft group of companies. All other trademarks are property of their respective owners. 2 Contents General VMM Troubleshooting........................................................................................................6 Description ....................................................................................................................................................................... 6 How to properly scope the issue ............................................................................................................................. 6 Scoping Questions..................................................................................................................................... 6 Features Involved .......................................................................................................................................................... 8 Underlying Technologies ............................................................................................................................................ 8 VMM Architecture Overview .............................................................................................................................................. 9 Introduction ..................................................................................................................................................................... 9 General Troubleshooting Strategy .......................................................................................................... 10 VM Manager Log ..................................................................................................................................... 10 Windows Event Logs ............................................................................................................................... 10 VMM Installation Troubleshooting ................................................................................................................................ 10 Installation Logs ...................................................................................................................................... 11 VMM Server Setup Logging..................................................................................................................... 11 VMM Agent Installation Logging ........................................................................................................................... 11 Windows Installer (MSI) Logging ......................................................................................................................... 11 Introduction ................................................................................................................................................................... 11 Viewing VMM Installation Logs ............................................................................................................................. 12 Troubleshooting Tools ............................................................................................................................ 14 Introduction ................................................................................................................................................................... 14 Data Collection Tools ........................................................................................................................................................... 14 Obtaining the legacy MPS Reporting utilities ................................................................................................... 14 Trace Collection...................................................................................................................................................................... 14 DebugView for Windows.................................................................................................................................................... 14 Introduction ................................................................................................................................................................... 14 Obtaining DebugView for Windows ....................................................................................................... 15 Before Collecting a DebugView Trace ..................................................................................................... 15 Collecting a DebugView Trace ................................................................................................................ 16 Trace32.exe .............................................................................................................................................................................. 17 Introduction ................................................................................................................................................................... 17 Obtaining the Trace32.exe ....................................................................................................................................... 17 3 TextAnalysisTool.Net ........................................................................................................................................................... 19 Introduction ................................................................................................................................................................... 19 SCVMM2012_MPSReports ................................................................................................................................................. 20 Introduction ................................................................................................................................................................... 20 Installation ...................................................................................................................................................................... 20 VMM Trace Collector – Managing and configuring tracing ........................................................................ 21 Add your machines to the Trace collector tool ................................................................................................ 21 Start and stop traces ................................................................................................................................................... 21 Save your configuration for later .......................................................................................................................... 22 VMM Trace Collector – Collecting traces............................................................................................................ 22 VMM Trace Collector – Collecting MPS reports data..................................................................................... 23 Diagnostic Data location ........................................................................................................................................... 24 Filtering Trace Files .................................................................................................................................................... 25 Viewing Trace Files .................................................................................................................................................... 26 SQL Server Management Studio Express ..................................................................................................................... 27 Introduction ................................................................................................................................................................... 27 Obtaining SQL Server Management Studio Express ...................................................................................... 27 Advanced Troubleshooting ..................................................................................................................... 29 Interpreting Job Failures .................................................................................................................................................... 29 Introduction ................................................................................................................................................................... 29 Where to begin .............................................................................................................................................................. 29 Tools to Use .................................................................................................................................................................... 29 Interpreting Job Failures........................................................................................................................................... 29 Analyzing the Trace .............................................................................................................................................................. 32 Introduction ................................................................................................................................................................... 32 WinRM Troubleshooting .................................................................................................................................................... 35 Test local WinRM functionality .............................................................................................................................. 36 Test remote WinRM functionality......................................................................................................................... 36 WMI Troubleshooting.......................................................................................................................................................... 37 WMI Troubleshooting Tools.................................................................................................................................... 37 Windows Management Instrumentation Tester (wbemtest.exe) ........................................................... 38 WMI Service Control Utility (winmgmt.msc) ................................................................................................... 39 4 WMIC command line .................................................................................................................................................. 40 WMI Error Constants ................................................................................................................................................. 41 BITS Troubleshooting .......................................................................................................................................................... 42 Introduction ................................................................................................................................................................... 42 Verify BITS outside of VMM ..................................................................................................................................... 42 BITS Compact Server .................................................................................................................................................. 43 BITS Traces ..................................................................................................................................................................... 44 Useful KB Articles and Blogs ................................................................................................................................... 47 5 General VMM Troubleshooting Description In a high-level overview, troubleshooting Virtual Machine Manager (VMM) for System Center 2012 can be divided into the following categories: 1. How to properly scope the issue. 2. What are the VMM features involved? 3. What are the underlying technologies that VMM relies upon? How to properly scope the issue Before work can begin on troubleshooting any VMM issue, you must properly identify or scope the issue. This involves asking the right questions, listening and sometimes interpreting the answer as well as collecting some basic information from the customer. Scoping Questions Scoping questions are questions designed to help determine the actual issue. Note: This list is targeted at common questions that should be asked in all VMM cases. Additional scoping is required once a specific problem area has been identified. What is the installed version of VMM including build? Is this a clean install of VMM or an update from a previous version or beta? What is the exact syntax of the error message? When does the error occur? During what VMM operation? What is the installed version of SQL? Is SQL local or remote? Scope the issue precisely and thoroughly Before deciding on what will be addressed as the issue, make sure you have asked yourself questions such as these: Is the issue reproducible? When exactly did it begin, and what was changed prior to that time? 6 What exactly does not work? This means there is an expected outcome and something else is instead occurring (or not). What constitutes a working scenario? Performing as it once did? Performing as expected for the first time? Which features are affected? Which features are not affected? Of the items that are not working as expected, what other technologies are dependent upon or related? This is your starting point. Develop a plan Test in various methods those items that have been identified as not working as expected. Until a conclusion or consensus is made regarding a feature, do not move on to other seemingly related issues. Stay on task. Once a technology has been ruled out as working correctly, move forward, not back. If it is decided that a feature is working focus should be shifted to the next relevant technology. Research relevant errors Begin with general errors and review carefully all messages reported by the product. Error messages have improved significantly and often simply state the cause of the problem and how to fix it. If there are very specific error codes (0x8XXXXXXX) search for these. General four digit errors are useful and can be referenced for general information in the error codes appendix, or online at TechNet, but eight digit errors are specific to the problem at hand. Use these at every opportunity. Resource List Knowledge Base (KB) http://support.microsoft.com/kb/242450 System Center Content Search Gadget http://gallery.live.com/LiveItemDetail.aspx?li=49e26ad0-113d-4f3d-a711-57f6530c75d9 TechNet - System Center Virtual Machine Manager 2008 http://technet.microsoft.com/en-us/library/cc917964.aspx (Direct) http://technet.com/scvmm (Select ‘Library’ tab at top) Official Microsoft Blogs http://blogs.technet.com/<Name> 7 scvmm, m2, hectorl, virtualization, jonjor, mbriggs Features Involved The following features are supported in VMM for System Center 2012. Underlying Technologies The following list contains core underlying technologies that VMM depends upon in order to work correctly. WS Management (WinRM) WMI BITS DCOM WCF 8 VMM Architecture Overview Introduction VMM is modular in design with the following three layers, joined by a common, well-documented interface. Before understanding where to begin when troubleshooting VMM, it is helpful to understand the underlying architecture to determine where to begin. Each of the core operating system underlying technologies will be discussed in more detail later in this document. 9 General Troubleshooting Strategy At this point, the issue has been properly scoped and a general understanding of the architecture has revealed underlying core technologies that might not be working correctly. The next steps are to use the available tools outlined below to analyze all involved computers, and to collect data for analysis. VM Manager Log One of the first places to look for errors or warnings is the VM Manager log. The VM Manager log can be found on the VMM Management Server in the Diagnostics node in Server Manager. 1. Open Server Manager and expand the Diagnostics node 2. Select Event Viewer and then expand 3. Applications and Services and find the VM Manager log underneath Scan this log for the presence of any Error messages. Correlate this to the date and time which the problem occurred. Windows Event Logs It is also a good idea to examine the Windows event logs for corresponding errors or failures. As often is the case, an underlying problem with the operating system or one of its core services will result or bubble up as an problem in VMM. For example: The WMI service has failed to start properly or was set to disabled. WMI is one of the core services that VMM relies upon for communication with the vmmagent. If the WMI service is not working properly, many operations in VMM will fail as a direct result of this. This could easily be determined by looking in the System or Application log for corresponding failures or errors that would indicate that the service is not running. VMM Installation Troubleshooting This section discusses VMM feature installation-related troubleshooting. The following topics are discussed in this section: Installation Logs o VMM Server Setup Logging 10 o VMM Agent Installation Logging Windows Installer (MSI) Logging Viewing VMM Installation Logs Known Issues – Setup and Upgrade Installation Logs VMM logs installation information for the main VMM Server as well as agent installation. VMM Server Setup Logging Separate reports exist for the installation of each of the primary features of VMM. Installation logs are written, by default, to the following hidden folder on Windows Server 2008: C:\ProgramData\VMMLogs VMM Agent Installation Logging Local agent installation information is logged in the following hidden folders: Windows Server 2003: C:\Documents and Settings\All Users\Application Data\VMMLogs AgentSetup.log vmmAgent.msi_<m-d-yyy_hh-mm-dss>.log vmmmsxml6setup_<m-d-yyy_hh-mm-dss>.log vmmvssetup_<m-d-yyy_hh-mm-dss>.log Windows Server 2008: C:\ProgramData\VMMLogs vmmAgent.msi_<m-d-yyy_hh-mm-dss>.log Important: If installation logging does not provide enough information to determine the cause of a failure, tracing may be enabled prior to starting the installation, using the VMM MPS Reports tool. Windows Installer (MSI) Logging Introduction VMM features that are provided as Windows Installer packages (.msi files), including agent installation packages, are installed using the Windows Installer service (MSI). When installing an MSI package, such as installing the VMM agent manually, you can enable logging using the following command: msiexec /I <MSIPackageName.msi> /L*V <path\logfilename>.log 11 Host Agent install example: msiexec /I “C:\Program Files\Microsoft System Center 2012\Virtual Machine Manager\agents\amd64\2.0.5007.0\vmmAgent.msi” /L*V c:\temp\vmmagent.log Viewing VMM Installation Logs When viewing the installation logs for the main VMM features, begin by viewing the log from the bottom, which lists the most recent activity. More Information: When viewing logs, it is common to see errors reported as Carmine error. “Carmine” was the code name of VMM during its development process. In most cases, the primary error text will be at the bottom of the log and will be indented from the rest of the lines as shown in the following figure. Figure 1: Sample vmmServer.log file As with any log-based troubleshooting, having a good understanding of what a successful installation log contains is valuable when trying to identify an issue versus expected 12 exceptions. Fully understanding the problem scenario, then successfully reproducing the same steps to obtain a known-good log is often needed so a valid working versus non-working log analysis may be performed. 13 Troubleshooting Tools Introduction Many tools can be used to troubleshoot problems that occur in VMM. Some can be classified as data collection tools, some help with capturing traces, some convert trace files from ETL format to LOG. In the sections that follow, you'll learn how to collect and analyze data surrounding a problem with VMM. Data Collection Tools The following tools are used to collect data surrounding a VMM problem. Obtaining the legacy MPS Reporting utilities The legacy Microsoft Product Support's Reporting Tools are available at the following location: http://www.microsoft.com/downloads/details.aspx?FamilyID=cebf3c7c-7ca5-408f-88b7f9c79b7306c0&DisplayLang=en Trace Collection DebugView for Windows Introduction DebugView for Windows is an application that lets you monitor debug output on your local system, or any computer on the network that you can reach via TCP/IP. It is capable of displaying both kernel-mode and Win32 debug output, so you don't need a debugger to catch the debug output your applications or device drivers generate, nor do you need to modify your applications or drivers to use non-standard debug output APIs. 970066 How to collect traces in System Center Virtual Machine Manager http://support.microsoft.com/default.aspx?scid=kb;EN-US;970066 Important: DebugView is generally only required when it is not possible to obtain an ETL trace through the MPSReport tool. DebugView does, however, provide information about technologies that are not directly related to VMM but that may have an effect on VMM behavior. 14 Note: ‘SCTrace.cmd’ is a script that automates the trace collection process by performing the steps below. This utility may save time and make the trace capturing process more user friendly for customers. The utility can be found at the following location: http://blogs.technet.com/jonjor/archive/2009/04/07/scvmm-tracing-made-easy.aspx Obtaining DebugView for Windows DebugView for Windows can be downloaded from the Windows Sysinternals site at the following link: http://technet.microsoft.com/en-us/sysinternals/bb896647.aspx Before Collecting a DebugView Trace 1. Install DebugView on the VMM server, the host in question, and/or the Web server (for troubleshooting self-service portal issues). 2. Save the following code into a text file and name it "odsflags.cmd": @echo off echo ODS control flags - only trace with set flags will go to ODS if (%1)==() goto :HELP if (%1)==(-?) goto :HELP if (%1)==(/?) goto :HELP echo Setting flag to %1... reg ADD "HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Tracing\Microsoft\Carmine" /v ODSFLAGS /t REG_DWORD /d %1 /f echo Done. goto :EXIT :HELP echo Usage: odsflags [flag], where flag is echo TRACE_ERROR = 0x2, echo TRACE_DBG_NORMAL = 0x4, echo TRACE_DBG_VERBOSE = 0x8, echo TRACE_PERF = 0x10, echo TRACE_TEST_INFO = 0x20, echo TRACE_TEST_WARNING = 0x40, echo TRACE_TEST_ERROR = 0x80, :EXIT 3. Save the following code into a text file and name it "odson.reg": Windows Registry Editor Version 5.00 [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Tracing\Microsoft\Carmine] "ODS"=dword:00000001 4. Save the following code into a text file and name it "odsoff.reg": 15 Windows Registry Editor Version 5.00 [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Tracing\Microsoft\Carmine] "ODS"=dword:00000000 5. Copy the three files created above, odsflags.cmd, odson.reg, and odsoff.reg, to the machines that DebugView will be used on. 6. Run the following commands at an elevated Command Prompt: odson.reg odsflags.cmd 255 (If you need to collect traces for both VMM Server and the host or the Web server, make sure to run these commands on all of these computers). Collecting a DebugView Trace 1. Open DebugView as an administrator and ensure that both Capture Win32 and Capture Global Win32 are selected on the Capture menu item. You should be able to see tracing from the VMM features showing up in DebugView. (If you need to collect traces for both VMM management server and the host, make sure to do these steps on all of these computers). Figure 2: DebugView.exe Capture menu 2. Restart the Virtual Machine Manager Service on the VMM management server by running the following commands at an elevated command prompt: net stop vmmservice net start vmmservice 3. Restart the Virtual Disk Service by running the following commands at an elevated command prompt: net stop vds net start vds 16 4. Restart the Virtual Machine Manager Agent service on the host by running the following commands at an elevated command prompt: net stop vmmagent net start vmmagent 5. Restart the IIS service on the Web server by running the following command at an elevated command prompt: iisreset 6. Reproduce the issue. 7. After reproducing the issue, save the output from the DebugView to a text file for analysis. Important: Turn off tracing after collecting the data by running the following command at a Command Prompt: odsoff.reg The format of a DebugView trace differs from that of an ETL trace since DebugView captures all Win32 application activity and, if selected, Kernel mode activity. If saved as a text file, it may be opened in other tools such as TextAnalysisTool.net or Trace32. Trace32.exe Introduction Trace32.exe, an executable found in System Center Configuration Manager 2007, can quickly open very large trace files and will automatically highlight lines with apparent errors. This tool will allow you to quickly open very large files and locate errors visually. Built-in are tools for filtering based on various parameters as indicated in the figure below. Obtaining the Trace32.exe Trace32.exe can be found at the location below, or with Configuration Manager 2007 media: http://www.microsoft.com/downloads/en/details.aspx?displaylang=en&FamilyID=5a47b972 -95d2-46b1-ab14-5d0cbce54eb8 17 Figure 3: Trace32 with error lines highlighted Figure 4: Filter settings in Trace32 18 TextAnalysisTool.Net Introduction TextAnalysisTool.NET was previously a Microsoft internal tool, but the author has made it available on his blog, http://blogs.msdn.com/delay/archive/2007/06/21/powerful-log-fileanalysis-for-everyone-releasing-textanalysistool-net.aspx as well. One benefit of this utility is the speed in which it can open very large files, similar to Trace32.exe. What makes this tool of greater value are the filters, markers, and quick search it provides. A description and screenshot of the tool’s features from the author below. Filters: Before displaying the lines of a file, TextAnalysisTool.NET passes the lines of that file through a set of user-defined filters, dimming or hiding all lines that do not satisfy any of the filters. Filters can select only the lines that contain a sub-string, those that have been marked with a particular marker type, or those that match a regular expression. A color can be associated with each filter so lines matching a particular filter stand out and so lines matching different filters can be easily distinguished. In addition to the normal "including" filters that isolate lines of text you DO want to see, there are also "excluding" filters that can be used to suppress lines you do NOT want to see. Excluding filters are configured just like including filters but are processed afterward and remove all matching lines from the set. Excluding filters allow you to easily refine your search even further. Markers: Markers are another way that TextAnalysisTool.NET makes it easy to navigate a file; you can mark any line with one or more of eight different marker types. Once lines have been marked, you can quickly navigate between similarly marked lines - or add a "marked by" filter to view only those lines. Find: TextAnalysisTool.NET also provides a flexible "find" function that allows you to search for text anywhere within a file. This text can be a literal string or a regular expression, so it's easy to find a specific line. If you decide to turn a find string into a filter, the history feature of both dialogs makes it easy. 19 Figure 5: TextAnalysisTool.NET SCVMM2012_MPSReports Introduction The VMM tracing tools were developed to provide a utility to manage, collect, and view various traces and diagnostic information in a VMM environment. It is made up of the VMM Trace Collector utility (mpsrpt_setup.exe) and the VMM Trace Viewer utility (TraceViewer.exe). The Trace Collector utility is used for managing and collecting traces, while the Trace Viewer utility is used for viewing VMM traces. You can download the tool from here: https://connect.microsoft.com/site799/Downloads You must register with the VMM for System Center 2012 Release Candidate program on Microsoft Connect to download the tool. Installation You can install the VMM tracing tools by running the MPSRpt_Setup.exe file which is an Out of band (OOB) release. The installation will prompt for a destination path to save the files. Tip: if you have a problem installing the application directly, try installing it from an elevated command prompt, using the command: “MPRRpt_Setup.EXE /Q /T:c:\mpsrpt /C” and check the log file for errors. 20 Note: In the above example, "c:\mpsrpt" is an example location. The location field is user defined and may be any location on the VMM Management server. SCVMM2012_MPSrpt has a dependency on the Sysinternals tool PsExec which is a lightweight telnet replacement that lets you execute processes on other systems, complete with full interactivity for console applications, without having to manually install client software. Important: PsExec must be located in the same folder as the SCVMM2012_MPSReports, or the c:\mpsrpt in the above example. PsExec, which is part of PStools may be downloaded at the following link: http://technet.microsoft.com/en-us/sysinternals/bb897553.aspx VMM Trace Collector – Managing and configuring tracing You can manage your traces through the VMM Trace Collector tool, mpsrpt_setup exe. To manage your traces, perform the following steps. Add your machines to the Trace collector tool Add your VMM server, hosts, library servers, clients, and P2V machines to the tool using the “Add Machine” button and typing the machine name into the new node. Manage and organize your machines into groups using the “Add Group” button, and typing the name of your new group into the new folder. Add machines and sub groups to your groups by selecting the group in the tree, then clicking the “Add Machine” or “Add Group” button. Start and stop traces Once you have your machines added and organized into groups, you can start and stop tracing on those machines by either right clicking on the groups or individual machines and selecting an option for which traces to enable. 21 You can disable traces in the same way. Save your configuration for later You can save your configuration and load a saved configuration using the load and save buttons on the main toolbar. When the tool loads, it loads the DefaultMachineConfig.mconfig configuration file. Replace this file if you would like a new default. VMM Trace Collector – Collecting traces To collect traces on one or multiple machines, select the machine or group in the tree, right click and select an option under the Collect Trace context menu. Select the “Advanced…” option to specify directory, session name, date ranges, and other options. 22 VMM Trace Collector – Collecting MPS reports data To collect additional diagnostic information that was originally in the MPSRPT tool, select the machine or group and click the “Generate Reports” button on the main toolbar. This will bring up the Reports UI, where you can specify the diagnostic information you wish to collect. Select the diagnostic information you want to collect and click the generate button. The report collector uses the MPSRPT engine to collect these reports, so the same permissions must be applied to these machines for this functionality to work properly. Important: The time required to run the report and collect data will vary depending upon the reports selected and the number of machines involved. 23 Diagnostic Data location By default, diagnostic data is collected to this location C:\MpsRpt\CollectedTraces\<trace session name>, but this can be configured in the advanced options dialog. 24 Filtering Trace Files The trace has a new feature to summarize (filter) large log files based on messages, dates and feature. 25 Viewing Trace Files The trace (.ETL) logs created by the SCVMM2012_MPSReports tool are not human-readable and must be loaded into an application that is capable of interpreting them. The preferred tool for interpreting the trace logs is the TraceViewer. The Trace Viewer (traceviewer.exe) has the following two primary functions: Converts the default binary .ETL trace logs into .CAR files that can be viewed in both TraceViewer and other trace parsing tools such as the TextAnalysisTool.net. Provides basic trace parsing. To convert the ETL file into CAR format for analysis, launch TraceViewer and drag the trace file into the open pane. This will result in a prompt on where to save the CAR file. The default location is the same folder as the ETL or trace file. Note: Once the file has been saved, TraceViewer will open the converted ETL file for analysis. 26 SQL Server Management Studio Express Introduction Microsoft SQL Server Management Studio Express is a freely-available, easy-to-use graphical management tool for managing instances of a SQL Server Database Engine created by any edition of SQL Server 2005. It may be used to view and edit the SCVMM-specific SQL database tables. Obtaining SQL Server Management Studio Express SQL Server Management Studio Express is available externally at the following location: http://www.microsoft.com/downloads/details.aspx?FamilyId=C243A5AE-4BD1-4E3D94B8-5A0F62BF7796&displaylang=en Installation instructions are provided on the download page. 27 Figure 6: Viewing TR_TaskTrail database 28 Advanced Troubleshooting Interpreting Job Failures Introduction Reviewing VMM traces has become so commonplace that we may tend to forget other information related to the failure. More important than the trace itself is the context of the failure. What was being performed at the time and what error message was recorded for the task? These other sources of information are your starting point. They are what make the trace valuable. Reading a VMM trace is an art and a science. There are basic methods to follow and items to locate in the trace, but the only way to become proficient is practice, practice, practice. Perform actions on your own machines, gathering a trace at the same time, and then review the trace to learn how the actions you initiated are recorded. What follows are general best practices for resolving customer issues by using all data available. Where to begin To understand a failure you need information from various sources. You cannot begin blindly reading a trace, for example, and expect to get far. Prepare yourself with these items: What job was being run at the time of the trace? A P2V, adding a host? Which trace do you have? Depending on the job you will need traces from more than one system. A trace from the VMM management server will always be needed. If performing a P2V a trace will be required from the Source machine and Destination Host as well. You must also know which trace is from which machine. Did the job fail or was it cancelled? The better answer is ‘failed’. A failed job will produce an error in the Admin Console along with a hex error code. This error code is the first item you will search for in the trace starting from the bottom, so it’s important to have. Verify that the trace being reviewed was run during the time that the error occurred. Tools to Use Essentially, Notepad.exe is all that is required to view a trace. Unfortunately, Notepad takes a long time to open the large traces created by VMM. In the examples that follow, TextAnalysisTool.Net will be used instead. A copy of all error codes returned by VMM should also be kept on hand for reference. A list of VMM 2008 R2 error codes is available here: http://social.technet.microsoft.com/wiki/contents/articles/virtual-machine-manager-vmm2008-r2-error-codes.aspx Interpreting Job Failures When a job fails, select the ‘Jobs’ tab in the Admin Console and review the error recorded. There will be an error code, usually a 3 to 5 digit number. Below this will be more specific 29 information that explains what went wrong in plain English (or localized language). There will also be a return code in hexadecimal format and a recommended action if available. The error will begin with ‘0x’ followed by eight digits. Often this return code is related to WinRM and if the meaning of the return code is not already provided it can easily be determined. Take the example below: Figure 7: Error from VMM Admininstrator Console ‘Error (2915),’ though specific, is really not specific enough. Perform a search of the VKB or the Internet and you will find numerous reasons for this error. To use this error effectively let’s dig deeper. First, read the error as it is presented. ‘The WS-Management service cannot process the request. Object not found on the <servername> server.’ This error also provides a Recommended Action. All of this is useful information: WS-Management is a reference to WinRM. So the issue is likely caused by a condition that led to a general communication failure, which bubbles up in VMM as a WinRM error. Checking the WinRM service on the indicated server would be a good starting place. Object not found on <servername>. Whatever the name of the server provided, this is where you should focus your efforts. The recommended action suggests that the agent be checked, and recommends rebooting <servername>. The only agent used in VMM is the VMM Agent or P2V Agent. A better action than rebooting the server indicated would be to check and restart the VMM Agent service. From this simple error we have two action plans: 1. Verify the WinRM service on the remote server. 2. Verify the VMM Agent on the remote server. Further, if all servers are reporting this error, it seems likely that the issue may in fact involve the VMM management server itself. Check the services mentioned on the VMM management 30 server, and verify communication with simple WinRM tests explained later in this document under ‘WinRM Troubleshooting’. Now let’s imagine two things being different regarding this error. First, let’s imagine that there was no explanation for the return code (not too difficult as it did report ‘Unknown error (0x80338000)), and that there was no recommended action. What now? First, look up the original 3 to 5 digit error code, in this case ‘2915.’ Searching ‘Error Codes_VMM R2’ we find the following: 2915 The WS-Management service cannot process the request. Object not found on the %ServerName; server. Ensure that the agent is installed and running. If the error persists, reboot %ServerName; and then try the operation again. This represents the code, the message, and the recommended action. So, in this case the recommended action was already provided in the Admin console message, but this is not always the case. Let’s move on. Resolve the return code. As these are usually WinRM related, start there. winrm helpmsg 0x80338000 This returns the following: winrm helpmsg 0x80338000 The WS-Management service cannot process the request. The service cannot find the resource identified by the resource URI and selectors. Ok, maybe this is not the most useful error message, but it is a bit different than that provided in the Admin Console and may give you a few key terms that return a better result on the Internet. Also, notice that if you did not precede the eight digit code with a ‘0x’ nothing is returned. This is a ‘feature’ of winrm help. Keep in mind that if winrm returns nothing for the error message, the error probably is not winrm related. There is one additional trick that can narrow down an error even more. Take the last four digits of an eight digit hex code and run this through ‘net helpmsg’. This is worth testing, but the results are not predictably useful. A final comment on return codes; you may have noticed that most of the return codes you see begin with ‘0x8’. If the first four digits of the code begin with 8004, 8007, or 80005 (three zeros) you have a WMI related error. Don’t mistake this to mean the error is due to WMI, just that its origin can be determined. Using the table below we see that errors beginning with 80041xxx or 800440xx did in fact originate in WMI, and so WMI should be investigated. Errors beginning with 8007xxx, 80040xxx and 80005xxx originated elsewhere, although they were reported through WMI. Skip WMI and look further for the source of the error in most cases with these. This table is included in the WinRM and WMI appendix module of this training. 31 Occasionally there will be an error code beginning with ‘-2’ that is ten digits long. These can be converted into regular hexadecimal numbers by entering the number, minus sign included, into calc.exe while in decimal mode, then change the format to hex. Tip: -2147024809 for example becomes FFFFFFFF80070057. Just remove the first eight ‘F’s. Table 1: Common ranges of WMI errors Term Description 0x800410xx 0x800440 Errors that originate in WMI itself. A specific WMI operation failed because of: An error in the request, for example, a WQL query fails or the account does not have the correct permissions. A WMI infrastructure problem, such as incorrect CIM or DCOM registration. 0x8007xxx Errors originating in the core operating system. WMI may return this type of error because of an external failure, for example, DCOM security failure. 0x80040xxx Errors originating in DCOM. For example, the DCOM configuration for operations to a remote computer may be incorrect. 0x80005xxx Error originating from ADSI (Active Directory Service Interfaces) or LDAP (Lightweight Directory Access Protocol), for example, an Active Directory access failure when using the WMI Active Directory providers. Analyzing the Trace Introduction If there is a trace available, the first thing to do is identify the task failure in the ‘Job’ view of the Admin Console. This represents the actual job failure and is your starting point. Begin at the bottom of the trace and search up for the hex error code (0x80041005 for example). It will be there if you are in fact looking at the right trace. Although both keywords below can be found in a trace, here’s a tip on determining what is being called: If the keyword ‘ServerConnection’ is found, this is a Host making reference to an attempt to contact the VMM Server. If the keyword ‘ClientConnection’ is found, this is a Server making reference to an attempt to contact a VMM Host. Once you have found the hex return code in the trace you need to understand the structure of the trace. First, keep in mind that traces are asynchronous. Simply meaning that there are many jobs running at once, all being recording by the trace, and that the line right above the one you find of interest may be from an entirely unrelated task. Pay attention to the PID (Process ID) and TID (Thread ID) of the line you are on… and if there is a TaskID, write it down as this represents the job itself. As you move up through the trace beginning with the hex code you will likely run into an 32 exception. An exception represents a job failure and probably has the answer to the issue at hand. Exceptions are also easy to identify as they are indented and many lines begin with ‘at Microsoft.’. In the example below work your way up searching for ‘0x80338029’. The ‘TaskID’ is also visible. Finally, notice that the bottom three lines are not related to the exception as the PID and TID do not match that of the exception lines. Figure 8: Trace example Tip: Even though the word ‘exception’ will allow you to locate failures in a trace easily, not all exceptions are related to real issues. Exceptions involving ‘NPIV’ for example are numerous and can usually be ignored. Walking up through the trace it is possible to isolate the exact function or operation being performed at the time the exception occurred. Often there is a corresponding remote WMI call being made that fails. These remote WMI calls are delivered as the payload of a WS-Man request, and then ran on the remote machine. Take note of these operations and attempt to reproduce outside of VMM. More information on this is below in the WinRM and WMI sections. Isolating specific tasks in a trace Each VMM task and subtask is identified by a task ID. The task ID is a GUID assigned to a task when the task is built. If any subtasks are required to complete a primary task, then a separate subtask ID is assigned to the subtask. Background tasks, such as refresh operations, capacity 33 management also have specific task IDs assigned to them. When troubleshooting problems that are not related to a specific user-initiated task, it is important to determine the task ID for the background task. Every task performed by VMM is tracked and stored via the following three key tables in the VMM database: Audit Task Trail database Task trail database Subtask trail database There are separate databases for storing individual task types, such as refresh operations, but these three databases can typically be used to identify all task operations. These tables are part of the Task Repository functionality of the VMM server engine. It is possible to view an organized report of each task, including its task ID, in the Microsoft SQL Server Management Studio Express application. To view the Task Trail database, perform the following steps: 1. Open Microsoft SQL Server Management Studio Express. 2. In the left pane, go to: VirtualMachineManagerDB\Tables\dbo.tbl.TR_TaskTrail. 3. Right-click on the dbo.tbl.TR_TaskTrail database and select Open Table. The TaskTrail database records both user initiated tasks and background scheduled tasks such as refresh operations. The entries in this table are tombstoned at 90 day intervals by default. Note: It is possible to modify the tombstoning frequency by changing the TaskGC value, defined in days, in the following registry key: HKEY_LOCAL_MACHINE\Software\Microsoft\Microsoft System Center Virtual Machine Manager Server\Settings\Sql This information is also contained in a KB: The Virtual Machine Manager service may consume high memory or CPU utilization http://support.microsoft.com/kb/2009348 Some of the key entries in the Task Trail database include: Task ID: Guid Task State: Success/Failed (Task) Description 34 Any error codes encountered Start and End date time PowerShell Commandlet name Owner: User account which initiated the task Was the user notified of the task success or failure via a message or error? After locating the Task ID of the task which failed, there are a number of methods for isolating that specific task within a trace. Method 1: .CAR file and FIND After converting the ETL trace to a CAR file, run the following command at a Command Prompt to pipe all of the lines in the trace relating to the specific Task ID (obtained from the Task Trail database) to the taskid.txt text file: find /i /n “(Task ID)” path_to_car_file.car >taskid.txt This method is also very useful when using the PID and TID of a Task. Method 2: TextAnalysisTool.net After converting the ETL trace to a CAR file, open the file in TextAnalysisTool.net. 1. Click on Filter > Add filter. 2. Enter the Task ID obtained from the Task Trail database and click OK. 3. Click View and select Show Only Filtered Lines. WinRM Troubleshooting Windows Remote Management (WinRM) provides the communication services for communication between the VMM management server and the VMM Host agents. This includes inter agent communications to the following hosts: Hyper-V Virtual Host Servers Virtual Server Host Servers Remote Library Servers The SCVMM 2008 R2 VMMServer service utilizes the WinRM scripting API (versus COM) for communication with the Windows Remote Management service on the managed hosts. 35 Tip: WinRM error messages usually provide a hex error code that is useful in understanding the problem. To resolve the hex code to error message type: winrm helpmsg 0xXXXXXXXX If the hex code is not preceded by ‘0x’ this command will not work. Test local WinRM functionality When troubleshooting WinRM treat the process as if performing a simple network ping test. Type the command below at an elevated command prompt: winrm id This should produce output similar to below: IdentifyResponse ProtocolVersion = http://schemas.dmtf.org/wbem/wsman/1/wsman.xsd ProductVendor = Microsoft Corporation ProductVersion = OS: 6.1.7201 SP: 0.0 Stack: 2.0 If an error is generated as opposed to the output above, perform a Quick Configuration of WinRM: winrm qc You may be prompted with something similar to below. Answer ‘Yes’ to any requests. WinRM already is set up to receive requests on this machine. WinRM is not set up to allow remote access to this machine for management. The following changes must be made: Enable the WinRM firewall exception. Make these changes [y/n]? WinRM has been updated for remote management. WinRM firewall exception enabled. WinRM can now be tested again by typing ‘winrm id’ as before. Test remote WinRM functionality The second half of a WinRM test establishes that the remote server has WinRM configured correctly. At an elevated command prompt type: winrm id –r:<remoteserver> This should produce output similar to below: winrm id -r:vmmr2lab-cl20 IdentifyResponse ProtocolVersion = http://schemas.dmtf.org/wbem/wsman/1/wsman.xsd ProductVendor = Microsoft Corporation 36 ProductVersion = OS: 6.1.7201 SP: 0.0 Stack: 2.0 If instead an error such as below appears, this means WinRM is not set up correctly on the remote machine, or there is something preventing communication over port 80 between the two systems. This could be a firewall or antivirus/malware programs. WSManFault Message = The WinRM client cannot complete the operation within the time spe cified. Check if the machine name is valid and is reachable over the network and firewall exception for Windows Remote Management service is enabled. Error number: -2144108250 0x80338126 The WinRM client cannot complete the operation within the time specified. Check if the machine name is valid and is reachable over the network and firewall exce ption for Windows Remote Management service is enabled. In this event, test local WinRM functionality on the remote system. If WinRM is configured correctly on the remote system as well the cause is most likely network communication between the two machines. Troubleshoot this as you would any network issue. WMI Troubleshooting When troubleshooting VMM issues that appear to be related to WMI, it is important to note that WMI errors may simply indicate the inability to communicate with the WMI service on a remote machine, not necessarily a problem with the service. This could be due to permissions or network-related issues. It is best to test and troubleshoot suspected WMI issues outside of VMM, using the WMI troubleshooting tools that are available. WMI troubleshooting tools and related information is discussed in this section. Note: It is not within the scope of this document to discuss WMI troubleshooting in detail. The information provided in this section is meant to provide the reader with basic knowledge to assist in determining if additional WMI troubleshooting is required. WMI Troubleshooting Tools The following tools may be used to test and troubleshoot the majority of WMI issues: Windows Management Instrumentation Tester (wbemtest.exe) WMI Service Control Utility (winmgmt.msc) WMIC command line 37 More Information: More information about WMI troubleshooting is available at the following location: http://msdn.microsoft.com/en-us/library/aa394603(VS.85).aspx Windows Management Instrumentation Tester (wbemtest.exe) The Windows Management Instrumentation Tester (wbemtest.exe) provides the ability to connect to and query WMI namespaces, on local or remote machines. It may be used to verify that an object exists in a particular namespace. More Information: More detailed WBEMTest usage information available at the following location: http://technet.microsoft.com/en-us/library/cc785775.aspx Wbemtest.exe is well suited to quickly test connectivity to local and remote WMI namespaces. Connecting to a namespace on a local machine verifies that the namespace is properly registered and accessible via the WMI service. Connecting to a namespace remotely additionally verifies that WMI connectivity between the two machines is working. Verify WMI Namespaces Each namespace below represents a point of failure that can be tested. To open this tool, type ‘wbemtest.exe’ at the Run prompt. The following screenshot shows the method used to connect to the remote server VMM2008R2-03C. The ‘Connect…’ button has been clicked on the ‘Windows Management Instrumentation Tester’ window, then the name of the remote server appended to the beginning of the ‘root\cimv2’ namespace. Below are three namespaces to verify. Verify basic WMI service availability by connecting to ‘root\cimv2’ Verify that the virtualization namespace (Hyper-V) is available by connecting to ‘root\virtualization’ Verify that the VMM Agent namespace is available by connecting to ‘root\scvmm’ Note: The SCVMM namespace will not be available unless a Host or P2V Agent is installed and running on the target system. 38 Figure 9: wbemtest.exe used to connect to remote namespace WMI Service Control Utility (winmgmt.msc) The WMI Service Control Utility configures and controls the WMI service. This tool allows permissions to namespaces to be viewed and modified. To open this tool type ‘winmgmt.msc’ at the Run prompt. In the following screenshot, ‘WMI Control (Local)’ has been right-clicked and ‘Properties’ selected. ‘Root’ was then selected on the ‘Security’ tab and the ‘Security’ button clicked in the lower right of the window. At this point permissions can be viewed and modified. 39 Figure 10: winmgmt.msc Security settings WMIC command line The WMIC command line tool is useful for quickly pulling information about local or remote systems. The same namespaces as described for the ‘wbemtest’ tool can be verified as follows: Local Test local CIMV2 namespace. wmic /namespace:root\cimv2 context Output below verifies connectivity. 40 NAMESPACE ROLE NODE(S) IMPLEVEL [AUTHORITY AUTHLEVEL LOCALE PRIVILEGES TRACE RECORD INTERACTIVE FAILFAST OUTPUT APPEND USER AGGREGATE : : : : : : : : : : : : : : : : root\cimv2\root\cimv2 root\cli VSTATION IMPERSONATE N/A] PKTPRIVACY ms_409 ENABLE OFF N/A OFF OFF STDOUT STDOUT N/A ON Other namespaces to check: wmic /namespace:root\virtualization context wmic /namespace:root\scvmm context Remote Test remote namespaces. wmic /node:"vmm2008r2-03c" /namespace:root\cimv2 context wmic /node:"vmm2008r2-03c" /namespace:root\virtualization context wmic /node:"vmm2008r2-03c" /namespace:root\scvmm context WMI Error Constants The table below provides information regarding ranges of WMI return codes. Table 2: Common ranges of WMI errors Term Description 0x800410xx 0x800440 Errors that originate in WMI itself. A specific WMI operation failed because of: • An error in the request, for example, a WQL query fails or the account does not have the correct permissions. • A WMI infrastructure problem, such as incorrect CIM or DCOM registration. 0x8007xxx Errors originating in the core operating system. WMI may return this type of error because of an external failure, for example, DCOM security failure. 0x80040xxx Errors originating in DCOM. For example, the DCOM configuration for operations to a remote computer may be incorrect. 0x80005xxx Error originating from ADSI (Active Directory Service Interfaces) or LDAP (Lightweight Directory Access Protocol), for example, an Active Directory access failure when using the WMI Active Directory providers. 41 A complete list of WMI errors and their meanings may be found at the following location: http://msdn.microsoft.com/en-us/library/aa394559(VS.85).aspx BITS Troubleshooting Introduction Background Intelligent Transfer Service (BITS) transfers files (downloads or uploads) between a client and server and provides progress information related to the transfers. You can also download files from a peer. BITS can transfer files asynchronously between a client and a server. VMM uses BITS to transfer payload between managed computers. These data transfers are encrypted by using a self-signed certificate generated at the time a host machine is added to VMM. Verify BITS outside of VMM The following steps can be used to verify that BITS is working properly outside of VMM. BITSadmin can be downloaded here: http://msdn.microsoft.com/enus/library/aa362813(VS.85).aspx BITSadmin examples here: http://msdn.microsoft.com/enus/library/aa362812(VS.85).aspx 1. Download BITSadmin.exe using the above link. It is also in the Windows Server 2003 Resource Kit. It does not appear in the tools directory, but is accessible from the command line. 2. Click on … Start > Programs > Windows Resource Kit Tools > Command Shell 3. This should open an elevated command prompt 4. Create a C:\Temp directory if one does not already exist 5. Type the following command: bitsadmin /transfer myDownloadJob /download /priority normal http://msdl.microsoft.com/download/symbols/debuggers/dbg_amd64_6.11.1.404.ms i c:\temp\dbg_amd64_6.11.1.404.msi 6. This command will automatically use BITSadmin.exe to test BITS functionality by downloading a file from our website to the local temp directory 7. This should take 10-20 minutes to complete 8. This test should be ran on the VMM management server, Host computer and Source 42 machine BITS Compact Server This method will allow you to test BITS in the same manner in which VMM uses BITS. On the destination host, try the following steps: 1. Disable BITS Compact Server. This feature is found in the Add/Remove Feature wizard. 2. Restart computer 3. Re-enable BITS Compact Server If you tried these steps and still have problems, run the following command on the host machine: winrm invoke CreateJob wmi/root/microsoft/bits/BitsClientJob @{Displayname="Test of fake job";RemoteUrl="http://download.microsoft.com/download/D/0/E/D0E6D2C1-25934017-B26D7375BC9263D5/PowerShell_Setup_amd64.msi";LocalFile="PATH_TO_LOCAL_DESTINATION"; Type="0";ServiceAccount="0";Suspend="true";Description="Description for fake job"} Please replace PATH_TO_LOCAL_DESTINATION with a path to an existing directory, but to a non-existing file (e.g., c:\httpRec\test.txt, where directory c:\httpRec exists, but test.txt in that directory does not). Don't provide path to an existing file, as it might overwrite it. This will attempt to create a BITS client job to download a file from a fake url. Take note of the output. If job succeeds, record the JobId. Example: CreateJob_OUTPUT JobId = {8DC2BE2F-0D2A-41B8-AEAD-F6DBED586E98} ReturnValue = 0 If above command succeeds, clean up the job by the following command: winrm Invoke SetJobState wmi/root/microsoft/Bits/BitsClientJob?JobID={8DC2BE2F0D2A-41B8-AEAD-F6DBED586E98} @{JobState="0"} 43 Replacing jobID with the job ID for your job. If the JobId was not returned with the CreateJob command above, you can find created dummy job by: bitsadmin /list /allusers Look for job with above name/description. Also look for any Suspended jobs and delete by using the following commands: Bitsadmin /cancel {Job_GUID} BITS Traces Occasionally it will be necessary to obtain BITS traces while reproducing the error. Create bitslog.cmd using below batch file Open an elevated powershell and navigate to the "C:\bits folder" Type the following commands: Bitslog /enable Bitslog /collect c:\bits # choose an appropriate directory name Perform operations and reproduce failure Type the following commands from powershell Bitslog /disable Collect the bits.log file for analysis 44 The following batch file should be renamed bitslog.cmd @echo off REM Script for enabling bits logging/collection to be used REM while reporting BITS issues setlocal ENABLEDELAYEDEXPANSION set DEFAULT_LOG_SIZE=20 set DEFAULT_LOG_DIR=%TEMP%\bits-logs set BitsKey=HKLM\Software\Microsoft\Windows\CurrentVersion\BITS if {%1} == {} goto Usage if {%1} == {/enable} goto :EnableLog if {%1} == {/e} goto :EnableLog if {%1} == {/disable} goto :DisableLog if {%1} == {/d} goto :DisableLog if {%1} == {/collect} goto :CollectLog if {%1} == {/c} goto :CollectLog 45 goto :Usage :EnableLog if {%2} == {} ( set log_size=%DEFAULT_LOG_SIZE% ) else ( set log_size=%2 ) echo Enabling logging for BITS with log file size as %log_size% reg add %BitsKey% /v LogFileFlags /t REG_DWORD /d 0xfbcf /f > NUL reg add %BitsKey% /v LogFileSize /t REG_DWORD /d %log_size% /f > NUL echo Restarting BITS for registry values to take effect net stop bits net start bits goto :eof :DisableLog echo Disabling logging for BITS reg delete %BitsKey% /v LogFileFlags /f > NUL reg add %BitsKey% /v LogFileSize /f > NUL echo Restarting BITS for registry values to take effect net stop bits net start bits goto :eof :CollectLog if {%2} == {} ( set log_dir=%DEFAULT_LOG_DIR% ) else ( set log_dir=%2 ) if NOT EXIST %log_dir% ( md %log_dir% > NUL if ERRORLEVEL 1 ( echo Failed to create the log dir %log_dir%. Not saving logs goto :eof ) ) echo Copying the BITS logs to %log_dir% directory REM Flush the current log for BITS logman update bits -ets -fd > NUL copy %windir%\system32\bits.log %log_dir% /y copy %windir%\system32\bits.bak %log_dir% /y REM copy the build info reg query "HKLM\Software\Microsoft\Windows NT\CurrentVersion" /v BuildLab > %log_dir%\bld_info REM get the output of bitsadmin bitsadmin /list /allusers /verbose > %log_dir%\AllJobs.txt goto :eof 46 :Usage echo BITSLOG usage echo bitslog /enable [^<no^>] - To enable BITS logging and specify the log size as ^<no^> MB echo BITS service will be restarted echo If ^<no^> is not specified, default is 20 MB echo bitslog /disable - To disable BITS logging echo BITS service will be restarted echo bitslog /collect [^<dir^>] - To collect BITS logs in the specified dir echo If ^<dir^> is not specified, default is %%TEMP%%\bits-logs goto :eof Useful KB Articles and Blogs How to Troubleshoot Slow BITS Performance, Hosts 'Not Responding' and 'Needs Attention' Communication Issues http://blogs.technet.com/b/jonjor/archive/2008/12/29/how-to-troubleshoot-slow-bitsperformance-hosts-not-responding-and-needs-attention-communication-issues.aspx Transfers between the System Center Virtual Machine Manager server, the Library Server and the Virtualization Hosts may fail with Error 12700 or 2912 http://support.microsoft.com/default.aspx?scid=kb;EN-US;2405062 P2V fails with Error 2912 0x80072F0C with System Center Virtual Machine Manager 2008 or System Center Virtual Machine Manager 2008 R2 http://support.microsoft.com/kb/2385280 47