SIM343 Werner Heisenberg, 1927 • Unreliable incident detection Operations • Limited communication of knowledge between Operations and “It is Development your my code “Why? It works fine in !” “So server, how is itnot failing?” staging” • Incorrect problem attribution • Time consuming problem resolution “But I only know it’s your unavailable “How do I monitor .NET “How should I know, I didn’t write “!@$#% $^%& *^!!!” when my customers call” application availability in thefor code” production?” Development “the management of existing business-facing applications to optimize value delivered for acceptable cost and risk.” - Gartner - 2010 Or in simpler terms “Application Performance Management” Application Real-Time Architecture Discovery and Modeling End User Experience Monitoring Application Performance Analytics Optimize Resources Improve Business Processes User-Defined Transaction Profiling Application Component Deep Dive Monitoring Web Browser CSM Collector Event information is displayed or the URL of the event is sent via e-mail notification SCOM 2007 Server SQL Event Storage SE-Viewer Server Operations PC Structured Events are sent through WMI to SCOM SQL Event Storage Development PC Portable Device Management Platform End User Service Telemetry Layer Requests Service SCOM + .NET MP NOC Operator Volume based alerts for unknown problems Reliability and Performance monitoring Security/Connectivity problem remediation DB performance troubleshooting KPI baselining/monitoring SE-Viewer Problem management Troubleshooting/debugging Support/Dev Advisor Biz/App Owner QoS analysis Before/After Quick wins Application scoring SLA management Service Distributed application Requests End User Platform Service Telemetry Layer Application Component Request Dependency Discovery ASP.NET, WCF, COM+, NT Service, Winforms QoS metrics: Availability = available time / total time Reliability = successful requests / total request Performance = satisfied count / total request Performance SLA Critical/Non-critical internal failures Run-time root cause info Distributed chaining Request metrics: call/sec, avg time, % failures, % perf degradation Resource calls (SQL, WCF, .NET Remoting) Connectivity & Security failures Platform Worker process load: CPU Memaory I/O # Problems Agile Release 1-4 weeks Traditional Release 12-24 months Known Unknown Time Changes in the environment (elasticity, 3rd party dependency) and release frequency invalidate knowledge about application behavior therefore exacerbating the need for highly adaptable LOB application monitoring Executive – Improving Business Performance Management – Optimizing Resources and Processes Architecture Web 2.0 Development QA/Test Production ! Support .NET Transaction Monitoring Lower costs and simplify management of datacenter applications Optimize availability and performance of critical LOB applications Build unified management of applications from the datacenter to the cloud Blue Section http://www.microsoft.com/cloud/ http://www.microsoft.com/privatecloud/ http://www.microsoft.com/windowsserver/ http://www.microsoft.com/windowsazure/ http://www.microsoft.com/systemcenter/ http://www.microsoft.com/forefront/ http://northamerica.msteched.com www.microsoft.com/teched www.microsoft.com/learning http://microsoft.com/technet http://microsoft.com/msdn