ENTERPRISE SYSTEM HEALTH CHECK PRESENTED BY ERIC GERRITY Eric Gerrity, Sr. Technical Consultant Technical Services Group eric.gerrity@thomsonreuters.com Enterprise System Health Check What to look for • DB SERVER: • SQL Server logs: A veritable wealth of information within the SQL logs. These show backups, object creation/use; job information & and even compiling errors • Windows logs: just like driving, be careful with yellow & red! Enterprise System Health Check What to look for (con’t) • Application Server: • Windows logs: just like driving, be careful with yellow & red! • Elite logs (residing in /elite/work/logs): Each Eliterelated service has it’s own log file for each db instance; all errors for any of the instances write to ‘errlog’ • Scheduled Tasks: log file within Control Panel -> Scheduled Tasks (not in date order) Enterprise System Health Check What to look for (con’t) • Webview Server: • Windows logs: just like driving, be careful with yellow & red! • Almost all Webview informational and error messages output to Windows Application logs; document these so that you can add them to your Support case. Enterprise System Health Check – Learn and Practice… – To maintain system AVAILABILITY – To maintain system PERFORMANCE Plan Build Maintain Enterprise System Health Check – Requirements Gap-Analysis •Maintain compliance with requirements – Elite’s Product System Requirements (PSR) document • Hardware • OS Edition • SQL Server Edition – Your potential requirements • Fault-tolerance • Disaster-recovery Enterprise System Health CheckConfiguration Gap-Analysis •Maintain compliance with Elite’s installation model – Summarized in Elite’s Enterprise Administrator’s Guide Also see Elite’s – Product System Requirements (PSR) guide – Windows Server 2003/2008 Installation guide – SQL Server 2005/2008R2 Components Installation guide – *** DO NOT apply SP1 for Windows 2008R2 without opening a case with Support first! There is a patch for application servers if SP1 is desired *** Enterprise System Health CheckSQL Server Memory – Minimum of 2 GB – Prefer lower of 10% database size or 8 GB – Set “Lock pages in memory” local security policy – Fix min. & max. SQL Server memory Enterprise System Health CheckSQL Server Parallelism – Max degree of parallelism = half # physical cores – Disable Hyper-Threading – Explore optimizations with your networking vendor Enterprise System Health CheckNetCPS server 1: netcps –s server2 : netcps <IP of server 1> NetCPS 1.0 - Entering client mode. Press ^C to quit Connecting to 191.161.1.112 port 4455... Connected! ---> CPS 965428.00 KPS: 942.80 MPS: 0.92 Avrg CPS 493292.00 KPS: 481.73 MPS: 0.47 Peek CPS 1555187.38 KPS: 1518.74 MPS: 1.48 Done. 104857600 Kb transferred in 212.57 seconds. Enterprise System Health CheckSQL Server Agent Jobs Job Purpose Schedule Database Integrity Check Runs DBCC integrity checks Weekly Deep Update Statistics Updates stats used by query optimizer Weekly Re-index Database Rebuilds table indexes Monthly Clean Winout Purges temp parameters Weekly Clean Winoutstat Purges Report Manager reports Weekly Enterprise System Health CheckWindows Scheduled Tasks Task Purpose Schedule cleantmp.ksh Purges temp files older than 2 days Daily inq_snap.ksh Rebuilds Inquiry summary tables Daily valindex.ksh Re-indexes SearchServer database for Conflicts Weekly internet.ksh Rebuilds WebView summary tables Weekly ENTERPRISE SYSTEM HEALTH CHECK LOG FILE MONITORING •Check log files daily – Windows – SQL Server – Application Automate as much as possible! Enterprise System Health CheckDatabase Mail In SQL2005: Enable via Surface Area Configuration tool • • • • OR exec sp_configure 'show advanced options', 1 reconfigure exec sp_configure 'Database Mail XPs', 1 Reconfigure • SQL2008R2: • Available natively; • Define operator(s) & conditions Enterprise System Health CheckDatabase Mail (cont’d) • • • • sendmail.bat set mailbody=%1 set mailbody=%mailbody:~1,-1% sqlcmd -S.\sql2008 -E -Q "declare @subject sysname; set @subject = 'Performance alert on ' + @@SERVERNAME; EXEC msdb.dbo.sp_send_dbmail @recipients=' <your_email_address>@company.com ',@subject = @subject , @body = '%mailbody%', @body_format = 'TEXT' ;“ • Then configure Perfmon to run sendmail.bat for threshold alerts Enterprise System Health CheckPerformance Monitoring •Establish a baseline against which to periodically compare – User experience – Performance metrics • Database server • Application & WebView servers • Citrix servers • Other Enterprise System Health CheckDatabase Server Disk Subsystem •Test integrity & performance – Microsoft SQLIOSim (replaces SQLIOStress) • Tests I/O path for problems that may corrupt data • Microsoft KB article 231619 – Microsoft SQLIO • Tests I/O capacity • Search Microsoft for documentation Enterprise System Health CheckSQLIOSim RUN •C:\SQLIOSim>sqliosim.com -dir c:\SQLIOSimTEST •ID User Information Complete •---------------------•1288 Main User Refreshed 366 times •3408 Display Monitor 9:18:12 •280 Overall Test Progress Full Test Run #1 25% •1772 Checkpoint Sleeping •4460 LazyWriter Sleeping, 1141 modified •4620 LogWriter Sleeping, 4915 processed •5016 Random Access 0:460847, Reading page(s) 94% •4592 Random Access 0:123619, Reading page(s) 94% •4928 Bulk Update 0:392084, Reading page(s) 95% •3764 Bulk Update 0:349364, Reading page(s) 95% •3144 Page Audit 0:115776 95% •Errors (0), warnings (13) reported to log file Enterprise System Health CheckSQLIOSim Test Results – Consult log file & Windows Event log for details – Consult hardware manufacturer if errors – Capture Win32 API calls Enterprise System Health CheckSQLIO Run – Edit set_proc_sock.txt to set number of physical cores and path of test file – Edit & run sqlio_1v1f(1x8).cmd once for each path to test – Creates time stamped log files at location of .cmd file Enterprise System Health CheckSQLIO Test Results – Compare results to Elite PSR document RANDOM WRITE 64k TEST (1 volume, 1 file) =================================== ... CUMULATIVE DATA: throughput metrics: IOs/sec: 104.69 MBs/sec: 6.54 latency metrics: Min_Latency(ms): 0 Avg_Latency(ms): 75 Max_Latency(ms): 639 Enterprise System Health CheckSQLIO Test Results vs. PSR I/O Guidelines – 15K RPM Drives 15,000 RPM SQLIO Guidelines (IO/sec) ACU Random Writes Random Reads 51 – 65 720 1680 66 – 80 900 2040 81 – 100 1080 2400 101 – 160 1260 2640 ... ... ... ENTERPRISE SYSTEM HEALTH CHECK PERFMON – Set up counters to log (Processor queue length; RAM; Logical disk counters; flavor to taste) – Use workstation to collect & analyze data – Sample no more than every 15 seconds – Analyze data Enterprise System Health CheckLog Analysis Microsoft Performance Analysis of Logs (PAL) – Reads Performance Monitor logs – Analysis using role-specific thresholds – HTML-based reports – Requires Microsoft Log Parser • .NET Framework 2.0 • OWC Enterprise System Health CheckSQL Profiler – Simultaneously collect Perfmon & SQL Profiler logs – Allows pinpointing of queries that correspond to exceeded thresholds Enterprise System Health CheckTips to Protect Performance – BIOS & driver updates – Windows & SQL Server updates – New software – Malware – Disk defragmentation Enterprise System Health CheckChange Management – Authorizes changes – Prioritizes changes – Tests changes – Promotes changes Enterprise System Health CheckCapacity Planning – Performance history – Number of concurrent users – Storage capacity and rate of consumption – New hardware requirements Enterprise System Health CheckBackups and Recovery – Business owners define data retention and recovery requirements – IT selects tools and processes to meet those requirements – Elite defines what data to backup but not how or how often – IT should periodically test recovery Enterprise System Health CheckThanks for Attending!!!! • DON’T FORGET!!! The new method to open cases, check status, etc. can be found here: • http://customerportal.elite.com/ • Please feel free to contact me via e-mail (eric.gerrity@thomsonreuters.com) or phone (913-422-4228) •Please, Please, PLEASE ask questions!!!