WW TSS-10 Application Server 2012: Advanced Troubleshooting Presenters: David Ujifusa Rich Liddell © 2012 Invensys. All Rights Reserved. The names, logos, and taglines identifying the products and services of Invensys are proprietary marks of Invensys or its subsidiaries. All third party trademarks and service marks are the proprietary marks of their respective owners. Overview Wonderware Tech Support Info Tool Performance Monitor Mind Map App Server Tech Notes New Ideas for Troubleshooting Common Tools and Troubleshooting Demo Redundancy and Engine communications Deployment Demo Common Scripting Issues Galaxy Repository Queries and SQL Troubleshooting Slide 4 How to attack install or version mismatch issues Slide 5 Issue 1: Hotfix not installed correctly Symptom: Gets error message “From ZipFileInterop. Zip file interop CreateFileW failed” importing View App into Galaxy SR 15316418 Tool Used: Wonderware Tech Support Info Tool Found that Hotfix 2177 was not installed properly Slide 6 Issue 1: WW Tech Support Info Tool Customer presses Retrieve Wonderware Information, saves the result to an XML file and sends the XML file to Tech Support Slide 7 Issue 1: WW Tech Support Info Tool Incorrect WWGalaxyRepositoryController.dll 3132.16.103.7 should be 3132.16.110.1 Slide 8 Issue 1: WW Tech Support Info Tool Demo WW Tech Support Info Tool • What customer does • How Tech Support analyzes Slide 9 Issue 1: WW Tech Support Info Tool Slide 10 Issue 2: NmxSvc Memory Leak • Symptom: NMXSVC memory leaks 300 MB per day and climbs to 1.9 GB • SR 15315695 • Tool Used: Performance Monitor Slide 11 Issue 2: NmxSvc Memory Leak Slide 12 Issue 2: Performance Monitor Demo Performance Monitor Log File • Configuration • Analyzing the Log File Slide 13 Issue 2: Performance Monitor Configuration Performance Monitor Configuration instructions Set up Performance Monitor to log all Processes and Processor. The general info for configuring a Performance Monitor Log File is found in the Help of Performance Monitor. Look for the article “creating counter logs” under “Performance Logs and Alerts.”. When creating the new Performance Log File press the Add Objects button. Select Process in the list of Performance objects and press the Add button. You will get a counter that looks something like \\davidu10\Process(*)\* Select Processor in the list of Performance objects and press the Add button. You will get a counter that looks something like \\davidu10\Processor(*)\* You can change the interval to something like every minute rather than every 15 seconds. Check that the Performance Log File size is growing to ensure that it is logging. When the test is finished send the .blg file to Wonderware Tech Support for analysis. Slide 14 Issue 2: Performance Monitor Log File Analysis What Instances to look at? aaEngine* aaGR aaLogger sqlservr svchost* view* Slide 15 Issue 2: Performance Monitor Log File Analysis Look at the Minimum and Maximum for these counters % Processor Time Handle Count Thread Count Private Bytes Virtual Bytes Slide 16 Issue 2: Performance Monitor Log File Analysis Possible resource issue if one of these counters for a Process is increasing a lot or if the Maximum is high as follows: % Processor Time > 50 Handle Count > 1000 Private Bytes > 500 MB Thread Count > 100 Virtual Bytes > 500 MB Slide 17 AppServer Troubleshooting Tech Notes Mindmap Slide 18 AppServer Troubleshooting Tech Notes Mind Map Slide 19 New Idea - AppServer Troubleshooting Symptom Mindmap or Word doc? Slide 20 New Idea – Utility to check install files and DLL Registration Slide 21 Trouble shooting - WTH Slide 22 Where to look fast to find a problem Slide 23 Common tools SMC Logger Log Monitor – Gift from Active factory Platform Manager Object Viewer SQL Profiler and SQL commands Slide 24 Let’s dive into a case Slide 25 Engines are losing objects over time Notes about the case: AppEngine_002 is having the problem right now 1) Engine deployed fine and all objects/attributes seemed fine 2) After a while View is no longer displaying some values stuck initializing 3) Object Viewer no longer shows the object under the engine 4) Redeploy states the object was never on this engine Slide 26 Look at the SMC 1) Export SMC log so you are only looking a few days and not weeks 2) Filter out messages that are blocking the view and do not matter 3) If you know the engine name, filter for it to see what the engine has done lately 4) Search for words like terminate or aabootstrap starting, to see if the computer was rebooted or if the engine was terminated. 5) Filter only errors and warnings to narrow the search more Slide 27 Open the exported Log file Slide 28 Messages look like we are onto the problem ObjectSyncMgr AppEngine_002 MergeDeltaCheckpointsReq. *** Failed to update object 28 primitive 100 in object cache. Removing object from the cache... AppEngine_002: Deploy changes - [1 of 1] Failed to find object “NH3_DK_AD04" on the engine. Found MergeDeltaCheckpointsReq messages for AppEngine_001 too, but not for Engine_3 and Engine_4. Slide 29 Engine Configurations are they different? 1) All Engines have Redundancy enabled 2) Engines 1 and 2 match but are different from 3 and 4. 3) Engines 3 and 4 match, and have not seen the problem 4) Difference is in the Redundancy settings Max. checkpoint deltas buffered = 10 Max alarm statechanges buffered = 10 Slide 30 What are the recommended settings? Slide 31 Quote from Customer “I never knew how much I needed redundancy until I enabled redundancy” Not following the recommended settings for redundancy can make your system run unreliably. Slide 32 Restarting Redundant engines Slide 33 Expected Behavior for Redundant Engines 1) Active engine will only display in Platform manager 2) Task manager will show both engines running 3) If engine is stopped with Platform manager the other engine will start off scan and the stopped engine will remain at shutdown and will not be standby ready. A) Set the Active engine to Run on Scan B) Start the engine you stopped and it will clear from display Slide 34 Creating Watch Windows Slide 35 IDE has locked up and stopped responding. Slide 36 What next restarting the IDE did not help. • The aaGR process may need to be restarted on the GR node • This process will effect all other IDE’s that are open, have all users close the IDE. • Once this process is terminated the IDE will lose connection and will need to be restarted. • We have a Power Shell utility to locate the user and IP for all nodes connected to the IDE called GetIDEUsers.ps1 Slide 37 Deployment Slide 38 Deployment: Mechanics • The IDE user deploys an object to a Platform • Package Manager on the IDE node forwards this deployment action to the Package Server on the GR • Package Server tells the PIM “Platform Installation Manager” to deploy the object’s runtime code modules and configuration files to the target Platform • PIM on the GR node transfers the files to the remote PIM via the DCOMTransport Slide 39 Quickest Way To Deploy • Take advantage of multi threading during deploy • Deploy all Platforms First at the same time • Deploy All Engines at the same time • Select Galaxy to deploy and set Currently deployed objects to “Skip” Slide 40 Deployment issues: What can go wrong? •DCOM •NMX Local mode •Version •Binding order •aaBootstrap not responding •aaLogger hanging •NIC setup – Auto detect •Platforms still deployed but removed from GR Slide 41 Tricks for keeping deployment moving • Check Bind order #1 problem • OSConfiguration utility – 3.1sp3p1 on 2008 sometimes requires the version from 2012 • DCOM has probelms rerun Change network account • If the IDE thinks the Platform is already deployed it might Need the Node removed form the registry on the remote node and GR. Slide 42 Deployment issues: Undeployment Fails • GR can not take the objects off scan so it fails •NMX service is running in Local mode •Engines are not responding •Platform remover was used •Can not communicate to the aabootstrap Slide 43 Deployment issues: InTouch ‘Orange Icon’ What can make things go wrong? • Maxuimum TCP connections achieved in non-server Machines • Use Server OS on every node • Use workarounds from Microsoft (search for ‘Remove Half-Open TCP Connection Limit’ in Google ) Slide 44 Deployment issues: Undeployment Fails •What can I do? • Use On Failure mark as Undeployed and try one of the following before undeploying • Unplug the GR-Node from the Network • Disable the NIC on the GR-Node or the Platform you are undeploying from. Only if RDP is not being used! • In the registry on the GR [Hklm\Software\ArchestrA\Framework\Platform\PlatformN odes] give the platform you are trying to undeploy a bad IP address. Slide 45 Analyzing Deployment Issues • What occurs during deployment? •GR •Remote •aaBootstrap • How do I know anything is deploying? •Task Manager •Logflags Slide 46 Demo Deployment Slide 47 Engine communication issues I have done everything mentioned above. Why am I encountering an Engine communications failure? Slide 48 Engine communication issues What can make things go wrong? • External processes that consumes high CPU • This can be caused i.e. by an unconfigured ArchestrA logger due to huge log file. • Syncronous scripts for time consuming operations • This can be caused by a SQL connect in a nonasyncronous script Slide 49 Engine communication issues What can make things go wrong? • External .NET modules that performs time consuming operations • This can be caused by an external .NET dll that performs a time consuming calculation in the main process without using threads and semaphorization. Slide 50 Scripting Considerations • Using the right script • Debugging • Logmessage() • What is Async for • Script Timeout/Error Slide 51 Invensys proprietary & © S Let the Engine / Object Relax While First Loading Use a while true script instead of a On true for large tasks (such as IO set reference). Delay with If Script.ExecutionCnt == 2 Slide 52 Invensys proprietary & Use LogMessage() Why have needless Logmessages going to the logger unless required. Always block them in with an IF statement: If me.Debug then Logmessage(me.msg); Endif; Slide 53 Invensys proprietary & © S Async Scripts •SQL scripts are a must • Engine.AsynScriptMaxThread default size is 5 •Engine.AsyncScriptsWaitingCnt •use this for sizing AsynScriptMaxThread Slide 54 Galaxy Repository Dive • Useful Tables • Gobject • Gobject_Change_Log • Lookup_operation • View public_gobject_definition GR • Platform • Redundancy Slide 55 How can I tell if someone deployed something? •Gobject_Change_Log •Objects affected •Operation performed •User Comment •User Logged on Slide 56 Return all operations for past 24 hours Slide 57 Check to see if any engines have been deployed Slide 58 Find checked-out objects Slide 59 Useful SQL • Find Checked out objects • Check in objects • Undeploy Galaxy • Undeploy Platform • Query Engines, Areas Slide 60 How to write to Event Viewer from SQL Debugging a Stored procedure DECLARE @@MESSAGE varchar(255) select @@message = 'EventLogKey = ' + cast(@EventLogKey as nvarchar(32)) + ', EventTime = ' + cast(@EventTime as varchar(32)) + ', EventTagName = ' + @EventTagName EXEC xp_logevent 51000, @@MESSAGE, informational Slide 61 Slide 62 Wrap up! • Check out the Tech notes and Tools we suggested and practice trouble shooting • Finding out what is different or changed often leads to the problem • Understanding Deployment requires less colorful metaphors while troubleshooting • Work on SQL skills and learn how to navigate the Galaxy Slide 63 Any Questions © 2012 Invensys. All Rights Reserved. The names, logos, and taglines identifying the products and services of Invensys are proprietary marks of Invensys or its subsidiaries. All third party trademarks and service marks are the proprietary marks of their respective owners.