WW TSS-10
Application Server 2012:
Advanced Troubleshooting
Presenters:
David Ujifusa
Rich Liddell
© 2012 Invensys. All Rights Reserved. The names, logos, and taglines identifying the products and services of Invensys are proprietary marks of
Invensys or its subsidiaries. All third party trademarks and service marks are the proprietary marks of their respective owners.
Overview
Wonderware Tech Support Info Tool
Performance Monitor
Mind Map App Server Tech Notes
New Ideas for Troubleshooting
Common Tools and Troubleshooting Demo
Redundancy and Engine communications
Deployment Demo
Common Scripting Issues
Galaxy Repository Queries and SQL Troubleshooting
Slide 4
How to attack install or version
mismatch issues
Slide 5
Issue 1: Hotfix not installed correctly
Symptom: Gets error message “From ZipFileInterop. Zip file interop CreateFileW failed” importing View App into Galaxy
SR 15316418
Tool Used: Wonderware Tech Support Info Tool
Found that Hotfix 2177 was not installed properly
Slide 6
Issue 1: WW Tech Support Info Tool
Customer presses Retrieve Wonderware
Information, saves the result to an XML file and
sends the XML file to Tech Support
Slide 7
Issue 1: WW Tech Support Info Tool
Incorrect WWGalaxyRepositoryController.dll 3132.16.103.7 should
be 3132.16.110.1
Slide 8
Issue 1: WW Tech Support Info Tool
Demo WW Tech Support Info Tool
• What customer does
• How Tech Support analyzes
Slide 9
Issue 1: WW Tech Support Info Tool
Slide 10
Issue 2: NmxSvc Memory Leak
• Symptom: NMXSVC memory leaks 300 MB per day and climbs to
1.9 GB
• SR 15315695
• Tool Used: Performance Monitor
Slide 11
Issue 2: NmxSvc Memory Leak
Slide 12
Issue 2: Performance Monitor
Demo Performance Monitor Log File
• Configuration
• Analyzing the Log File
Slide 13
Issue 2: Performance Monitor
Configuration
Performance Monitor Configuration instructions
Set up Performance Monitor to log all Processes and Processor. The general info for configuring a Performance
Monitor Log File is found in the Help of Performance Monitor. Look for the article “creating counter logs” under
“Performance Logs and Alerts.”.
When creating the new Performance Log File press the Add Objects button.
Select Process in the list of Performance objects and press the Add button. You will get a counter that looks
something like \\davidu10\Process(*)\*
Select Processor in the list of Performance objects and press the Add button. You will get a counter that looks
something like \\davidu10\Processor(*)\*
You can change the interval to something like every minute rather than every 15 seconds.
Check that the Performance Log File size is growing to ensure that it is logging.
When the test is finished send the .blg file to Wonderware Tech Support for analysis.
Slide 14
Issue 2: Performance Monitor Log File
Analysis
What Instances to look at?
aaEngine*
aaGR
aaLogger
sqlservr
svchost*
view*
Slide 15
Issue 2: Performance Monitor Log File
Analysis
Look at the Minimum and Maximum for these
counters
% Processor Time
Handle Count
Thread Count
Private Bytes
Virtual Bytes
Slide 16
Issue 2: Performance Monitor Log File
Analysis
Possible resource issue if one of these counters
for a Process is increasing a lot or if the
Maximum is high as follows:
% Processor Time > 50
Handle Count > 1000
Private Bytes > 500 MB
Thread Count > 100
Virtual Bytes > 500 MB
Slide 17
AppServer Troubleshooting Tech Notes
Mindmap
Slide 18
AppServer Troubleshooting Tech Notes
Mind Map
Slide 19
New Idea - AppServer Troubleshooting
Symptom Mindmap or Word doc?
Slide 20
New Idea – Utility to check install files
and DLL Registration
Slide 21
Trouble shooting - WTH
Slide 22
Where to look fast to find a problem
Slide 23
Common tools
SMC Logger
Log Monitor – Gift from Active factory
Platform Manager
Object Viewer
SQL Profiler and SQL commands
Slide 24
Let’s dive into a case
Slide 25
Engines are losing objects over time
Notes about the case:
AppEngine_002 is having the problem right now
1) Engine deployed fine and all objects/attributes seemed fine
2) After a while View is no longer displaying some values stuck
initializing
3) Object Viewer no longer shows the object under the engine
4) Redeploy states the object was never on this engine
Slide 26
Look at the SMC
1) Export SMC log so you are only looking a few days and not weeks
2) Filter out messages that are blocking the view and do not matter
3) If you know the engine name, filter for it to see what the engine has
done lately
4) Search for words like terminate or aabootstrap starting, to see if
the computer was rebooted or if the engine was terminated.
5) Filter only errors and warnings to narrow the search more
Slide 27
Open the exported Log file
Slide 28
Messages look like we are onto the
problem
ObjectSyncMgr
AppEngine_002 MergeDeltaCheckpointsReq. ***
Failed to update object 28 primitive 100 in object cache.
Removing object from the cache...
AppEngine_002: Deploy changes - [1 of 1] Failed to find object
“NH3_DK_AD04" on the engine.
Found MergeDeltaCheckpointsReq messages for AppEngine_001
too, but not for Engine_3 and Engine_4.
Slide 29
Engine Configurations
are they different?
1) All Engines have Redundancy enabled
2) Engines 1 and 2 match but are different from 3 and 4.
3) Engines 3 and 4 match, and have not seen the problem
4) Difference is in the Redundancy settings
Max. checkpoint deltas buffered
= 10
Max alarm statechanges buffered = 10
Slide 30
What are the recommended settings?
Slide 31
Quote from Customer
“I never knew how much I needed redundancy
until I enabled redundancy”
Not following the recommended settings
for redundancy can make your system
run unreliably.
Slide 32
Restarting Redundant engines
Slide 33
Expected Behavior for Redundant
Engines
1) Active engine will only display in Platform manager
2) Task manager will show both engines running
3) If engine is stopped with Platform manager the other engine will
start off scan and the stopped engine will remain at shutdown and
will not be standby ready.
A) Set the Active engine to Run on Scan
B) Start the engine you stopped and it will clear from display
Slide 34
Creating Watch Windows
Slide 35
IDE has locked up and stopped
responding.
Slide 36
What next restarting the IDE did not
help.
• The aaGR process may need to be restarted on the GR node
• This process will effect all other IDE’s that are open, have all users
close the IDE.
• Once this process is terminated the IDE will lose connection and will
need to be restarted.
• We have a Power Shell utility to locate the user and IP for all nodes
connected to the IDE called GetIDEUsers.ps1
Slide 37
Deployment
Slide 38
Deployment: Mechanics
• The IDE user deploys an object to a Platform
• Package Manager on the IDE node forwards this
deployment action to the Package Server on the
GR
• Package Server tells the PIM “Platform
Installation Manager” to deploy the object’s
runtime code modules and configuration files to
the target Platform
• PIM on the GR node transfers the files to the
remote PIM via the DCOMTransport
Slide 39
Quickest Way To Deploy
• Take advantage of multi threading during deploy
• Deploy all Platforms First at the same time
• Deploy All Engines at the same time
• Select Galaxy to deploy and set Currently deployed objects to “Skip”
Slide 40
Deployment issues: What can go wrong?
•DCOM
•NMX Local mode
•Version
•Binding order
•aaBootstrap not responding
•aaLogger hanging
•NIC setup – Auto detect
•Platforms still deployed but removed from GR
Slide 41
Tricks for keeping
deployment moving
• Check Bind order #1 problem
• OSConfiguration utility – 3.1sp3p1 on 2008
sometimes requires the version from 2012
• DCOM has probelms rerun Change network account
• If the IDE thinks the Platform is already deployed it
might Need the Node removed form the registry on
the remote node and GR.
Slide 42
Deployment issues: Undeployment Fails
• GR can not take the objects
off scan so it fails
•NMX service is running in
Local mode
•Engines are not responding
•Platform remover was used
•Can not communicate to the aabootstrap
Slide 43
Deployment issues: InTouch ‘Orange Icon’
What can make things go wrong?
• Maxuimum TCP connections achieved in non-server
Machines
• Use Server OS on every node
• Use workarounds from Microsoft
(search for ‘Remove Half-Open TCP Connection Limit’ in
Google  )
Slide 44
Deployment issues: Undeployment Fails
•What can I do?
• Use On Failure mark as Undeployed and try one of the
following before undeploying
• Unplug the GR-Node from the Network
• Disable the NIC on the GR-Node or the Platform you are
undeploying from. Only if RDP is not being used!
• In the registry on the GR
[Hklm\Software\ArchestrA\Framework\Platform\PlatformN
odes]
give the platform you are trying to undeploy a bad IP
address.
Slide 45
Analyzing Deployment Issues
• What occurs during
deployment?
•GR
•Remote
•aaBootstrap
• How do I know anything is
deploying?
•Task Manager
•Logflags
Slide 46
Demo Deployment
Slide 47
Engine communication issues
I have done everything mentioned above. Why am I
encountering an Engine communications failure?
Slide 48
Engine communication issues
What can make things go wrong?
• External processes that consumes high CPU
• This can be caused i.e. by an unconfigured ArchestrA
logger due to huge log file.
• Syncronous scripts for time consuming operations
• This can be caused by a SQL connect in a nonasyncronous script
Slide 49
Engine communication issues
What can make things go wrong?
• External .NET modules that performs time consuming
operations
• This can be caused by an external .NET dll that
performs a time consuming calculation in the main
process without using threads and semaphorization.
Slide 50
Scripting Considerations
• Using the right script
• Debugging
• Logmessage()
• What is Async for
• Script Timeout/Error
Slide 51
Invensys proprietary &
©
S
Let the Engine / Object Relax While
First Loading
Use a while true script instead of a
On true for large tasks (such as IO
set reference).
Delay with
If Script.ExecutionCnt == 2
Slide 52
Invensys proprietary &
Use LogMessage()
Why have needless Logmessages going to the logger
unless required. Always block them in with an IF
statement:
If me.Debug then
Logmessage(me.msg);
Endif;
Slide 53
Invensys proprietary &
©
S
Async Scripts
•SQL scripts are a must
• Engine.AsynScriptMaxThread default size is 5
•Engine.AsyncScriptsWaitingCnt
•use this for sizing AsynScriptMaxThread
Slide 54
Galaxy Repository Dive
• Useful Tables
• Gobject
• Gobject_Change_Log
• Lookup_operation
• View public_gobject_definition
GR
• Platform
• Redundancy
Slide 55
How can I tell if someone deployed
something?
•Gobject_Change_Log
•Objects affected
•Operation performed
•User Comment
•User Logged on
Slide 56
Return all operations for past 24 hours
Slide 57
Check to see if any engines have been
deployed
Slide 58
Find checked-out objects
Slide 59
Useful SQL
• Find Checked out objects
• Check in objects
• Undeploy Galaxy
• Undeploy Platform
• Query Engines, Areas
Slide 60
How to write to Event Viewer from SQL
Debugging a Stored procedure
DECLARE @@MESSAGE varchar(255)
select @@message = 'EventLogKey = ' + cast(@EventLogKey as
nvarchar(32)) + ', EventTime = ' + cast(@EventTime as
varchar(32)) + ', EventTagName = ' + @EventTagName
EXEC xp_logevent 51000, @@MESSAGE, informational
Slide 61
Slide 62
Wrap up!
• Check out the Tech notes and Tools we
suggested and practice trouble shooting
• Finding out what is different or changed often leads to the problem
• Understanding Deployment requires less colorful metaphors while
troubleshooting
• Work on SQL skills and learn how to navigate the Galaxy
Slide 63
Any Questions
© 2012 Invensys. All Rights Reserved. The names, logos, and taglines identifying the products and services of Invensys are proprietary marks of
Invensys or its subsidiaries. All third party trademarks and service marks are the proprietary marks of their respective owners.