Wintel Level 1 Support Training: Troubleshooting Guide

MRO Training for Wintel Level 1 support This document is intended for Wintel Level 1 support to provide the basics troubleshooting steps for certain recurrent incidents and issues. REMEMBER!!! For each incident managed a ticket is needed, This ticket must be completely and correctly filled, and updated with every step taken during the troubleshooting. Use always your CORPAA ID to logon to the servers 1) HPOM agents issues Alerts of the type: Event Flow Broken and HPOM Healthcheck required Are triggered because the server agent can’t communicate with the OMW or OML manager or because a module of the HPOM agents is not working properly The status of the agents must be checked with the following command opcagt –status A healthy status shows that all of the agents are Running and no messages are buffering Any problem with a module, messages are buffering message, or any other issue with the agents, it requires the HPOM agents being recycle Message Agent buffering for the following servers : --------------------------------------------------ustlcomaa100.tul.aa.com Using the following commands, opcagt –stop -> opcagt –cleanstart If after this step the agents keep failing or you get any error message like below Escalate to Level 2 support 2) System reboot suspected alerts A system reboot suspected require to check the following a) Is the reboot alert real? b) When was rebooted? c) Why was rebooted? The following HPSA script allows to check the latest error events on System for a server, WINDOWS-OS-Health Check-915.vbs Using the HPSA script “Verify_VM_status” a complete healthcheck of the server can be done including    Uptime Network status HPOM agents status This script can be used with several servers at once (Example: ESX Host outages) To get better details the System events must be checked Open the Event Viewer by All Program > Administrative tools > Event Viewer Open the System events logs > Search for the following Event ID 6008: Unexpected reboot 6009: System start time 6006: Event log service stopped, the last event before a reboot 1074: System restart triggered by an application or a user A reboot suspected alert requires that a Problem ticket opens to investigate root cause of the unexpected reboot 3) Recycling Windows services Open the Services console on All programs > Administrative Tools > Services Search the service that requires the restart Select it and right click on it, select stop Wait for the service stops, refresh the status of the service on the Services console, Then select it again, right click and click start. If the status remains as Stopping or Starting during any of these steps escalate to Level 2 support NOTE: A restart of a service is an outage with impact. 4) Recycling IIS service (Internet Information Service) On Windows 2003 and above open the Internet Information Services Manager All Programs > Administrative Tools > Internet Information Services (IIS) Manager Select the server, on AA environment you will see only the local server listed Right click over the server name and select All Tasks and Restart IIS On the window displayed select Stop Internet Services on SERVERNAME Once is finish repeat the process and select Start Internet Services on SERVERNAME IIS service can also be recycled via the Services console, by restarting the service World Wide Web Publishing Service. However this way is not recommended as best practice. If the recycle do not fix the issue please escalate to the Level 2 support 5) Terminate a process via Task Manager Open Task Manager by executing the command taskmgr or right click on Task bar and click on Task manager, select Processes tab, If necessary mark “Show processes from all users” Check for the exact process name or exact PID (Process IDentifcator) number, Select it and right click on it, Select End Process, and then confirms the Warning message. If the process re-appears or does not end escalate to Level 2 support 6) Check CPU and Memory usage The CPU and Memory usage can be checked through Task Manager, Open Task Manager and switch to the Performance tab, CPU and memory usage varies all the time, so you need to keep eyes on it for 1 or 2 minutes, constantly TO notice the average usage, Notice that SQL servers normally uses a very high amount of CPU and Memory, so what it needs to be checked is the performance of the server itself. The time it takes to launch Task Manager, or even the time it takes to complete the logon process are good indicators of the server performance 7) Checking MS Cluster status Open the Cluster Manager GUI through, Start > All Programs > Administrative tools > Failover Cluster Management (Windows 2008) On W2K3 it’s called Cluster Administrator Once is open on Windows 2003 you need to connect to the cluster, as you are logged on a cluster node you can enter a . (point) as cluster name Once is open you will see the cluster name on as header of the tree view (left panel), You can check here the cluster Groups status and on which node it’s the group active Each group can have several resources, select a group to check its resources status, A group or resource that is in failover process will show the following status, “Offline pending”, “Offline”, “Online pending”, “Online” If a group or resource shows a “Faulted” status please escalate to Level 2 support. To move or failover a group from one node to another you have to right click over the group and select Move Group And then accepts the warning message, NOTE: A failover is an outage and there will be impact during the failover. On Windows 2008 although the GUI is slightly different the process is the same, 8) Checking Veritas Cluster The Veritas cluster are very similar to the MS clusters, however the GUI is different Go to All Programs > Symantec > Veritas Cluster Server > Veritas Cluster Manager – Java Console Connect to the server in question – either by clicking on the server if it is already there or by selecting File-New Cluster Type in the primary server’s name for the cluster that you want to check, use the default port listed (14141) Once you logon the Cluster Administrator window will show, Here you can see the status of the groups, nodes, resources and services The image above shows the STL group in faulted status on both nodes (APAPPP12 and APAPPP14) To bring up again the STL group you have to do the following steps, First, clear the fault by right-clicking the STL Cluster – Clear Fault – Auto Then to bring Staff Manager back up, right click the cluster again – Online – Any System Here you can also select a specific node to bring Online the group, please review ESL and MRC Sites notes to be sure if there is any special requirement for the cluster or group. To restart a service of a determinate group you have to follow this process, Click on the plus ( + )sign next to the group name that owns the service to be restarted Click on the plus (+) sign next to the “Generic Service” to expand the services detail Right click on the service that you want to restart and select Offline, then clicks on the node where you want the service to stop. Wait a minute or the time required to the service to goes Offline. You can see the current status of the service on the nodes Once the service is offline, repeat the process and click on Online and then select the node where you want the service to go online. Note that as best practice and in our account specially, all of the services and resources of a group runs on the same node. 9) Logoff users from Citrix The fastest and easy way to logoff users from Citrix us through the Service Delivery Console Logon to the AA citrix portal through me.citrix.aa.com (Sabre VPN) or prod.citrix.aa.com (HP VPN) using your CORPAA ID/Pass Here you will see the published apps and folders that you have access Enter on Administration > Citrix Consoles Launch XA 5 – Prod Delivery Service Console and XA 6 – Prod Delivery Service Console Once it opens if no window is opened select Action > Configure and run discovery, Then remove any servers listed if there is one and then click on Add Local Computer Then click next until the discovery process ends. Then expand XenApp and you will able to see all of the Apps published and Citrix servers for XA 6 farm On XA5 farm you have to click on AA_Citrix On the right panel click the down arrow to switch the view to users, Search for the user, right click on the user ID and click on logoff user, 10) Cargonet Procedure CARGONET Procedures.docx 11) AA Staffing Automation procedures AA STAFF PROCEDURES.docx

Wintel Level 1 Support Training: Troubleshooting Guide

Related documents

Products

Support

Wintel Level 1 Support Training: Troubleshooting Guide

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib