Uploaded by cuentasgeorge

L1 Training

advertisement
MRO Training for Wintel Level 1 support
This document is intended for Wintel Level 1 support to provide the basics troubleshooting steps for
certain recurrent incidents and issues.
REMEMBER!!!
For each incident managed a ticket is needed,
This ticket must be completely and correctly filled, and updated with every step taken during the
troubleshooting.
Use always your CORPAA ID to logon to the servers
1) HPOM agents issues
Alerts of the type:
Event Flow Broken and HPOM Healthcheck required
Are triggered because the server agent can’t communicate with the OMW or OML manager or because
a module of the HPOM agents is not working properly
The status of the agents must be checked with the following command
opcagt –status
A healthy status shows that all of the agents are Running and no messages are buffering
Any problem with a module, messages are buffering message, or any other issue with the agents, it
requires the HPOM agents being recycle
Message Agent buffering for the following servers :
--------------------------------------------------ustlcomaa100.tul.aa.com
Using the following commands,
opcagt –stop
->
opcagt –cleanstart
If after this step the agents keep failing or you get any error message like below
Escalate to Level 2 support
2) System reboot suspected alerts
A system reboot suspected require to check the following
a) Is the reboot alert real?
b) When was rebooted?
c) Why was rebooted?
The following HPSA script allows to check the latest error events on System for a server,
WINDOWS-OS-Health Check-915.vbs
Using the HPSA script “Verify_VM_status” a complete healthcheck of the server can be done including



Uptime
Network status
HPOM agents status
This script can be used with several servers at once (Example: ESX Host outages)
To get better details the System events must be checked
Open the Event Viewer by
All Program > Administrative tools > Event Viewer
Open the System events logs > Search for the following Event ID
6008: Unexpected reboot
6009: System start time
6006: Event log service stopped, the last event before a reboot
1074: System restart triggered by an application or a user
A reboot suspected alert requires that a Problem ticket opens to investigate root cause of the
unexpected reboot
3) Recycling Windows services
Open the Services console on
All programs > Administrative Tools > Services
Search the service that requires the restart
Select it and right click on it, select stop
Wait for the service stops, refresh the status of the service on the Services console,
Then select it again, right click and click start.
If the status remains as Stopping or Starting during any of these steps escalate to Level 2 support
NOTE: A restart of a service is an outage with impact.
4) Recycling IIS service (Internet Information Service)
On Windows 2003 and above open the Internet Information Services Manager
All Programs > Administrative Tools > Internet Information Services (IIS) Manager
Select the server, on AA environment you will see only the local server listed
Right click over the server name and select All Tasks and Restart IIS
On the window displayed select Stop Internet Services on SERVERNAME
Once is finish repeat the process and select Start Internet Services on SERVERNAME
IIS service can also be recycled via the Services console, by restarting the service World Wide Web
Publishing Service. However this way is not recommended as best practice.
If the recycle do not fix the issue please escalate to the Level 2 support
5) Terminate a process via Task Manager
Open Task Manager by executing the command taskmgr or right click on Task bar and click on Task
manager, select Processes tab,
If necessary mark “Show processes from all users”
Check for the exact process name or exact PID (Process IDentifcator) number,
Select it and right click on it, Select End Process, and then confirms the Warning message.
If the process re-appears or does not end escalate to Level 2 support
6) Check CPU and Memory usage
The CPU and Memory usage can be checked through Task Manager,
Open Task Manager and switch to the Performance tab,
CPU and memory usage varies all the time, so you need to keep eyes on it for 1 or 2 minutes, constantly
TO notice the average usage,
Notice that SQL servers normally uses a very high amount of CPU and Memory, so what it needs to be
checked is the performance of the server itself.
The time it takes to launch Task Manager, or even the time it takes to complete the logon process are
good indicators of the server performance
7) Checking MS Cluster status
Open the Cluster Manager GUI through,
Start > All Programs > Administrative tools > Failover Cluster Management (Windows 2008)
On W2K3 it’s called Cluster Administrator
Once is open on Windows 2003 you need to connect to the cluster, as you are logged on a cluster node
you can enter a . (point) as cluster name
Once is open you will see the cluster name on as header of the tree view (left panel),
You can check here the cluster Groups status and on which node it’s the group active
Each group can have several resources, select a group to check its resources status,
A group or resource that is in failover process will show the following status,
“Offline pending”, “Offline”, “Online pending”, “Online”
If a group or resource shows a “Faulted” status please escalate to Level 2 support.
To move or failover a group from one node to another you have to right click over the group and select
Move Group
And then accepts the warning message,
NOTE: A failover is an outage and there will be impact during the failover.
On Windows 2008 although the GUI is slightly different the process is the same,
8) Checking Veritas Cluster
The Veritas cluster are very similar to the MS clusters, however the GUI is different
Go to All Programs > Symantec > Veritas Cluster Server > Veritas Cluster Manager – Java Console
Connect to the server in question – either by clicking on the server if it is already there or by selecting
File-New Cluster
Type in the primary server’s name for the cluster that you want to check, use the default port listed
(14141)
Once you logon the Cluster Administrator window will show,
Here you can see the status of the groups, nodes, resources and services
The image above shows the STL group in faulted status on both nodes (APAPPP12 and APAPPP14)
To bring up again the STL group you have to do the following steps,
First, clear the fault by right-clicking the STL Cluster – Clear Fault – Auto
Then to bring Staff Manager back up, right click the cluster again – Online – Any System
Here you can also select a specific node to bring Online the group, please review ESL and MRC Sites
notes to be sure if there is any special requirement for the cluster or group.
To restart a service of a determinate group you have to follow this process,
Click on the plus ( + )sign next to the group name that owns the service to be restarted
Click on the plus (+) sign next to the “Generic Service” to expand the services detail
Right click on the service that you want to restart and select Offline, then clicks on the node where you
want the service to stop.
Wait a minute or the time required to the service to goes Offline. You can see the current status of the
service on the nodes
Once the service is offline, repeat the process and click on Online and then select the node where you
want the service to go online.
Note that as best practice and in our account specially, all of the services and resources of a group
runs on the same node.
9) Logoff users from Citrix
The fastest and easy way to logoff users from Citrix us through the Service Delivery Console
Logon to the AA citrix portal through me.citrix.aa.com (Sabre VPN) or prod.citrix.aa.com (HP VPN) using
your CORPAA ID/Pass
Here you will see the published apps and folders that you have access
Enter on Administration > Citrix Consoles
Launch XA 5 – Prod Delivery Service Console and XA 6 – Prod Delivery Service Console
Once it opens if no window is opened select Action > Configure and run discovery,
Then remove any servers listed if there is one and then click on Add Local Computer
Then click next until the discovery process ends.
Then expand XenApp and you will able to see all of the Apps published and Citrix servers for XA 6 farm
On XA5 farm you have to click on AA_Citrix
On the right panel click the down arrow to switch the view to users,
Search for the user, right click on the user ID and click on logoff user,
10) Cargonet Procedure
CARGONET
Procedures.docx
11) AA Staffing Automation procedures
AA STAFF
PROCEDURES.docx
Download