M821 - Operations Production Support Checklist.

advertisement
Information and Technology Services
Project Management Methodology
TIO’s Production Support Checklist – DRAFT
Comments or questions?
This document is intended for use as an information guideline to TIO Project Leads.
TIO Project Management – Production Support Checklist
1.0 Pre-project Items
N/A



#
1.1
Production Support Item
Is there any preparation
needed for new hardware?


1.2
What type of authentication
is used for the system?


Description
If the new system will require new hardware, then
Operations will need to plan for space, power and cooling
requirements. This should be built into early stages of the
project plan.
Most systems require users to prove who they are
(authenticate). You should determine what type of
authentication will be used with your system, and then add
steps to your plan to assure this work occurs appropriately.
Comments/Notes
Description
Operators often need to be able to perform basic
troubleshooting or maintenance for systems. For example,
they need to be able to check the health of a system
component, and restart it if necessary. In order for them to
do this effectively and consistently, documentation is needed
which provides a step-by-step approach for Operators to
follow. Documentation should also include a complete
listing of error messages, and the response that operators
should take in each situation. Documentation is especially
helpful during roll-out phases, or when new staff join the
team. Documentation is written by EAS technical writers
(James Malayang is the main contact), with TIO technical
staff providing content and review.
Comments/Notes
2.0 Information Sharing
N/A



Document1
#
2.1
Production Support Item
Is any documentation
needed for Operations?
Page 1 of 7
Last Updated: 2/8/2016
N/A



#
2.2
Production Support Item
Is any training needed for
Operations?


2.3
Is any documentation
needed for technical staff?


2.4
Is any training needed for
technical staff?
Description
When new systems are introduced into the Data Center, it
often makes sense to provide organized training for
Operators. This is particularly true if the system is
significantly different than our other systems, or if we expect
Operators to respond to phone calls from various technical or
business staff. Sometimes hands-on training is necessary to
show Operators how to perform their duties and sometimes
higher-level training is appropriate to just give them an
understanding of the new system.
Technical staff members on your team (or on other teams)
often need documentation if they have OCCB or other
support responsibilities for a new system or component.
Documentation may help them know where to find things,
provide them tips on troubleshooting, or give them step-bystep directions on maintenance activities. Documentation is
especially helpful during roll-out phases, or when new staff
join the team. Documentation is written by EAS technical
writers (James Malayang is the main contact), with TIO
technical staff providing content and review.
Sometimes it is useful to provide an organized training
session to technical staff members on your team (or on other
teams). This training can show them how to perform any
maintenance or OCCB tasks for which they may be
responsible.
Comments/Notes
Description
When an infrastructure component fails, operations staff
should be notified of the problem. This usually involves
writing one or more scripts which send a message to the
Unicenter Console. (NOTE - This also usually involves
documentation for operators on how to respond to each error
message)
For most systems TIO needs to forecast growth and usage.
Usually we do this by tracking data in the CPC database and
regularly reviewing and analyzing the data. In order for data
to be captured in the CPC database, scripts need to be written
Comments/Notes
3.0 Scripts
N/A


Document1

#
3.1
Production Support Item
Are scripts needed for
system monitoring?
3.2
Are scripts needed for
capacity planning?
Page 2 of 7
Last Updated: 2/8/2016
N/A

#
Production Support Item


3.3
Are scripts needed for
purging or archiving of any
data or files?


3.4
Are scripts needed to
automate start-up,
shutdown, clean-up or other
activities?


Document1
Description
and scheduled.
Most programs create log files, report files, or other types of
output. Before a system goes live, TIO should make sure that
a regular, systematic approach exists for controlling the size
and/or number of these files (or of database tables). For
example, AIG runs a regular script daily to purge SQR report
files. The rules for the purging are documented and shared
with others.
If you expect operations staff (or other staff) to regularly
perform an operation to the system, then it may make sense
to write scripts to simplify the execution of these activites.
For example, the Oracle DBAs have scripts to startup and
shutdown a database.
Page 3 of 7
Comments/Notes
Last Updated: 2/8/2016
4.0 Availability and Support
N/A



#
4.1
Production Support Item
What is the class of the
system (A or B)?


4.2
What are the hours of
availability for the new
service?


4.3
What are the
backup/recovery
requirements for the new
service?


4.4
Do you need to test the
complete recovery of the
system?
Document1
Description
TIO classifies systems as “Class A” or “Class B.” Class A
systems receive support around the clock as soon as it is
needed. Class B systems may have some work done around
the clock, but most work can be deferred until the next
business day (at 7:30AM). Usually production systems are
Class A, and non-production systems are Class B, but this
varies from system-to-system. The appropriate support level
for any new system needs to be defined, and Operations
“Class A” list needs to be updated appropriately.
The exact hours of availability need to be determined. This
will include identifying maintenance windows and back-up
windows. Once determined, hours of availability should be
shared with Rob Schweitzer so that appropriate reports can
be calculated and updated appropriately. Hours of
availability also need to be published to Operations and
other TIO staff.
The backup/recovery planning is an important part of the
roll-out of any new system. Backups should be discussed
and planned with business stakeholders.
Expectations/requirements for data loss and recovery should
be understood. Exact backup schedules should be agreed
upon. Database transaction logging should be discussed and
application requirements for point-in-time recovery
understood.
Before a system goes live, it may be appropriate to simulate a
major hardware and/or software problem and restore the
system from tape. This hasn’t always been done in the past,
but it is a good practice. Not only will it validate that you
have documented the recovery procedures clearly, it also will
allow you to measure how long recovery may take, and may
give you ideas on how to streamline the process before a real
production problem occurs.
Page 4 of 7
Comments/Notes
Last Updated: 2/8/2016
N/A





Document1
#
4.5
Production Support Item
What are the Disaster
Recovery requirements for
this new service?
Description
If a system must be recovered in a 2-5 day timeframe to
assure business operations, then recovery plans must be
made for our Sungard hot site. If the system can be
recovered in a much longer time frame (2-4 weeks) then no
hot site requirements exist. No matter what recovery
strategy is used, specific recovery plans should exist, and
should be stored as part of your teams disaster recovery plan.
Page 5 of 7
Comments/Notes
Last Updated: 2/8/2016
5.0 Security
N/A

#
Production Support Item
Description


5.1
Do you need to work with
Access Services on
management of any special
administrator accounts?


5.2


5.3
Are any interfaces needed,
or protocols used which may
require firewall or other
security infrastructure
changes?
Is transaction auditing
appropriate for the system?
It is common for tools to have one or more special accounts
for administration of the system. For example, PeopleSoft
has PSoft, SysAdm, VP1, and PS. Other programs have
special batch accounts or other accounts for processing.
These accounts may need to be administered by Access
Services – changing passwords on a regular schedule and
notifying appropriate personnel.
New systems often introduce new protocols or interfaces
which require data transfer between this system and other
systems. Adjustments to firewall and other security policies
may be necessary to assure your system operates correctly.


5.4


Document1
Does the system introduce
any new security risks or
changes which must be
minimized?
Comments/Notes
If the system is used for database transactions, it may be
appropriate to capture details of the transactions for security
analysis. ITS is moving toward a standard of auditing
transactions in enough detail to be able to identify username
and IP address of the person that performed a transaction.
Often this can only be accomplished if the appropriate
transaction logs are created and synchronized.
Each system has its own unique characteristics. You should
assess factors such as: who will use the system; what data
resides in the system; and infrastructure components used by
the system to determine if any unique or new security risks
exist. If they do, then you should plan to work with Security
and Network Services to help mitigate or eliminate the risk.
Page 6 of 7
Last Updated: 2/8/2016
6.0 Data Capture and Reporting
N/A



#
6.1
Production Support Item
Are there any logging
requirements for this
system?


6.2
Are there any usage
statistics or other data that
you should be including in
the TIO Monthly Report?


6.3
Is a Service Level Agreement
needed for the system?


Document1
Description
Most systems provide some type of logging capabilities. It is
important to anticipate the types of questions ITS will need to
ask about the system, and capture the appropriate logs. Logs
should be able to support basic troubleshooting. In the past
logs have been used to determine characteristics about our
user community (e.g. what browsers they use), or about a
particular problem (e.g. what was the exact time of a
transaction). A schedule for purging and archiving of logs
should also be determined.
It is common that the organization needs to answer questions
about utilization and user behavior. For example, how are
users using the system? When are they using the system?
How many concurrent users do we usually have for what is
the maximum users we see? Capturing this data and
reporting out the data in the TIO Monthly Report can be
helpful and save you and the organization time and money
later.
ITS usually develops SLAs for “fee for service” systems (e.g.
ITCom’s Pinnacle System). The SLA details out all aspects of
support, and also covers what fees, if any, the customer must
pay. The SLA should include details of what types of service
reporting is necessary also.
Page 7 of 7
Comments/Notes
Last Updated: 2/8/2016
Download