Information and Technology Services Project Management Methodology TIO’s Production Support Checklist – DRAFT Comments or questions? This document is intended for use as an information guideline to TIO Project Leads. TIO Project Management – Production Support Checklist 1.0 Pre-project Items N/A # 1.1 Production Support Item Is there any preparation needed for new hardware? 1.2 What type of authentication is used for the system? Description If the new system will require new hardware, then Operations will need to plan for space, power and cooling requirements. This should be built into early stages of the project plan. Most systems require users to prove who they are (authenticate). You should determine what type of authentication will be used with your system, and then add steps to your plan to assure this work occurs appropriately. Comments/Notes Description Operators often need to be able to perform basic troubleshooting or maintenance for systems. For example, they need to be able to check the health of a system component, and restart it if necessary. In order for them to do this effectively and consistently, documentation is needed which provides a step-by-step approach for Operators to follow. Documentation should also include a complete listing of error messages, and the response that operators should take in each situation. Documentation is especially helpful during roll-out phases, or when new staff join the team. Documentation is written by EAS technical writers (James Malayang is the main contact), with TIO technical staff providing content and review. Comments/Notes 2.0 Information Sharing N/A Document1 # 2.1 Production Support Item Is any documentation needed for Operations? Page 1 of 7 Last Updated: 2/8/2016 N/A # 2.2 Production Support Item Is any training needed for Operations? 2.3 Is any documentation needed for technical staff? 2.4 Is any training needed for technical staff? Description When new systems are introduced into the Data Center, it often makes sense to provide organized training for Operators. This is particularly true if the system is significantly different than our other systems, or if we expect Operators to respond to phone calls from various technical or business staff. Sometimes hands-on training is necessary to show Operators how to perform their duties and sometimes higher-level training is appropriate to just give them an understanding of the new system. Technical staff members on your team (or on other teams) often need documentation if they have OCCB or other support responsibilities for a new system or component. Documentation may help them know where to find things, provide them tips on troubleshooting, or give them step-bystep directions on maintenance activities. Documentation is especially helpful during roll-out phases, or when new staff join the team. Documentation is written by EAS technical writers (James Malayang is the main contact), with TIO technical staff providing content and review. Sometimes it is useful to provide an organized training session to technical staff members on your team (or on other teams). This training can show them how to perform any maintenance or OCCB tasks for which they may be responsible. Comments/Notes Description When an infrastructure component fails, operations staff should be notified of the problem. This usually involves writing one or more scripts which send a message to the Unicenter Console. (NOTE - This also usually involves documentation for operators on how to respond to each error message) For most systems TIO needs to forecast growth and usage. Usually we do this by tracking data in the CPC database and regularly reviewing and analyzing the data. In order for data to be captured in the CPC database, scripts need to be written Comments/Notes 3.0 Scripts N/A Document1 # 3.1 Production Support Item Are scripts needed for system monitoring? 3.2 Are scripts needed for capacity planning? Page 2 of 7 Last Updated: 2/8/2016 N/A # Production Support Item 3.3 Are scripts needed for purging or archiving of any data or files? 3.4 Are scripts needed to automate start-up, shutdown, clean-up or other activities? Document1 Description and scheduled. Most programs create log files, report files, or other types of output. Before a system goes live, TIO should make sure that a regular, systematic approach exists for controlling the size and/or number of these files (or of database tables). For example, AIG runs a regular script daily to purge SQR report files. The rules for the purging are documented and shared with others. If you expect operations staff (or other staff) to regularly perform an operation to the system, then it may make sense to write scripts to simplify the execution of these activites. For example, the Oracle DBAs have scripts to startup and shutdown a database. Page 3 of 7 Comments/Notes Last Updated: 2/8/2016 4.0 Availability and Support N/A # 4.1 Production Support Item What is the class of the system (A or B)? 4.2 What are the hours of availability for the new service? 4.3 What are the backup/recovery requirements for the new service? 4.4 Do you need to test the complete recovery of the system? Document1 Description TIO classifies systems as “Class A” or “Class B.” Class A systems receive support around the clock as soon as it is needed. Class B systems may have some work done around the clock, but most work can be deferred until the next business day (at 7:30AM). Usually production systems are Class A, and non-production systems are Class B, but this varies from system-to-system. The appropriate support level for any new system needs to be defined, and Operations “Class A” list needs to be updated appropriately. The exact hours of availability need to be determined. This will include identifying maintenance windows and back-up windows. Once determined, hours of availability should be shared with Rob Schweitzer so that appropriate reports can be calculated and updated appropriately. Hours of availability also need to be published to Operations and other TIO staff. The backup/recovery planning is an important part of the roll-out of any new system. Backups should be discussed and planned with business stakeholders. Expectations/requirements for data loss and recovery should be understood. Exact backup schedules should be agreed upon. Database transaction logging should be discussed and application requirements for point-in-time recovery understood. Before a system goes live, it may be appropriate to simulate a major hardware and/or software problem and restore the system from tape. This hasn’t always been done in the past, but it is a good practice. Not only will it validate that you have documented the recovery procedures clearly, it also will allow you to measure how long recovery may take, and may give you ideas on how to streamline the process before a real production problem occurs. Page 4 of 7 Comments/Notes Last Updated: 2/8/2016 N/A Document1 # 4.5 Production Support Item What are the Disaster Recovery requirements for this new service? Description If a system must be recovered in a 2-5 day timeframe to assure business operations, then recovery plans must be made for our Sungard hot site. If the system can be recovered in a much longer time frame (2-4 weeks) then no hot site requirements exist. No matter what recovery strategy is used, specific recovery plans should exist, and should be stored as part of your teams disaster recovery plan. Page 5 of 7 Comments/Notes Last Updated: 2/8/2016 5.0 Security N/A # Production Support Item Description 5.1 Do you need to work with Access Services on management of any special administrator accounts? 5.2 5.3 Are any interfaces needed, or protocols used which may require firewall or other security infrastructure changes? Is transaction auditing appropriate for the system? It is common for tools to have one or more special accounts for administration of the system. For example, PeopleSoft has PSoft, SysAdm, VP1, and PS. Other programs have special batch accounts or other accounts for processing. These accounts may need to be administered by Access Services – changing passwords on a regular schedule and notifying appropriate personnel. New systems often introduce new protocols or interfaces which require data transfer between this system and other systems. Adjustments to firewall and other security policies may be necessary to assure your system operates correctly. 5.4 Document1 Does the system introduce any new security risks or changes which must be minimized? Comments/Notes If the system is used for database transactions, it may be appropriate to capture details of the transactions for security analysis. ITS is moving toward a standard of auditing transactions in enough detail to be able to identify username and IP address of the person that performed a transaction. Often this can only be accomplished if the appropriate transaction logs are created and synchronized. Each system has its own unique characteristics. You should assess factors such as: who will use the system; what data resides in the system; and infrastructure components used by the system to determine if any unique or new security risks exist. If they do, then you should plan to work with Security and Network Services to help mitigate or eliminate the risk. Page 6 of 7 Last Updated: 2/8/2016 6.0 Data Capture and Reporting N/A # 6.1 Production Support Item Are there any logging requirements for this system? 6.2 Are there any usage statistics or other data that you should be including in the TIO Monthly Report? 6.3 Is a Service Level Agreement needed for the system? Document1 Description Most systems provide some type of logging capabilities. It is important to anticipate the types of questions ITS will need to ask about the system, and capture the appropriate logs. Logs should be able to support basic troubleshooting. In the past logs have been used to determine characteristics about our user community (e.g. what browsers they use), or about a particular problem (e.g. what was the exact time of a transaction). A schedule for purging and archiving of logs should also be determined. It is common that the organization needs to answer questions about utilization and user behavior. For example, how are users using the system? When are they using the system? How many concurrent users do we usually have for what is the maximum users we see? Capturing this data and reporting out the data in the TIO Monthly Report can be helpful and save you and the organization time and money later. ITS usually develops SLAs for “fee for service” systems (e.g. ITCom’s Pinnacle System). The SLA details out all aspects of support, and also covers what fees, if any, the customer must pay. The SLA should include details of what types of service reporting is necessary also. Page 7 of 7 Comments/Notes Last Updated: 2/8/2016