UC Davis Central Kuali Rice Service Service Level Agreement By: IET Last Revised 7-22-2010 1 Purpose This document serves as an agreement between IET and any application sponsor integrating with the UC Davis Central Kuali Rice Service (hereafter referred to as the “Rice service”). The objective of the Service Level Agreement (SLA) is to present a clear, concise and measurable description of the Rice service. Effective Date: (Date SLA is put into effect) Version History Version 1.0 1.1 1.2 1.3 1.4 Date 5/14/10 6/23/10 7/15/10 7/16/10 7/22/10 Revision Description First Draft Author Hampton Sublett Hampton Sublett Hampton Sublett Hampton Sublett Hampton Sublett Approval History Approver Title Approval Date Agreement Termination (if necessary) Approver Title Approval Date 2 Table of Contents 1. General Overview 4 2. Service Description 4 Service Scope 4 Assumptions 4 3. Roles and Responsibilities 5 Stakeholder Information 5 IET Responsibilities 5 Integrating Application Team Responsibilities 5 4. Service Enhancements and Minor Bug Requests 5 5. Service Availability, Response Times and Escalation Procedures 5 6 Hours of Coverage, Response Times and Escalation Procedures Rice Team Coverage Periods 6 “Blocker” Issue Type Notification Process 6 “Non-Blocker” Issue/Enhancement Type Notification Process 6 Response Times 6 Prioritization 6 Escalation of Issues 6 Communication 7 Planned Maintenance and Service Changes 7 Release Schedule 7 Advanced Notification/Negotiation 7 Service Exceptions to Coverage 7 Uptime Commitment 7 6. Reporting, Reviewing and Auditing 7 7. Risks 8 8. Signatures 8 3 1. General Overview This is an SLA between IET and any application sponsor integrating with the Rice service and is intended to document: The Rice service that IET provides to integrating applications The general levels of availability, response and maintenance associated with the service The responsibilities of IET and the integrating application team The process for requesting services or receiving support This SLA shall remain valid until revised or terminated. 2. Service Description Service Scope Rice Environments o Development: Each anchor application will have a central Rice development environment available which includes a private application server and private database schema. o Test-Integration: All anchor applications will share a test integration environment where all applications can test integration issues in a shared environment. o QA-Integration: A shared QA environment used for integration and load testing. o Stage: Used for integrated deployment practice. o Production: This is the version that all Production Anchor Applications will be using. While the other environments might have lesser uptime levels, it is expected that the Production system be available 24/7/365 with the exception of minimal planned and unplanned outages. Rice Support Production o Production Services managed by IET use Remedy to track all production related issues. Development o Development systems for Services managed by IET use Jira to track all development related bugs and enhancements. 3. Rice Community o As a member of the Rice community, members should receive access to and are expected to contribute back to the following: Developer Documentation Service FAQs Rice Mailing Lists Assumptions Changes to the Rice service will be documented and communicated to all integrating application stakeholders as defined in the Roles & Responsibilities section of this document. The Rice service will be provided in adherence to any related policies, processes and procedures. Roles and Responsibilities Stakeholder Information IET Rice Service Owner – Responsible for overall Rice service Rice Service Manager – Responsible for the day to day Rice service operations and enhancements 4 Rice Architect – Responsible for ensuring all technical changes are consistent with best practices and to ensure technical compatibility between Rice and integrating systems. Contact Info Role Rice Service Owner Rice Service Mgr Rice Architect Name Deborah Lauriano Hampton Sublett Curtis Bray Desk Phone 530 754 5990 530-754-6193 530-754-6199 Mobile Phone 530-219-4660 530-574-7758 530-574-7794 Email dalauriano@ucdavis.edu hsublett@ucdavis.edu clbray@ucdavis.edu KFS KFS Operations Manager – KFS Project Manager - Contact Info Role KFS Operations Mgr KFS Project Mgr Name Desk Phone Radhika Prabhu 530 754 6805 Mobile Phone Email Rprabhu@ucdavis.edu IET Responsibilities Meet response times associated with the priority assigned to incidents and service requests. Meet system uptime target Generate quarterly reports on service level performance Provide appropriate notification to customers for all scheduled maintenance via the shared Integrated Calendar Provide an auditable migration path of Rice changes for all projects Adherence to campus policies (e.g. PPM-310-22 UC Davis Cyber-Safety Program, see Appendix A) Integrating Application Team Responsibilities Meet agreed upon Anchor Application or Rice release dates or communicate schedule changes 90 days in advance. Provide Anchor Application representative(s) to assist with resolution of a service related incident Participate in the documented change control process Document specific service availability requirements on Confluence and work with IET Rice Service Manager for additions or changes to established service levels. (https://confluence.ucdavis.edu/confluence/x/VobTAQ) Adherence to campus policies (e.g. PPM-310-22 UC Davis Cyber-Safety Program, see Appendix A) 4. Service Enhancements or Minor Bug Requests To request a non-Blocker modification to the Rice service, Rice customers should enter a ticket in Middleware’s Jira instance. All requests will be planned and scheduled among Anchor Applications Project Managers into future releases. 5. Service Availability, Response Times and Escalation Procedures The Rice service has been architected to support multiple mission critical campus systems. It is because of these systems that the Rice service needs to have the highest availability possible so as not to impact the users of the other dependent systems. Service includes: 24/7/365 High Availability (HA) of virtualization and SAN infrastructure with monitoring Automatic hardware failover in the event of hardware failure or resource contention Nightly whole server guest host backups for Disaster Recovery (DR) retained for three days. These backups do not allow file level recovery. Systems are backed up with all files open and will recover in a “crashed” state. Network Load Balancing 5 24x7 support that will be managed by the Campus Data Center operations team. All systems within the Campus Data Center monitored for security to guard against intrusion and other service attacks. A web site monitoring page, updated every five minutes, indicating the status of every system and service Hours of Coverage, Response Times and Escalation Procedures Rice Team Coverage Periods (turn around times during Non-Business Hours are likely to be slower than during Business Hours) Business Hours (8AM-5PM, Mon-Fri) Non -Business Hours (5PM-8AM Mon-Fri, SAT, SUN, Holidays and Campus Closure Days). “Blocker” Issue Type Notification Process All Blocker Issues or “Major Incidents” are described as an unplanned outage, service degradation, or emergency maintenance outside of planned maintenance that affects or could affect multiple users and/or a mission critical service. Any downtime during the hours that a service is expected to be available, is a major incident. It is possible to have a development system “Blocker” if the issue is impacting a critical path task on a project, although by default, all production “Blocker” tickets will be addressed first. Communication/Escalation Process Regardless of time of day whoever identifies the issue: Call IET Operations (530-752-1566) i. Operations will notify the appropriate support staff member and create a Remedy ticket with the highest priority (Urgent) “Non-Blocker” Issue/Enhancement Type Notification Process Create a Jira Ticket i. Assign to Rice Lead Developer (unless correct resource is known) ii. Assign a priority iii. Assign a due date Response Times (Rice team has acknowledged ownership of the issue and has begun work to resolve) “Blocker” Issue Type The Rice Production system is being auto-monitored and will alarm when the system is unavailable. The SysAdmin will be paged and will begin triaging the issue within 15 minutes. For non-Production Blocker issues, once Operations is contacted, a SysAdmin will take ownership of the issue within 15 minutes. The issue will then be seen through to resolution regardless of time of day. “Non-Blocker” Issue Types “Critical” Issue Type i. Response time = Rice team ownership of the issue will be assumed within 4 business hours, work to resolve the issue will be top priority until fixed unless due date allows work to be scheduled (but worked on only during business hours) “Major” Issue Type i. Response time = Rice team ownership of the issue will be assumed within 2 business days, work will be planned and scheduled within a release (and updated within Jira) “Minor and Trivial” Issue Types i. Response time = No predefined response time. Items are unscheduled and reside in a “wish list” repository and discussed during specific release planning meetings. 6 Prioritization “Blockers and Critical” Issue Types = use stated priorities. If there is more than one ticket in either of these categories relying on the same resource, Rice Service Mgr will work with Anchor Application PMs to negotiate priority order. All other issue types will be planned and scheduled among Anchor Applications Project Managers into releases. Escalation of issues Contact Rice Service Mgr, Rice Architect or Rice Service Owner. Communication “Blockers” Issue Types o Rice Service Manager: Communicate multiple times a day with affected Anchor Application Project Managers (or other stakeholders as necessary, including Tech Lead) via email or phone if necessary. o Anchor Application PM or Tech Lead: Confirm fix ASAP and inform Rice team. “Critical” Issue Types o Rice Service Manager: Send summary email at the end of the business day describing the state of any open “Critical” issues. o Anchor Application PM or Tech Lead: Confirm fix ASAP and inform Rice team. “Non-Blocker” Issue Types o Communication within the “comments” field of Jira tickets is sufficient. Planned Maintenance and Service Changes Release Schedule (Post Version Compatibility): The Rice service will follow the Rice Foundation Roadmap but will intentionally stay one release behind it (or two months after the release, whichever comes first). Advanced Notification/Negotiation Rice Service Manager will notify Anchor Application PMs and Sponsors of releases 90 days in advance via email and by updating the “Rice and Anchor Applications Calendar of Events” schedule. Planned outages from other systems upon which Rice is dependent will need to be communicated and negotiated. Service Exceptions to Coverage Schedule Exceptions to the Rice schedule/processes are possible, but need to be negotiated as least 30 days in advance (example Fiscal Year End close where KFS needs 100% uptime) 6. Uptime Commitment Target is 99.9% uptime Reporting, Reviewing and Auditing Reporting: Status reports, including response time to High Priority tickets (Urgent/Blocker and High/Critical) on the system availability, as described in this document will occur on a monthly basis. These reports will be assembled by the Rice Service Manager. Reviewing: This document will be reviewed on an annual basis and modified accordingly, by the UC Davis Kuali Rice Technical workgroup and approved by the Rice Oversight committee. 7 7. 8. Risks Auditing: The processes employed for system availability will be audited annually internally by the UC Davis Kuali Rice Technical workgroup. Rice System dependencies are documented in the network diagram, posted on confluence: https://confluence.ucdavis.edu/confluence/x/qY-TAQ Signatures Rice Service Owner: _________________________________________ Date:____________________ Anchor Application Sponsor: _______________________________ Date:____________________ 8 Appendix A: DCCS Security Best Practices Checklist The UC Davis Cyber-safety Policy, UC Davis Security Standards Policy (PPM Section 310-22), has been officially adopted by campus to define both responsibilities and key practices for assuring the integrity, availability and confidentiality of UC Davis computing systems and electronic data. The following best practices list includes the key practices from the UC Davis Security Standards Policy and the Cybersafety Program and additional practices that are necessary for servers residing within DCCS. For more information on the policy, see http://manuals.ucdavis.edu/ppm/310/310-22a.htm. For more information on the Cyber-safety Program, see http://security.ucdavis.edu/cybersafety.cfm. This document will be updated as necessary to incorporate changes to security practices due to changes in security risks, technology, and campus policies. DCCS Security Standards in Support of the Campus Cyber-safety Standards Data Center and Client Services Security Best Practices 2009 Checklist The following requirements for network devices on the DCCS network are based on policies that are audited by the Cyber-Safety Program and/or required by DCCS policy. Responsibility Cyber-Safety 2009 Policy Reference DCCS Ref.# Category 1 Software Vulnerabilities 1 Software Vulnerabilities 2 Virus Infections Software Vulnerabilities (Patching) All critical OS and application software patch updates must be applied within 7 days of release. Managers shall make final decision about patching outside of posted maintenance schedule. x Virus Infections/Anti-Spyware Client Anti-virus/anti-spyware software must be installed on Windows and Macs. Updates must be applied within 24 hours of release. x 2 Virus Infections and Anti-Spyware Anti-virus/anti-spyware software must be configured to check and apply updates daily. x 2 Virus Infections and Anti-Spyware 3 Weak Authentication Weak Authentication 9 All default passwords must be modified on initial use. x 3 Weak Authentication All user accounts must be password-protected or use SSH keys or a token. x 3 Weak Authentication x 3 Weak Authentication Standard accounts such as UCD loginID used by system administrators should not have privileged access. x 3 Weak Authentication Group accounts must not be used when logging in to a host. x 3 DC Weak Authentication Configure hosts to exclusively use Unix or windows bastion host services for SSH or remote desktop access to hosts. x If a staff member is no longer affiliated with the Data Center, all privileged-access passwords should be changed. x Publicly accessible on-line applications must guard against brute force attacks. x All system and application passwords must be stored in the safe in Operations All devices must be configured to “lock” and require the user to reauthenticate if left unattended for more than 20 minutes Pass phrase strength shall have a minimum combination of 6 x 1019 or 65.9 bits of entropy or randomness using the formula, entropy = (log2(b))*l This value is a function of the size of the character set (b) and the length of the password (l). x DC Weak Authentication 3 Weak Authentication 3 DC Weak Authentication x 3 DC Weak Authentication x 3 Weak Authentication 4 Insecure Personal Information 4 Insecure Personal Information x Insecure Personal Information Restricted data (SSN, credit card number) must be removed from all computers where it is not required x Client must inform DCCS when a system houses or has access to restricted data. Computers storing restricted data must be running tripwire, must have a host-based firewall, and must log security events to DCCS syslog server. Firewall Services x DC Insecure Personal Information x DC Insecure Personal Information x 5 10 Firewall Services All firewall services are restrictively configured to deny all incoming traffic unless expressly permitted. x 5 Firewall Services System must run a host-based firewall. In addition, systems must be housed behind the DCCS firewall or project-based firewall. x 5 Firewall Services 6 Unnecessary Computer Programs/Services Unnecessary Computer Programs/Services Telnet and FTP should be disabled. SSH should be allowed only through the bastion hosts except for ISUN service. x Insecure network services/processes must be disabled. If an insecure network service is required, use additional security measures to secure the service. x Unnecessary Computer Programs/Services Backup, Recovery, and Disaster Planning 6 Unnecessary Computer Programs/Services 7 Backup, Recovery, and Disaster Planning All critical & sensitive university electronic communication records must be backed up on a regular & frequent basis to separate backup media. x 7 Backup, Recovery, and Disaster Planning Backup media must be protected from unauthorized access & stored in a separate location from source. x 7 Backup, Recovery, and Disaster Planning Backup media must be tested on a regular basis to ensure recoverability. x 7 Backup, Recovery, and Disaster Planning 8 Physical Security 8 Physical Security 8 Physical Security 9 Open Relay Email Proxies 9 Open Relay Email Proxies Physical Security Physical security measures must be implemented to protect critical or sensitive university electronic communication records from theft. x Portable storage devices must not be left unattended if publicly accessible. x Open Relay Email Proxies Hosts connected to network must not permit open e-mail relaying. 11 x x Unrestricted Proxy Servers If a proxy server exists, users must authenticate to the server & meet the campus criteria to access campus licensed intellectual property (e.g. online journals). x Audit Logs 10 Unrestricted Proxy Servers 10 Unrestricted Proxy Servers 11 Audit Logs System logs must be configured to track system access. x 11 Audit Logs System logs must be sent to Data Center syslog server. x 11 Audit Logs DCCS security team shall run yearly audits using Nessus. Results identified as critical by Nessus shall be resolved within 7 days. x 11 DC Security Review Password strength checking software such as Crack must be run yearly on all Data Center password files to check for use of weak passwords. x 3 DC Security Review 12 Security Training 12 Security Training 12 Security Training 14 Release of Electronic Storage 14 Release of Electronic Storage 14 Release of Electronic Storage Security Review Security Training A technical training program is documented for systems staff responsible for security administration. x Campus staff handling critical or sensitive university electronic communication records will receive annual information security awareness program training re: policy and proper handling and controls: http://security.ucdavis.edu/presentations/ITSecurityTutorial.ppt. x x Release of Electronic Storage All data is removed from electronic storage prior to release or transfer of equipment. x Data removal must be consistent with physical destruction of electronic storage device, degaussing or overwriting of data at least 3 times. x x Web Application Security Vulnerabilities Web Application Security Vulnerabilities Web applications developed or acquired by campus units must support secure coding practices. 12 x x 16 Web Application Security Vulnerabilities Web applications must mitigate the vulnerabilities described within the OWASP Top Ten Critical Web Application Security Vulnerabilities. Use Watchfire Appscan Enterprise to analyze the security of your web sites. Sites with high visibility or sites that access restricted data or sites that house publicly writable pages should be analyzed first. x x x x 16 Web Application Security Vulnerabilities 16 DC Web Application Security Vulnerabilities Any system requiring an exception to any of the above must be detailed with an explanation of why the exception is necessary. Procedures to follow in the event of a security incident: REPORTING A SECURITY INCIDENT (https://confluence.ucdavis.edu/confluence/display/IETSP/Security+Incident+Response+Plan+At-A-Glance) IET staff should not attempt to repair problems with their own or their co-workers systems. In the event of a problem, immediately contact your designated desktop support person. If you are not sure who your desktop support person is, please ask your supervisor. When reporting a security incident, you should provide as much information as possible, including: The date and time you discovered the incident General description of the incident The system or data at risk, including the known or suspected presence of personal identifying information on the affected computer Any actions you have taken since you discovered the incident Your contact information 13