April 8, 2015 XSEDE Operations Patricia Kovatch, Victor Hazlewood, Justin Whitt Randy Butler, Chris Jordan, Stephen McNally, Steve Quinn, Troy Baer, Linda Winkler XSEDE Operations • Improve user productivity through enhanced – Ease of use – Reliability – Quality assurance • Track metrics to gauge our success and continually improve 2 Operations (1.5 FTE) Patricia Kovatch (.5) Victor Hazlewood (.5) Justin Whitt (.5) NICS Software Support – 3.25 FTEs Troy Baer, NICS (.5) Stuart Martin, GRAM, Uchicago (.25) Raj Kettimuthu, GridFTP, Uchicago (.25) Tom Howe, Registry, Uchicago (.5) PSC (1.25) TACC (.5) Networking – 3.25 FTEs Linda Winkler, UChicago (.25) Paul Wefel, NCSA (.25) Matt Ezell, NICS (1) Kathy Benninger, PSC (.5) Chris Rapier (.25) Joe Lappa, PSC (.5) William Jones, TACC (.5) Accounting and Account Management – 1.5 FTEs Steve Quinn, NCSA (.5) Ester Soriano, NCSA (.75) Ed Hanna, PSC (.25) Systems Operational Support – 12 FTEs Stephen McNally, NICS (.5) Mike Lowe, IU (1) Justin Miller, IU (1) Nada Cagle, NCSA (1) Mark Fredericksen, NCSA (1) Mike Pingleton, NCSA (1) Frank Wells, NCSA (1) Rolf Wilson, NCSA (1) Tom Johnson, IU (.5) Dave Lifka, Cornell (.25) Tim Bouvet, NCSA (.25) Wayne Louis Hoyenga, NCSA, (.25) Rick Mohr, NICS (.5) Dave Carver, TACC (.75) Leo Carson, SDSC (.5) Shava Smallen, SDSC (.5) Tom Howe, Iaas/SaaS, UChicago (.5) Byron Gill, PSC (.1) Anjana Kar, PSC (.2) Kevin Sullivan ,PSC (.1) Jared Yanovich, PSC (.1) Security – 4.25 FTEs Randy Butler, NCSA (.25) Jim Marsteller, PSC (.5) Adam Fest, PSC (.5) Nathaniel Mendoza, TACC (.75) Victor Hazlewood, NICS (.5) Ryan Braby, NICS (.5) James Barlow, NCSA (1) Jim Basney, NCSA (.25) Data Services – 2.25 FTEs Chris Jordan, TACC (.25) Jack Kordas, Uchicago (.5) Chad Kerner, NCSA (.25) Rick Mohr, NICS (.5) Josephine Palencia, PSC (.5) Tomislav Urban, TACC (.25) Deliverables and Goals 1. Security Deploy XSEDE Certificate Authority, deploy two factor authentication service, federate two factor authentication with BW, perform campus bridging with InCommon, provide security auditing services for XSEDE connected hosts, coordinate resource intrusion events; 2. Data Services Deploy XSEDE-wide parallel file system, coordinate data movement and management services, and develop a framework for distributed archival replication; 3. Networking Facilitate end-to-end performance for users, transition to XSEDEnet, peer with R&E network; Deliverables and Goals 4. Software Support Deploy and perform acceptance testing of new capabilities and services into the production XSEDE environment, provide feedback to developers; 5. Accounting and Account Management Maintain current TG automatic distributed accounting and account management service, streamline account creation process, improve user access to stats; 6. Systems Operational Support Provide frontline user support, systems administration for all centralized XSEDE services and monitoring through the 24x7 XSEDE Operations Center XSEDE Services Service Primary Location Replication Location Account and allocation management, usage reporting database servers, and the XD Central Database (XDCDB) SDSC PSC User allocation online request web and database servers (POPS) NCSA PSC XD user portal, collaboration and social networking servers TACC NICS User ticket system database and servers TACC NICS 24x7 computing and networking operations servers and displays for monitoring NCSA IU Website, documentation and document repository servers TACC NICS User news mailing list and email servers TACC NICS Online tutorials with CI-tutor and Virtual Workshop servers NCSA PSC xsede.org DNS NCSA TACC 6 XSEDE Services Service Primary Location Replication Location Grid identity management including Certificate Authority, Public Key Infrastructure, MyProxy servers NCSA PSC Two factor authentication servers NICS PSC Two factor authentication token NICS PSC Inter-SP area parallel file system servers and disk NICS Each SP as appropriate Initial archive replication service NICS TACC XD cross site security logging aggregation service PSC NICS Grid Interface Unit and Resource Namespace Servers (RNS) Every SP N/A Grid services monitoring servers PSC TACC Knowledge Base IU N/A VM hosting IU N/A 7 Operational Metrics Cybersecurity – Security events, logins and login types, security items deployed, security awareness training events Data management and coordination – wide area parallel file system usage and uptime Networking – Network uptime and usage Software maintenance and coordination – Software deployment issues and resolution 8 Operational Metrics – cont’d Accounting and account management – Account creation time for PI and non-PI (Goal: Decrease account creation time to within five business days) System operational support – Deliver 95% uptime on critical centralized services – Respond meaningfully to all tickets within 24 hours – Close 80% of all tickets within two business days 9 Review of activities to July 1 1.1.3.1 Deploy grid middleware infrastructure 1.1.3.3 Deploy account management software 1.1.3.4 Deploy information services infrastructure 1.1.3.5 Deploy common user environment 1.1.3.6 Deploy system of systems test environment 1.1.4.2 Deploy XSEDE website servers 1.2.1.1 Coordinate XSEDE security incident response 1.2.4.1 Test XSEDE software 1.2.6.1 Setup XSEDE Operations Center 1.2.3.1 Transition to XSEDEnet 1.3.2.1 Setup and populate XSEDE.ORG DNS Review of activities to July 1 (continued) 1.2.6.5 Migrate AMIE to stand alone server off of XDCDB at both primary and secondary 1.2.6.5 Upgrade XDCDB hardware at SDSC 1.3.2.1 Deploy XSEDE User Portal (XUP) servers Preview of year 1 activities 1.1.3.2 Deploy data management software 1.2.1.1 Deploy XSEDE Certificate Authority (CA) 1.2.1.2 Develop security awareness program 1.2.1.3 Deploy security authentication program 1.2.1.4 Deploy security tools 1.2.1.5 Deploy security infrastructure 1.2.1.6 Deploy InCommon authentication service 1.2.2.1 Deploy global parallel file system 1.2.2.2 Design archival replication framework Ongoing 1.2.3.1 Maintain and monitor XSEDEnet 1.2.3.2 Tune end-to-end performance 1.2.4.1/2 Test and deploy XSEDE software 1.2.5.1 Maintain accounting and account management databases 1.2.5.2 Provide usage reports 1.2.6.1 Provide frontline user support 24x7 XSEDE Operations Center (XOC) 1.2.6.2 Deploy and support XSEDE system infrastructure 1.2.6.3 Support deploy security tools/infrastructure 1.2.6.4 Report operational metrics (yearly) DNS transition plan • Ops Networking leading the DNS transition • xsede.org primary service moving to NCSA, backup at TACC • Delegation of {site}.xsede.org to sites • XSEDE staff should review DNS needs – Determine teragrid.org entries to duplicate – Determine new xsede.org entries – Review and coordinate with XSEDE L3 manager • XSEDE L3 Manager or delegate submits dns requests in TG help ticket