Emerging HIPAA and Protected Data Requirements for Research Computing at SDSC Ron Hawkins Director of Industry Relations / TSCC Program Manager April 23, 2014 SAN DIEGO SUPERCOMPUTER CENTER Objectives (for participation) • Understand requirements for protected data processing on HPC systems • Develop a roadmap for implementation at UCSD • Focus on “services” not “projects” • Understand how technology be used to implement protected data environments • Contribute to understanding/solutions/best practices across the community SAN DIEGO SUPERCOMPUTER CENTER What we are being asked for… • dbGaP • Database of Genotypes and Phenotypes • Human genomic studies data administered by NIH • Must apply and must comply with dbGaP Code of Conduct and “Security Best Practices” document • Bottom line: Don’t put the data on the Internet • HIPAA • If you have to ask… SAN DIEGO SUPERCOMPUTER CENTER SDSC Roles/Functions/Services • Operate national HPC systems under XSEDE program (Trestles, Gordon, Comet) • Operate a hybrid “hotel/condo” computing cluster (TSCC) for UC researchers • Operate a co-location facility for UC campuses • Operate several storage chargeback facilities (“Project”, “Cloud”, Commvault) • Conduct sponsored research and operate various individual projects • Work with biotech industry & external research institutes SAN DIEGO SUPERCOMPUTER CENTER Campus Overview Scripps Translational Science Institute Salk Institute SDSC 40GbE to CENIC (100GbE late 2014) Moores Cancer Center 10GbE Campus Network UCSD School of Medicine J. Craig Venter Institute SAN DIEGO SUPERCOMPUTER CENTER SDSC Data Center 5,000 SQ. FT. 12,000 SQ. FT. CAGED CO-LO STORAGE (Multiple Systems) 10GbE Network Fabric CO-LO COMET (late 2014) CO-LO TRESTLES GORDON TSCC CGHub (Cage) MSKCC ANNAI SAN DIEGO SUPERCOMPUTER CENTER TSCC & Project Storage Use Case SDSC DATA CENTER CAMPUS LAB NFS “PROJECT” STORAGE LAB USERS NFS TSCC NFS OTHER SHARES (ON/OFF CAMPUS) LAB SERVER SAN DIEGO SUPERCOMPUTER CENTER HIPAA PROJECTS/EXPERIENCE SAN DIEGO SUPERCOMPUTER CENTER Medicaid Integrity Group Data Engine • • • • • • • • • • The Center for Program Integrity’s Medicaid CI Platform FISMA-Certified, HIPAA-Compliant CMS System of Record Built in 2008/2009, Operations & Maintenance 2009-2016 10+ years of Medicaid claims and reference data (~100 TB) 26 families of Security Controls, over 200 controls, sub-controls Implements NIST SP 800-53 and CMS ARS requirements Data Warehouse, Analysis, BI, and Case Management Tools 350+ Users (CPI, CMS Contractors, CMCS, OIG, DOJ, and Others) 100+ Concurrent Users, 500+ Algorithms, 4000+ Daily Queries Connections to CMS Networks and Data Transfer Capabilities 9 Sherlock Cloud • Infrastructure as a service (IaaS), includes compliance of the entire software architecture and management processes. • Meets federal “Cloud First” requirements and flexibility goals • Maintains the security and oversight aspects of a traditional managed services model • Common standards, reliability, and compliance methods provide economies of scale and a shared management knowledge base. • FISMA-certified, HIPAA compliant, and more open (Agile) environments separate projects and enforce appropriate compliance • Undertaking FedRAMP Cloud Service Provider (CSP) certification, becoming a requirement in many government contracts and grants 10 Sherlock Cloud • Suite of component cloud services: – Storage: File, Block, Database – Compute: Full virtualization; Support for Windows, Linux, and AIX – Shared Services: Backups, Authentication, Configuration Mgt., Ticketing, Logging, High-Speed File Transfer, Remote Access, DNS, etc. – Security: Project-customized firewalling, IDS, and monitoring – Networking: Non-blocking 10Gb networking end to end – Disaster recovery: Multi-site backup and failover capabilities • Used by CMS, NIH, CalIT2, UCSF, UCOP and UCSD • We Evaluate potential clients and only accept partners with a commitment to securely operating their environments. 11 Protected Data on HPC • Researchers value the HPC and storage services provided by SDSC • Startup costs of dbGaP- or HIPAA-compliant “silos” are too much for most projects • There are some workarounds but have limits: • De-identified data • Obtain consent and IRB approval for research use of human subject data (but not PII) • “Projects” lack economies of scale, on-demand service, and elasticity SAN DIEGO SUPERCOMPUTER CENTER What we are doing at present… • Continuing to work with researchers on a project basis • Continuing to evaluate and understand use cases • Examining feasibility of one or more pilot projects in FY 2014 (7/1/14-6/30/15) – under auspices of UCSD’s “Research Cyberinfrastructure” program SAN DIEGO SUPERCOMPUTER CENTER How do we? • Understand requirements and best practices for protected data processing in HPC? • Develop a roadmap for implementation on our campus? • Develop “services” not “projects”? • Deploy technology to implement protected data environments on shared infrastructure? • Contribute to understanding/solutions/best practices across the community? SAN DIEGO SUPERCOMPUTER CENTER THANK YOU! SAN DIEGO SUPERCOMPUTER CENTER