Research Data Storage Resources at IU Anurag Shankar University Information Technology Services Indiana University March 2, 2012 University Information Technology Services July 17, 2016 Outline • Data Storage Use Cases - Types of research data and the storage they require • Data Storage Services - Where/how to store your data • Storage of HIPAA Regulated Data - Storing sensitive data • Real World Examples - How people are using the storage services University Information Technology Services 7/17/2016 Data Types and Desired Storage Characteristics Type of Data Volume Throughput Access Speed Criticality Data being acquired MB -TB MB/second Fast High (not easy to reproduce) Data in analysis MB -TB MB-GB/s Very fast Low - High (reproducible) Data being published/shared MB - GB KB-MB/s Moderate Low (reproducible) Archival data MB - PB MB-GB/s Slow High if not also stored elsewhere University Information Technology Services 7/17/2016 Research Data Storage Services • • • • • • Data Capacitor Research File System (RFS) Scholarly Data Archive (SDA) Research Database Complex (RDC) Alfresco Share REDCap • Slashtmp University Information Technology Services 7/17/2016 How IU’s Research Data Storage Services Fit Data Types Type of Data Resource/Service Space Available Eligibility Duration Data being acquired RFS, Data Capacitor GB – 100s of TB IU Days – Months Data in analysis RFS, Data Capacitor on Big Red/Quarry MB - TB IU Days Months Data being published/shared Server disk, Alfresco Share, REDCap, Slashtmp MB - GB IU, PU, ND, outside users Months Years Archival data SDA GB - PB IU Years University Information Technology Services 7/17/2016 Data Storage Services by Use Use Service Access Backed Up? High Performance Storage Data Capacitor File System on Big Red/Quarry No Storage for In-Work, Data RFS Mapped Drive, Web, SFTP, OpenAFS client Yes Structured Data Storage RDC (Oracle, MySQL) Applications Yes Shared Document Storage Alfresco Share Web, WebDAV Yes Shared Storage for HIPAA data REDCap Web Yes Archival Storage SDA Web, Mapped Drive, SFTP, Parallel FTP No University Information Technology Services 7/17/2016 Service Targeted For Not Good For RFS Storing relatively small files that are updated and/or accessed frequently, need group access Storing database files, backups SDA Storing large files or small files aggregated (zipped) into large files, long-term storage Storing small files, files requiring frequent/quick access, in work data RDC Relational databases Storing unstructured data Alfresco Share Sharing Word, Excel, PDF, text files Storing data REDCap Storing & sharing HIPAA data General storage Data Capacitor Temporary data being read or written on Big Red/Quarry requiring the fastest speeds available General storage Slashtmp Temporary space to exchange files too large as email attachments General storage University Information Technology Services 7/17/2016 Storage Resource/Service Details Service Technology Capacity RFS OpenAFS 60TB SDA High Performance Storage System (HPSS) 15 PB tape, 150TB disk RDC Oracle, MySQL 200TB Alfresco Share Alfresco Share 1TB REDCap REDCap 1TB Data Capacitor Lustre 360TB University Information Technology Services 7/17/2016 Storage Resource/Service Details Service Default Quota Account Request RFS 100GB http://itaccounts.iu.edu SDA None http://itaccounts.iu.edu RDC 10GB http://itaccounts.iu.edu Alfresco Share None http://www.indianactsi.org/alfrescorequ est REDCap None http://www.indianactsi.org/redcapacr Data Capacitor None Big Red/Quarry account Slashtmp 4GB No Slashtmp account needed, only IU login to use University Information Technology Services 7/17/2016 Storage Resource/Service Details Service Web Access URL More Help at RFS http://rfsweb.iu.edu http://kb.iu.edu/aroz.html SDA http://www.sdarchive.iu.edu http://kb.iu.edu/aiyi.html RDC Application specific http://kb.iu.edu/awmv.html Alfresco Share http://alfresco.uits.iu.edu http://www.indianactsi.org/kb/a lfresco REDCap http://redcap.uits.iu.edu http://www.indianactsi.org/kb/r edcap Data Capacitor N/A (accessed from the Unix command line) http://kb.iu.edu/data/avvh.html Slashtmp http://slashtmp.iu.edu http://kb.iu.edu/data/angt.html University Information Technology Services 7/17/2016 Storage of HIPAA Regulated Data • HIPAA (Health Insurance Portability and Accountability Act) Security Rule regulates electronic protected health information (ePHI), i.e. identifiable patient information • It mandates physical, administrative, and technical controls for storing ePHI University Information Technology Services 7/17/2016 HIPAA Data … • To support IU School of Medicine (IUSM) researchers, RT initiated a project in 2008 to align its systems and services with HIPAA • The project was overseen by a committee consisting of IU’s compliance office, IT security and policy offices, the IUSM CIO, faculty, and IT staff • Alignment included gap and risk analyses by an outside expert, filling gaps, and the creation of an ongoing risk management plan University Information Technology Services 7/17/2016 HIPAA Data … • In 2009, the compliance office blessed RT being capable of handling ePHI • As of Dec. 31, 2011, this has resulted in the following (starting from zero): • • • • • • • Number of biomedical user accounts on RT systems : 2800 Volume of biomedical data stored on RT systems : 500TB Use of computing cycles on RT supercomputers : 1 million SUs Number of biomedical databases : 450 Number of new RT services developed specifically for biomedical researchers : 10 Number of major NIH grants we are written into : 5 Number of FTEs these grants have funded : 6 University Information Technology Services 7/17/2016 Real World Examples • A research group in the IUSM Dept. of Radiology was running out of space in the department to archive digital X-ray images (100-200MB/image). They were able to use the SDA to store tens of thousands of these images and now rely solely on SDA as their image archive. They have over 10TB of data currently stored. University Information Technology Services 7/17/2016 Real World Examples … • A research group needed to use an application to view a certain collection of data at the same time. They stored it in a group area in RFS, mapped it to drive R: on their individual Windows desktops, and accessed it simultaneously from various campus location as well as home/while traveling (using VPN). University Information Technology Services 7/17/2016 Real World Examples … • The state of Indiana collected geospatial data when they flew the state in 2005. Because of its size, no one in the state had the capacity to make these data available to the public. The SDA was used to store all 20TB of orthoquads and serves them currently over the web (see http://gis.iu.edu). University Information Technology Services 7/17/2016 Real World Examples … • A research in the School of Library Information Sciences at IUB wanted to explore relationships between fields of science in scientific journal publications. She was able to use the RDC to host nearly a TB of data in Oracle and do her relational work. University Information Technology Services 7/17/2016 Real World Examples … • An IU researcher had an urgent need for a shared space to store work documents for collaborators within and outside IU. He could not wait for affiliate accounts to be created for external users. He was able to use Alfresco Share to set up multiple collaboration “sites” for the project within the hour and invite collaborators to these sites. Each space provides not only a shared document library, but also shared wiki, blogs, etc. University Information Technology Services 7/17/2016 Real World Examples … • The Division of Biostatistics at IUPUI wanted to help a clinical researcher at IUSM migrate data in spreadsheets to a central, webaccessible database, and allow her to share the database with collaborators at University of Maryland who needed to add patient data (ePHI) they were acquiring. They used REDCap to accomplish all this AND export the data in a format ready for the SAS statistics package to analyze. University Information Technology Services Contact Your single point of contact for all things RT: Anurag Shankar ashankar@iu.edu 812-325-8629 Local Contact: Carol Wood cfwood@iun.edu 219-980-7758 7/17/2016