Byte Me: Megabytes, Gigabytes, and Terabytes at the University of Missouri Jeannette E. Pierce Associate Director for Research & Information Services MU Libraries April 18, 2014 1 The University of Missouri • American Association of Universities (AAU) • Research University/Very High (Carnegie) • Innovation and Economic Prosperity University Research by the Numbers - 2012 •Research Expenditures – $239,000,000 •Research Proposals – 1,474 •Active Projects – 1,952 •11 Research Centers •9 Core Facilities Research Data Dilemmas • Storage – secure • Storage – cost effective • Storage – ourselves or with others? • Storage – who pays? • High performance computing capacity • Every lab “an island” vs. “centralized support” Research Data Dilemmas • To keep or not to keep • How long to keep • Public vs. private (open vs. dark) • Discoverability • Access • Curation Did I mention storage? http://www.adaptivecomputing.com/wp-content/uploads/2013/11/43088_original1.jpg Research Data Stakeholders • Researchers – priorities are to identify or create, and analyze • Support the Mission • Create knowledge/contribute to discipline • Use resources efficiently, but also need an efficient research process • Successful grants • Meet compliance obligations • Collaborate with colleagues (locally & beyond) • Citations (author metrics) Research Data Stakeholders • University Administration/Office of Research – priorities are to know what is produced and meet standards/requirements • Support Mission • Enhance institutional identity Successful grants Ensure compliance Identify entrepreneurial opportunities Create educational opportunities • • • • Research Data Stakeholders • Department of Information Technology (DoIT) – priorities are to ensure information security and performance capacity Support Mission • Support Researchers & Administration • Efficient use of resources • • MU Libraries – priorities are data discovery, access, & preservation • Support Mission • Support Researchers & Administration • Efficient use of resources Research Data Stakeholders Worldwide Research Community Research Data Environment • Strong network capacity, but concerns about storage capacity • Highly decentralized data storage & management, leading to some concerns about data security • Individual PI’s responsible for compliance • Few standards, policies, or guidelines relating to research data management, though we do have policies related to data security especially as relates to human subject data Research Data Environment • Data storage capacity driving conversation about data management on campus • Need to create strong data management plans just beginning to lead to more conversation about data management and data accessibility • Requirements for open data recognized, but not yet a driving force for researchers • Researchers need training and available expertise to consult on research data management Cyberinfrastructure Council • Group of research data stakeholders brought together by CIO, Gary Allen, in January 2013. Understand cyberinfrastructure necessary to support research & discovery • Identify strategic investments & services to be provided by IT • • Accomplishments April/May 2013 - Conducted campus survey • August/Sept 2013 - Follow-up survey focused on High Performance Computing • Fall 2013 – Completed CI Plan • Fall 2013 - Sponsored inaugural Cyberinfrastructure Day • Detailed info at: http://doit.missouri.edu/ci/ Cyberinfrastructure Council “Cyberinfrastructure is defined to mean the research environments that support advanced data acquisition, data storage, data management, data integration, data mining, data visualization, and other computer and information processing services. It is meant to include no only the technology but also the human resources necessary to make it useful and effective.” Detailed info at: http://doit.missouri.edu/ci/ MU Campus Cyberinfrastructure (CI) Plan Vision “MU is committed to providing and supporting the cyberinfrastructure necessary to excel in the discovery, dissemination, and application of knowledge in an environment of rapidly changing technologies, so we may optimally fulfill our research, education, outreach, and economic development missions.” http://doit.missouri.edu/pdf-files/ci-plan.pdf MU Campus Cyberinfrastructure (CI) Plan “Research data is an important asset of the University and should be protected and preserved accordingly. The University should provide appropriate data dissemination and data security, preservation and curation services, and researchers need to take advantage of these services.” http://doit.missouri.edu/pdf-files/ci-plan.pdf MU Campus Cyberinfrastructure (CI) Plan Summary of 8 recommended actions: • Effective partnerships with investigators • Participation in regional/national collaborations • Increased availability of hardware & services • Create workforce learning and development opportunities • Raise awareness • Continue to assess priority needs • Maintain & expand network possibilities • Enhance pursuit of partnerships with business & industry http://doit.missouri.edu/pdf-files/ci-plan.pdf MU Campus Cyberinfrastructure (CI) Plan Five working groups: • Data Management Storage & Curation • High Performance Computing • Networking • Security • Data Analytics & Visualization http://doit.missouri.edu/pdf-files/ci-plan.pdf CI Plan Details: Data Management, Storage, and Curation • Existing Resources • MOspace – institutional repository • Data Storage • Basic File Services • Bengal • UMBC Data Storage • Kaltura • Data Transfer • DropOff • Secure TransmIT CI Plan Details: Data Management, Storage, and Curation • Recommendations • Develop a plan to communicate with campus researchers about available storage options and define guidelines for data management and curation in support of federal mandates and preservation of University intellectual assets. • Hire three to five data curation specialists to work with researchers for consultation on implementation of guidelines and compliance with federal requirements. (Libraries & DoIT) • DoIT recently hired for a new position titled “Director, Research Computing Support Service.” • Increase awareness and compliance with evolving data management and curation recommendations/guidelines. • Invest in additional campus and cooperative research-data repository resources to meet growing storage, preservation, and curation needs. CI Plan Details: Information Security • Existing Resources • Policies on data classification and security standards • DoIT security experts • Authenticated identity management • Gaps • Distributed/de-centralized IT support and systems • Varying levels of training • Varying awareness of data classification and security standards CI Plan Details: Information Security • Recommendations • Provide a data security awareness education course for use by faculty and staff • Provide training on best practices for securing, protecting, and curating research data • Provide additional means to secure and disseminate data • DoIt recently implemented Box • Consolidate IT infrastrucuture to ensure compliance • Consider pursuit of Silver certification through InCommon MU Libraries & Research Data • Provide data archive services ICPSR • Government and other publically available statistics • • Support access to scholarly publication Journals • Books • Databases • • Educational outreach & training Promote awareness of resources • Database training • Citation management • Citation metrics • Research consultation services • MU Libraries & Research Data • New Digital Services Department • MOspace – a permanent digital storehouse focusing on works created by those connected the University • Expanding capacity for digital asset management • Campus Collaborations • Cyberinfrastructure Council • Digital Humanities Commons • Digital Curator position in partnership wish Reynolds Journalism Institute • E-Science Institute • Strategic directions (in process) MU Libraries & Research Data • Collaboration with regional/national partnerships • D4 (Data Federation of University Research) – GWLA • • • • Complete an environmental scan of data management initiatives & needs Host a two-day workshop on managing, sharing, and preserving research data Create a plan for a multi-institutional approach to research data management SHared Access Resource Ecosystem (SHARE) – ARL/AAU/AAUP • • • A distributed content and registry layer A discovery layer across repositories A content-aggregation layer National Digital Stewardship Alliance (NDSA) • ORCID • MU Libraries & Research Data • Individual librarians taking initiative to • Consult on data management plans • Share information about author rights • Assist with identifying relevant data repositories • Promote MOspace & identifying content to include in MOspace • Promote open access opportunities • Assist with compiling author metrics • Support Digital Humanities initiatives More on MOspace • Strengths • • • • • • • Persistent URL Open Access Optimized for search engines Institutional identify Curated environment Bit level preservation (file fixity) Ongoing questions related to data inclusion • • • • • • File size File type(s) Need for richer metadata Automated submission process Versioning Rights MU Libraries – Possibilities • Expand capacity to support data management across the disciplines Hire additional expert staff in areas of data curation, programming, and digital preservation • Explore value of new data consortium memberships • • Define specific education/outreach role as relates to data management across the research lifecycle • Examples from profession include training related to file naming, metadata, data documentation, data management plan options, author rights, researcher identity, etc. MU Libraries -- Possibilities • Expand capacity to support Open Access (OA) initiatives • Provide education about benefits of OA • Identify OA opportunities & funding support • Library as publisher? • Continue to strengthen relationship with all campus stakeholders Researchers • DoIT • Office of Research • SISLT (School of Information Science & Learning Technologies) • • Join Digital Preservation Network (DPN) MU Libraries – Ongoing Questions • If we build it, will they come? Should individual libraries provide data repository services? • How do we create awareness and build trust? • Should we only support data that is openly accessible and intended to be preserved? Do we collaborate with IT & researchers to support “dark” archives of data? • How do we acquire new competencies and expertise necessary to support data management? • How do we fund new positions? Questions Contact Jeannette Pierce, piercejea@Missouri.edu