Pascal Calarco

advertisement

Ontario Library Research Cloud: Building A

ProvinceWide Research Cloud for Ontario’s

Academic Libraries

Pascal V. Calarco, University of Waterloo

IGeLU 2015 September 3, 2015

Agenda

• OCUL Overview

• Problem we’re trying to solve

• Funding and project plan

• Technology overview

• Some likely use cases

• Next steps

• Q&A

Ontario Council of University Libraries

• 21 member libraries

• 420,000 students

• Collaboration in:

– Shared electronic collections

– Planning & assessment

– Digital library services

& infrastructure

Libraries’ Growing Storage Needs

• Digitized physical materials: books, journals, film, audio

– Reformatting to conserve original eg. Acidic paper such as newspapers

– Reformatting to increase access eg. Rare materials

– Format migration to preserve content eg.

16mm film

Libraries Growing Storage Needs

• Born digital scholarly content for long term stewardship:

– E-Theses and supplemental material

– Scholarship: Working papers, Pre-prints, Open

Access

– Research data: numeric, geospatial, image, audio

– Websites and digital ephemera of academic interest

– Donated electronic materials for Special Collections

• John English’s hard drives of personal email correspondence, drafts and other materials

OCUL Storage Survey (2013)

• 10 of 21 institutions responded; six >10k FTE, 4 smaller than 10k

• Preservation & Access Needs:

– 80%: digitized print content

– 80%: faculty publications

– 60%: donated digital content

– 50%: research data

– 50%: GIS data

– 40%: purchased digital resources

– 20%: corporate records

– 20%: E-Theses

OCUL Survey: Storage Needs

• Current storage requirements: 100GB-

30TB; total of respondents: 58.5 TB

• Expected storage needs, next 2-3 years:

– 20% 100TB+

– 40% 10TB-100TB

– 20% >10TB

– 250TB total for all 10 institutions

OCUL Survey: Storage Provisioning

• 80% partner with campus IT often/mostly

• 60% provision in-house often/mostly

• 40% provision with other partner libraries often/mostly

• 30% provision with commercial services often/mostly

OCUL Storage Survey: Top Features

(2013)

• Large storage on demand

• Low cost

• Canadian-based hosting

• Transparent pricing

• Archival quality storage

Storage Architectures and Cost Tiers

Cloud storage options

• Amazon S3/Glacier: $500k/year for current 250TB SP content

– $2000/TB per year, recurring

• DuraCloud: Amazon reseller, adding preservation & mgmt. tools

– $1000-$1500/TB per year, recurring

• Private Cloud: OpenStack

– $280-$350/TB per year, amortized over three years

MTCU Proposal and PIF funding

• 2013/2014: OCUL was awarded $1.2 million

Productivity and Innovation Fund (PIF) funding for OLRC startup

• 50TB per founding partner institution

• Triplestore preservation: content copies at three different co-located nodes for redundancy, error correction

• Text mining portal for stored ScholarsPortal content

Hardware configuration

• Dell selected as hardware vendor.

• Head units: Dell Power Edge R720xd server populated with two 2.8GHz Xeon processors, 256GB of RAM, and two

200GB SSD drives which will be used to run the operating system and the OpenStack software. Each head unit also contains twelve 4TB SAS drives for an internal storage capacity of 48TB.

• Storage shelves: Dell PowerVault MD 1200 storage shelves, directly attached to the server, with each shelf containing twelve 4TB SAS drives, with a total capacity per shelf of 48TB.

• Total initial capacity 3.6 PB raw, triple-redundant, 1.2 PB net

OpenStack

• An open source cloud computing platform, primarily deployed as an

Infrastructure-as-a-Service (IaaS) platform

• Swift – OpenStack object store, store and retrieve data via API

• Integrate OpenStack/Swift to

Digital Repository architectures

• Develop Dropbox-like cloud storage web interface

Use Cases

• Digital Preservation

• Institutional and Personal Storage

• Repositories

• Research Data Management

• Text mining large volumes of digital textual content for research purposes

Digital Curation

Fedora Commons

• Open source digital object repository, that is the underlying architecture behind Islandora, Hydra, and other digital asset management systems.

DSpace

• An open source turnkey institutional repository software for building open access repositories for scholarly and published digital content.

Archivematica & ICAtoM

• An open source digital preservation system designed to maintain standards-based, long term access to collections of digital objects.

Dataverse

• An open source web application for publishing, citing, analyzing and preserving research data.

• Research data management focus

• Access not preservation

Text Mining

• Potential uses by researchers in Digital

Humanities:

– Entity recognition

– Parts of speech analysis

– Topic modeling

– Network analysis

– Visualization

Canadian Text Archive Centre

• Phase 2 development

– Leverage OCUL ScholarsPortal text corpus of books and journals for academic research

– CTAC Advisory Committee being formed

– Tools and service development for students and researchers to create worksets of documents from content in the OLRC

– Bring “analysis to the data”

– June 2015 – May 2016

Current Status & Milestones

• October 2014: integration with Archivematica

• December 2014: integration with DataVerse

• Q1 2015: Storage Nodes finalized; installation of

Waterloo/Guelph/Laurier node

• March 2015: integration with Fedora Commons

• May 2015: Third Hackfest, Text Mining Portal

• June 2015: integration with DSpace

• Fall 2015: Canadian Text Archive Centre Advisory

Committee

Thanks! Questions?

• Pascal Calarco, University of Waterloo

Library pvcalarco@uwaterloo.ca

Download