Co-funded by the European Union under FP7-ICT-2009-6 Storage Solutions The use case at the National Library of the Netherlands (KB) Jeffrey van der Hoeven APARSEN webinar, April 14th, 2014 Co-ordinated by aparsen.eu #APARSEN Co-funded by the European Union under FP7-ICT-2009-6 Outline of talk • • • • • • About the National Library of the Netherlands (KB) Storage challenges: creating digital collections Storage solution Cost Future perspective Cloud storage: hot or not… aparsen.eu #APARSEN Co-funded by the European Union under FP7-ICT-2009-6 • Since 1798 / 248 FTE / 53M euro budget • We preserve & give access to everything published in and about the Netherlands • Central role in Dutch information infrastructure • Kept safe: 6M physical publications / 18M digital publications • Goal: everything digital in 2035 aparsen.eu #APARSEN Co-funded by the European Union under FP7-ICT-2009-6 What we do We give open access to: 8 million Newspaper pages online 4,6 2,1 million Online visits million Parlementary pages online aparsen.eu #APARSEN Co-funded by the European Union under FP7-ICT-2009-6 Storage challenges: Creating digital collections aparsen.eu #APARSEN Co-funded by the European Union under FP7-ICT-2009-6 Storage share of digital collections (in GB) aparsen.eu #APARSEN Co-funded by the European Union under FP7-ICT-2009-6 Storage prospect at KB 1800m 1 PB & 1000M files Burj Khalifa Dubai 0,5 PB 1.5 million 828m & CD-ROM’s Empire State Building 500M files 443m 324m Eiffel tour 2010 2011 2012 2018 aparsen.eu #APARSEN Co-funded by the European Union under FP7-ICT-2009-6 Challenges in (long-term) storage • • • • • Volume (size and number of files) Type of data (structured / unstructured) Growth rate Availability vs preservation Cost per TB aparsen.eu #APARSEN Co-funded by the European Union under FP7-ICT-2009-6 Storage solution aparsen.eu #APARSEN Co-funded by the European Union under FP7-ICT-2009-6 IT & Storage at KB Two locations: • In-house = data centre for primary storage and computing • Off-site = for data back-up & archiving • Hosting 230 servers (80 physical / 150 virtual) • Managing 550 TB of data • Managing +/- 500 million files: – PDF, TIFF, JPEG2000, JPEG, XML aparsen.eu #APARSEN Co-funded by the European Union under FP7-ICT-2009-6 Storage Management Storage tiers Very fast, very expensive Gold Used for : indexing, databases HW : SAN with HiPerf SAS disks, near-future: SSD Fast, expensive Silver Used for : web hosting, processing HW : SAN with HiCap SAS disks Slow (45 sec), sustainable Steel Used : long-term archiving HW : Disk-based NAS with WORM Very slow (> 45 sec) Bronze Used for : back-up & restore, archiving HW : LTO4/5 tape aparsen.eu #APARSEN Co-funded by the European Union under FP7-ICT-2009-6 Storage process & strategy Selection Digital processing Stage 1 Access Stage 2 Stage 3 Shared file system(s) / API Stage 4 Stage 5 DB File system Storage management Off-site Bronze Storage on-site Bronze Steel Silver Gold Platinum Back-up aparsen.eu #APARSEN Co-funded by the European Union under FP7-ICT-2009-6 Storage cost Source: http://www.brightsideofnews.com/2011/12/07/your-storage-blog-make-storage-cheaper-and-more-energy-efficient/ aparsen.eu #APARSEN Co-funded by the European Union under FP7-ICT-2009-6 TCO storage • Cost per Terabyte (TB) per year per storage tier • TCO composed of several cost components, based on whitepaper Four Principles for Reducing Total Cost of Ownership (2011 Hitachi) • In total 14 cost components included • In 2014 model was approved by PWC accounting office Referenced article: http://www.hds.com/assets/pdf/four-principles-for-reducing-totalcost-of-ownership.pdf aparsen.eu #APARSEN Co-funded by the European Union under FP7-ICT-2009-6 Hardware & software Maintenance Support Power & cooling Floor space Monitoring Off-site locations Network Waste & duplication aparsen.eu #APARSEN Co-funded by the European Union under FP7-ICT-2009-6 KB TCO storage 2014 per TB per year € 4,858.- € 1,036.- € 1,046.- Steel Silver € 387.- Bronze Gold aparsen.eu #APARSEN Co-funded by the European Union under FP7-ICT-2009-6 KB TCO storage cost over years aparsen.eu #APARSEN Co-funded by the European Union under FP7-ICT-2009-6 KB vs storage providers (cloud) KB aparsen.eu #APARSEN Co-funded by the European Union under FP7-ICT-2009-6 Can we afford it in the future? • Recent developments *: – Disk storage is becoming more popular in archiving. – Physical limits of hard disk drive seems reached. – Kryder’s law seems to fail, as disk storage density seems not to keep up the pace of a yearly 30-40% increase of storage density. – Monopoly of hard disk producers Seagate and Western Digital is risky as prices might go up, especially in case of shortage. Risk: storage costs can become a bottleneck for long-term preservation. * David Rosenthal blog post, available at: http://blog.dshr.org/2012/12/talk-at-fall-2012cni.html aparsen.eu #APARSEN Co-funded by the European Union under FP7-ICT-2009-6 Cloud storage: hot… or not? Storage in the cloud aparsen.eu #APARSEN Co-funded by the European Union under FP7-ICT-2009-6 Benefits of cloud storage • • • • • Scalable Availability Pay per TB per month No need for own ICT infrastructure Less maintenance aparsen.eu #APARSEN Co-funded by the European Union under FP7-ICT-2009-6 However… in preservation terms: • • • • • Is it sustainable? Who is responsible for the data? Which jurisdiction is applied? What if I want to migrate to another cloud? Continuity: no money? No data! • Advise: be cautious to use the cloud for long-term storage. Read on: http://www.ncdd.nl/blog/?p=2347 aparsen.eu #APARSEN Co-funded by the European Union under FP7-ICT-2009-6 Thank you! Questions? Jeffrey DOT vanderhoeven AT kb DOT nl aparsen.eu #APARSEN