APARSENwebinar_201404_02_storagesolutions_Salza

advertisement
Co-funded by the European Union under FP7-ICT-2009-6
Survey on Italian Preservation Repositories
Silvio Salza, salza@dis.uniroma1.it
CINI-Università di Roma “La Sapienza”
Storage Solution Webinar, April 14th 2014
Co-ordinated by
aparsen.eu
#APARSEN
Co-funded by the European Union under FP7-ICT-2009-6
The APARSEN Wp23 questionnaire
• Questionnaire on Storage solutions and Scalability prepared
•
and distributed as part of APARSEN WP23
The questionnaire focused on:
-
Profile of the repositories (mission, volumes, type of objects)
Storage management policy
Organization of the storage system
Cost (TCO) and quality assessment
• CINI designed the questionnaire and was in charge of the
•
Italian survey
8 large repositories in different areas were surveyed
Survey on Italian Preservation Repositories
Silvio Salza, CINI-Università di Roma “La Sapienza”
Webinar on Storage Systems, April 14th 2014
aparsen.eu
#APARSEN
Co-funded by the European Union under FP7-ICT-2009-6
Italian regulations on digital preservation
• Organization of Italian repositories mostly driven by national
•
•
•
regulations
Regulation issued in a 2001 bill and later updated in 2010-14
Are mandatory for Public Administrations since 2001
Private companies must comply as well for some types of
records: health-care records, fiscal records, e-invoices etc.
Quite often the focus is just on complying with the regulations:
the design of the repository and the quality of the
preservation process are not given sufficient attention
Survey on Italian Preservation Repositories
Silvio Salza, CINI-Università di Roma “La Sapienza”
Webinar on Storage Systems, April 14th 2014
aparsen.eu
#APARSEN
Co-funded by the European Union under FP7-ICT-2009-6
Profile of the surveyed repositories
Cultural Heritage
e-Gov
Mission
•
•
•
•
Other
XXXX
XXXXXXXX
XXXX
Number of Digital Objects
< 10%
20% - 100%
> 100%
Yearly increase
Most repositories are active since less than 5 years
High yearly growth rate (average 100%)
Generally a single type of digital object is preserved
Access granted to registered users only
Survey on Italian Preservation Repositories
Silvio Salza, CINI-Università di Roma “La Sapienza”
Webinar on Storage Systems, April 14th 2014
aparsen.eu
#APARSEN
Co-funded by the European Union under FP7-ICT-2009-6
Storage management policy
• Only 50% of the repositories has a formally declared
•
•
storage management policy (the ones in the e-gov area)
None provided a link to a public policy document
Crucial issues:
-
Regular integrity checks (always specified)
Backup interval (always specified)
Data recovery workflow (specified in one case only)
Storage Management policy should always be formally
declared and possibly made public
Survey on Italian Preservation Repositories
Silvio Salza, CINI-Università di Roma “La Sapienza”
Webinar on Storage Systems, April 14th 2014
aparsen.eu
#APARSEN
Co-funded by the European Union under FP7-ICT-2009-6
Three-level storage organization
Access
Preservation
Mirrors the core level and protects
it from external accesses
Backup
Periodical dumps of the core level
Most repositories (but not all) declared a three-level storage
organization:
- Preservation: core level devoted to preservation
- Access: front-end level to support external access
- Backup: back-end level for periodic dumps
Survey on Italian Preservation Repositories
Silvio Salza, CINI-Università di Roma “La Sapienza”
Webinar on Storage Systems, April 14th 2014
aparsen.eu
#APARSEN
Co-funded by the European Union under FP7-ICT-2009-6
Storage implementation
RAID5
RAID1
Access
None
RAID5
HD
WORM
Preservation
RAID5
RAID1
Tape-DVD
Backup
• In 3 cases there was no separate access level
• One repository claimed it unnecessary since it was using a
WORM device (EMC2 Centera) for the core level
• Two others claimed RAID at the core level provided enough
redundancy and the file system provided for write protection
• Backups typically made on a weekly basis
Survey on Italian Preservation Repositories
Silvio Salza, CINI-Università di Roma “La Sapienza”
Webinar on Storage Systems, April 14th 2014
aparsen.eu
#APARSEN
Co-funded by the European Union under FP7-ICT-2009-6
About storage media and systems
• Tape cartridges are OK for backup
• DVD and other consumer-level optical media should be
•
•
avoided as too risky, but are still used in small repositories
RAID replication at the core level is not equivalent to having
a separate level for access (this would be a separate device)
Using a single level of WORM devices, despite their quality,
has some serious drawbacks:
-
These devices typically rely on proprietary firmware
Data can’t be read without the intermediation of the firmware
Replication is still limited to a single device
Survey on Italian Preservation Repositories
Silvio Salza, CINI-Università di Roma “La Sapienza”
Webinar on Storage Systems, April 14th 2014
aparsen.eu
#APARSEN
Co-funded by the European Union under FP7-ICT-2009-6
Local versus geographical replication
• Replication is the key element to achieve reliability
• Different levels of replication:
-
Device: within a given device (e.g. RAID5)
Local: locally but involving different devices
Geographical: replicated data kept in different locations
• Local (and device) replication is vulnerable to catastrophic
•
•
(but not unlikely) events: flood, fire, earthquake
Reliability of RAID systems assumes that faults of different
devices are statistically independent (a tricky assumption!)
If the room where the devices are is flooded all of them will fail
Survey on Italian Preservation Repositories
Silvio Salza, CINI-Università di Roma “La Sapienza”
Webinar on Storage Systems, April 14th 2014
aparsen.eu
#APARSEN
Co-funded by the European Union under FP7-ICT-2009-6
The night of the earthquake
• University of L’Aquila in Central
•
•
Italy maintained and updated
daily a backup copy of its
records in a computing center in
Bologna (some 300 Km away)
On April 9th 2004 an earthquake
destroyed most of the city
Thanks to the geographical
replication, not a single record
was lost
Bologna
L’Aquila
Survey on Italian Preservation Repositories
Silvio Salza, CINI-Università di Roma “La Sapienza”
Webinar on Storage Systems, April 14th 2014
aparsen.eu
#APARSEN
Co-funded by the European Union under FP7-ICT-2009-6
How reliable is my repository?
• Interviewed repository managers were asked to give some
figures to assess the quality of the preservation service:
-
Reliability: probability of not loosing any data in a given time
Availability: percentage of time the system can be accessed
Cost: TCO (Total Cost of Ownership) per TB/year
• Only a few provided answers to these questions
• Only very few answers were credible: one guy claimed his
repository had achieved 100% reliability!! Can he fly too?
Inability to provide these figures is a clear indicator of the
poor level of the design
Survey on Italian Preservation Repositories
Silvio Salza, CINI-Università di Roma “La Sapienza”
Webinar on Storage Systems, April 14th 2014
aparsen.eu
#APARSEN
Co-funded by the European Union under FP7-ICT-2009-6
What about outsourcing?
• All storage levels in the surveyed repositories were in-house
• But part of the questionnaire dealt with outsourcing options
• Cloud storage was proposed as a main option to provide
geographical replication, at least for some storage level
• The result was discouraging: no answer, even if we insisted
• The attitude was like that of children saying: “I won’t eat it,
and I won’t even taste it!”
One guy finally claimed that cloud was too expensive
and too unreliable. But the same guy was unable to
provide any figure for his own reliability and TCO
Survey on Italian Preservation Repositories
Silvio Salza, CINI-Università di Roma “La Sapienza”
Webinar on Storage Systems, April 14th 2014
aparsen.eu
#APARSEN
Co-funded by the European Union under FP7-ICT-2009-6
Conclusion 1: Improve the design process
• A good design should evaluate different alternatives
• Quantitative elements should be used to compare them:
-
TCO
Reliability
Availability
Level and type of replication
Lifespan
• These elements also form the basis for assessing the quality
•
of the preservation service
The storage management policy should be clearly stated
Survey on Italian Preservation Repositories
Silvio Salza, CINI-Università di Roma “La Sapienza”
Webinar on Storage Systems, April 14th 2014
aparsen.eu
#APARSEN
Co-funded by the European Union under FP7-ICT-2009-6
Conclusion 2: Exploit new opportunities
• Improve reliability by exploiting redundancy
• Geographical redundancy is a key element
• Move remotely at least one level of storage: don’t put all
your eggs in one basket
• Overcome prejudices about outsourcing:
-
Why in-house systems should be better?
One may get reasonable control of outsourced resources
Special conditions can be negotiated
• Cloud storage is a great opportunity: it should be carefully
considered before being dismissed
Survey on Italian Preservation Repositories
Silvio Salza, CINI-Università di Roma “La Sapienza”
Webinar on Storage Systems, April 14th 2014
aparsen.eu
#APARSEN
aparsen.eu
Network of Excellence
#APARSEN
Download