Document 11189807

advertisement
Digital Records Infrastructure
David Thomas
7 March 2012
What is changing
• Nature of records
• Our understanding of the risk
• Threat profile
• Expectations
Nature of record
Digitised records will replace paper originals
Thi means that
This
h we require:
i
Higher standard of integrity
Higher
g
standard of p
preservation
4
Understanding of the risk
Currently few risks with formats – inter glacial period?
Risks before records are transferred (Digital Continuity)
Poor capture
Sensitivity and closure issues
5
Mixed media collections
Stephen J Gould’s papers at
St f d - 850 b
Stanford
boxes off ttextual
t l
material, approximately 450
audiovisual items
items, and 1
1,180
180
computer media files
6
Understanding of the risk
For good reasons it is not possible to predict the rate of data
l
loss
ffrom a storage system
This doesn’t
doesn t stop manufacturers from making claims:
LTO 1
LTO 2
LTO 3
LTO 4
17 year life
21 year life
30 year life
lif
17 year life
But how many manufacturers give guarantees?
7
E pectations
Expectations
Fast access and quick response to FoI enquiries
The greatest threat is volume, volume, volume
• The volumes are now so huge that only powerful automated
systems can cope – the days of human intervention are over
9
Data volumes arriving at TNA 2012 - 2014
Born-Digital
• 2012 Olympics records – estimated at
30 TB
• The results of the 20-Year Rule change
5 TB
• The Government Web Archive
80 TB
• Total
125 TB
• Other
Oth material
t i l may b
be coming
i 2 – 3 terabytes
t b t ffrom Hill
Hillsborough
b
h
Digitised
• Home
H
G
Guard
d records
d
113 TB
• 1939 National Health Register
93 TB
• Digitisation
Di i i i off military
ili
service
i records
d
181 TB
• Total
387 TB
10
Data storage 2011 - 2020
1400
1200
Terabytes
1000
800
600
400
200
0
Jan-11
Jan-12
Jan-13
Jan-14
Jan-15
Jan-16
Jan-17
Jan-18
Jan-19
Jan-20
How do we compare
3 petabytes
3 petabytes
NARA 184 terabytes
190 terabytes
150 billion web pages
1 1 billion web pages
1.1
12
What we need
• Ability to process and store very large volumes of data
• Abilityy to identifyy formats of files so appropriate
pp p
preservation processes can be implemented in the
future
• Defence
D f
against
i t malware
l
which
hi h might
i ht d
damage th
the
system or attack users
• Ability to ensure the integrity of records by using an
appropriate cryptographic hash function(MD5 or SHA2)
• Abilityy to conduct regular
g
hash checks to determine
whether any bits have been lost
• Ideally use two different software systems in case of a
catastrophic
t t hi failure
f il
off one
• Ability to handle closed or sensitive records
13
Download