position paper Workshop database preservation – Edinburgh, Scotland 23 March 2007

advertisement
position paper
Workshop database preservation
23 March 2007 – Edinburgh, Scotland
Rolf Lang
Landesarchiv Baden-Württemberg
Germany
rolf.lang@la-bw.de
overview
• takeover
• data preparation / storage
• use
2
takeover
• when
– regularly during lifetime
• how
– see follow up charts
• what
– typically a reduced subset of data
• readability
– data as CSV and XML field description
• limits
– export time, complexity, size
3
takeover
MySQL Migration Toolkit
4
takeover
MySQL Migration Toolkit
• Migration result on 2 databases
– first DB works, but
• generated export SQL creates wrong syntax
for MySQL („init default user“)
• results in manual corrections for many tables
– second DB fails
• Required time to map foreign keys > 4 hours,
aborted!
5
takeover
siard
6
takeover
siard
• works, but requires time.
• running over night >12h and produce
2.5GByte output, but did still not
complete!
• complains about wrong data model
• results in either correct the source (?)
or ignore columns.
7
takeover
Oracle SQL Developer
• Java based connection
8
takeover
Oracle SQL Developer
• Seems to work, but
– time consuming process
– requires additional processing to move to
neutral CSV format
– may produce large output >4GByte per
table.
9
takeover
remaining basic problems
User level
Application level
logic
Storage table level
10
takeover
entity-relationship
11
data preparation
• understand data model
• reduce complexity
• de normalize
• tool support ?
• verify data integrity & completeness ?
12
data preparation
reduce complexity
• empty / unused
– tables
– columns
• administrative fields
– ignore fields designed for access control,
– ins, upd, del user/date (?)
• presentation layer
– shortcuts
– registers
13
data preparation
remaining basic problems
Vendor/supplier
design
data
know how
++
-
Public administration
-
++
Archive
+
+
14
data preparation
remaining basic problems
• understand the meaning of the data
– db table/field name to human readable
names
– constraints
– versioning technique
• describe the data
– technical
– quirks / inconsistencies
15
data preparation
storage
• as database
– basic tables and a standardized query
• as XML file
– expands data size
• as CSV and XML
– CSV for data and
– XML for field description
16
usage
• offer search query options
– requires import to a simplified database
engine
• just data
– so processing needs to be done by the
customer
17
Download