position paper Workshop database preservation 23 March 2007 – Edinburgh, Scotland Rolf Lang Landesarchiv Baden-Württemberg Germany rolf.lang@la-bw.de overview • takeover • data preparation / storage • use 2 takeover • when – regularly during lifetime • how – see follow up charts • what – typically a reduced subset of data • readability – data as CSV and XML field description • limits – export time, complexity, size 3 takeover MySQL Migration Toolkit 4 takeover MySQL Migration Toolkit • Migration result on 2 databases – first DB works, but • generated export SQL creates wrong syntax for MySQL („init default user“) • results in manual corrections for many tables – second DB fails • Required time to map foreign keys > 4 hours, aborted! 5 takeover siard 6 takeover siard • works, but requires time. • running over night >12h and produce 2.5GByte output, but did still not complete! • complains about wrong data model • results in either correct the source (?) or ignore columns. 7 takeover Oracle SQL Developer • Java based connection 8 takeover Oracle SQL Developer • Seems to work, but – time consuming process – requires additional processing to move to neutral CSV format – may produce large output >4GByte per table. 9 takeover remaining basic problems User level Application level logic Storage table level 10 takeover entity-relationship 11 data preparation • understand data model • reduce complexity • de normalize • tool support ? • verify data integrity & completeness ? 12 data preparation reduce complexity • empty / unused – tables – columns • administrative fields – ignore fields designed for access control, – ins, upd, del user/date (?) • presentation layer – shortcuts – registers 13 data preparation remaining basic problems Vendor/supplier design data know how ++ - Public administration - ++ Archive + + 14 data preparation remaining basic problems • understand the meaning of the data – db table/field name to human readable names – constraints – versioning technique • describe the data – technical – quirks / inconsistencies 15 data preparation storage • as database – basic tables and a standardized query • as XML file – expands data size • as CSV and XML – CSV for data and – XML for field description 16 usage • offer search query options – requires import to a simplified database engine • just data – so processing needs to be done by the customer 17