23 March 2007 PresDb’07 Peter Buneman Univ. of Edinburgh Vassilis Christophides Univ. of Crete and FORTH-ICS P. Buneman & V. Christophides 1 PresDb’07 23 March 2007 PresDB 2007 Informal venue aiming to bring together researchers and practitioners addressing archival issues associated with databases 8 invited talks, 13 short presentations Thanks to speakers Thanks to executive committee Peter Buneman, Bertram Ludaescher, Chris Rusbridge, WangChiew Tan, Ken Thibodeau Thanks to organizing committee Joy Davidson, Yrsa Roca Fannberg, Florance Kennedy, Heiko Mueller P. Buneman & V. Christophides 2 PresDb’07 23 March 2007 The Importance of Scientific Data Much of the data is either impossible to reproduce e.g. climate and demographic data Much of the data can only be recovered at enormous costs e.g. data from high energy physics experiments or space flight missions Nearly every reference manual, dictionary and gazetteer benefits from some form of database management support there has been an explosion in the number of curated databases (e.g., biology) These databases represent a huge investment of human effort! The need for preservation is self-evident P. Buneman & V. Christophides 4 PresDb’07 23 March 2007 Preserving Digital Objects Digital Preservation aims to maintain and add value to a trusted body of digital objects for current and future use digital objects are maintained in an archive without being damaged, lost or maliciously altered (integrity, authenticity) digital objects can be found, extracted and served to a user (accessibility, retrievability) digital objects can be interpreted and understood by the user (readability, interpretability) Preservation life cycle: appraisal, extraction, ingestion, description, maintenance, access, and dissemination P. Buneman & V. Christophides 5 PresDb’07 23 March 2007 Why are Databases Different? How is a DB it different from a “fixed” digital object? has internal structure – understandable by both people and programs It It changes over time It has internal consistency We consider everything from relational and object-oriented databases to data held in XML, scientific data formats and ontologies to be “databases” P. Buneman & V. Christophides 6 23 March 2007 PresDb’07 Preserving Databases Database Preservation poses new technical, economic and legal challenges databases are structured, what do we really need to preserve? Queries (which?), DBMS environment, application semantics databases evolve we need both old versions of the database, and time-base queries about the change (e.g. how has the number of smokers in Greece changed in the past 20 years) databases are centrally managed, data survival depends on the continued existence (funding?) of the host organization can we move to a distributed, redundant model of database preservation? P. Buneman & V. Christophides 7 PresDb’07 23 March 2007 Program 8:45 - 9:00 Registration 9:00 - 9:15 Opening 9:15 - 11:15 A Computer Scientist’s Perspective on Database Preservation 11:30 - 13:00 Brainstorming Session 13:00 - 14:30 Lunch 14:30 - 16:30 An Archivist’s perspective on Database Preservation 16:30 - 16:45 Break 16:45 - 18:30 Brainstorming Session 18:30 - 19:00 Closing Remarks P. Buneman & V. Christophides 8 PresDb’07 23 March 2007 Logistics Don’t forget your registration !!! Full Talks: 30min 20min presentation + 5min discussion Short Talks: 15min 10min presentation + 5min discussion Reminder: load presentations during the breaks! P. Buneman & V. Christophides 9 PresDb’07 P. Buneman & V. Christophides 23 March 2007 Acknowledgements 10 PresDb’07 23 March 2007 Program 8:45 - 9:00 Registration 9:00 - 9:15 Opening Vassilis Christophides 9:15 - 11:15 A Computer Scientist’s Perspective on Database Preservation 9:15 - 9:45 Giorgos Flouris and Carlo Meghini Steps Towards a Theory of Information Preservation 9:45 - 10:15 Mema Roussopoulos A Fresh Look at the Reliability of Long-term Digital Storage 10:15 - 10:45 David Rosenthal Engineering Issues in the Preservation of Databases 10:45 - 11:15 David Gross-Amblard Database Watermarking: Protection by Alteration 11:15 - 11:30 Break P. Buneman & V. Christophides 11 PresDb’07 23 March 2007 Program 11:30 - 13:00 Brainstorming Session Questions to be addressed How do we keep archived databases readable and usable in the long term (at acceptable cost)? How do we separate the data from a specific database management environment? How can we preserve the original data semantics and structure? How can we preserve authenticity and provenance of databases? How can we preserve data while it continues to evolve? How can we have efficient preservation frameworks, while retaining the ability to query different database versions? How can multi-user online access be provided to hundreds of archived databases containing terabytes of data? Can we move from a centralized model to a distributed, redundant model of database preservation? P. Buneman & V. Christophides 12 PresDb’07 23 March 2007 Program 11:30 - 11:45 Peter Buneman Why current database technology does not support preservation 11:45 - 12:00 Panos Vassiliadis, George Papastefanatos, and Timos Sellis Management of the Evolution of Database-Centric Information Systems 12:00 - 12:15 Stefan Brandl and Peter Keller-Marxer Long-term Archiving of Relational Databases with Chronos 12:15 - 12:30 Gabriel David Data Warehouses in the Path from Databases to Archives 12:30 - 12:45 Norman Swindells Sustainable Data - Data representation by standardised information models 12:45 - 13:00 Ulf Andersson Information and Operational applications and LTPA (long term preservation application) 13:00 - 14:30Lunch 13 P. Buneman & V. Christophides PresDb’07 23 March 2007 Program 14:30- 16:30 An Archivist’s perspective on Database Preservation 14:30 - 15:00 John A. Kunze Practical Citation in a World of Evolving Data 15:00 - 15:30 Kevin Ashley Preserving the Imperfect 15:30 - 16:00 Bill Roberts A success story and an unsolved problem 16:00 - 16:30 Michael Lesk Data Preservation: It's a People Problem 16:30 - 16:45 Break P. Buneman & V. Christophides 14 PresDb’07 23 March 2007 Program 16:45- 18:30 Brainstorming Session Questions to be addressed What are the salient features of a database that should be preserved? What are the different stages in the database preservation's life cycle? What documentation is preserved together with a database, and in what format? What are the legal encumbrances on database preservation? What can be learned from traditional archival appraisal for the selection of databases for preservation? To what extent can the preservation strategies, and procedural policies developed by archivists be adapted for databases? P. Buneman & V. Christophides 15 PresDb’07 23 March 2007 Program 16:45 - 17:00 W. Christopher Lenhardt Promoting Trusted Digital Repositories to Support Database Preservation 17:00 - 17:15 Katerina Tsakona Legal Awareness on Database Preservation 17:15 - 17:30 Luís Faria and Rui Castro RODA - Repository of Authentic Digital Objects 17:30 - 17:45 Dirk Roorda MIXED: Migration to Intermediate XML for Electronic Data 17:45 - 18:00 Seamus Ross and Sarah Jones Performing Arts Databases: Use Cases 18:00 - 18:15 Jonathan Bard $2B of irradiation data from the '50s, archived in an incoherent DBMS 18:15 - 18:30 Rolf Lang Position Paper 18:30 - 19:00 Closing Remarks P. Buneman & V. Christophides 16