Data Management for XML: Research Directions By: Jennifer Widom Stanford University Reviewer: Kristin Streilein Paper Objectives: whitepaper “thoughts on the research opportunities XML brings to the general area of data management” Not a survey offers “personal opinions and thoughts on Data Management for XML” “written from a true research standpoint” Important Commerical Perspectives not covered How will XML be used? Data exchange format? Data storage format? With or without DTDs? Application interoperation and data integration will still cause problems Proposed mechanisms for inter-document references and proposed extensions or alternatives to DTDs for richer schema definitions not covered Current State of Query Processing of Web Information HTML Pages Needs be preprocessed for meaningful queries Simple keyword-based searches Traditional DBMS Simple & rigid forms-based interfaces Sample XML Research Topics: Ability to map XML-encoded info into a true data model Resolve conflicts from mixing concepts of documents and databases Designing XML databases Theoretical results Practical techniques Relationship between XML DTDs and traditional database schemas Sample XML Research Topics: Query language(s) Database updates in XML setting Efficient physical layout and indexing mechanisms Query Processing View mechanisms How to scale everything to web proportions Lore Project at Stanford: Personal Research Agenda Storage and Indexing Clustering schemes New index types Compression DataGuides and DTDs Build validating into XML database system Encode subelement ordering Performance and functionality tradeoffs (DataGuides & DTDs) Combine DataGuides & DTDs Browse database structure Allow updates propagate database Lore Project at Stanford: Personal Research Agenda Databases and Information Retrieval Keyword search Proximity search Similarity search Other Database Features Views Constraints Triggers Change Management Lore Project at Stanford: Personal Research Agenda Mixing Semistructured and Structured Data Finding the structure Exploiting the structure XML in/on a Traditional DBMS Performace Evaluation Appropriate benchmark for what XML data should look like Type of queries & mix of queries and updates