Θέματα Συστημάτων Βάσεων Δεδομένων Ιστορία, Παρόν και Μέλλον του χώρου των Βάσεων Δεδομένων Πάνος Βασιλειάδης pvassil@cs.uoi.gr Σεπτέμβρης 2003 www.cs.uoi.gr/~pvassil/courses/readings/ Topics Yesterday Today Tomorrow Part of these slides come from Prof. Timos Sellis’ course – many thanx! 2 Topics Yesterday Today Tomorrow 3 History of the field of databases Late 60's: network (CODASYL) & hierarchical (IMS) DBMS. Low-level “record-at-a-time” DML, i.e. physical data structures reflected in DML (no data independence) 1970: Codd's paper -- the relational model. The most influential paper in DB research. Set-at-a-time DML. Data independence. Allows for schema and physical storage structures to change under the covers. Truly important theory, led to "paradigm shift" in thinking and in practice. Papadimitriou: "as clear a paradigm shift as we can hope to find in computer science". Turing award 4 History of the field of databases early-to-mid-70's raging debate between the two camps. "great debate" in 1975 mid 70's: 2 full-function (sort of) prototypes Ingres System R Ancestors of essentially all today's commercial systems 5 History of the field of databases Ingres: UCB 1974-77 a ``pickup team'', including Stonebraker & Wong early and pioneering. Led to Ingres Corp (CA), Sybase, MS SQL Server, Britton-Lee, Wang's PACE. System R: IBM San Jose (now Almaden) 15 PhDs. Led to IBM's SQL/DS & DB2, Oracle, HP's Allbase, Tandem's Non-Stop SQL. System R arguably got more stuff ``right'' Both were viable starting points, proved practicality of relational approach. Beautiful example of theory -> practice!! 6 History of the field of databases early 80's commercialization of relational systems mid 80's SQL becomes “intergalactic standard”. DB2 becomes IBM's flagship product. IMS “sunseted” 7 History of the field of databases 90’s: the age of maturity network & hierarchical essentially dead (though commonly in use!) relational becomes mainstream improvements in terms of transactional facilities, performance and stability Scale, scale, scale… 8 Scale, scale, scale… EOSDIS*: 1 Tb/day, keep it all for 15 years (they need tertiary storage for that) *NASA’s Earth Observing System Data and Information System WalMart: 365 node system, 6Tb online, 4billion row table, 200million updates daily, 4000 queries/day, 1500 users/week, 4 min DS response time w/ avg. 60000 rows Databases make the world go round, mainly due to their ability to handle HUGE amounts of data, RELIABLY!!! Large scale is our business… 9 History of the field of databases Late 90’s: object relational & the web SQL-1999 & early implementations support for ADT’s RDBMS’s as back-end for internet front-ends Application Servers and middleware 10 Topics Yesterday Today Tomorrow 11 VLDB 2003 The International Conference on Very Large DataBases (VLDB) is the top database conference. The 29th VLDB conference was held in Berlin, Germany in Sept. 2003. To accommodate the wide spectrum of papers, VLDB 2003 was organized into three tracks: Core Database System Technology Infrastructure for Information Systems· Industrial Applications & Experience http://www.vldb.informatik.hu-berlin.de/ 12 VLDB 2003 – from the CfP “The Core Database Technology PC will evaluate papers that report on technology that is meant to be incorporated in the database system itself. This includes database engine functions, such as query languages, data models, query processing, views, integrity constraints, triggers, access methods, and transactions in centralized, distributed, replicated, parallel, mobile, and wireless environments. It also includes extended data types, such as multimedia, spatial and temporal data, and system engineering issues, such as performance, high availability, security, manageability, and ease-of-use. Papers on all aspects of active and object databases, storage technology, and data management system architecture should be submitted to 13 the Core Database Technology PC.” VLDB 2003 – from the CfP “The PC covering Infrastructure for Information Systems will evaluate papers that report on methods, issues, and problems faced during the design, development and deployment of innovative solutions for information management. Examples include workflows, advanced transaction processing features, application servers, object monitors, services in support of E-commerce, mediators and other web-oriented data facilities, metadata repositories, data and process modeling, web services, user interfaces and data visualization, data translation and migration, data cleaning, multi-agent systems, and system management.” 14 VLDB 2003 – from the CfP “The PC on Industrial Applications & Experience solicits submissions covering innovative commercial database implementations, novel applications of database technology, and experience in applying recent research advances to practical situations. The track is VLDB's way to foster the exchange of ideas and solutions between research and industry. Application areas include those of Bioinformatics/Life Science, Engineering, Mobile Systems, Enterprise Resource Planning (ERP), and other areas all of which pose technical challenges to the field of data management.” 15 VLDB 2003 Submissions By Track: Core 249 Infrastructure 162 Industrial 46 Grand Total 457 Accepted: 84 (70 research, 1:6) The field is flourishing … getting your paper accepted is hard (nice excuse)!! 16 VLDB 2003 (98) Optimization and Performance (84) Advanced Search, Query, and Approximation (70) Semi-structured Data, XML (64) Internet and WWW Databases / Query Systems (63) Access Methods (44) Data Mining and Knowledge Discovery (32) Infrastructure Challenges and Opportunities (30) Databases and database services: Internet and the WWW (30) Novel / Advanced Database Applications (29) Data Integration / Federation / Mediation (29) Information Retrieval with Database Systems (29) Middleware Data Architectures (29) Special Purpose DB Techn.: Multidimensional Databases … miscellaneous other topics … 17 Topics Yesterday Today Tomorrow 18 The Lowell report -- 2003 Senior database researchers gather every few years to assess the state of database research and to recommend problems and problem areas that deserve additional focus. The previous meetings were held in Laguna Beach, Ca. in 1989, in Palo Alto, Ca. (Lagunitas) in 1990, in Palo Alto, Ca. (Lagunitas II) in 1995, and at Asilomar, Ca. in 1998. The sixth ad-hoc meeting was held May 4-6, 2003 in Lowell, Mass., USA. http://research.microsoft.com/~Gray/Lowell/ 19 Issues for future research (data)Bases for everything Information Fusion Multimedia Querying Uncertain data & Personalization Data Mining Privacy & Trustworthy Systems New User Interfaces 100 year storage 20 … no more data bases … …, it is time to stop grafting new constructs onto the traditional architecture of the past. Instead, we should rethink basic DBMS architecture with an eye toward supporting: Structured data Text, space, time, image, and multimedia data Procedural data, that is data types and the methods that encapsulate them Triggers Data Streams and queues as co-equal first class components within the DBMS architecture both its interface and its implementation rather than as afterthoughts grafted on a relational core. The participants were adamant that one should start with a clean sheet of paper. 21 Issues for future research Information Fusion: Therefore, one must perform information integration on-the-fly over perhaps millions of information sources. … the thorny problem of semantic heterogeneity remains … Multimedia Querying: … to create easy ways to analyze, summarize, search, and view the “electronic shoebox” of a person’s multimedia information. Uncertain data: …query processing must move from a deterministic model, where there is an exact answer for every query, to a stochastic one, where the query processor performs evidence accumulation to get a better and better answer to a user query. 22 Issues for future research Data mining: users … wish for tools that generate some “pearls of wisdom”. A challenge for data mining research is to develop algorithms and structures for sifting through the databases looking for such pearls, while running in background and consuming excess system resources. Another important challenge is to integrate data mining with database querying, optimization, and other facilities such as triggers. 23 Issues for future research Privacy: our community can work on security systems that include a component dealing with the prospective use to which the data will be put. Access decisions should be based not only on who is requesting the data but also on what use it will be put to. New User Interfaces: There is a crying need for better ideas in this area. PV: Major Issue!!! 24 Issues for future research 100 year storage: even archived information is disappearing, because it was captured on a medium that is deteriorating (e.g. photographic film or magnetic tape) or because it was captured on a medium that requires obsolete devices (e.g. special storage drives), or because the application that is needed to interpret the information no longer works (e.g. troff). [we need] mechanisms for migration, to copy information from deteriorating or obsolete media, and for emulation, to capture methods that can interpret information that is stored for long periods (e.g. troff renderer) 25