Database Theory: Back to the Future Victor Vianu UC San Diego / INRIA Predicting the future: a daunting task The Oracle at Delphi Arthur C. Clarke, 1964 Prediction in CS: an inglorious history T.J. Watson, 1943 “I think there is a world market for maybe five computers” Ken Olson, 1977 "There is no reason anyone would want a computer in their home." Herbert A. Simon, 1965 “By 1985 machines will be capable of doing any work Man can do." History of Database Futurology Laguna, Lagunita I and II, Asilomar, NSF Workshops, etc • Prediction: what will happen • Prescription: what researchers should work on Laguna report (1988) Prediction --Top 3 future applications: CASE, CAD, Image and Spatial data --Main issue of debate: object-oriented databases Completely missed the Web revolution around the corner! Prescription: should not work on dependency theory, recursive queries, and new data models “The overwhelming sentiment of the majority of participants is that they did not want to see any more papers on recursive queries. An analogy was drawn to dependency theory which was explored at length a few years ago (…) There was no support for any more data models. The problem of data translation in a heterogeneous computing environment was raised by one participant. This area was discussed and most people believed it to be a solved research problem. “ Niels Bohr: "Prediction is very difficult, especially about the future." Why is prediction so hard? • Difficult to predict new applications • Even harder to predict what technical issues they will raise Think of XML, data integration, data exchange… DB Theory Futurology Christos Papadimitriou, PODS 1995: “Database Metatheory: Asking the Big Queries” • retrospective of db theory • reflections on theory, dynamics of the field good vs. successful positive vs. negative results relationship to practice paradigm shifts and scientific revolutions in CS Future = evolution + revolution extrapolated ??? Database research topics: 1981 - 1995 [Papadimitriou 1995] Database research topics: 1996-2011 Optimization, indexing 15 Relational Theory Semistructured, XML 10 Constraint, spatial db 5 Probabilistic db 97 99 01 03 05 07 09 11 Database research topics: 1996-2011 15 Data integration/exchange 10 Streams Data mining 5 Security, privacy 97 99 01 03 05 07 09 11 Others (1996-2011) • • • • • Recursive queries 13 Workflows and web services 12 Search and ranking 11 Transactions 8 Provenance 4 Can this be extrapolated into the future? No! Can this be extrapolated into the future? No! Clustering by “surface” topic: primary motivation, closely tied to timely applications Alternative: look for “persistent” topics: recurring as fundamental conceptual and technical tools under the wraps Alternative: look for “persistent” topics: recurring as fundamental conceptual and technical tools under the wraps Example: dependency theory • started out as surface topic: relational dbs need dependencies! • but became persistent: crucial tool in data exchange, data integration, query optimization, data cleaning… proof techniques used in all areas Surface topic Persistent topics Data integration/exchange Incomplete information Datalog Views Dependency theory Conjecture: past persistency is a predictor of future persistency! Some persistent themes • • • • • query languages updates dependency theory recursive queries views and incomplete information • connections to broader theory: logic, complexity, automata theory Some persistent themes • • • • • query languages updates dependency theory recursive queries views and incomplete information • connections to broader theory: logic, complexity, automata theory Query languages For any data model: • query language design, semantics • expressiveness and complexity • static analysis and optimization Open question: language for PTIME Updates For any data model: • semantics of updates • update languages • incremental computation view maintenance, constraint checking, etc Views and incomplete information Central as surface topics, inseparable tandem Everywhere under the wraps: • data integration LAV, GAV, certain answers • data exchange • query optimization • data cleaning and repair • uncertain data • privacy Future = evolution + revolution extrapolated persistent themes ??? The revolutionary side: wide open, pregnant with opportunity ! “the future is data + communication” “computer users will spend their time extracting information from multiple data sources” “we need to understand the new kinds of data arising from the web” “we need to study data streams, large data collections” J. Hopcroft Computer science theory to support research in the information age Exciting times for PODS ! computer science is becoming increasingly centered around data, information, knowledge Challenge: Data has many suitors! Challenge: Data has many suitors! • • • • • • • • Mainstream databases Knowledge discovery / data mining Information retrieval Semantic web, searching, ranking High-dimensional data analysis Networks, distributed computing Cloud computing Scientific computing, data visualization Increasingly pervasive: probability and statistics Challenge: Data has many suitors! • • • • • • • • Mainstream databases Knowledge discovery / data mining Information retrieval Semantic web, searching, ranking High-dimensional data analysis Networks, distributed computing Cloud computing Scientific computing, data visualization PODS will have to reinvent its identity! Reliable prediction: PODS will remain a good place to be! • exciting crossroads between hot applications and beautiful theory • researchers inspired by applications while maintaining a long-term perspective Reliable prediction: PODS will remain a good place to be! • exciting crossroads between hot applications and beautiful theory • researchers inspired by applications while maintaining a long-term perspective Will continue to have fun! Best reason for optimism: the human factor • The field started by attracting first-rate theoreticians from other areas in search of trailblazing excitement • It continues to attract incredibly talented young researchers