Lecture01-past_now_future

advertisement
Θέματα Συστημάτων Βάσεων Δεδομένων
Ιστορία, Παρόν και Μέλλον του χώρου των
Βάσεων Δεδομένων
Πάνος Βασιλειάδης
pvassil@cs.uoi.gr
Σεπτέμβρης 2003
www.cs.uoi.gr/~pvassil/courses/readings/
Topics
Yesterday
Today
Tomorrow
Part of these slides come from Prof. Timos Sellis’ course –
many thanx!
2
Topics
Yesterday
Today
Tomorrow
3
History of the field of databases
Late 60's: network (CODASYL) & hierarchical (IMS)
DBMS.
Low-level “record-at-a-time” DML, i.e. physical data structures
reflected in DML (no data independence)
1970: Codd's paper -- the relational model. The most
influential paper in DB research.
Set-at-a-time DML. Data independence. Allows for schema and
physical storage structures to change under the covers. Truly
important theory, led to "paradigm shift" in thinking and in
practice.
Papadimitriou: "as clear a paradigm shift as we can hope to find in
computer science".
Turing award
4
History of the field of databases
early-to-mid-70's
raging debate between the two camps.
"great debate" in 1975
mid 70's: 2 full-function (sort of) prototypes
Ingres
System R
Ancestors of essentially all today's commercial systems
5
History of the field of databases
Ingres: UCB 1974-77
a ``pickup team'', including Stonebraker & Wong early and
pioneering. Led to Ingres Corp (CA), Sybase, MS SQL Server,
Britton-Lee, Wang's PACE.
System R: IBM San Jose (now Almaden)
15 PhDs. Led to IBM's SQL/DS & DB2, Oracle, HP's Allbase,
Tandem's Non-Stop SQL. System R arguably got more stuff
``right''
Both were viable starting points, proved practicality of
relational approach. Beautiful example of theory ->
practice!!
6
History of the field of databases
early 80's
commercialization of relational systems
mid 80's
SQL becomes “intergalactic standard”.
DB2 becomes IBM's flagship product.
IMS “sunseted”
7
History of the field of databases
90’s: the age of maturity
network & hierarchical essentially dead (though
commonly in use!)
relational becomes mainstream
improvements in terms of transactional facilities,
performance and stability
Scale, scale, scale…
8
Scale, scale, scale…
EOSDIS*: 1 Tb/day, keep it all for 15 years (they need
tertiary storage for that)
*NASA’s Earth Observing System Data and Information System
WalMart: 365 node system, 6Tb online, 4billion row table,
200million updates daily, 4000 queries/day, 1500
users/week, 4 min DS response time w/ avg. 60000 rows
Databases make the world go round, mainly due to their
ability to handle HUGE amounts of data, RELIABLY!!!
Large scale is our business…
9
History of the field of databases
Late 90’s: object relational & the web
SQL-1999 & early implementations
support for ADT’s
RDBMS’s as back-end for internet front-ends
Application Servers and middleware
10
Topics
Yesterday
Today
Tomorrow
11
VLDB 2003
The International Conference on Very Large DataBases
(VLDB) is the top database conference. The 29th VLDB
conference was held in Berlin, Germany in Sept. 2003.
To accommodate the wide spectrum of papers, VLDB
2003 was organized into three tracks:
Core Database System Technology
Infrastructure for Information Systems·
Industrial Applications & Experience
http://www.vldb.informatik.hu-berlin.de/
12
VLDB 2003 – from the CfP
“The Core Database Technology PC will evaluate papers
that report on technology that is meant to be incorporated
in the database system itself. This includes database
engine functions, such as query languages, data models,
query processing, views, integrity constraints, triggers,
access methods, and transactions in centralized,
distributed, replicated, parallel, mobile, and wireless
environments.
It also includes extended data types, such as multimedia,
spatial and temporal data, and system engineering issues,
such as performance, high availability, security,
manageability, and ease-of-use. Papers on all aspects of
active and object databases, storage technology, and data
management system architecture should be submitted to
13
the Core Database Technology PC.”
VLDB 2003 – from the CfP
“The PC covering Infrastructure for Information Systems
will evaluate papers that report on methods, issues, and
problems faced during the design, development and
deployment of innovative solutions for information
management.
Examples include workflows, advanced transaction
processing features, application servers, object monitors,
services in support of E-commerce, mediators and other
web-oriented data facilities, metadata repositories, data and
process modeling, web services, user interfaces and data
visualization, data translation and migration, data cleaning,
multi-agent systems, and system management.”
14
VLDB 2003 – from the CfP
“The PC on Industrial Applications & Experience solicits
submissions covering innovative commercial database
implementations, novel applications of database
technology, and experience in applying recent research
advances to practical situations. The track is VLDB's way
to foster the exchange of ideas and solutions between
research and industry. Application areas include those of
Bioinformatics/Life Science, Engineering, Mobile
Systems, Enterprise Resource Planning (ERP), and other
areas all of which pose technical challenges to the field of
data management.”
15
VLDB 2003
Submissions By Track:
Core
249
Infrastructure 162
Industrial
46
Grand Total 457
Accepted: 84 (70 research, 1:6)
The field is flourishing …
getting your paper accepted is hard (nice excuse)!!
16
VLDB 2003
(98) Optimization and Performance
(84) Advanced Search, Query, and Approximation
(70) Semi-structured Data, XML
(64) Internet and WWW Databases / Query Systems
(63) Access Methods
(44) Data Mining and Knowledge Discovery
(32) Infrastructure Challenges and Opportunities
(30) Databases and database services: Internet and the WWW
(30) Novel / Advanced Database Applications
(29) Data Integration / Federation / Mediation
(29) Information Retrieval with Database Systems
(29) Middleware Data Architectures
(29) Special Purpose DB Techn.: Multidimensional Databases
… miscellaneous other topics …
17
Topics
Yesterday
Today
Tomorrow
18
The Lowell report -- 2003
Senior database researchers gather every few years to
assess the state of database research and to recommend
problems and problem areas that deserve additional focus.
The previous meetings were held in Laguna Beach, Ca. in
1989, in Palo Alto, Ca. (Lagunitas) in 1990, in Palo Alto,
Ca. (Lagunitas II) in 1995, and at Asilomar, Ca. in 1998.
The sixth ad-hoc meeting was held May 4-6, 2003 in
Lowell, Mass., USA.
http://research.microsoft.com/~Gray/Lowell/
19
Issues for future research
(data)Bases for everything
Information Fusion
Multimedia Querying
Uncertain data & Personalization
Data Mining
Privacy & Trustworthy Systems
New User Interfaces
100 year storage
20
… no more data bases …
…, it is time to stop grafting new constructs onto the
traditional architecture of the past. Instead, we should
rethink basic DBMS architecture with an eye toward
supporting:
Structured data
Text, space, time, image, and multimedia data
Procedural data, that is data types and the methods that encapsulate
them
Triggers
Data Streams and queues
as co-equal first class components within the DBMS
architecture  both its interface and its implementation 
rather than as afterthoughts grafted on a relational core.
The participants were adamant that one should start with a
clean sheet of paper.
21
Issues for future research
Information Fusion: Therefore, one must perform
information integration on-the-fly over perhaps millions of
information sources. … the thorny problem of semantic
heterogeneity remains …
Multimedia Querying: … to create easy ways to analyze,
summarize, search, and view the “electronic shoebox” of a
person’s multimedia information.
Uncertain data: …query processing must move from a
deterministic model, where there is an exact answer for
every query, to a stochastic one, where the query processor
performs evidence accumulation to get a better and better
answer to a user query.
22
Issues for future research
Data mining: users … wish for tools that generate some
“pearls of wisdom”.
A challenge for data mining research is to develop
algorithms and structures for sifting through the databases
looking for such pearls, while running in background and
consuming excess system resources.
Another important challenge is to integrate data mining
with database querying, optimization, and other facilities
such as triggers.
23
Issues for future research
Privacy: our community can work on security systems that
include a component dealing with the prospective use to
which the data will be put. Access decisions should be
based not only on who is requesting the data but also on
what use it will be put to.
New User Interfaces: There is a crying need for better
ideas in this area.
PV: Major Issue!!!
24
Issues for future research
100 year storage: even archived information is
disappearing, because it was captured on a medium that is
deteriorating (e.g. photographic film or magnetic tape) or
because it was captured on a medium that requires obsolete
devices (e.g. special storage drives), or because the
application that is needed to interpret the information no
longer works (e.g. troff).
[we need] mechanisms for migration, to copy information
from deteriorating or obsolete media, and for emulation, to
capture methods that can interpret information that is
stored for long periods (e.g. troff renderer)
25
Download