Lecture 1: Introduction to databases Dr. M. A. Rouf Professor Dept. of Computer Science and Engineering Dhaka University of Engineering and Technology (DUET), Gazipur-1700 Bangladesh Email: marouf.cse@duet.ac.bd, rouf7606@gmail.com Cell phone: 01711-780541 Dr. M. A. Rouf, Dept. of CSE, DUET Database Prehistory Data entry Query processing Storage and retrieval Sorting Dr. M. A. Rouf, Dept. of CSE, DUET Our Hero --- E. F. Codd Edgar F. "Ted" Codd ( August 23, 1923 - April 18, 2003) was a British computer scientist who invented relational databases while working for IBM. He was born in Portland, Dorset, studied maths and chemistry at Oxford. He was a pilot in the Royal Air Force during WWII. In 1948 he joined IBM in New York as a mathematical programmer. He fled the USA to Canada during the McCarthy period. Later, he returned to the USA to earn a doctorate in CS from the University of Michigan in Ann Arbor. He then joined IBM research in San Jose. His 1970 paper “A Relational Model of Data for Large Shared Data Banks” changed everything. In the mid 1990’s he coined the term OLAP. Dr. M. A. Rouf, Dept. of CSE, DUET Database Management Systems (DBMSs) Your Applications Go Here DBMS Raw Resources (bare metal) Dr. M. A. Rouf, Dept. of CSE, DUET Database abstractions allow this interface to be cleanly defined and this allows applications and data management systems to be implemented separately. Today, Database Systems are Ubiquitous Database system design from the European Bioinformatics Institute (Hinxton UK) Service Tools Database design Submission tools Submitters Development DB Production DB End Users Other archives Service DB Add value (computation) Rouf, Dept. of CSE, DUET Q/C etcDr. M. A. Add value (review etc.) Releases Releases & & Updates Updates What is a database system? • A database is a large, integrated collection of data • A database contains a model of something! • A database management system (DBMS) is a software system designed to store, manage and facilitate access to the database Dr. M. A. Rouf, Dept. of CSE, DUET What does a database system do? • Manages Very Large Amounts of Data • Supports efficient access to Very Large Amounts of Data • Supports concurrent access to Very Large Amounts of Data • Supports secure, atomic access to Very Large Amounts of Data Dr. M. A. Rouf, Dept. of CSE, DUET File System Vs DBMS • A company has 500GB of data – – – – – – – Employee info Departments Sales Products Raw materials Shipment Accounts • A 32-bit chine can address up to 4 GB main memory – – – – How can we make query on this 500 GB data? We must protect data from inconsistent update. We must ensure that is restored to a consistent state if system crash. We must secure data to view and update by unauthorized access. Dr. M. A. Rouf, Dept. of CSE, DUET Databases are a Rich Area for Computer Science • Programming languages and software engineering (obviously) • Data structures and algorithms (obviously) • Logic, discrete maths, computation theory – Some of today’s most beautiful theoretical results are in “finite model theory” --- an area derived directly from database theory • Systems problems: concurrency, operating systems, file organisation, networks, distributed systems… Many of the concepts covered in this course are “classical” --- they form the heart of the subject. But the field of databases is still evolving and producing new and interesting research (hinted at in lectures 11 & 12). Dr. M. A. Rouf, Dept. of CSE, DUET What this course is about • According to Ullman, there are three aspects to studying databases: 1. Modelling and design of databases 2. Programming 3. DBMS implementation • This course addresses 1 and 2 Dr. M. A. Rouf, Dept. of CSE, DUET Course Outline Lecture Title 1 Introduction to database 2 Entity-relationship model 3 The relational model 4 Relational algebra 5 Relational calculus 6 Schema refinement: functional dependencies 7 Schema refinement: normalization 8 Online analytical processing 9 Basic SQL and integrity constraint 10 Transactions, recovery, concurrency 11 Database storage, indexes, query execution Dr. M. A. Rouf, Dept. of CSE, DUET Taken By Recommended Reading • Raghu Ramakrishnan, Johannes Gehrke “Database Management Systems” • Elmasri & Navathe, “Fundamentals of database systems”, 4th ed. • Silberschatz, Korth & Sudarshan, “Database system concepts”, 4th ed. (Text Book) • Ullman & Widom, “A first course in database systems”. • Date, “An introduction to database systems”, 8th ed. • OLAP – DB2/400: Mastering Data Warehousing Functions. (IBM Redbook) Chapters 1 & 2 only. http://www.redbooks.ibm.com/abstracts/sg245184.html – Data Warehousing and OLAP Hector Garcia-Molina (Stanford University) http://www.cs.uh.edu/~ceick/6340/dw-olap.ppt – Data Warehousing and OLAP Technology for Data Mining Department of Computing London Metropolitan University http://learning.unl.ac.uk/csp002n/CSP002N_wk2.ppt Dr. M. A. Rouf, Dept. of CSE, DUET Some systems to play with 1. mysql: • www.mysql.org • Open source, quite powerful 2. PostgreSQL: • www.postgresql.org • Open source, powerful 3. Microsoft Access: • Simple system, lots of nice GUI wrappers 4. Commercial systems: • • • Oracle 10g (www.oracle.com) SQL Server 2000 (www.microsoft.com/sql) DB2 (www.ibm.com/db2) Dr. M. A. Rouf, Dept. of CSE, DUET Database system architecture • It is common to describe databases in two ways – The logical level: • What users see, the program or query language interface, describes the stored data in terms of company’s data model. – The physical level: • How files are organised, what indexing mechanisms are used, • It is traditional to split the logical level into two: overall database design (conceptual) and the views that various users get to see • A schema is a description of a database Dr. M. A. Rouf, Dept. of CSE, DUET Three-level architecture External Schema 1 Conceptual level Physical level External Schema 2 External … Schema n Conceptual Schema Internal Schema Dr. M. A. Rouf, Dept. of CSE, DUET External level • Physical level: describes physical storage structure. • Conceptual level: describes the structure for the company users • External level: describes the view for external users. Logical and physical data independence • Data independence is the ability to change the schema at one level of the database system without changing the schema at the next higher level • Logical data independence is the capacity to change the conceptual schema without changing the user views • Physical data independence is the capacity to change the internal schema without having to change the conceptual schema or user views Dr. M. A. Rouf, Dept. of CSE, DUET Database design process • Requirements analysis – User needs; what must database do? • Conceptual design Next Lecture – High-level description; often using E/R model • Logical design – Translate E/R model into (typically) relational schema • Schema refinement – Check schema for redundancies and anomalies • Physical design/tuning – Consider typical workloads, and further optimise Dr. M. A. Rouf, Dept. of CSE, DUET The Fundamental Tradeoff of Database Performance Tuning • De-normalized data can often result in faster query response • Normalized data leads to better transaction throughput, and avoids “update anomalies” (corruption of data integrity) Yes, indexing data can speed up transactions, but this just proves the point --- an index IS redundant data. General rule of thumb: indexing will slow down transactions! What is more important in your database --- query response or transaction throughput? The answer will vary. What do the extreme of Dept. theofspectrum look like? Dr. ends M. A. Rouf, CSE, DUET A Theme of this Course: OLTP vs. OLAP • OLTP = Online Transaction Processing – Need to support many concurrent transactions (updates and queries) – Normally associated with the “operational database” that supports day-to-day activities of an organization. • OLAP = Online Analytical Processing – Often based on data extracted from operational database, as well as other sources – Used in long-term analysis, business trends. Dr. M. A. Rouf, Dept. of CSE, DUET