Introduction to Database Design Donghui Zhang CCIS, Northeastern University Outline Database and DBMS Architecture of Database Applications Database Design Database Application Programming Database, DBMS A Database is a very large, integrated collection of data. A Database Management System (DBMS) is a software designed to store and manage databases. A Database Application is a software which enables the users to access the database. Why DBMS? We currently live in a world experiencing information explosion. To manage the huge amount of data: DBMS the total RDBMS market in 2003 was $7 billion in license revenues. Much more money was spent to develop Database applications. 3000 2500 2000 1500 1000 500 0 Ot he rs Mi cr os of t NC R Te ra da ta Or ac le 2002 2003 IB M #million dollars RDBMS New Liscence Revenue Total revenue: 7.1 billion in 2003. The worldwide database management software market saw double-digit growth in 2004. The five-year forecast calls for a compound annual growth rate of nearly 6 percent, bringing the market to $12.7 billion in new license revenue by 2009. Title: Forecast: Database Management Systems Software, Worldwide, 2003-2009 Author: Colleen Graham, Gartner Time: April 21, 2005 DBMS can Provide … Data independence and efficient access. Reduced application development time. Data integrity and security. Uniform data administration. Concurrent access, recovery from crashes. DBMS Historic Points First DBMS developed by Turing Award winner Charles Bachman in the early 1960s. in 1970, Turing Award winner Edgar Codd proposed the relational data model. in the late 1980s, IBM proposed SQL. Outline Database and DBMS Architecture of Database Applications Database Design Database Application Programming Components of Data-Intensive Systems Three separate types of functionality: Data management Application logic Presentation Example: Course Enrollment -- Build a system using which students can enroll in courses: Data Management • Student info, course info, instructor info, course availability, pre-requisites, etc. Application Logic • Logic to add a course, drop a course, create a new course, etc. Presentation • Log in different users (students, staff, faculty), display forms and human-readable output The Three-Tier Architecture Presentation tier Middle tier Data management tier Client Program (Web Browser) Application Server Database System E.g. What we use Presentation tier Middle tier Data management tier Client Program (Web Browser) Application Server Apache JSP Database System MySQL HTML: An Example <HTML> <HEAD></HEAD> <BODY> <h1>Barns and Nobble Internet Bookstore</h1> Our inventory: <h3>Science</h3> <b>The Character of Physical Law</b> <UL> <LI>Author: Richard Feynman</LI> <LI>Published 1980</LI> <LI>Hardcover</LI> </UL> <h3>Fiction</h3> <b>Waiting for the Mahatma</b> <UL> <LI>Author: R.K. Narayan</LI> <LI>Published 1981</LI> </UL> <b>The English Teacher</b> <UL> <LI>Author: R.K. Narayan</LI> <LI>Published 1980</LI> <LI>Paperback</LI> </UL> </BODY> </HTML> HTML: static vs dynamic Static: you create an HTML file which is sent to the client’s web browser upon request. E.g.: • your CCIS login is ‘donghui’, • your HTML file is /home/donghui/.www/index.html • The URL is http://www.ccs.neu.edu/home/donghui Dynamic: the HTML file is generated dynamically via your ASP.NET code. Another View Client Machines Machine 1 Your database MySQL Machine 2 Your JSP Code Apache Client browser 1 Client browser 2 Client browser 3 Client-Server Architecture Server Client Data Management: DBMS @ Server. Presentation: Client program. Application Logic: can go either way. • If combined with server: thin-client architecture • If combined with client: thick-client architecture Thin-Client Architecture Client Server Client Client • Database server and web server too closely coupled, • E.g. Does not allow the application logic to access multiple databases on different servers. Thick-Client Architecture Client Server Client Client • No central place to update the business logic • Security issues: Server needs to trust clients • Does not scale to more than several 100s of clients Advantages of the Three-Tier Architecture Heterogeneous systems • Tiers can be independently maintained, modified, and replaced Thin clients • Only presentation layer at clients (web browsers) Integrated data access • Several database systems can be handled transparently at the middle tier • Central management of connections Scalability • Replication at middle tier permits scalability of business logic Software development • Code for business logic is centralized • Interaction between tiers through well-defined APIs: Can reuse standard components at each tier Outline Database and DBMS Architecture of Database Applications Database Design Database Application Programming ER-Model Entity: Real-world object distinguishable from other objects. E.g. Students, Courses. An entity has multiple attributes. E.g. Students have ssn, name, phone. Entities have relationships with each other. E.g. Students enroll Courses. Example of ER Diagram time name ssn title phone Students unit cid Enroll Courses To implement the above design, store three tables in the database. Students ssn name Enroll phone 1111 John 617-373-5120 2222 Alice 781-322-6084 3333 Victor 617-442-7798 ssn Courses cid title unit CSU430 Database Design 4 CSG131 Transaction Processing 4 CSG339 Data Mining 4 cid time 1111 CSU430 Fall’03 1111 CSG339 Spring’04 2222 CSG131 Winter’03 2222 CSG339 Spring’04 3333 CSU430 Winter’01 Key Constraint in ER Diagram name ssn dname phone Students did BelongsTo address Departments Many-to-one relationship: no need to be implemented as a table! Students ssn name phone did 1111 John 617-373-5120 1 2222 Alice 781-322-6084 1 3333 Victor 617-442-7798 3 Departments did dname address 1 Computer Science #161 Cullinane 2 Electrical Engineering #300 Egan 3 Physics #112 Richard Some Other Design Concepts Primary key Participation constraint Normal forms (BCNF, 3-NF, etc.) IS-A hierarchy Ternary relationships Outline Database and DBMS Architecture of Database Applications Database Design Database Application Programming SQL Query Find the students in Computer Science Department . • if we know the did is 1: SELECT S.name FROM Students S WHERE S.did=1 • otherwise: SELECT S.name FROM Students S, Departments D WHERE D.did=S.did AND D.dname=`Computer Science’ SQL in Application Code SQL commands can be called from within a host language (e.g., C++, Java) program. Two main integration approaches: • Embed SQL in the host language (Embedded SQL, SQLJ) • Create special API to call SQL commands (JDBC) Implementation of Database System Introduction Donghui Zhang Partially using Prof. Hector Garcia-Molina’s slides (Notes01) http://www-db.stanford.edu/~ullman/dscb.html 32 Isn’t Implementing a Database System Simple? Relations Statements Results 33 Introducing the Database Management System • The latest from Megatron Labs • Incorporates latest relational technolo • UNIX compatible 34 Megatron 3000 Implementation Details Relations stored in files (ASCII) e.g., relation R is in /usr/db/R Smith # 123 # CS Jones # 522 # EE . . . 35 Megatron 3000 Implementation Details Directory file (ASCII) in /usr/db/directory R1 # A # INT # B # STR … R2 # C # STR # A # INT … . . . 36 Megatron 3000 Sample Sessions % MEGATRON3000 Welcome to MEGATRON 3000! & . . . & quit % 37 Megatron 3000 Sample Sessions & select * from R # Relation R A B SMITH 123 C CS & 38 Megatron 3000 Sample Sessions & select A,B from R,S where R.A = S.A and S.C > 100 # A 123 522 B CAR CAT & 39 Megatron 3000 To execute “select * from R where condition”: (1) Read directory file to get R attributes (2) Read R file, for each line: (a) Check condition (b) If OK, display 40 Megatron 3000 To execute “select A,B from R,S where condition”: (1) Read dictionary to get R,S attributes (2) Read R file, for each line: (a) Read S file, for each line: (i) Create join tuple (ii) Check condition (iii) Display if OK 41 What’s wrong with the Megatron 3000 DBMS? Expensive update and search e.g., - To locate an employee with a given SSN, file scan. - To change “Cat” to “Cats”, complete file write. • Solution: Indexing! 42 What’s wrong with the Megatron 3000 DBMS? Brute force query processing e.g., select * from R,S where R.A = S.A and S.B > 1000 - Do select first? - More efficient join? • Solution: Query optimization! 43 What’s wrong with the Megatron 3000 DBMS? No concurrency control or reliability e.g., - if two client programs read your bank balance ($5000) and add $1000 to it… - Crash. • Solution: Transaction management! 44