Akram Jalal Ahlia University MIS College of Business and Finance 1 COURSE IDENTIFICATION LECTURES: M 16:30 – 18:10 - Class 6. Lab: W 16:30 – 18:10 - Lab 1. INSTRUCTOR: Dr. Akram Jalal Office: Email: akarim@ahlia.edu.bh Phone: Office Hours: Tuesdays, 10am to 11:30am Walk-ins are welcome any time based on availability Dr. Akram Jalal 2 ASSIGNMENTS: Submissions will be in class. Late submissions will be penalized by at the rate of 10% per day. Assignments later than three days will not be accepted. Assignment should be written using computer tool. Hand writing home works are not accepted. All students should achieve their own work according to the Ahlia University rules, which means you must write up your favourite solution independently. Exams: The midterm exam will be given in-class on ----. This exam will covers material and chapters (1, 2 and 5). The date of final exam is based on the final exam timetable. 3 GRADING: Attendance: 10% Assignment: 20%, Midterm: 30%, Final: 40%. Consistent class attendance is required absences will seriously affect your total grade. 4 by Raghu Ramakrishnan & Johannes Gehrke ◦ 3rd edition Available: Next week in the library The readings of this book are required, even if material is not covered in the lectures. All other readings are optional. 5 Course Structure (Outline) Week Hours 1 Lecture 3 Lab --- 2 3 --- ILOs Topics Teaching Method Assessment Method A1 Course Introduction & Familiarisation LecturesClass discussions Oral Participation A1, A2, B1, B2, B3, C3, D2, D4 A1, B2, B3, C1, C3 Introductions to DBMS LecturesClass discussions Oral Participation /Group discussions LecturesClass discussionsLab Exercises LecturesClass discussionsLab Exercises LecturesClass discussionsLab Exercises Oral Participation /Group discussions Oral Participation /Group discussions Oral Participation /Group Assignment All topics weeks 2-5. -- Midterm Exam Structured Query Language (SQL) 1. Basic SQL query LecturesClass discussionsLab Exercises LecturesClass discussionsLab Exercises LecturesClass discussionsLab Exercises LecturesClass discussionsLab Exercises LecturesClass discussions Oral Participation /Group Assignment Oral Participation /Group Assignment Oral Participation /Group Assignment Oral Participation /Group Assignment Oral Participation /Group Discussions Oral Participation /Group discussions Oral Participation /Group discussions 3 1,1/2 1,1/2 4 1,1/2 1,1/2 A1, B2, B3, C1, C3 Database design 5 1,1/2 - A1, A3, B3, D3 Structured Query Language (SQL) 1. Basic SQL query 5 1,1/2 - 6 1,1/2 1,1/2 B1, C1, C3, D4, D3 A1, A3, B3, D3 7 1,1/2 1,1/2 A1, A3, B3, D3 8 1,1/2 1,1/2 Database design 1. Structured Query Language (SQL) Union Intersection and Exception 1. Structured Query Language (SQL) Union Intersection and Exception A1, A3, B3, D3 9 1,1/2 1,1/2 A1, A3, B3, D3 Structured Query Language (SQL) 1. Aggregate operators 10 3 - A1, A3, B2, B3, C1, C3 Tree Structured Indexing 11 3 - A1, A3, B2, B3, C1, C3 Tree Structured Indexing 12 3 - A2, A3, B1, B3, D2, D4 Submitting assignment 1 Transaction Management LecturesClass discussions LecturesClass discussions A large, integrated collection of data, typically describing the activities of one or more organizations. Models a real-world enterprise, e.g. a university database might contain information about: Entities such as: ◦ ◦ students, courses, instructors, Relationships One to One One to Many Many to Many A Database Management System (DBMS) is a software designed to assist in maintaining and utilizing large collection of data. DBMS also needed to store the data in files and write application-specific code to manage it. 7 Data independence. DBMS provides an abstract view of the data that hides data representation and storage. When the DBMS hides certain details of how data is stored and maintained, it provides what is called as the abstract view of data. Complexity (of data and data structure) is hidden from users through several levels of abstraction. Efficient data access. DBMS utilizes a variety of sophisticated techniques to store and retrieve data efficiently. Data integrity and security. If data is always accessed through DBMS, the DBMS can enforce integrity constrains. E.g.1. Before inserting salary, DBMS should check that department budget is not exceeded. E.g.2 access control. Uniform data administration. When several users share the data, centralizing the administration of data can offer the significant improvement. Concurrent access, recovery from crashes. DBMS schedules concurrent access to the data in such a manner that users can think of the data as being accessed by only one user at a time. Reduced application development time: DBMS supports important functions that are common to many applications accessing data in DBMS 1. Large data sets (100s of GBs, or TBs, …) No indices ◦ Finding “George” in huge flat file is expensive Modifications intractable without better data structures ◦ “George” “Georgie” is very expensive ◦ Deletions are very expensive 9 4. Security? ◦ ◦ File system may lack security features File system security may be rough 5. Application programming interface (API)? Interfaces, interoperability 6. How to query the data? 10 A data model is a collection of high level data description constructs. Most of DBMS today are based on relational data model. Semantic data model (such as E-R diagram) is more abstract, high level data model and served as starting point and subsequently translated into a database design. E-R diagram allows us to periodically denote entities and the relationships among them. A schema is a description of a particular collection of data, using the a given data model. e.g. Student (Sid: string, Name: String, age: Integer) The relational data model is the most widely used model today (which is used in numerous systems such as Oracle, DB2, MS Access) is based on relation (which described as a set of records) , basically a table with rows and columns. ◦ ◦ Every relation has a schema, which describes the fields (columns, or attributes). Other data models are: Hierarchical model, the network model, objectoriented model and object-relational model. Each external schema consists of one or more relations and view External schema – View1 Data in DBMS is described at three levels of abstractions. Many External echemas, single conceptual (logical) Conceptual Schema schema and physical schema. ◦ Views describe how users see the Physical Schema data. User 1 External schema – view 2 User 2 User 3 External schema view3 User 4 (one per each DB) (one per each DB) ◦ Conceptual schema defines logical structure. In RDBMS CS describes all relations that are stored in the database. Student(sid: string, Name: String, age: Integer) Faculty ( ….. ◦ Physical schema describes the files and indexes used. It summarizes how the relations described in the conceptual schema are actually stored on secondary storage devices such as disks and tapes. External schema allow data access to be customized (and authorized). DDL (data definition Language) used to define external and conceptual schemas). Concerned with the way individual users see the data. Information about those schemas are stored in system catalogs. Start with SQL DDL to create tables: CREATE TABLE Students ( Name CHAR(30) SSN CHAR(9) PRIMARY KEY NOT NULL, Category CHAR(20) ); Continue with SQL to populate tables: INSERT INTO Students VALUES('Hillary', '123456789', 'undergraduate'); 13 Takes: Students: SSN 123-45-6789 234-56-7890 Courses: CID C20.0046 C20.0056 Name Hillary Barak … Category undergrad grad … SSN 123-45-6789 CID C20.0046 123-45-6789 C20.0056 234-56-7890 C20.0046 … CName Databases Advanced Software semester Spring, 2004 Spring, 2004 Fall, 2003 Ultimately files, but complex 14 Find all the students who have taken C20.0046: SELECT SSN FROM Takes WHERE CID='C20.0046'; Find all the students who have taken C20.0046 previously: SELECT SSN FROM Takes WHERE CID='C20.0046' AND Semester='Fall, 2005'; Find the students’ names: SELECT Name FROM Students, Takes WHERE Students.SSN=Takes.SSN AND CID='C20.0046' AND Semester='Fall, 2005'; 15 Conceptual Schema: ◦ Students(ssn: string, name: string, login: string, age: int, gpa: real) ◦ Courses(cid: string, cname: string, credits: int) ◦ Enrolled(sid:string, cid:string, grade: string) Physical schema: ◦ Relations stored as unordered text files. ◦ Indices on first column of each relation Views: ◦ My_courses(cname: string, grade: string, credits: int) ◦ Course_info(ssn: string, name: string, status: string) 16 failover/rovery concurrent use deal with large datasets? security interop? querying in what? DBMS as application Q: How does a DBMS solve these problems? A: See third part of course, but for now… 17 Consider a database that holds information about airline reservations. At any time, it is possible that several travel agents are looking up information about available seats on various flights and making new seat reservations. When several users access a database concurrently, the DBMS must order their requests to avoid conflicts. ACID test properties ◦ Atomicity all or nothing ◦ Consistency constraints on relationships ◦ Isolation concurrency control simulated solipsism ◦ Durability Crash recovery 18 Key concept is transaction, which is an atomic sequence of database actions (reads/writes). Each transaction, executed completely, must leave the DB in a consistent state if DB is consistent when the transaction begins. Avoiding inconsistent state A DBMS prevents this outcome ◦ Xacts are all or nothing One simple idea: log progress of and plans for each xact ◦ Durability: changes stay made (with log…) ◦ Atomicity: entire xact is committed at once 20 Many users concurrent execution ◦ Disk access is slow (compared to CPU) ◦ don’t waste CPU – keep running Interweaving actions of different user programs but can lead to inconsistency: ◦ e.g., two programs simultaneously withdraw from the same account For each user, should look like a single-user system 21 Each xact (on a consistent DB) must leave it in a consistent state 22 Concurrent execution of user programs is essential for good DBMS performance. ◦ Because disk accesses are frequent, and relatively slow, it is important to keep the cpu humming by working on several user programs concurrently. Interleaving actions of different user programs can lead to inconsistency: e.g., check is cleared while account balance is being computed. DBMS ensures such problems don’t arise: users can pretend they are using a single-user system. Database design: ◦ Entity/Relationship models ◦ Modeling constraints The relational model: ◦ Relational algebra ◦ Transforming E/R models to relational schemas SQL ◦ DDL & query language 24 Why would you choose a database system instead of simply storing data in operating system files? When would it make sense not to use a database system? Explain the difference between external, internal, and conceptual schemas. How are these different schema layers related to the concepts of logical and physical data independence? Which of the following plays an important role in representing information about the real world in a database? Explain each of briefly. 1. The data definition language (DDL). 2. The data manipulation language (DML). 3. The buffer manager. 4. The data model (DM). 25