Course Introduction Introduction Relational Database Systems: Creating a relational database (DDL) Formulating SQL queries (DML) Embedded SQL Relational algebra Relational model, relational integrity, etc. Data Modeling: Entity-relationship model Converting an E-R schema into a relational schema Web-Based Database Applications PHP and MySQL Drupal Content Management System (CMS) 1 Overview of data management After this lecture, you should be able to: Understand the differences between Information and Data. Understand the differences between Database and Database Management System (DBMS). Why do we need them? Understand Data Modeling in general, Relational Model in particular. Complete the Assignment 1. Introduction 2 Data vs. Information Introduction From data.gov “Data are values or sets of values representing a specific concept or concepts. Data become "information" when analyzed and possibly combined with other data in order to extract meaning, and to provide context. The meaning of data can vary according to its context” (Source: Federal Enterprise Architecture Data Reference Model). 3 Information Management Introduction Modeling an enterprise, which is an application-world with Entities (e.g., students, courses) Relationships (e.g., Garfield is taking CS275) Creating a database with a database management system (DBMS). Manipulating and Maintaining a database 4 Data Encoding Information Data Modeling Entities, Relationships, etc. Numbers, Strings, Records, Pointers, etc. Bits, Bytes, Pages, etc. Electronic Charges, Magnetic polarization, etc. Introduction 5 Database vs. DBMS Introduction A database is an integrated collection of data. A database management system (DBMS) is a software package designed to create and manage a database. The data stored in a database are organized according to the data model supported by the DBMS. A relational database, for example, stores the data in a collection of tables. 6 DBMS Functions Introduction Data definition Data manipulation Security and data integrity Recovery and concurrency control Data dictionary Performance tuning 7 Advantages of Integrated Data Management Introduction Data sharing No (logical) redundancy of stored data Simple and efficient data access Reduced application development time Data integrity and security Concurrent access, recovery from crashes Uniform data administration Economy 8 Problems with Storing Data in Files Introduction Data stored in different files cannot be easily related. Accessing desired records may not be easy. Efficient protection against inconsistency caused by multiple concurrent users not easy to implement. Effective crash recovery not supported. Security and access control not enough. 9 Why Study Databases? Introduction Shift from computation to information scramble to webspace scientific applications Datasets increasing in diversity and volume. Digital libraries interactive video Environmental protection DBMS encompasses most of CS OS Theory Data Structures, Algorithms, Languages Multimedia 10 Data Model A data model is a collection of concepts for describing data. The relational model of data is the most widely used data model today. Main concept: relation, basically a table with rows and columns. Relations can represent entities with attributes and associations among entities. A schema is a description of a particular collection of data, using a given data model. Introduction 11 Example: Student Relation Student (sid: string, name: string, login: string, age: integer, gpa: real) Introduction 12 Data Encoding Information Data Modeling Entities, Relationships, etc. Numbers, Strings, Records, Pointers, etc. Bits, Bytes, Pages, etc. Electronic Charges, Magnetic polarization, etc. Introduction 13 Levels of Abstraction Views (external schema) describe how users see the data. The conceptual schema defines logical structure of the data. The physical schema describes how the data are stored on physical devices (files and indexes used). View 1 View 2 View 3 Conceptual Schema Physical Schema * External and Conceptual schemas are defined using DDL (Data Definition Language); * Data is modified/queried using DML (Data Manipulation Introduction Language). 14 Example: University Database Conceptual schema: Students(sid: string, name: string, login: string, age: integer, gpa: real) Courses(cid: string, cname: string, credits: integer) Enrolled(sid: string, cid: string, grade: string) Physical schema: Relations stored as unordered files. Index on first column of Students. External Schema (View): Course_info(cid: string, no_of_enrollment: integer) Students_gpa_greater_than_3(sid: string, name: string, gpa: real) Introduction 15 Data Independence Applications insulated from how data is structured and stored. Logical data independence Protection from changes in logical structure of data. Old applications should work. Physical data independence Protection from changes in physical structure of data. Data should be accessible even when storage media and/or formats change. Introduction Data Independence is one of the most important benefits of using a DBMS! 16 Layered Architecture of a DBMS Query Optimization and Execution Relational Operators Files and Access Methods Buffer Management Disk Space Management DB Introduction 17 Three Tier Software System Architecture Presentation Tier Logic Tier End users Applications Clients DBMS Server Data Tier Database Introduction 18 Languages Introduction Host Languages C, C++ Java, C# PHP, Python Data Sublanguages (DSLs): DSL = DDL + DML SQL • Data Definition Language (DDL) Used to define the structure of a database • Data Manipulation Language (DML) Used to access and manipulate data CODASYL DBTG Language 19 DBA (Database Administrator) Introduction Defines the conceptual schema Defines the internal schema Talks to the users Defines backup and recovery procedures Conducts performance tuning Conducts security management 20 Summary A database is an integrated collection of data shared by possibly multiple applications. A DBMS is a general software package for creating and managing a database. A DBMS supports query languages, recovery from system crashes, concurrent access, quick application development, data integrity, and security. Levels of abstraction give data independence. A DBMS typically has a layered architecture. DBAs hold responsible jobs and are paid well. Introduction 21