WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam WFM 5201: Data Management and Statistical Analysis Lecture-07: Database Management System Akm Saiful Islam Institute of Water and Flood Management (IWFM) Bangladesh University of Engineering and Technology (BUET) June, 2008 Slide 1 WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam Outline Database Management System Introduction to Databases File System Vs. Databases Advantages of using databases Data Models – Hierarchical, network, relational, object oriented Overview of Relational Database Slide 2 WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam Introduction to Databases Information Systems process and manage data. Data Management involves “Capturing”, “Retrieval,” and “Storage” of data. Database Management Systems (DBMSs) are Computer systems that manage data in databases. Today’s DBMSs are based on sophisticated software and powerful computer hardware. Well known DBMS software includes ORACLE, Microsoft SQL Server, Sybase and MySQL(free download) among others. Slide 3 WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam File Organisation Sequential Files records are stored in a fixed sequence records can only be read in that sequence, starting from the first record records can only be added at the end of the file (append) sequential files are not efficient Indexed Files Use an index to access records in a random fashion. Records can be sorted according to an attribute or preference. (e.g Alphabetically, Ascending, Descending, etc.) Indexed files are efficient, and faster to access. Slide 4 WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam The File Systems Approach General Ledger Personnel File File Production Planning Payroll File File Invoicing Inventory File File Despatch Order Entry File File Redundant Data Storage. One file is used in each application. No data sharing. Cross-application transfers are difficult to manage and achieve. File Systems are rarely used for data processing anymore. Slide 5 WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam The Database Approach General Ledger Production Planning Personnel Payroll Invoicing Despatch Inventory Order Entry Compactness. Data is stored in a single logical “place.” Data can be shared and related between applications Data transfer between applications is easier Used for a wide range of applications. Slide 6 WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam Database Characteristics • Amount Database size depends on the number of records or files it contains. Complexity Database complexity depends on the number of relations between the files. Volatility A measure of the changes typically required in a given period of time. Immediacy A measure of how rapidly changes must be made to data. Slide 7 WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam Advantages of using a Database Approach Flexible Data Access. DBMSs have various tools to manipulate, query, or report data, such as Structured Query Language (SQL), and Report Generators. Hence: Selected data is easily retrieved A DBMS can accommodate different data views for different users Improved Data Integrity. Modern DBMSs consist of various tools and methods to: ensure that data is correct, consistent, and current verify data input and check whether data is ‘reasonable’. Slide 8 WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam Advantages of DBs (continued) Improved Data Security. Tools such as password access, and encryption, ensure that data is not: deliberately or accidentally damaged or changed accessed without proper authorisation Data Independence. Problems arising from the interdependence of data and programs are kept to a minimum. Reduced Data Redundancy. Single version of the truth. Efficient data storage. Efficient time management of Hardware (CPU), programmer(s), analyst(s) and user(s). Relational DBs use Normalisation to reduce data redundancy. Slide 9 WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam Advantages of DBs (continued) Ability to Share and Relate Data. Different user groups can use the same data. Data in different (physical or logical) parts of the system can be related for a certain application. Standardisation of Data. In general data items have common names and storage format. Increased Productivity. The various tools reduce the complexity that is otherwise associated with DB maintenance when changes are required to the system. For example Law changes, Economy Changes, User Changes. Slide 10 WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam Costs of Database Approach The implementation and use of DBMSs is normally associated with various costs. Such as: Initial expenses involve planning costs, and consultancy fees. Computer hardware costs. Software costs. Database Administrator costs, and staff training costs. Conversion costs of an existing system. Various operational costs. Slide 11 WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam Data Models 1. Hierarchical 2. Network 3. Relational 4. Object Slide 12 WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam 1. Hierarchical Model Stores data as hierarchically related to each other. Record shape are tree structure. BUET Faculty of Civil Engineering CE WRE Faculty of Architectural URP Archit. Slide 13 WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam Hierarchical Database Model Slide 14 WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam Hierarchical Database Model Logically represented by an upside down tree Each parent can have many children Each child has only one parent Slide 15 WFM 6202: Remote and GIS Water Management © Dr. Akm Saiful Islam WFM 5201: Data Sensing Management andin Statistical Analysis © Dr. Akm Saiful Islam Hierarchical Model Several records or files are hierarchically related with each other. For example, an organization has several departments, each of which has attributes such as name of director, number of staffs, annual products etc. Each department has several divisions with attributes of name of manager, number of staffs, annual products etc. Then each division has several sections with attributes such as name of head, number of staff, number of PCs etc. Slide 16 WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam Advantage and Disadvantages of Hierarchical Model Advantages High speed access to large databases Easy to update- (to add or delete new nodes) Disadvantages Links are only possible in Vertical Direction (from top to bottom) but not for horizontal or diagonal unless they have same parents. For example, it is hard to find what is the relation between URP and DCE from this data model. Slide 17 WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam 2. Network Database Model Doesn’t force data into hierarchical levels Owner/Member relationships: Owner record type Member record type Each owner may have one or more member types Each member type and corresponding owner record type form set, which represents relationship Slide 18 WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam Network Database Model Slide 19 WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam Network Database Model Each record can have multiple parents Composed of sets - relationships Each set has owner record and member record Member may have several owners A set represents a 1:M relationship between the owner and the member Figure 1.10 Slide 20 WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam 3. Relational Model Student Table Based on two important concepts: Key of relation - one to one, one to many, many to many Primary attribute – which can’t be duplicate Student Table * * Course Table Many to many relationship Student ID Name CourseID 1 Mr. X 001 2 Mr. X 002 3 Mr. Y 003 Course table Cour seID Title Cre dit 001 RS & GIS in WM 3 002 Watershed Hydrology 3 003 Risk Management 3 Slide 21 WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam Relational Database Relational database is the most popular model for GIS. For example, the following relational database softwares are widely used. - INFO in ARC/INFO - DBASE III for several PC-based GIS - ORACLE for several GIS uses In a relational model, the following two important concepts should be defined. Key of relation ; a subset of attributes Unique identification ; e.g. the key attributes is a phone directory in a set of last name, first name and address. non redundancy ; any key attribute selected and tabulated should keep the key's uniqueness. e.g. address can not be dropped from telephone address, because there may be many with the same names. Prime attribute : an attribute listed in at least one key. The most important point of the relational database design is to build a set of key attributes with a prime attribute, so as to allow dependence between attributes as well as to avoid loss of general information when records are inserted or deleted. Slide 22 WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam Relational Database Model Slide 23 WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam Relational Database Model Figure 1.11 Slide 24 WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam SQL What is it? Structured Query Language Used in ORACLE and other DB systems Non-procedural - i.e. Specify what you want not how to get it SQL - (also pronounced SEQUEL) directly related to the development of the RELATIONAL MODEL by E.F.Codd. Slide 25 WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam SQL SQL is used to perform query in relations databases. For example, find the name of the student who took more than or equal to 6 credit hour in this term SELECT Student.Name, Course.Credit FROM Student, Course WHERE Student.CourseID = Course.CourseID AND Credit >= 6 The answer is : Mr. X 6 Slide 26 WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam Find the relationship between this two tables in the BUET Library Book Table ISBN Title Author 050 Applied David Hydrology Maidmen 060 Irrigation Cheng Borrow Table ID Name ISBN 1 2 3 Mr. P Mr. Q Mr. R 050 060 070 One to one Many to Many One to Many ? Slide 27 WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam Normalization of an Un-normalized Table to relational database Slide 28 WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam Advantage of Relational Database Advantages there is no redundancy. type of building of an owner can be changed without destroying the relation between type and rate. a new type of building for example "Clay" can be inserted. (row insert is easy). Disadvantages Require a number of tables and relationship Its difficult to add a new column in the table. Slide 29 WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam 4. Object Databases Current generation systems have a need to handle complex data for complex applications such as computer aided design computer aided software engineering geographic information systems interactive web sites Relational systems are inadequate for these systems Why do you think this is? Slide 30 WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam Object Database Types Object-oriented extend a programming language such as Java with persistency and a query language Object-relational extend a current RDBMS (e.g. Oracle) with object-oriented extensions Slide 31 WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam Object Oriented Model BUET Part of Part of Departments Is a Is a CE Institutes Is a URP DCE IWFM AIT WRE Is a = Inheritance Part of = association Attributes: Faculty, Staff, Students Slide 32 WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam Object Oriented Database An Object Oriented model uses functions to model spatial and non-spatial relationships of geographic objects and the attributes. An object is an encapsulated unit which is characterized by attributes, a set of orientations and rules. An object oriented model has the following characteristics. generic properties : there should be an inheritance relationship. abstraction : objects, classes and super classes are to be generated by classification, generalization, association and aggregation. adhoc queries : users can order spatial operations to obtain spatial relationships of geographic objects using a special language. Slide 33 WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam Example of Object Oriented Model Slide 34 WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam 5. Object-Relational Database Model Object-relational database management systems (ORDBMS): Combine: Ability of object technology to handle advanced relationship types Data integrity, reliability, and recovery features of relational models Most popular and powerful of modern database system applications Oracle, Microsoft SQL Server Slide 35 WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam Object-Relational Database Table Slide 36 WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam Overview of Relational Database Slide 37 WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam What is a Relational Database? A database is more than just a collection of information - such as student and course information, faculty and grades. A database is a representation of the people and things your business needs to operate, and the way those people and things relate to each other. A database system supports the business rules defined by the customer. Slide 38 WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam Logical to Physical Database Design The Entities in the Logical Data Model are translated into Tables in the physical database design The entity attributes become columns of each table in the database Data type (numeric, character, date) Business rules for the legal values for the column (the domain of the column) Slide 39 WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam Data Models A data model is a collection of concepts for describing data. A schema is a description of a particular collection of data, using the given data model. The relational model of data is the most widely used model today. Main concept: relation, basically a table with rows and columns. Every relation has a schema, which describes Slide 40 WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam Example: University Database Conceptual schema: Students(sid: string, name: string, login: string, age: integer, gpa:real) Courses(cid: string, cname:string, credits:integer) Enrolled(sid:string, cid:string, grade:string) Physical schema: Relations stored as unordered files. Index on first column of Students. External Schema (View): Course_info(cid:string,enrollment:integer) Slide 41 WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam Instance of Students Relation Students( sid: string, name: string, login: string, age: integer, gpa: real ) sid 53666 53688 53650 name Jones Smith Smith login age jones@cs 18 smith@ee 18 smith@math19 gpa 3.4 3.2 3.8 Slide 42 WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam Levels of Abstraction Many external schemata, single conceptual(logical) schema and physical schema. External schemata describe how users see the data. Conceptual schema defines logical structure Physical schema describes the files and indexes used. External Schema 1 External Schema 2 External Schema 3 Conceptual Schema Physical Schema * Schemas are defined using DDL; data is modified/queried using DML. Slide 43 WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam Database Terminology Tables within a relational database hold sets of data using rows and columns Rows (records) appear horizontally in a report, and contain one or more columns Columns (fields) are named data elements and appear vertically in a report Primary Keys identify uniqueness in a row Indexes are created for faster access to the data in the database Slide 44 WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam Basic Database Concepts Table A set of related records u Record – A collection of data about an individual item u Name: Barry Harris College: Medicine Tel: 392-5555 Name: Barry Harris College: Medicine Tel: 392-5555 Field – A single item of data common to all records Name: Barry Harris Slide 45 WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam An Example of a Table Fields Records Name GatorLink Phone College Graff rgraff 392-3900 Pharmacy Harris bharris 392-5555 Medicine Ipswich zipswich 846-5656 PHHP Slide 46 WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam Different parts of a database Fields – different types of data (number or text) Records Queries Reports Slide 47 WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam Concepts pf Relational Database Based on two important concepts: Key of relation - one to one, one to many, many to many. attribute – which can’t be duplicate Primary Slide 48 WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam Primary Key The column or set of columns that provide the uniqueness for the row. A table can have only one primary key. Existing values in primary key columns may not be modified (insert new value and then delete old value) The table of a relationship containing the primary key is called the Parent Table. Slide 49 WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam Foreign Keys A primary key referenced from another table is called a foreign key For each foreign key value, there must be a row in a table whose primary key has the same value. The foreign key can be made up of one or more columns of a table but must match the primary key it is referencing A table can have any number of foreign keys. Slide 50 WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam Primary Keys & Foreign Keys Name User Phone College Graff rgraff 392-3900 Pharmacy Harris bharris 392-5555 Medicine Ipswich zipswich 846-5656 PHHP To ensure that each record is unique in each table, we can set one field to be a Primary Key field. A Primary Key is a field that that will contain no duplicates and no blank values. Foreign Keys link to data in other tables Slide 51 WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam Relationship Types One-to-One : relationship is single valued in both directions A manager manages one department; a department has only one manager. One-to-Many : relationship is multi-valued in one direction - one row in the parent table is associated with many rows in the dependent table. One department has many employees. Many-to-Many : relationships are multi-valued in both directions. This type of relationship can be expressed in a table with a column for each entity. (crosswalk table) An employee can work on more than one project, and a project can have more than one employee assigned. Employee, Project, and Employee/Project tables. Slide 52 WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam Data Integrity For a table to have Domain Integrity the value of each column of data is meaningful and acceptable in the business environment, and passes all the edits we impose on it. For a table to have Association Integrity the relationship between two or more columns in that table satisfies a pre-defined business association. For a table to have Referential Integrity referential constraints between tables must be enforced at all times by the Relational Database Management System Slide 53 WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam Relational Database Referential Integrity Student *Student_ID Student_Course *Student_ID (FK) *Course_Number *Course_Ind For Referential Integrity - The foreign key must match a value in the primary key of the parent table, at all times. In this example, the Student table has a *Primary Key - Student_ID. The Student_Course table has a 3 column *Primary Key, and also has a Foreign Key (FK) of Student_ID that references the Student table. There must never be a Student_ID in the Student_Course table that does not exist in the Student table first. Slide 54 WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful Islam Database Options Consumer Flat Files Microsoft Excel - Limit of 65,536 Rows Microsoft Access FileMaker Pro MySQL (Open Source) Postgres (Open Source) Enterprise RDMS Oracle IBM/DB2 MS SQL-server Sybase Informix Lotus Notes MySQL (Open Source) Postgres (Open Source) Slide 55