I. Introduction Why database? Data is abundant, everywhere, and prevalent, persistent. Data - Representation of facts or concepts. Raw facts. Information - Knowledge derived from processing the data. Management - - Refers to the organized and strategic handling of information and data within an organization. Discipline that focuses on the proper generation, storage, and retrieval. Information management - - Process by which relevant information is provided to decision-makers in a timely manner (Davis 1997). To provide the right information to the right time person at the right time. Database - Shared, integrated computer structure that stores the collection of the following: Metadata – data about data End user data – raw facts of interest to the end user. File system - Composed of a collection of file folders with proper tags. Useful in data management but become obsolete. Drawbacks of file system 1. 2. 3. 4. 5. 6. 7. Data redundancy and inconsistency. Difficulty accessing data. Data isolation. Integrity problems. Atomicity problems. Concurrent-access anomalies. Security problems. Database system - Collection of interrelated data and a set of programs that allow users to access and modify these data. Components of database system 1. Data 2. Hardware 3. Software a. operating system b. application programs and utilities software c. DBMS 4. Procedure 5. People a. Administrator b. Designer c. End-user Casual Naïve Sophisticated Standalone users Database management system (DBMS) - Collection of programs that manage the database structure and controls access to the data stored in the database. DBMS functions: - Data dictionary management - Data storage management - Data transformation and presentation - Security management - Multi-user access control - Backup and recovery management - Data integrity management - Database access languages and application programming interfaces Database communication interfaces Advantages 1. Improved data sharing. 2. Improved data security. 3. Better data integration. 4. Minimized data inconsistency. 5. Improved data access. 6. Improved decision-making. 7. Increased end-user productivity. Disadvantages 1. Complexity 2. Skilled resources 3. Performance tuning 4. Database failure 5. Cost 6. Additional hardware cost 7. Frequent upgrades II. Introduction Part 2 Data model - Collection of concepts that can be used to describe the structure of a database. Categories of data model High-level or conceptual data models - Close to the way many users perceive data. Use concepts such as entities, attributes, and relationships. Low-level or physical data models - Describe the details of how data is stored on the computer. Schema and instances Schema - Organization of data as a blueprint of how the database is constructed. - Displayed schema is a schema diagram. Instances / Database state / current set of occurrences - Data in the database at a particular moment in time. Level of schema Data independence 1. Logical data independence Capacity to change the conceptual schema without having to change the external schemas or app. 2. Physical data independence Capacity to change the internal schema without having to change the conceptual schema. Due to physical independence, any changes will not affect the conceptual layer. - Using a new storage device - Modifying the file organization technique - Switching to different data structures - Changing access method - Modifying indexes - Changes to compression techniques - Change of location of database Due to logical independence, any changes will not affect the external layer. Add/modify/delete a new attribute, entity or relationship is possible without a rewrite of existing app program. - Merging two records in one - Breaking an existing record into two or more records. DBMS architecture Tier-I (single tier architecture) - Where the client, server, and database all reside on the same machine. Tier-II - An application interface is called ODBC (Open Database Connectivity) an API which allows the client-side program to call the DBMS. Tier-III (three tier architecture) - An extension of the 2-tier architecture with 3 layers: presentation layer, application layer, database server. DBMS languages - - Data Definition Language (DDL) for specifying the database schema. CREATE – create database instance. ALTER – alter structure of database. DROP – drop database instances and objects. TRUNCATE – delete tables. RENAME – rename instances. COMMENT – to comment. Data Manipulation Language (DML) for accessing and manipulating data in a database. SELECT – read records from tables. INSERT – insert records into tables. UPDATE – update data. DELETE – deletes records. - Data Control Language (DCL) granting and revoking access GRANT – grant access to user REVOKE – revoke access from user - Transactional Control Language (TCL) performed or rollback actions. COMMIT – persist the changes made by DML commands. ROLLBACK – rollback changes made. III. Data Manipulation Data modeling - First step in designing a database. - Creating a specific data model for a determined problem domain. Data model - Usually graphical, of more complex realworld data structures. - Represents data structures and their constructs with the purpose of supporting a specific problem domain. Types of data models 1. Flat file model Consists of a single, twodimensional array of like elements. 2. Hierarchical model Data organized into a tree like structure with each record has one parent record and many children. 3. Network model Expands upon the hierarchical structure, allowing many-to-many relationships in a tree-like structure that allows multiple parents. 4. Object-oriented database models Aims to avoid the objectrelational impedance mismatch. 5. Entity-relationship model Describes the structure of a database with the help of a diagram, Entity Relationship Diagram. Entity Relationship Diagram - Shows the complete logical structure of a database. - Best used for the conceptual design of a database. - Based on entities and their attributes and relationships among entities - Entity - Real-world entity having properties called attributes. Every attribute defined by its set of values is called domain. - Relationships - Logical association among entities. - Mapped with entities in various ways. Mapping cardinalities - Define the number of associations between two entities. One to one One to many Many to one Many to many 6. Relational model - Ordering of columns is immaterial in a table, there can’t be multiple tuples or rows in a table, each tuple will contain a single value for each of its attributes. - Contains multiple tables, each like the one in the “flat” database model. Degree of abstraction (Data hiding) - DBMS tries to hide details of how the data is stored and maintained, implementation details of the database and complexity of the database. Degrees of abstraction External model/schema - End-user’s view of the environment. ER representation of this is called external schema. - IV. Conceptual model/schema Represents global view of the database by the organization. Basis for the identification and highlevel description of the main data objects. Internal model/schema Representation “seen” by the DBMS. Depicts specific representation of an internal model, using database construct. Physical model/schema Lowest level of abstraction, describing the way data is saved on the storage. Data Model XML Types of data structure Extensible Markup Language (XML) - Way to structure and store data in a format readable by human and machine - Allows you to structure and organize data in a hierarchical manner. Elements - Fundamental building blocks of an XML document. - Enclosed in angle brackets (<>). Attributes - Elements can provide additional information about the element. - Specified like: <book title=”ABC” author=” None”/> Text - Provides the actual data. <name>john doe</name> What is a document schema in XML? - A document schema like a blueprint or set of rules that defines the structure, elements, and data types. - Acts as a guide to ensure XML documents conform to specific format or structure. Key components of a document schema 1. Elements – building blocks of an XML. Represent different pieces of data. 2. Attributes – provide additional information about elements. Properties or characteristics of an element. 3. Data type – XML can specify the data type that an element or attribute can contain. Includes text, number, dates. 4. Hierarchical structure – defines how elements can be nested within each other, creating a tree-like structure. Determines the order of the relationship between elements. Document schema <complexType> - Element that defines a complex type - An XML element that contains other elements and/or other attributes. - V. Sequence Specifies that the child elements must appear in a sequence. Any child elements can occur from 0 to any number of times. Relational database model Logical view Relational model - Represents the database as a collection of relations. - Relation is nothing but a table of values. What are DBMS keys? - An attribute or set of attributes which helps you uniquely identify a record or row of data in a relation (table). Super key - A group of single or multiple keys which identifies rows in a table. - Can be used to identify row of data in a table. Candidate - Set of attributes that uniquely identify tuples in a table. - A super key with no repeated attributes. Primary - Column or group of columns in a table that uniquely identify every row in that table. - Two rows can’t have the same primary key value, cannot be null, never be modified or updated. Foreign - Column that creates a relationship between two tables. - The purpose is to maintain data integrity and allow navigation between two different instances of an entity. Composite - Combination of two or more columns that uniquely identify a record. Integrity rules - Overall completeness, accuracy, and consistency of data. Integrity constraints Entity integrity - Primary key value cannot be null. - PRI is used to identify individual rows in a relation. Domain integrity - Definition of a valid set of values for an attribute. - The value of the attribute must be available in the corresponding domain. Referential integrity - Specified between two tables. It ensures that the values for a set of attributes in one relation must also appear the same. Relational set operators - Data in relational tables are of limited value unless the data can be manipulated to generate useful information. - Properties of closure – the use of relational algebra operators on existing relations (tables) produces new relations. 1. Select Also known as RESTRICT, yields values for all rows found in a table that satisfy a given condition. 2. Project Yields all values for selected attributes. Yields a vertical subset of a table. 3. Union Combines all rows from two tables, excluding duplicate rows. Columns and Domains must be compatible to be used in the union. 4. Intersect Yields only the rows that appear in both tables. 5. Difference Yields all the rows in one table that are not found in the other table; it subtracts one table from the other. 6. Product Yields all possible pairs of rows from two tables. Also known as Cartesian product. 7. Join Allows the information to be combined from two or more tables. a. Inner join Includes only those tuples with matching attributes and the rest are discarded in the resulting relation. b. Outer join Include all the tuples from the participating relations in the resulting relation. i. Left outer join (R) All tuples from the LEFT relation are included in the resulting relation. ii. Right outer join (S) all the tuples from the RIGHT relation are included in the resulting relation. iii. Full outer join (R and S) All the tuples from the RIGHT relation are included in the resulting relation, if there are tuples in right without any matching tuple with the left, the R-attributes resulting relation are made NULL. Data dictionary - Provides detailed description of all tables found within the user/designercreated database. - Contains metadata – data about data. Relationships within relational database One-to-one (1:1) one-to-many (1:M) many-to-many (M:M) VI. Functional dependency - Relationship that exists between two attributes. Typically exists between the primary key and non-key attribute within a table. Terms: Decomposition – rule that suggests if you have a table that appears to contain two entities that are determined by the PK, consider breaking them up into two tables. Dependent – right side of functional dependency diagram Determinant - left side of functional dependency diagram Functional dependency – relationship between two attributes, typically between the PK and other non-key Non-normalized table – a table that has data redundancy in it. Union – rule that suggests that if two tables are separate, the PK is the same, consider putting them together. Rules of functional dependencies Multivalued dependency - Occurs in the situation where there are multiple independent multivalued attributes in a single table. Multivalued dependency - Complete constraint between two sets of attributes in a relation. Trivial functional dependency - Set of attributes which are called trivial if the set of attributes are included in that attribute. - X -> B where B is a subset of A. Non-trivial functional dependency - Occurs when A -> B holds true where B is not a subset of A. Transitive dependency - A type of functional dependency which happens when “t” is indirectly formed by two functional dependencies. Advantages of functional dependency Avoids data redundancy. - Help maintain quality of data. - Helps define meanings and constraints. - Help identify bad design. - Help finding the facts regarding design. VII. - Normalization Process for evaluating and correcting table structures to minimize data redundancies, reducing data anomalies. Anomalies in DBMS 1. Insertion anomalies Makes the repetition of several data. 2. Deletion anomalies Remove some needed data in a table. 3. Update anomalies If you miss updating every single data. First normal form (1NF) - An attribute (column) of a table cannot hold multiple values. Rule: 1. Each col should contain atomic values. 2. A col should contain values that are of the same type. 3. Each col should have a unique name. 4. The order in which data is saved doesn’t matter. Second normal form (2NF) Two rules for 2NF 1. The table must be in 1NF. 2. The table must not have partial dependency. Partial dependency - Occurs when a non-prime attribute is functionally dependent on part of a composite key. Foreign key - Ensures rows in one table correspond rows in another. Third normal form (3NF) Two rules for 3NF 1. The table must be in 2NF. 2. The table must not have transitive dependency. transitive dependency - Attribute is dependent on an attribute that is not part of primary. Transitive functional dependency - When changing a non-key column, might cause any of other non-key to change. CODE CHEAT SHEET CREATE DB create database databaseName; DROP DB drop database databaseName; USE DB use databaseName; DROP tables inside DB drop table tableName; CREATE table create table tableName( id int unassigned not null auto_increment, first_col varchar(255) not null, second_col varchar(255) not null, third_col varchar(255) not null, primary key (id)); CREATE table with foreign key create table tableName( id int(11) primary key, foreignId int(11), first_col varchar(255), Foreign key(foreginId) references tableWhereForegin(foreignId)); SHOW table show tables; INSPECT the table schema desc tableName; ALTER contents in the table alter table tableName modify first_col varchar(255) not null; RENAME table name rename table tableName to newTableName; SHOW table contents select * from tableName; ADD column in table alter table tableName add new_col not null [ first | (and) after col_name]; DROP column in table alter table tableName drop column col_name; RENAME column in table alter table tableName change column old_name new_name not null [first | (and) after col_name]; DELETE ROW delete from <table> where <column='element'>; CREATE a primary key using alter table alter table tableName add constraint tableName_pk primary key (id); DROP primary key alter table tableName drop primary key; ADD foreign key alter table tableName add constraint fk_foreign_id foreign key (foreign_id_on_this_table) references tableWhereForeign(foreign_id); CASCADING STEP 1: show create table tableName; //before STEP 2: alter table tableName add constraint fk_foreign_id foreign key (foreign_id_on_this_table) references tableWhereForeign(foreign_id); on delete cascade on update restrict; STEP 3: show create table tableName; //after BACKUP STEP 1: exit STEP 2: mysqldump -u root -p databaseName > E:\folderDestination\databaseBackup.sql USE BACKUP mysql -u root -p databaseName < E:\folderDestination\databaseBackup.sql