DATABASE ESSENTIALS I: RDBMS BCIS 4660 Obi Ogbanufe, Ph.D “If we have data, let’s look at data. If all we have is opinions, lets go with mine.” Jim Barksdale Former Netscape CEO 2 OBJECTIVES ▪ Understand databases ▪ Understand relational database management systems (RDBMS) ▪ Understand databases and database objects (tables, views, indexes etc.) ▪ Understand database guidelines and standards COURSE OVERVIEW Data Warehouse Overview Operational Data Data Warehouse Data Extraction and Integration Business Intelligence Applications 4 COURSE PLAN Intro to Data Warehousing DW and BI Database Fundamentals DW Design ETL Introduction to DW Database Essentials Dimensional Modeling Design ETL Business Intelligence T-SQL DW Design Process Manage ETL Business Intelligence w/DW BI Application Advanced DW Cloud DW ER Modeling 5 WHAT IS A DATABASE? ▪ A database is a electronic collection of data that is organized or structured in a specific format. 6 DATABASE MANAGEMENT SYSTEM A database system is an information or computer system that manages the collection, storage, retrieval of data ▪ Organizations’ data must be stored and managed ▪ The data should be available to users that need it ▪ The management of the data needs to be automated 7 Class discussions and activities Can you think of other instances that require the collection, management, and automation of data for a businesses? DATABASE MANAGEMENT SYSTEM ▪ Simply put, a database management system (DBMS) is a software system that manages data ▪ It is a program that manages the storage, update, and retrieval of data ▪ It manages how users interact to add, update, and delete data ▪ It provides and manages the interface between the stored data and the users ▪ It ensures that the data is consistent, available, and accessible to users or other programs 9 RELATIONAL DATABASE MANAGEMENT SYSTEM ▪ A relational database is a database structure that allows database objects to have relationships with other objects in the database Students ▪ A DBMS that manages relational objects is called relational database management system (RDBMS) ▪ Data in a relational databases is stored in tables Courses ▪ Data manipulation in the RDBMS uses Structured Query Language (SQL ) 10 RELATIONAL DATABASE: TABLE ELEMENTS ▪ A table is two-dimensional structure that represents the connection between a row and a column o Row: record, tuple o Column: attribute, field, variable ▪ Each row in a table should have the same number of columns ▪ A relationship is made between the tables (entities) when there is a common column (attribute) in both tables Table Row Column Relationship 11 A RELATIONAL DATABASE Columns (Fields) Primary Key Rows (Records) StudentID FirstName LastName DOB 889900 LaTonya Baker 10/12/2000 997766 Michael Caine 06/06/2001 334455 Quyhn Tran 05/20/1999 772255 Terry Ostermeier 02/08/1998 009277 Chike Ogumike 01/29/2004 115566 August Rush 12/25/2002 Relationships Primary Key CourseID CourseName StudentID 00012345 BCIS4600 889900 00012345 BCIS4600 997766 00012345 BCIS4600 334455 12 RDBMS & TRANSACTIONS ▪ A RDBMS must handle transactions in the database ▪ A RDBMS in the organization must ensure that multiple users work concurrently without overwriting each others work or corrupting the data (multi-user database) ▪ MS SQL Server is a multi-user RDBMS ▪ MS Access is not multi-user RDBMS 14 RDBMS & TRANSACTIONS ▪ A transaction is an atomic unit of work that contains one or more SQL statements ▪ An atomic unit of work must be successfully completed (committed) as a unit or undone (rolled back) as a unit ▪ $100 funds withdrawal from an ATM could involve a transaction with 3 operations Withdraw $100 Decrease the savings Provide the funds Record the transaction in database ▪ An RDBMS ensures that all three operations complete successfully. Otherwise, all 3 operations are rolled back ▪ An RDBMS transaction should be “All or Nothing”. We all succeed or we all fail (Atomicity, Consistency, Isolation, Durability) 15 WHAT IS ACID? A RDBMS processes transactions using the ACID property. ACID is the RDBMS property that ensures the integrity of transactions ▪ Atomicity: All operations in a transaction are performed or none is performed. There is no partial transaction ▪ Consistency: The transaction should always keep the database in consistent state ▪ Isolation: The effect of a transaction should not be visible to other transactions until the transaction is complete and committed ▪ Durability: Changes made by committed transactions are permanent 16 RELATIONAL DATABASE MANAGEMENT SYSTEM A relational database consists of the following: ▪ Structure: Defined database objects used for storing and accessing the database ▪ Operations: Defined actions that allow users to manipulate the data and the data structures ▪ Rules: Rules that govern the operations performed on the data and data structure Structure Operations Rules 17 Database Objects Tables, Views, Index 18 DATABASE STRUCTURES (OBJECTS) ▪ Database objects are “objects” in the database that are used to store, view, and retrieve data ▪ There are many database objects. The most frequently used are: Tables, Views, Indexes, and stored procedures Table View Index Stored Procedure 19 DATABASE OBJECTS (TABLES) ▪ Tables are the most important objects in an RDBMS ▪ Tables store database data (in rows and columns) ▪ Tables are also called entities ▪ An entity can be a person, place, object or event. oEach entity (e.g., students, grades, courses) requires data related to that entity to be stored and managed. o Tables: Student, Grades, Courses Table 20 DATABASE OBJECTS (VIEWS) ▪ Views are virtual tables (a.k.a stored queries) ▪ Views do not store data ▪ Views create a layer of abstraction between the table and the user View ▪ It allows users to access the data without fear of changing the underlying tables ▪ Views can be used as a security measure. Users can access the data in tables through views without being granted permission to the table structures 21 DATABASE OBJECTS (INDEXES) ▪ An index is a database structure that helps improve performance and speed during data retrieval ▪ An index improves database performance by allowing the database engine to access and retrieve data quickly (think of a book index) Index ▪ Indexes are typically added to columns that are used frequently in the WHERE and ORDER BY clauses 22 INDEXES ▪ Poorly designed indexes and/or lack of indexes could cause database performance problems ▪ An index is stored on-disk or in-memory and associated with a table for speeding up the retrieval of rows from the table ▪ The design of indexes depend on the database type: OLTP (Write) or OLAP (Read) Index ▪ There are 3 main types of indexes: Clustered, Nonclustered, and Unique Indexes 23 COMMON INDEX TYPES ▪ A clustered index sorts and stores the data rows of the table in order based on the clustered index key. Uniqueness is a property of clustered indexes ▪ A nonclustered index can be defined on a table and the data rows are not in any particular order. Uniqueness is a property of nonclustered indexes 24 DATABASE OBJECTS (INDEXES) Think about an ordered table of BusinessEntityID and row positions. If the objective is to quickly retrieve a number of rows. An index can help minimize the number of rows that the database has to examine in order to retrieve specified rows. 25 Database Operations DDL, DML and DCL 26 DATABASE OPERATIONS ▪ Almost all operations performed on the RDBMS are done using SQL statements ▪ SQL stands for Structured Query Language ▪ A SQL (pronounced sequel) statement is a program instruction that allows users and programs to access data in the database. SQL consists of identifiers, parameters, variables, names, data types etc. ▪ Three main types of SQL statements: Data Definition Language (DDL) Commands that define a database, including creating, altering, and dropping tables and establishing constraints Data Control Language (DCL) Data Manipulation Language (DML) Commands that maintain and query a database. Commands that control a database, including administering privileges and committing data DATA DEFINITION LANGUAGE (DDL) ▪ DDL statements allow users to create, alter, and drop objects and other database structures, including the database itself ▪ Most DDL statements start with keywords: CREATE, ALTER, DROP CREATE TABLE: creates a new table structure/definition DROP TABLE: drops the table and deletes all data ALTER TABLE: edits the structure/definition of table DDL Alter Table Create Index Drop Index Create View Drop View Create Schema DATA DEFINITION LANGUAGE (DDL) CREATE TABLE DROP TABLE IF EXISTS Employee; CREATE TABLE Employee ( EmployeeID int IDENTITY(1,1) NOT NULL PRIMARY KEY, FirstName char (30), LastName char (30), EmailAddress char (50), JobID int , HireDate date); INSERT INTO Employee VALUES ('Ben', 'Aller', 'Ben.Aller@nocompany.com', 1115, '09/02/2020'), ('Kenneth', 'Onye', 'Ken.Onye@nocompany.com', 1123, '07/12/2020'); DATA MANIPULATION LANGUAGE (DML) ▪ DML statements query or manipulate data (content) in existing database objects ▪ Most DML statements start with the keywords SELECT, INSERT, UPDATE ▪ DML statements are the more commonly used SQL statements o Retrieve (SELECT) data from tables or views o Add (INSERT) and remove (DELETE) rows of data tables or views o Change (UPDATE) column values in existing records in tables of views DML SELECT INSERT UPDATE DELETE DATA MANIPULATION LANGUAGE (DML) SELECT * FROM employees INSERT INTO employee (LastName, FirstName, EmailAddress, Jobid, Hiredate) VALUES ('Shreya', 'Mackenzie', 'Mackenzie.Shreya@bcis.edu', 1234, '14-FEB-2008') UPDATE employee SET FirstName =Millie' WHERE Jobid = 1234 DELETE FROM employee WHERE Jobid = 1234 DATA CONTROL LANGUAGE (DCL) ▪ DCL statements allows the user or program to control the database system, granting, revoking permissions, or administering privileges to the database system. ▪ DCL is sometimes used interchangeably with Transaction Control Language (TCL) ▪ DCL/TCL manage changes made by DML statements. ▪ DCLs are used for grouping DML statements together as a unit of transaction DCL Grant Revoke Commit Rollback DATA CONTROL LANGUAGE (DCL) ▪ COMMIT: Make a transaction change permanent ▪ ROLLBACK: Undo a transaction change ▪ GRANT: Grants permissions on a table, view, stored procedure, etc. ▪ REVOKE: Removes a previously granted or denied permission. DCL Grant Revoke Commit Rollback Rules Constraints CONSTRAINTS A constraint is a rule placed on a table or column that restricts operations and data values allowed ▪ Enforces rules at the table level ▪ Enforces integrity ▪ Prevents deletion of tables, if dependencies exist ▪ Can be defined during or after table creation TABLE CONSTRAINTS There are 5 major constraints that can be created: 1. Not Null (NN) 2. Unique (U) 3. Primary Key (PK) 4. Foreign Key (FK) 5. Check (CHK) CONSTRAINTS (COMMONLY USED) ▪ Primary Key (PK) – This constraint ensures that the column is the PK for the table and will not have duplicate values ▪ Foreign Key (FK) – This defines a column as a foreign key (reference). It references to the primary key column of another table. Ensures referential integrity o Prevents orphaned record: A record whose foreign key points to (or references) a nonexistent primary key value ▪ Unique (U) – Enforces unique constraint on a column. This means that there can be no duplicate values for this column of data ▪ Not Null (NN) – This constraint is an unknown value. It restricts a column from being an unknown value. Please note that a NULL value is not blank, zero, empty. Difference between Primary Key and Unique Key • PK does not allow NULL |Unique key can allow a NULL • Table has only 1 PK | Tables can include multiple Unique Keys REFERENTIAL INTEGRITY CONSTRAINT ▪ Referential integrity constraint - Regulates the relationship between a table with a foreign key ▪ Ensures that the value of the foreign key matches one of the values in the primary key column of the other table EmployeeID LastName DepartmentID (FK) DepartmentID DepartmentName EmployeeID LastName DepartmentID DepartmentID DepartmentName 1110 Ken 101 101 Technology 1128 Lanre 101 101 Technology 1139 Abigail 214 214 Accounting Database Guidelines and Standards Naming Conventions DATABASE GUIDELINES AND STANDARDS Naming conventions Consistency in database objects naming conventions and abbreviations ▪ Allow users to easily identify objects ▪ Allow ease of database administration ▪ Database naming standards are often developed in conjunction with all users (DB Admin, DB Engineers, Data Analysts, Business Analysts etc.) ▪ Database naming standards apply to all users of the database (database administrator, database developer etc.) DATABASE GUIDELINES AND STANDARDS- EXAMPLES No spaces in table/field names Table names should match PrimaryKey name Use camelCase or PascalCase table names and field names camelCase: First word starts with lowercase with no spaces PascalCase: Each word starts with an uppercase with no spaces Create Database camelOrigin Create Database PascalOrigin GENERAL RULES FOR IDENTIFIERS (DATABASE OBJECTS) The name of each database object (e.g., database, table, view,) is referred to as its identifier Rules for identifiers in RDBMS (SQL Server) The first character must be one of these: A letter, underscore, the at sign @, or number sign # Subsequent characters can have: Letters, decimal number, the at sign @, dollar $, number sign, or underscore The identifier must not be a T-SQL reserved word (both upper or lowercase) Embedded spaces or special characters are not allowed 43 SUMMARY Learned about databases Learned about relational database management systems (RDBMS) Learned about database objects (tables, views, indexes etc.) Learned some database guidelines, standards, and naming conventions