Slides - The University of Tulsa

advertisement
Chapter 3 and Module C
DATABASES AND DATA WAREHOUSES
Supporting the Analytics-Driven Organization
Opening Case:
The Digitization of Content
In 2010, more than half of all music was in digital form; physical music
will likely never again be the majority. What else can be digitized?
Pictures, movies, books.
What about education? Can a course be digitized and canned?
INTRODUCTION

Business intelligence (BI)
◦ Knowledge about your customers, competitors,
business partners, environment, and internal
operations to make effective, important, and
strategic business decisions

Analytics
◦ Fact-based decision-making
◦ Integrated use of IT and statistical techniques to
create BI. E.g., If I run a coffee shop and most of
my customers are between 18 and 30, and mostly
male, then what can I do with this information?
THE RELATIONAL DATABASE
MODEL
There are many types of databases
 The relational database model is the most
popular. Relation = Table.


Relational database
Database Characteristics
1.
2.
3.
4.
Collections of information
Created with logical structures
Include logical ties within the information
Include built-in integrity constraints
2. Database – Logical Structure
Advisor
Advisor ID
Character
 Field
 Record
 File (Table)
 Database
 Data
Warehouse

Class
ALastName AFirstName
101
Leonard
Lori
Class
Synonym
102
Aurigemma
Sal
10342
MIS
3003
3
103
Bajaj
Akhilesh
10344
MIS
1123
2
104
Platner
Steve
10359
MIS
4133
2
105
McCrary
Mike
10450
MIS
1123
1
10578
MIS
2013
3
10643
MIS
4053
1
Class Prefix
Class No
Class Section
Student-Class
Student
Student ID SLastName SFirstName Advisor ID
1011
Berry
Jeff
101
1012
Smith
Tom
103
1013
Sanders
Tally
101
1014
Anderson
Cindy
103
1015
Whitman
Amy
102
1016
Jones
Kelsi
105
1017
Phillips
Susan
104
Student ID
Class Synonym
1011
10342
1011
10643
1013
10578
1014
10342
1014
10359
1014
10450
1015
10578
1016
10342
1017
10344
1017
10450
Logical Structure: Character
Advisor
Advisor ID
Character
 Field
 Record
 File (Table)
 Database
 Data
Warehouse

Class
ALastName AFirstName
101
Leonard
Lori
Class
Synonym
102
Aurigemma
Sal
10342
MIS
3003
3
103
Bajaj
Akhilesh
10344
MIS
1123
2
104
Platner
Steve
10359
MIS
4133
2
105
McCrary
Mike
10450
MIS
1123
1
10578
MIS
2013
3
10643
MIS
4053
1
Class Prefix
Class No
Class Section
Student-Class
Student
Student ID SLastName SFirstName Advisor ID
1011
Berry
Jeff
101
1012
Smith
Tom
103
1013
Sanders
Tally
101
1014
Anderson
Cindy
103
1015
Whitman
Amy
102
1016
Jones
Kelsi
105
1017
Phillips
Susan
104
Student ID
Class Synonym
1011
10342
1011
10643
1013
10578
1014
10342
1014
10359
1014
10450
1015
10578
1016
10342
1017
10344
1017
10450
Logical Structure: Field
Advisor
Advisor ID
Character
 Field
 Record
 File (Table)
 Database
 Data
Warehouse

Class
ALastName AFirstName
101
Leonard
Lori
Class
Synonym
102
Aurigemma
Sal
10342
MIS
3003
3
103
Bajaj
Akhilesh
10344
MIS
1123
2
104
Platner
Steve
10359
MIS
4133
2
105
McCrary
Mike
10450
MIS
1123
1
10578
MIS
2013
3
10643
MIS
4053
1
Class Prefix
Class No
Class Section
Student-Class
Student
Student ID SLastName SFirstName Advisor ID
1011
Berry
Jeff
101
1012
Smith
Tom
103
1013
Sanders
Tally
101
1014
Anderson
Cindy
103
1015
Whitman
Amy
102
1016
Jones
Kelsi
105
1017
Phillips
Susan
104
Student ID
Class Synonym
1011
10342
1011
10643
1013
10578
1014
10342
1014
10359
1014
10450
1015
10578
1016
10342
1017
10344
1017
10450
Logical Structure: Record
Advisor
Advisor ID
Character
 Field
 Record
 File (Table)
 Database
 Data
Warehouse

Class
ALastName AFirstName
101
Leonard
Lori
Class
Synonym
102
Aurigemma
Sal
10342
MIS
3003
3
103
Bajaj
Akhilesh
10344
MIS
1123
2
104
Platner
Steve
10359
MIS
4133
2
105
McCrary
Mike
10450
MIS
1123
1
10578
MIS
2013
3
10643
MIS
4053
1
Class Prefix
Class No
Class Section
Student-Class
Student
Student ID SLastName SFirstName Advisor ID
1011
Berry
Jeff
101
1012
Smith
Tom
103
1013
Sanders
Tally
101
1014
Anderson
Cindy
103
1015
Whitman
Amy
102
1016
Jones
Kelsi
105
1017
Phillips
Susan
104
Student ID
Class Synonym
1011
10342
1011
10643
1013
10578
1014
10342
1014
10359
1014
10450
1015
10578
1016
10342
1017
10344
1017
10450
Logical Structure: File
Advisor
Advisor ID
Character
 Field
 Record
 File (Table)
 Database
 Data
Warehouse

Class
ALastName AFirstName
101
Leonard
Lori
Class
Synonym
102
Aurigemma
Sal
10342
MIS
3003
3
103
Bajaj
Akhilesh
10344
MIS
1123
2
104
Platner
Steve
10359
MIS
4133
2
105
McCrary
Mike
10450
MIS
1123
1
10578
MIS
2013
3
10643
MIS
4053
1
Class Prefix
Class No
Class Section
Student-Class
Student
Student ID SLastName SFirstName Advisor ID
1011
Berry
Jeff
101
1012
Smith
Tom
103
1013
Sanders
Tally
101
1014
Anderson
Cindy
103
1015
Whitman
Amy
102
1016
Jones
Kelsi
105
1017
Phillips
Susan
104
Student ID
Class Synonym
1011
10342
1011
10643
1013
10578
1014
10342
1014
10359
1014
10450
1015
10578
1016
10342
1017
10344
1017
10450
Logical Structure: Database
Advisor
Advisor ID
Character
 Field
 Record
 File (Table)
 Database
 Data
Warehouse

Class
ALastName AFirstName
101
Leonard
Lori
Class
Synonym
102
Aurigemma
Sal
10342
MIS
3003
3
103
Bajaj
Akhilesh
10344
MIS
1123
2
104
Platner
Steve
10359
MIS
4133
2
105
McCrary
Mike
10450
MIS
1123
1
10578
MIS
2013
3
10643
MIS
4053
1
Class Prefix
Class No
Class Section
Student-Class
Student
Student ID SLastName SFirstName Advisor ID
1011
Berry
Jeff
101
1012
Smith
Tom
103
1013
Sanders
Tally
101
1014
Anderson
Cindy
103
1015
Whitman
Amy
102
1016
Jones
Kelsi
105
1017
Phillips
Susan
104
Student ID
Class Synonym
1011
10342
1011
10643
1013
10578
1014
10342
1014
10359
1014
10450
1015
10578
1016
10342
1017
10344
1017
10450
Database – Physical Structure
Database tables are stored in the operating
system as files, but we don’t worry about the files,
because when we open the database, we see the
tables. Providing this table centric view to us is
the job of the DBMS (Database Management
system).
 Common examples of DBMSs are:
MS Access, Libre Office Base, MYSQL, Maria DB,
Oracle, MS SQL Server

Databases –Logical Structures
Databases have many tables
 In databases, the row number is irrelevant; not
true in spreadsheet software
 In databases, column names are very important.
Column names are created in the data
dictionary

Database –Logical Structures

Data dictionary – contains the logical structure
for the information in a database
Before you can enter information
into a database, you must define
the data dictionary for all the
tables and their fields. For
example, when you create the
Truck table, you must specify that
it will have three pieces of
information and that Date of
Purchase is a field in Date
format.
3. Databases - Logical Ties Within
the Information
Logical ties must exist between the tables or
files in a database
 Logical ties are created with primary and
foreign keys
 Primary key (PK)


Foreign key (FK)
Database – Logical Ties within the
Information
Customer Number is the
primary key for Customer and
it also appears in Order as a
foreign key. Foreign key
means that the value MUST
exist in the customer table first,
before it can exist in the order
file.
Separate Tables that Link Tables
Example: If an order can have many customers,
and a customer can be linked to many orders,
then how do we capture the link to show which
customers are on an order?
 Can we add multiple columns to the orders
table, one for each customer id? Or can we add
multiple columns to the customers table, one
for each order?
 OR: create a new table with customer id and
order id? Call it CustomerOrders? What is the
primary key for this table?
 What about order date? Customer feedback for
that order?

4. Databases – Built-In Integrity
Constraints

Integrity constraints – rules that help ensure
the quality of the information
◦ Primary keys: Value must be unique in the main table
◦ Foreign key: Value must already exist in the main table
◦ Column constraints: Sales price cannot be negative,
Phone number must have area code
◦ PK & FK constraints and many column constraints can
be created at time of building the tables, one time and
then all data that is put in is checked first for violations.
Steps in Developing a Database
Step 1: Decide on the tables, columns, column types,
primary keys and foreign keys.
 Step 2: Use A Data Definition Language To Create Your
Database

Prior to developing the database, we can plan out the
design, using a diagram called an Entity Relationship
diagram. In this, we look at the business requirements
and try to list the entities (objects or events) and the
links between them (relationships).
 Then we create tables, with foreign keys and primary
keys for each entity and relationship, except that some
relationships do not get their own table. E.g., if there was
one customer per order, then we just add customer id in
the orders table.

Fun In class Project

University Database

Objects: Courses, Course sections, Professors, Students (graduate and undergraduate)
, Classrooms, buildings













Example
Example
Example
Example
Example
Example
course numbers: MIS3053, MIS4233, MIS3023.
course section identifiers: MIS3053Fall2008A, MIS4233Fall2008A
StudentID: 0918512
FacultyID: 0918452
BuildingID: HELM, OLIP
classroom ID: HELM316
Events: A student takes a course section and gets a grade, a professor teaches a
course section and gets a rating for that course section, a graduate student may TA a
Course section, and also get a rating for it.
Example grade: ‘A’
Example Professor rating: “Excellent’
Example GA rating: ‘Excellent’
Design tables so that information is not duplicated and is properly linked.
Show primary and foreign keys.
Do this in 2 stages: 1. Table names, columns, PKs and links.
2. Table names, columns, PKs and FKs.
DATABASE MANAGEMENT
SYSTEM TOOLS
5 Components of a DBMS
DBMS engine: Software that talks to Operating system and
shows us the tables interface
2. Data definition subsystem: allows us to create the tables, and
define PK, FK and other constraints
3. Data manipulation subsystem: Ask questions of existing
information
1.




Views
Report generators
QBE tools
SQL
Application generation subsystem: Create screens that link to
tables, so users can input and see informations
5. Data administration subsystem: Create users, grant and
revoke privileges.
4.
View

View – allows you to see the contents of a database
file, make changes, and query it to find information
Report Generator

Report generator – helps
you quickly define formats
of reports and what
information you want to
see in a report
Query-by-Example Tool

QBE tool – helps you graphically design the
answer to a question
Structured Query Language
 SQL
– standardized fourth-generation query
language found in most DBMSs
 Sentence-structure equivalent to QBE
 Mostly used by IT professionals
 Non-procedural language, which makes it
different from other programming languages
OLTP, OLAP, and Business Intelligence
Data Processing

Online transaction processing (OLTP)
◦ The gathering and processing transaction information, and
updating existing information to reflect the transaction
 Databases support OLTP
 Operational database – databases that support OLTP and some limited
OLAP
 Day to day transactions are recorded. Individual transactions are recorded.

Online analytical processing (OLAP)
◦ The manipulation of information to support decision making
 Databases can support some OLAP
 Data warehouses only support OLAP, not OLTP
 Data warehouses are special forms of databases that support decision
making and help build BI. They have summarized information. Like all sales
for each product line in each store in each day.
DATA WAREHOUSES AND DATA
MINING
Data warehouses support OLAP and decision
making
 Data warehouses do not support OLTP

Data warehouse
 Data mart
 Data-mining

Data Warehouse Example
According to customers who are
female between 30-45,
what percentage of sales for
cameras occurred after
radio advertising
in the North Territory?
Data Mart Example
Data-Mining Tools
Data Mining:
https://www.youtube.com/watch?v=f2Kji24833Y
Digital Dashboard:
https://www.youtube.com/watch?v=h9BUlaTlHCE
Data Warehouse Considerations
 Do
you really need one, or does your database
environment support all your functions?
 Do all employees need a big data warehouse
or a smaller data mart?
 How up-to-date must the information be?
 What data-mining tools do you need?
INFORMATION OWNERSHIP
 Information
is a resource you must manage and
organize to help the organization meet its goals
and objectives
 You need to consider
◦ Strategic management support
◦ Sharing information with responsibility
◦ Information cleanliness
Strategic Management Support
Data administration – function that plans
for, oversees the development of, and
monitors the actual data/information. It sets
policy.
• Database administration – function
responsible for the more technical and
operational aspects of managing the DBMS
platform and the database application. It
executes policy.
•
Sharing Information
 Everyone
can share – while not consuming –
information
 But someone must “own” it by accepting
responsibility for its quality and accuracy
Information Cleanliness
 Related
to ownership and responsibility for
quality and accuracy
 No duplicate information
 No redundant records with slightly different
data, such as the spelling of a customer name
 GIGO – if you have garbage information you
get garbage information for decision making
Download