The Tablet Technology: Practical & Theoretical Applications

advertisement
Data, Information, and
Databases
BDIS 6.1
BSAD 141
Dave Novak
Topics Covered
Information types: transactional –vsanalytical
 Five characteristics of information quality
 Database versus a DBMS
 RDBMS: advantages and terminology
 Multi-user issues

The Need for High-Quality
Information

Data are everywhere

Which data are important?

Which data should the organization store?

Which data need to be further
manipulated?

Which data are required to make different
types of decisions?

How does the organization convert various
data into information that is needed?
The Need for High-Quality
Information

Recall difference between data and
information from Lecture #1
The Need for High-Quality
Information

The need to obtain and analyze the many
different levels, formats, and granularities
of organizational information to make
decisions
The Need for High-Quality
Information

CRITICAL TO REMEMBER!

Decisions are only as good as the quality
of the data and information that are used
to make the decisions…

Crap in  Crap out
• Using technology to help you make a
decision using poor quality data doesn’t help
Data Quality Problems
Example of Low Quality Data
Characteristics of High
Quality Information
1) Accurate
 2) Complete
 3) Consistent
 4) Unique
 5) Timely

1) Accurate
Are the data (is the information) correct,
precise, and exact?
 For example:

Are the data factual?
 Are values error-free?
 Have data been verified?

• Correct spelling
• Precise numbers
2) Complete
Are the data whole (complete) and do
they have all the necessary parts?
 For example


Are there missing values or pieces of
data?
•
•
•
•
Full street address
Area code along with phone number
Empty fields
Full Names
3) Consistent
Are the data are in agreement with
themselves and with known facts?
 For example


Does summary information agree with
detailed information? Can you reconcile
the data?
• Do mathematical manipulations yield correct
results?
• Are data manipulations performed
consistently for the entire data set?
4) Unique

Are the data unique (one of a kind) or are
there redundant, repetitious or
unnecessary data stored in the same
database?

For example:
• Are there duplicate records for the same
“event”?
• Are there different versions of “the same” file
or event (which is the latest or most
accurate?)
5) Timely
Are the data current with respect to
decision-making needs?
 Timeliness depends on the situation

Real-time information – Immediate,
up-to-date information
 Real-time system – Provides realtime information in response to
requests
 “Real-time” is a relative description
that depends on the use or need

How can data be of “poor”
quality?
Customers intentionally enter
inaccurate information to protect their
privacy or because they are irritated
 Different data entry standards and
formats
 Operators enter abbreviated or
erroneous information by accident or
to save time
 Third party and external information
contains inconsistencies, inaccuracies,
and errors

What is a Database?

Database – a collection of information
organized in a way that provides
efficient retrieval

There are electronic and physical
databases (paper/print)

A database can be a very simple
collection of data such as
alphabetically arranging names in an
address book
What is a Database
Management System (DBMS)?

Database management systems (DBMS) –
A set of computer programs / software that
allow users to store, modify, query, and
retrieve data in a systematic and controlled
manner
Database Management
System (DBMS)

A database (the physical collection of data) is
typically not portable across different DBMS


Like application software, different DBMS are
generally designed to work with specific
system software and specific database
schema
A database is typically something inside the
DBMS, although in the case of a MS Excel
workbook the database is a standalone object
Database Management
System (DBMS)
A very popular and common DBMS is the
relational DBMS (RDBMS)
 A standard program and user interface is the
Structure Query Language (SQL)

A programming language used to create,
modify, and retrieve information from a
database
 Different databases use different (proprietary)
variations to standard SQL

Database Management
System (DBMS)

According to the following source (which I did
not verify with the Gartner report) the top five
commercial RDBMS vendors in 2011 were:

http://itknowledgeexchange.techtarget.com/eye-on-oracle/oracle-the-clear-leader-in-24-billionrdbms-market/
Oracle (≈ 50% market share)
 IBM (≈ 20% market share)
 Microsoft (≈ 17% market share)
 SAP
 Teradata

Database Management
System (DBMS)
Oracle: Oracle Database and MySQL
 IBM: DB2 and Informix
 Microsoft: SQL Server
 SAP: Sybase Enterprise and Sybase IQ
 Teradata

Single File Data Management

MS Excel is a database, but it is not a
DBMS!

Each worksheet is a single large twodimensional matrix
A database is simply an organized
collection of data that can be accessed
 A DBMS is software that is used to
manage the database and provides a
set of tools used to manipulate and
query data

Relational Database
Management System (RDBMS)

Data are organized as a set of formal tables

Data can be accessed and combined in
different ways without reorganizing the data
within the tables
•

Data can be manipulated in different ways and
combined with data in other tables without
altering the original data in the table
RDBMS can be easily extended / scaled –
new data and new categories of data can be
added without changing existing data
RDBMS Terminology

Data model – A picture of logical data
structures that detail the relationships
among data elements

Metadata – Formal description of data
structures (like tables and fields) and
any constraints of the table or values
within the table

Data about the containers of data
RDBMS Terminology

Data dictionary – Compiles all of the
metadata about the elements in the
data model
Entity Sets (Tables)
Relational table or entity set –
Each table consists of columns
(fields/attributes) and rows
(records/entities)
 The table has a name that
describes the group of related
entities within the table


For example, a table labeled
“Student” would contain a group
of student entities
Entity / Record / Row

A person, place, thing,
transaction, or event about
which data are being collected
and stored
The individual rows in a table
contain entities
 Each row is also referred to as
a record


Example?
Attributes / Field / Column

The data elements that describe the
characteristics of a specific entity


The columns in each table contain the
attributes
Example?
What is a Relationship?

1) When designing a relational DB,
data need to be separated into tables
that contain related data elements

For example we would not store data
related to customer (name, address,
phone, etc.) and data related to the
customer’s particular order (orderID,
date, shipping method, etc.) in the
same table
What is a Relationship?
All information specific to a customer
would go into a “Customer” table
 All information specific to the specific
orders would go into an “Order” table


We would then create a relationship
between the tables to match a
particular customer with a particular
order
What is a Relationship?

A relationship in an RDB is an
association between the entities within
the different tables

There are THREE (3) types of
relationships:
• One-to-One (1:1)
• One-to-Many (1:M)
• Many-to-Many (M:M)
Creating Relationships
Through Keys

KEYS are used to create
relationships between the entities in
different tables in the RDB

Primary key

Foreign key
Creating Relationships
Through Keys

For our purposes:

Every table in a RDB MUST have a
primary key

The foreign key is not required in
every table and will only appear on
the “many” side of the relationship
Advantages of RDBMs

RDBMS advantages from a
business perspective include
1) Flexibility
 2) Scalability and performance
 3) Improved information integrity
(quality)

• Reduced information redundancy

4) Information security
1) Flexibility
Handle changes quickly and easily
 Provide users with different views of
the data

Arranging data items in different
ways depending on the user need
 Showing a particular user only some
of the available fields while not
showing them other fields

1) Flexibility: Schema
Different database schema can be “owned”
by or associated with different users
The schema is a user personalized set of
tables, views, and indexes
2) Scalability and
Performance

A DBMS must expand to meet
increased demand, while
maintaining acceptable
performance levels
Scalability – Refers to how well a
system can adapt to increased
demands
 Performance – Measures how
quickly a system performs a
certain process or transaction

3) Information Integrity

Information integrity – a measure of
information quality


Know that data have not been entered
incorrectly or altered in an unauthorized
manner
Integrity constraint – rules that help
ensure the quality of information

We will discuss entity integrity and
referential integrity (there is also domain
integrity)
3) Information Integrity:
Controlling Redundancy

Redundant data are ok if they serve a specific
purpose such as being used as backup directly
linked to the source


Backup systems promote fault tolerance,
Unintentional redundancy is not good
Wasted storage
 Difficult to modify
 Possible inconsistencies

4) Information Security
Information is an organizational asset
and must be protected
 RDBMS offer several security features


Access level – Determines the level of
access each individual user has
• Who can access the DBMS

Access control – Determines the
types of things each group can do
• Types of access, such as power to create,
modify, delete, and/or read
• Which types of SQL statements can be
executed
Multiuser Issues
DBMS serve many different users with
different needs
 Many users may require concurrent
access to the same data


Must preserve integrity of data and the
performance of the system
Multiuser Issues

Problem: if multiple users (say tens or
even hundreds of users) access the same
data concurrently, how does the DBMS
allow one user to change data without
being overwritten by another user? This is
typically referred to as the Lost-update
problem
Multiuser Issues

Concurrent transactions are addressed
through the use of transactions and locks
Transactions – single indivisible action
that affects some data
 Once a transaction is committed, it is
permanent and changes are visible to all
users
 If transaction is not committed, changes
are “rolled back” or reversed

Multiuser Issues

Locks – literally “locks” the data so that
changes cannot be made on the data
while another transaction is in process
Enterprise DBMS
Learning Outcomes

Five characteristics of quality information

Define database, DBMS, RDBMS, and
supporting components and terminology

Advantages of RDBMS

What is SQL?

Describe the lost-update problem and how it
is addressed
Download