Organizing Data and Information

advertisement
Organizing Data and
Information
Chapter 3
Fundamentals of Information
Systems, Second Edition
1
Learning Objectives
– Define general data management concepts
and terms, highlighting the advantages and
disadvantages of the database approach to
data management.
– Name three database models and outline
their basic features, advantages, and
disadvantages.
Fundamentals of Information
Systems, Second Edition
2
Learning Objectives
– Identify the common functions performed by
all database management systems and
identify three popular end-user database
management systems.
– Identify and briefly discuss recent database
applications.
Fundamentals of Information
Systems, Second Edition
3
The Hierarchy of Data
Fundamentals of Information
Systems, Second Edition
4
Date entries, attributes, and keys
– Entity: Generalized class of people,
places, systems for which data is
collected. (Ex. Employees, customers)
– Attribute: Characteristic of an entity (Ex.
First name, last name)
– Key: A set of fields used to identify an
entity
– Primary Key: A key that uniquely
identified the entity
Fundamentals of Information
Systems, Second Edition
5
Keys and Attributes
Fundamentals of Information
Systems, Second Edition
6
The Traditional Approach To Data
Management
– Create new
files for each
application
– Data
redundancy
– Data integrity
Fundamentals of Information
Systems, Second Edition
7
The Database Approach to Data
Management
Fundamentals of Information
Systems, Second Edition
8
Advantages of the
Database Approach (1)
• Improved strategic use of corporate date
– Accurate information always available
• Reduced data redundancy
– Data is stored in one place
• Improved data integrity
– Changes are reflected throughout
• Easier modification and update
– No need to know where the data is
Fundamentals of Information
Systems, Second Edition
9
Advantages of the
Database Approach (2)
• Data and program independence
– Accurate information always available
• Better access to data and information
– Simple instructions to access data
• Standardization of data access
– Each DBMS uses the same set of instructions
• Standardization for programmers
– Should only know how to access the DBMS
Fundamentals of Information
Systems, Second Edition
10
Advantages of the
Database Approach (3)
• Better protection of data
– Require authorization on the data
• Shared data resources
– Setup the database once
– Several applications can use it
Fundamentals of Information
Systems, Second Edition
11
Disadvantages of the
Database Approach
• Costly
– Specialized DBMS software
– Specialized DBMS administrators and
operators
• Increased vulnerability
– Single point of failure
– Targets for attacks
Fundamentals of Information
Systems, Second Edition
12
Data Modeling
• Planned data redundancy
– To have it available in more than one place
– To improve system performance
• Data model
– A diagram of entities and their relationships
• Enterprise data modeling
– Done at the level of enterprise
• Entity-relationship diagrams
– Use graphs to show how data is organized and how it
is related
Fundamentals of Information
Systems, Second Edition
13
Entity-Relationship Diagram for a
Customer Ordering Database
Entity
Relationship
(one-to-many)
Relationship
(many-to-one)
Relationship
(one-to-one)
Fundamentals of Information
Systems, Second Edition
14
Database Models
• Hierarchical (tree)
– Data is organized top-down
• Network
– Owner-membership relationship
– A member can have many owners
• Relational
– Uses tabular format with 2-dimensional tables
(relations)
– Relations resemble files
Fundamentals of Information
Systems, Second Edition
15
Hierarchical Database Model
Fundamentals of Information
Systems, Second Edition
16
Network Database Model
Fundamentals of Information
Systems, Second Edition
17
Relational Database Model
Fundamentals of Information
Systems, Second Edition
18
Relational Models
Describe data using a standard tabular format with all
data elements placed in two-dimensional tables, called
relations, that are the logical equivalent of files.
– Rows represent data entity
– Columns represent attributes
Fundamentals of Information
Systems, Second Edition
19
Relational Models
– Domain: Set of values an attribute can have
• Age: Between 0-100
• Gender: Male or female
– Selecting
• Pick rows based on certain criteria
• Select those whose gender is female
– Projecting
• Create a new table with a subset of attributes
– Joining
• Combine two or more tables
Fundamentals of Information
Systems, Second Edition
20
Linking Database Tables
to Answer an Inquiry
Fundamentals of Information
Systems, Second Edition
21
Building and Modifying a Relational
Database
Fundamentals of Information
Systems, Second Edition
22
Database Management
Systems
Fundamentals of Information
Systems, Second Edition
23
Providing a User View
• Schema - a description of the entire database
– First create a schema, then create the tables
• Subschema - a file that contains a description
of a subset of the database and identifies which
users can modify the data items in that subset
– A sales representative has to see the data for
her office, not the company stock data
Fundamentals of Information
Systems, Second Edition
24
The Use of Schemas and Subschemas
Fundamentals of Information
Systems, Second Edition
25
Creating and Modifying the Database
• Data definition language (DDL) - a collection of
instructions and commands used to define and
describe data and data relationships in a specific
database
• Used to define the schemas
• Data dictionary – detailed description of data in
a database
• Create a data dictionary when defining the
schemas
Fundamentals of Information
Systems, Second Edition
26
Typical Uses of a Data Dictionary
•
•
•
•
•
•
•
Provide a standard definition of terms and data elements
Assist programmers in designing and writing programs
Simplify database modification
Reduce data redundancy
Increase data reliability
Speed program development
Ease modification of data and information
Fundamentals of Information
Systems, Second Edition
27
Storing and Retrieving Data
Fundamentals of Information
Systems, Second Edition
28
Data Access
• Concurrency control: Lock the record so that
only one application can access it at a time
• Data manipulation language (DML)
• Structured Query Language (SQL)
• SELECT * FROM Project
WHERE Project_number=“155”
• UPDATE Project
SET Project_number=“156”
WHERE Project_number=“155”
Fundamentals of Information
Systems, Second Edition
29
Structured Query Language
Fundamentals of Information
Systems, Second Edition
30
Database Output
Fundamentals of Information
Systems, Second Edition
31
Popular Database Management Systems
•
•
•
•
•
•
Oracle
MySQL
Paradox database
FileMaker Pro
Microsoft Access
Lotus 1-2-3 Spreadsheet
Fundamentals of Information
Systems, Second Edition
32
Worldwide Database Market Share
(2001)
Fundamentals of Information
Systems, Second Edition
33
Selecting a Database Management
System (1)
• Database size: Number of records in the
database
• Number of concurrent users: People or
applications that will access it at the same time
• Performance: How fast can the DBMS access or
update records?
Fundamentals of Information
Systems, Second Edition
34
Selecting a Database Management
System (2)
• Integration: Which operating system can it run
under?
• Features: Which security procedures or privacy
policies are in place?
• Vendor: Size and reputation of the vendor
• Cost: Initial cost, maintenance costs, hardware
costs, personnel costs
Fundamentals of Information
Systems, Second Edition
35
Database Applications
Fundamentals of Information
Systems, Second Edition
36
Data Warehouses, Data Marts,
and Data Mining
• Data Warehouse - a database that collects business
information from many sources in the enterprise,
covering all aspects of the company’s processes,
products, and customers.
• Data Mart – a subset of a data warehouse.
– For small and medium size businesses
– Used mostly for decision support system
• Data Mining - an information analysis tool that involves
the automated discovery of patterns and relationships in
a data warehouse.
Fundamentals of Information
Systems, Second Edition
37
Elements of a Data Warehouse
Fundamentals of Information
Systems, Second Edition
38
Common Data Mining Applications
Fundamentals of Information
Systems, Second Edition
39
Common Data Mining Applications (1)
• Branding and positioning of products
• Customer churn
– Which customers can switch to competitors?
• Direct marketing
– Who would respond to telemarketing?
• Fraud detection
– Predict transactions which are likely to be illegal
Fundamentals of Information
Systems, Second Edition
40
Common Data Mining Applications (2)
• Market-based analysis
– Which products are bought at the same time (diaper,
beer, chips)
• Market segmentation
– Group users based on similarity of products that they
buy
• Trend analysis
– Analyze how variables change over time (e.g., sales)
Fundamentals of Information
Systems, Second Edition
41
Business Intelligence
Gathering enough of the right information in a
timely manner and usable form.
– Competitive intelligence
• What others are doing
– Counterintelligence
• Define trade secret information
– Knowledge management
• Capture company’s collective expertise wherever it
resides
• Record knowledge and share it
Fundamentals of Information
Systems, Second Edition
42
Others
– Distributed databases
• Data is spread over a few database
– On-line analytical processing (OLAP)
• Programs used to store and deliver data
• Used to analyze millions of customer records
– Open database connectivity (ODBC)
standards
Fundamentals of Information
Systems, Second Edition
43
Comparison of OLAP and Data Mining
Fundamentals of Information
Systems, Second Edition
44
Advantages of ODBC
Fundamentals of Information
Systems, Second Edition
45
Object-Relational Database
Management System
• Stores the following types of data as objects:
–
–
–
–
–
audio
images
unstructured
text
spatial data
Fundamentals of Information
Systems, Second Edition
46
Spatial Technology
Fundamentals of Information
Systems, Second Edition
47
Summary
• Data - one of the most valuable resources a firm
possesses.
• Entity - a generalized class of objects for which data is
collected, stored, and maintained.
• Attribute - a characteristic of an entity.
• DBMS - a group of programs used as an interface
between a database and application programs.
• Data mining - the automated discovery of patterns and
relationships in a data warehouse.
Fundamentals of Information
Systems, Second Edition
48
Download