chapter_6

advertisement
Chapter 6: Databases and Information
Management
6.1

Organizing Data in a Traditional File Environment
An effective info system provides users with accurate, timely and relevant information
File Organization Terms of Concepts

A computer system organizes data in a hierarchy that starts with bits and bytes and progresses to
fields, records, files, and databases
o
Bit: smallest unit of data a computer can handle
o
Byte: group of bits that represent a single character, e.g. letter, number, symbol
o
Field: grouping of characters into a word, a group of words, or a complete number
o
Record: group of related fields such as a student’s name, the course taken, date, grade,
etc.
o
File: group of records of the same type
o
Database: group of related files, e.g. student database
o
Entity: a person, place, thing or event about which we store and maintain information.
o
Attribute: each characteristic or quality describing a particular entity, e.g. Student_ID
Problems with the Traditional File Environment

Each functional area (e.g. accounting, finance, HR, etc.) developed their own systems & data files

Problems:
o
Data redundancy and inconsistency
o
Program-data dependence
o
Inflexibility
o
Poor data security
o
Inability to share data among applications
Data Redundancy and Inconsistency

Data redundancy: presence of duplicate data in multiple data files so that the same data are
stored in more than one place or location.

Data inconsistency: same attribute may have different values
o
E.g. the attribute ‘Date’ is not updated for the entity ‘COURSE’ across the different
databases/systems

Additional confusion may result from using different coding systems to represent values for an
attribute  can’t integrate data from different sources
Program-Data Dependence

Program-data dependence: the coupling of data stored in files and the specific programs
required to update and maintain those files so that changes in programs require changes to the
data

A change in the software program could change the data, which may then not be compatible with
other programs
Lack of Flexibility

Traditional file systems can deliver schedule reports but not unanticipated information requests in
a timely fashion

It may be possible, but expensive
Poor Security

Little control or management of data leads to little control over access and dissemination of data
Lack of Data Sharing and Availability

It is virtually impossible to access data in a timely manner because pieces of information in
different files and different parts of the organization cannot be related to one another

6.2

Different values of the same piece of information
The Database Approach to Data Management
Another definition of database is a collection of data organized to serve many applications
efficiently by centralizing the data and controlling redundant data

Data appears to be stored in one location
Database Management Systems

Database management system (DBMS): software that permits an organization to centralize
data, management them efficiently, and provide access to the stored data by application
programs.

DBMS retrieves information from the database and presents it to an application program
o


Traditional data files would be the other way around
DBMS separates the logical and physical view
o
Logical view – data as they would be perceived by end users
o
Physical view – how data are actually organized and structured on physical storage media
DBMS makes the physical database available for different logical views as required by users
How a DBMS Solves the Problems of the Traditional File Environment

Reduces data redundancy and inconsistency by minimizing isolated files in which the same data
are repeated

Uncouples program data, enabling ad hoc data queries

Enables the organization to centrally manage data, their use and their security
Relational DBMSs

Relational DBMS: a type of logical database model that treats data as if they were stored in twodimensional tables. It can relate data stored in one table to data in another as long as the two
tables share a common data element.

A row of the table = record = tuple

Key field: field in a record that uniquely identifies instances of that record so that it can be
retrieved, updated or sorted.

Primary key: unique identifier for all the information in any row of the table

Foreign key: field in a database table that enables users to find related information in another
database table.
Operations of a Relational DBMS

Relational database tables can be combined easily to deliver data required by users, provided
that any two tables share a common data element

3 basic operations:
o
Select
o
Join

Creates a subset consisting of all records in the file that meet the stated criteria

Combines relational tables to provide the user with more information than is
available in individual tables
o
Project

Creates a subset consisting of columns in a table, permitting the user to create
new tables that contain only the information required
Object-Oriented DBMSs

Many newer applications require databases that can store and retrieve not only structured
numbers and characters but also drawings, images, photographs, audio and full-motion video
o

DBMS that organize data into rows and columns are not well-suited for this purpose
Object-oriented DBMS: stores the data and procedures that act on those data as objects that
can be automatically retrieved and shared
o
Becoming popular b/c can be used to manage multimedia components or Java applets in
Web applications
o

Slow in processing a large number of transactions
Object-relation DBMS: DBMS with capabilities of both object-oriented and relational DBMS
Capabilities of Database Management Systems
1.
Data definition: capability to specify the structure of the content of the database
2.
Data dictionary: automated or manual file that stores definitions of data elements and their
o
Used to create database tables and to define the characteristics of the fields in each table
characteristics
3.
Data manipulation language
Querying and Reporting

Data manipulation language: DBMS’s specialized language that is used to add, change, delete
and retrieve the data in the database
o
Contains commands that permit end users and programming specialists to extra data
from the database to satisfy info requests and develop applications

Structured Query Language (SQL): standard data manipulation language for relational database
management systems

Microsoft Access and other DBMSs include capabilities for report generation so that the data of
interest can be displayed in a more structured and polished format than would be possible by just
querying

Crystal Reports (popular report generator)
Designing Databases
Normalization and Entity-Relationship Diagrams

Conceptual database design describes how data elements in the database are to be grouped

Design process identifies
o
Relationships among data elements
o
Most efficient way of grouping data elements together to meet business information
requirements

o
Redundant data elements
o
Groupings of data elements for specific application programs
Normalization: the process of creating small, stable yet flexible and adaptive data structures
from complex groups of data

Repeating data groups: unnnormalized data wherein there can be multiple records associated
with multiple records from another table

Referential integrity: rules to ensure that relationships between linked database tables remain
consistent

Entity-relationship diagram: a methodology for documenting databases illustrating the
relationship between various entities in the database.

If the business does not get its data model right, the system will not be able to serve the business
well  end up working with data that is inaccurate, incomplete or difficult to retrieve
Distributing Databases

Distributed database: database stored in more than one location
o
Partitioned database

Parts of the database are stored and maintained physically in one location and
other parts are stored and maintained in other locations so that each remote
processor has the necessary data to serve its local area
o
Duplicate database


Duplicate the central database at all remote locations
Advantages:
o
Reduce the vulnerability of a single, massive central site
o
Increase service and responsiveness to local users and often can run on smaller, less
expensive computers

6.3
Disadvantages:
o
Depart from central data standards and definitions
o
Pose security problems by widely distributing access to sensitive data
Using Databases to Improve Business Performance and
Decision Making
Data Warehouses

Concise, reliable information about current operations, trends and changes across the entire
company can be a problem if data in different parts of the organization
What is a Data Warehouse?

Data warehouse: database that stores current and historical data of potential interest to decision
makers throughout the company
o
Consolidates and standardizes information from different operational databases so that
the information can be used for management analysis and decision making.
o
Concept:

Take data from internal & external data sources

Extract and transform

Data warehouse that serves as an information directory and allows data access &
analyses
Data Marts

Data mart: subset of a data warehouse in which a summarized or highly focused portion of the
organization’s data is placed in a separate database for a specific population of users
o
Smaller, decentralized data warehouse
o
E.g. sales & marketing data marts to deal with customer information
Business Intelligence, Multidimensional Data Analysis, and Data Mining

Business intelligence (BI): applications and technologies to help users make better business
decisions
o
Keep track of transactions
o
Develop knowledge about customers, competitors and internal operations by finding
patterns and insights
o
Change decision-making behaviour to achieve higher profitability
o
Database -> Data warehouse -> BI
Online Analytical Process (OLAP)

OLAP: capability for manipulating and analyzing large volumes of data from multiple perspectives
o
E.g. product, pricing, cost, region or time period

Enables users to obtain online answers to ad hoc questions in a fairly rapid amount of time, even
when the data are stored in very large databases, such as sales figures for multiple years

Cube analogy
Data Mining

Data mining: analysis of large pools of data to find patterns and rules that caneb used to guide
decision making and predict future behaviour
o
Associations

o
Sequences

o
Occurrences linked to a single event
Events linked over time
Classification

Patterns that describe the group to which an item belongs by examining existing
items that have been classified and by inferring a set of rules
o
o
Clustering

Similar to classification

Find different groupings within data
Forecasting

Uses a series of existing values/patterns to forecast what other values will be

Perform high-level analyses of patterns or trends, but also provide more detail when needed

Predictive analysis: uses data-mining techniques, historical data and assumptions about future
conditions to predict outcomes of events
o
E.g. the probability a customer will respond to an offer or purchase a specific product
Text Mining and Web Mining

Unstructured data, most in the form of text files, is believed to account for more than 80% of an
organization’s useful information (e.g. e-mails, transcripts, memos, etc.)

Text mining: discovery of patterns and relationships from large sets of unstructured data
o
E.g. businesses might turn to text mining to analyze transcripts of calls to customer
service centres to identify major service and repair issues

Web mining: discovery and analysis of useful patterns and information from the World Wide
Web
o
E.g. understand customer behaviour, evaluate a website’s effectiveness, quantify the
success of marketing campaigns
o

E.g. Google Trends, Google Insights
Web mining = searching for data patterns through content, structure & usage mining
o
Web content mining
o
Web structure mining


o
Process of extracting knowledge from Web content
Examines data related to the structure of a particular website
Web usage mining

Examines user interaction/behaviour data recorded by a Web server whenever
requests for a website’s resources are received
Databases and the Web

Many companies now use the Web to make some of the information in their internal databases
available to customers and business partners

E.g. buying stuff online – after the user goes to the website, the Web browser software requests
data from the organization’s database, communicated through HTML commands

Database server: a computer in a client/server environment that is responsible for running a
DBMS to process SQL statements and perform data management tasks
Web
browser
Internet
Web server
Application
server
Database
Server
Database
o
Many back-end databases cannot interpret commands written in HTML
o
Application server is the middleware working between the Web server & database server
 “translator” of HTML to SQL

Handles all application operations, including transaction processing and data
access, between browser-based computers and a company’s back-end business
applications or databases

Takes requests from the Web server, runs the business logic to process
transactions based on those requests, and provides connectivity to the
organization’s back-end systems or databases

Software for handling these programs could be a Common Gateway Interface
(CGI) script

Advantages:
o
Web browser software is much easier to use than proprietary query tools
o
Few or no changes to the internal database
o
Costs less to add a Web interface in front of a legacy system than to redesign and rebuild
the system to improve user access

6.4
MySpace is a massive database of users’ personal information  entirely new business
Managing Data Resources
Establishing an Information Policy

Information policy: the organization’s rules for sharing, disseminating, acquiring, standardizing,
classifying and inventorying information.
o

Specific procedures and accountabilities that identify:

Which users and organizational units can SHARE information

Where information can be DISTRIBUTED

Who is responsible for UPDATING & MAINTAINING the information
Data administration: specific policies and procedures through which data can be managed as an
organizational resource
o
Developing information policy

o
Planning for data
o
Overseeing logical database design
o
Data dictionary development
o
Monitoring how information systems specialists and end-user groups use data
Data governance: the policies and processes for managing the availability, usability, integrity and
security of the data employed in an enterprise, with special emphasis on promoting privacy,
security, data quality and compliance with government regulations.

Database administration: a special organizational function for managing the organization’s data
resources, concerned with information policy, data planning, maintenance of data dictionaries,
and data quality standards.
Ensuring Data Quality

Inaccurate, untimely or inconsistent data leads to incorrect decisions, product recalls and financial
losses

If a database is properly designed and enterprise-wide data standards established, duplicate or
inconsistent data elements should be minimal

Many errors result from data input, e.g. misspellings or incorrect codes

Data quality audit: structured survey of the accuracy and level of completeness of the data in an
information system
o
Can be performed by surveying entire data files, samples from data files, or end users for
their perceptions of data quality

Data cleansing: also known as data scrubbing, activities for detecting and correcting data in a
database that are incorrect, incomplete, improperly formatted or redundant.
o
Enforces consistency among different sets of data that originated in separate information
systems
Download