Chapter 12 Lecture

advertisement

Chapter 12

File Processing and Data

Management Concepts

Presentation Outline

I.

Terminology

II.

Database Technology

III. The Architecture of a Database

Management System (DBMS)

IV. The Database Administrator

I. Terminology

A. Field

B. Data Occurrences

C. Fixed vs. Variable Length Records

D. Record Key

E. Sort Keys

A. Field

A field is the smallest block of data that will be stored and retrieved in the information system.

Other names for field include data item, attribute, or element.

Field 1 Field 2

B. Data Occurrences

A specific set of data values for a record in a file.

3

4

1

2

5

The above table contains 5 occurrences of account records for the general ledger account file.

C. Fixed vs. Variable Length Records

Fixed Length Records

Both the number of fields and the length of each field are fixed.

Strength: Easier to manipulate records.

Weakness: Must accommodate maximum sizes.

Variable Length Records

Both the number of fields and the length of each field are variable. (See Fig. 15-1 on p. 603)

Strength: Less waste of memory when maximum sizes do not have to be accomodated.

Weakness: Record manipulation is more difficult.

D. Record Key

1110

1500

2105

2110

A record key is a field or combination of fields that uniquely identifies a particular record in a file.

E. Sort Keys

Primary sort key – The first field used to sort the data occurrences in a record set.

Secondary sort key – A field used to determine relative position among a set of data occurrences in a record set.

Tertiary sort key – Additional fields beyond primary and secondary sort keys that are required to uniquely identify data occurrences in a record set.

Last Name First Name

Adams Tom

Jones

Jones

Jones

Alisa

Julie

Julie

Young Sam

Age

25

36

19

21

22

II. Database Technology

A. The Problem of Redundancy

B. The Components of a Database

A. The Problem of Redundancy

That is not what we show per our records.

Redundancy occurs when different areas of an organization use the information system to store the same information in more than one place.

Results in update anomaly.

B. The Components of a Database

Management System

1. Data Description Language (DDL)

2. Data Manipulation Language (DML)

3. Data Query Language (DQL)

1. Data Description Language (DDL)

Defines the logical structure of the database (known as the schema). Defines the following:

Name of data fields.

Type of data (numeric, alphabetic, etc.)

Number of positions

(length of field).

May also define subschema (i.e., individual user views)

2. Data Manipulation Language (DML)

Pull a trial balance.

The DML consists of the commands for updating, editing, manipulating, and extracting data.

Structured query language

(SQL) is a common DML in relational settings.

Structure Query Language (SQL)

3. Data Query Language

Query by

Example (QBE)

A data query language is a user friendly language or interface that allows the user to request information by simply filling in blanks. Represents a special type of

DML.

III. The Architecture of a Database

Management System (DBMS)

A. The Database Architecture

B. The Conceptual Architecture and Entity-

Relationship (ER) Diagrams

C. Logical Data Structures

D. The Physical Structure

A. The Database Architecture

Conceptual

Level

Logical Level

Physical Level

Database contents

Uses of database

Desired reports

Information to be viewed

Logical data structures:

Tree

Network

Relational

Access Methods:

Sequential Access

Indexed Files

B. The Conceptual Architecture and Entity-

Relationship (ER) Diagrams

PART_NO NAME

PART

STORED AT

COST

Square boxes are used for entities

(separate tables).

Ellipses are used for attributes (table columns).

Diamond shaped boxes depict relationships.

LOCATION

WHSE ADDRESS

C. Logical Data Structures

1. Tree or Hierarchical Structure

2. Network Structures

3. Relational Structure a. Selection b. Projection c. Join

1. Tree or Hierarchical Structure

A parent record can have many children. However a child record can have only one parent.

Can only model 1:1 (one-to-one) and 1:* (one-to-many) relationships.

Commonly used with accounting data. Can only access data by going from a parent to child.

Assets

Current Assets

Long-term Assets

Balance Sheet

Liabilities

Current Liabilities

Long-term Liabilities

Equity

Revenues

Expenses

2. Network Structure

Eliminates the distinction of parent and child records. A parent can have many children and a child can have many parents.

Can model 1:1 (one-to-one), 1:* (one-to-many), and *:* (many-to-many) relationships.

Must know the physical structure of the data in order to access it.

3. Relational Structure

Relational databases organize and store data in two dimensional tables consisting of rows and columns.

Relationships among tables are represented by common data values in different tables.

Straight forward in terms of organizing and searching the data.

Possesses ad hoc search capabilities.

3a. Selection

Produces a horizontal subset (includes entire row) of a relation which satisfies a boolean predicate.

Savings Table

Name

John

Bill

Mary

Joe

Acct #

123

205

707

127

Balance

35.75

3.95

7.95

4.05

(Savings)

Balance < 5.00

Bill

Joe

Name Acct #

205

127

Balance

3.95

4.05

3b. Projection

Constructs a vertical subset of a relation. The subset is obtained by selecting specified attributes and removing others.

Savings Table

Name

John

Bill

Mary

Joe

Acct #

123

205

707

127

Balance

35.75

3.95

7.95

4.05

(Savings)

Name Balance < 5.00

Name

Bill

Joe

3c. Join

A join is used to combine 2 tables. The attribute used to join must be in both tables.

Table R

A B C

C

Table S

D E a1 b1 c1 c2 d1 e1 a2 b2 c2 c3 d2 e3 a2 b2 c3 c2 d1 e2 a4 b2 c2

R |X| S

A a2 a2 a2 a4 a4

B b2 b2 b2 b2 b2

C c2 c2 c3 c2 c2

D d1 d1 d2 d1 d1

E e1 e2 e3 e1 e2

D. The Physical Structure

1. Sequential Access

2. Indexed Files

1. Sequential Access

 Records can only be accessed in a predefined sequence. For example, if there are 100 records in a file, one must access the first 99 records before accessing the last record.

Generally useful for batch processing when nearly all records must be accessed.

2. Indexed Files

Any attribute can be extracted from the records in a primary file and used to build a new file whose purpose is to provide an index to the original file.

First, the index is searched to find a specified value of an attribute such as an customer account number.

Second, the disk addresses are used to directly retrieve the desired records

See Fig. 12-13 on p. 427.

IV. The Database Administrator

This is not quite what we need.

The database administrator is a person who coordinates data management activities such as approving the physical contents and user views of the database.

Summary

Fields and keys

Three Components of a DBMS

Three Types of Database Architecture

The Database Administrator

Download