Introduction to Databases

advertisement
Introduction to
Databases
“Simplicity is the ultimate sophistication.”
Leonardo da Vinci
Class Outline
 What is data and why is it important?
 What is a database and database schema?
 What is a database management system?
 What is a database application?
 What are the levels of database representation?
 What were the limitations of the systems that led to the
development of the current relational database systems?
 What are various types of database systems?
 What do databases have to do with
HEALTH SCIENCE?
Databases versus “The Rest”
Word
processing
Spreadsheet
Database
Text handling
excellent
fair
poor
Mathematical
functions
poor
excellent
very good
excellent
good
fair
Training Cost
low
moderate
high
Software Cost
low
moderate
high
Volume of data
low
moderate
very high
Multiuser Access
low
moderate
very high
Ease of Use
Principles of Information Resource Management






Organizational resources flow into and out of the organization
Two types of major organizational resources: Physical resources,
Conceptual resources (data & information)
As scale of organization grows, it becomes increasingly difficult
to manage by observation (i.e., reliance on conceptual resources)
Conceptual resources can be managed just like physical
resources or assets (e.g., employees, $$, equipment, etc.)
Management of data & information means getting it before it’s
needed, protecting it, assuring quality, and getting rid of it when
no longer required
Management of data & information can be achieved only through
organizational commitment
Adapted from McFadden, F.R. & Hoffer, J.A. (1994). Modern Database
Management. Redwood City, CA:Benjamin/Cummings Publishing (p. 6)
processing
Information is a major organizational resource
Action
Knowledge
Information
Expand experimental funding
for rare mutation of CFTS
Rare mutation has significantly
increased over the last year
(2003) – 0.5% with rare mutation
(2004) – 5% with rare mutation
(organized data)
Data
(isolated facts)
Dr. John conducted 100 CFTS (2003)
experiments of 100 patients each of
which 50 had a rare mutation
Dr. Jill conducted 100 CFTS (2004)
experiments of 100 patients of which
500 had a rare mutation
What is a Database?
Organized collection of related information or data
stored on a computer disk for easy, efficient use
OrderNum
201
209
214
221
235
239
InvoiceAmt
854.00
1,106.00
1,070.50
1,607.00
1,004.50
1,426.50
CustomerName
Cottage Grill
Cleo's Dow ntow n Restaurant
Jean's Country Restaurant
Maxw ell's Restaurant
Embers Restaurant
The Empire
Ow nerName
Ms. Doris Reaume
Ms. Joan Hoffman
Ms. Jean Brooks
Ms. Barbara Feldon
Mr. Clifford Merritt
Ms. Curtis Haiar
Phone
(616) 643-8821
(616) 888-2046
(517) 620-4431
(219) 333-0000
(219) 816-2456
(616) 762-9144
Outstanding Invoice Amounts By Order
201
data
209
214
221
235
239
information
In the beginning…(in the 1950s)
…There were no databases. Just file (or data processing) systems.

Name:
Address:
City:
Phone:
Date:
Time:
Patient:
OHIP:
Jane Doe
123 Easy St.
London
455-0897
Sept 14, 1955
2:00 p.m.
Jane Doe, 455-0897
123456789
File systems were typically
organized by function (use)
 The first data management
systems performed clerical
tasks (transactional processing)
such as order entry processing,
payroll, work scheduling.
 e.g., files for patients (file
folder analogy); each record for
a single patient; another file for
appointment/ billing
information
Limitations of Data File Systems
Customer
processing
Application
Customer
file
Order
processing
Application
Order
file
 Worked adequately if data collection needs were
relatively small.
 Problems arose as data files, information needs, and
reporting requirements grow in complexity due to:

Extensive programming - use of third-generation languages
(e.g., COBOL, FORTRAN) in which the programmer must
specify what is be done as well as how it is to be done
Limitations of Data File Systems

Poor mechanisms for sharing data across organization files are often incompatible with one another (separate,
isolated data)

Data redundancy - duplicate information in two or more
files

Program/ data dependence - if the file structure changed,
ALL programs using the file had to be modified - timeconsuming

Lack of flexibility - could not do ad hoc queries or reports;
required separate programs for every report or query

Poor security - difficult to program, therefore, often omitted

Difficulty of representing data in the users’ perspective
Historical Roots of Database Systems
Customer
processing
Application
Order
processing
Application
DBMS
Database
Employee
processing
Application



Developed to overcome limitations of file systems, developed initially on
mainframe computers in late 60s and early 70s - a typical early DBMS
cost $100,000 (many are still in use)
First general databases were created for General Electric Company
(GEC) - Integrated Data Store (IDS), designed to run on GEC machines;
B.F. Goodrich ported IDS to IBM 360 - became dominant until 1980s
As PCs gained popularity (1980s), single-user, personal databases
developed; at present, most database technology is used in workgroups
What is a Database Management
System (DBMS)?
“A set of programs used to define,administer, and process
the database and its applications conveniently and
efficiently”
Program (or collection of programs) that enables users to create the
database. The DBMS manages the storage and retrieval of data, and
provides the user with certain functionalities to guarantee that the
data will be logically organized and consistently applied.
Database
DBMS
(e.g., Oracle, dBase,
Access, Paradox)
Database
Application
User(s)
Features of a DBMS
DBMS
Database
• user data
• metadata
• indexes
• application
metadata
Design Tools Subsystem
D • Table Creation Tool
B • Form Creation Tool
M • Query Creation Tool
S • Report Creation Tool
• Procedural Language
Compiler
E
n
g
i
n
e
Run Time Subsystem
• Form Processor
• Query Processor
• Report Writer
• Procedural Language
RunTime
developer
Application
program
users
Application
program
What is a Database Application?
Database
DBMS
Database application
• A computer program that
performs a specific task of
practical value in a business
situation
• An interface that allows the
user to enter and manipulate
data; User can request abstract
views of data
• Created by database designers
and developers using a DBMS
program or a programming
language
Major Components of a Database Application
1. Form- data entry
2. Report- summarizes & prints
3. Query- asks questions of data
4. Menu - organizes components
5. Program
- used to automate a database
Evolution of Database Models
Hierarchical
Network
Relational
still in use in many older (1970s) legacy
systems; very few new databases;
referred to “navigational systems”
the vast majority currently use this,
therefore, our course’s focus is here
Semantic
ObjectRelational
ObjectOriented
Very few new databases are
being created using ObjectOriented Programming (not
many ODBMS for businesses to
implement this model)
The Relational Database Model
Genetics Lab
Gel Images
Results
Experiments
Patients



Gene
represented by tables (like spreadsheets)
tables are NOT linked with physical pointers
unlike earlier systems, all three types of relationships can be
represented
 accommodates the design of larger databases that involve
complex relationships and intricate manipulations
Evaluation of the Relational database model
Advantages







But #1 problem still is
mechanisms for minimizing data redundancy and inconsistency
logical database design is separated from physical aspects
relatively program-data independent
management of data for access, manipulation, and security
flexible mechanisms for generating reports and queries
program development and maintenance costs are reduced
data can be accessed in a multiplicity of ways within and amongst
organizations
Disadvantages

ease of use - many untrained people create and use databases
without considering its design - usually incorporate many errors
Comparison of Database models
File Systems
• data dependence
• structural dependence
• demands upon programmer
Hierarchical, Network DBMS
• data independence
• structural dependence
• demands upon programmer
Relational DBMS
• data independence
• structural independence
• demands upon computer
Types of Database Systems
Centralized (single site)
Distributed
• microcomputer (desktop)
• legacy mainframe/ mini computer
(1 CPU)
• client/server architecture (>1 CPU)
Type
Example


>1 site, requires network
not widely adapted yet due
to many problems
# of
concurrent
users
Typical size of
database
1
< 10 Megabytes
< 25
< 100 Megabytes
Personal
Joe's House
Painting Service
Workgroup
Video rental store
Organizational
(enterprise)
Larger
Corporations or
Government
hundreds
> 1 Trillion bytes
Multimedia
(Internet
technology)
Holiday resort
bookings (with
photos)
possibly
hundreds
Any
our focus;
centralized,
microcomputer
database
Three levels of Database Representation
data elements
& their
relationships
physical
implementation
- access
methods, index
construction,
data structure;
database exists
in reality only
here
Conceptual level
Internal level
database
design,
logical,
abstract
description of
each user
group will
have its own
view of the
database;
database is
accessed from
here
External level
Primary focus of the lectures of this course is the conceptual level because
the creation of a database begins with its design; the focus of the laboratories
is the internal and external level.
The Database System Environment
 Hardware - physical devices
 computer, peripherals, network devices
 Software
 DBMS (manages the database)
 operating systems software (manages hardware & software)
 application programs (user access and manipulate database)
 People
 system administrators (manage general operations)
 database designers (architects of database structure)
 database administrators (ensure the database is functioning)
 systems analysts & programmers (design & implement database)
 end users (use application programs)
 Procedures - rules of the company governing use of data
 Data
Focus of this course
Lectures
 Conceptual design of
databases: determining
their purpose, developing
a model, identifying the
tables that are required,
designing normalized
tables and identifying
their relationship to one
another.
Laboratories
 Implement a database at
the internal and external
level: create databases
(tables) and database
applications (queries,
forms, reports,
programs) using a
typical microcomputer
relational database
management system,
MS Access 2003.
Better Definition of a Database

A collection of users’ data, organized logically and managed
by a unifying set of principles, procedures, and functionalities,
which help guarantee the consistent application and
interpretation of that data
(a) organized collection of related information or data
stored on a computer disk for easy, efficient use; represented in
tabular format
OrderNum
201
209
214
221
235
239
InvoiceAmt
854.00
1,106.00
1,070.50
1,607.00
1,004.50
1,426.50
CustomerName
Cottage Grill
Cleo's Dow ntow n Restaurant
Jean's Country Restaurant
Maxw ell's Restaurant
Embers Restaurant
The Empire
Ow nerName
Ms. Doris Reaume
Ms. Joan Hoffman
Ms. Jean Brooks
Ms. Barbara Feldon
Mr. Clifford Merritt
Ms. Curtis Haiar
Phone
(616) 643-8821
(616) 888-2046
(517) 620-4431
(219) 333-0000
(219) 816-2456
(616) 762-9144
Better Definition of a Database (cont'd)
(b) A database is
self-describing
(metadata or system
catalogues or data
dictionary)
 A database contains
a description of its
own structure (e.g.,
the names of all the
tables, the names
and types of data in
each column in all
the tables)
Kroenke, D.M., Database Processing: Fundamentals, Design & Implementation, Prentice Hall, 1998
Better Definition of a Database (cont'd)
(c) Indexes are stored with the database

Data accessed from a source table for sorting and searching is
time-consuming without a “pointer” system, which improves
performance and accessibility of the database
 The “overhead cost” of indexing is that each time data is updated,
all indexes must also be updated, therefore, reserve index for
cases in which they are needed
Salesperson
Employee ID
Name
Office
27
Rodney Jones Toronto
44
Goro Azuma Tokyo
35
Francine Moire Brussels
37
Anne Abel
Tokyo
Office Index
Office
Toronto
Tokyo
Brussels
Employee ID
27
44, 37
35
(d) Application Metadata - stores structure and format of
application components; not all DBMS support this feature
Table
Users view their data in two-dimensional tables.
table =
file
=
relation
Field
The fields within records contain data.
Data within a field must be of the same data type. Each field within
a table must have a unique name. Order of fields is unimportant.
column
=
field
=
attribute
Record
 A record is a group of related fields of information about
a single instance of one object or event in a database.
 Tables consist of zero, one, or more records.
 Order of rows is unimportant.
row
=
record
=
tuple
Database Schema
Database schema defines database’s structure,
tables, relationships, domains, and constraint rules
 Tables



EXPERIMENT (ExprId, GeneID, ExprName, DoctorID)
GENE (GeneID, GeneName, GeneDescription)
DOCTOR (DoctorID, DocFName, DocLName)
 Relationships


Each experiment involves one and only one gene
Each doctor is in charge of one or more experiments
 Domains (set of values in a column)

Physical description (e.g., set of integers 0 < x < 99999)
 Constraints (business rules)

Gene Name cannot be left blank
Download