Lecture 1

advertisement
Lecture 1
DBMS & More
Atif Farid Mohammad
Adjunct Professor
UNCC
Introductions
How many of you know?
PhP or ColdFusion or JavaScript
How many of you are?
Grads or Undergrads
What are your majors?
CS or SIS or Something Else
DBMS & More…
Textbook:
• Fundamentals of Database Systems, Elmarsi/Navathe,
Benjamin, 2011 (Sixth Edition)
Workload:
• (30 %) 3 In Class Quiz
• (30%) Midterm exam
• (40% ) One project, parts 1 & 2
Any Questions ?
Objectives
•
•
•
•
•
•
•
Introduction
Example of database systems
Database characteristics
People associated with database
Advantage of Using a DBMS
Database implications
When Not to use a DBMS
8
• Databases:
Introduction
– Used to maintain information and to present
data to users
• Examples of Databases include:
– Reservations systems (Hotel, Car, Airline)
– Transactions processing systems (Online
Banking systems, local library)
– Investigations systems(Scientific Database
systems)
– Multimedia database systems
– Geographic information systems (GIS)
9
Core Database Terminologies: 1
• Data?
– Any information (most likely in electronic form)
worth preserving
• E.g., names, addresses, grades, etc.
• Database?
– A collection of related data describing the
activities of one or more organizations
◦ Organized (or structured) for access and modification
◦ Preserved over a long period
– E.g. University Database
10
Core Terminologies: 3
• Query?
 an operation that extracts specified data from the
database
 E.g.
 get the list of all courses and grades taken by “Smith”
• Relation?
 an organization of data into a two-dimensional table,
 rows (tuples) represent basic entities or facts of some sort
 columns (attributes or fields) represent properties of those entities.
• Schema?
 a description of the structure of the data in a database
 also known as metadata
11
Database Def-1
A database is a shared collection of
logically related data that is stored to
meet the requirements of different
users of an organization
Database Def-2
A database is a self-describing collection of
integrated records
Database Def-3
A database models a particular real
world system in the computer in the form
of data
The concept of a shared organizational database
Management
Planning
Marketing
Control
Sales
Product
Development
Corporate
Database
Accounting
Accounts
Receivable
Accounts
Payable
Manufacturing
Scheduling
Production
A bit of History
 Computer initially used for computational/
engineering purposes
 Commercial applications introduced File Processing
System
File Processing System
A collection of programs that perform services
for the end-users such as production of reports
File Processing Systems
Examination
Registration
Library
Applications
Examination
Applications
Registration
Applications
Library
Data
Files
Examination
Data
Files
Registration
Data
Files
Library
Program and Data Interdependence
File Processing Systems
Library
Exam
Registration
Reg_Number
Reg_Number
Reg_Number
Name
Name
Name
Father Name
Address
Father Name
Books Issued
Class
Phone
Fine
Semester
Address
Grade
Class
Duplication of Data
Incompatible Formats
Vulnerable to Inconsistency
Database Approach
Advantages of Database Approach
Library
Examination
Registration
Library
Applications
Examination
Applications
Registration
Applications
Database
Management
System
- Data Sharing
- Controlled Redundancy
University
Students
Database
- Data Independence
- Better Data Integrity
Data & Information
Company: Super Soft
Dept: Sales
Emp Name Age Salary
Matt Damon
23
55
Bruce Willis
24
55
Katie Holmes
20
40
Robert Langdon
19
20
 Schema
Database Applications
Database Management System
(DBMS)
Other Advantages
Data consistency
Better data security
Faster development of new
applications
They also provide
Economy of scale
Better concurrency control
Better backup and recovery procedures
Disadvantages
Higher costs
Conversion cost
More difficult recovery
Typical Components
Software
interact
Application Programs
develop
End users
Application
Programmers
“What” to get
DBMS
Data
maintain
Database
Administrators
“How” to get
Database
design
Database
Designers
Users
Levels of Data
Real World Data
Entity, Attribute
e.g. A student,
A class name
Metadata
Data Occurrences
Student record, Data
item occurrence e.g.
‘s001’, ’Amir’, ‘CS101’
Record type,
Data item type
e.g. Student
record type
Employee
name, age,
qual, sal
Emp
Name text
Age
number
Sal
number
John Durso
23
55
Lisa Smith
24
55
Cindy Bates
20
40
Braden Sams
19
20
Levels of Data
Real-world data
Metadata
Data Occurrence
Database Users
Application Programmers
End Users
– Naïve
– Sophisticated
Roles in the Database
Environment
• Data Administrator (DA)
• Database Administrator (DBA)
• Database Designers (Logical and
Physical)
• Application Programmers
• End Users (naive and sophisticated)
Functions of DBA
Schema definition
Granting data access
Routine Maintenance
 Backups
 Monitoring disk space
 Monitoring jobs running
Database Properties
• A database must
– Represent some aspects of real world
– Collection of data must be logically coherent
and meaningful
– Database is designed, build, and populated with
data for specific purpose
34
Database Construction
• Database can be constructed
– Manually
• E.g., a library card catalog
– Computerized system
• A Specific set of applications
• Database Mgt systems
35
Database Mgt system (DBMS)
• Database management system (DBMS)?
– A collection of programs that enables users
•
•
•
•
To define (specifying data type, etc)
To construct (storing)
To manipulate (reading, writing)
To share (simultaneous access)
• To protect (security & privacy protection)
• To modify (changing requirements)
• Database system= DBMS + Database
36
37
Example: University Database
System
• A typical university database system
maintaining information regarding
– Students
– Courses
– Grades
38
Example: A University Database
39
Database Engineering
• Design of any DB application starts
– Requirements definition and analysis
– Conceptual Design
• Performed using Entity relationship Diagram
– Logical Design
• Performed using Orcale / Mysquel, etc
– Physical Design (storing/accessing/indexing)
40
Main Characteristics of DB
• Self-Describing nature of a database system
( catalog =Data + metadata)
• Insulation between programs and data (
program-data-independence)
• Support of multiple views of the data
(virtual data)
• Sharing of data and multi-user transaction
processing (concurrency control)
– Transaction and atomicity
• ACID
41
People involved in Database
• Involved two types of people
– people on the Scene
– people behind the Scene
42
People on the Scene
– Database administrators (DBA)
• Authorization, coordination, supervision of DB
– Database designers
• Defining, building, maintaining, etc
– End Users
• Casual, naive, sophisticated
– Software Engineer
• system analysts and application programmers
43
People behind the Scene
• DBMS system designers and implementers
• Tool developers
• Operators and maintenance personnel
44
File System vs. Database Mgt
Systems
• Suppose an organization needs to manage a
large collection of data, say, 500 GB (i.e.,
500X1024 MB)
• Data is supposed to be accessed
concurrently by employees
• Changes made to the data must be applied
consistently
• Access to the data must be restricted
45
File systems vs. A DBMS (cont’)
• Using file systems, the data is stored in OS files
– Not enough MM
– Difficulty to directly access data (with 32-bit ,
we can access 4GB)
– Need to write special program to answer each
question
– Duplicated efforts
– inconsistency
– Concurrent accesses
– Security policies
–…
46
Primary Advantages of Using a
DBMS: 1
• Controlling Redundancy
– Duplicate space and efforts
– inconsistency
• Restricting Unauthorized Access
– Security and authorization subsystem
• Providing persistent storage for program objects and data
structure
– Impedance mismatch problem (incompatibility between
PL and DBMS)
• Deriving new information from existing ones (view)
47
Primary Advantages of Using a
DBMS: 2
•
•
•
•
•
•
Providing Multiple User Interfaces
Representing Complex Relationships among Data
Enforcing Integrity Constraints
Providing Backup and crash Recovery
Scheduling concurrent accesses to the data
Reducing application development time
48
Implication of the Database
Approach
• Standards can be enforced (data and display
format)
• Application development time can be
reduced
• Flexibility and maintainability
• Availability of up-to date information
• Economics of Scale (i.e., reducing overall
costs of operations, sources and mgt)
49
When Not to Use A DBMS
• When there is unnecessary overhead
resulted from:
– High initial up-front cost
– Generality that a DBMS provides for defining
and processing data
– Overhead for providing
•
•
•
•
security
concurrency
recovery
integrity
50
When to Use File System
• A File system is desirable under these
conditions:
– The database and application are simple, welldefined, and unchangeable
– Single-user access to the data is required
– performance
51
So, what is a Database?
• Collection of data central to some enterprise
• Essential to operation of enterprise
– Contains the only record of enterprise activity
• An asset in its own right
– Historical data can guide enterprise strategy
– Of interest to other enterprises
• State of database mirrors state of enterprise
– Database is persistent
52
What is a Database Management
System?
• A Database Management System (DBMS)
is a program that manages a database:
– Supports a high-level access language (e.g.
SQL).
– Application describes database accesses using
that language.
– DBMS interprets statements of language to
perform requested database access.
53
What is a Transaction?
• When an event in the real world changes the
state of the enterprise, a transaction is
executed to cause the corresponding change
in the database state
– With an on-line database, the event causes the
transaction to be executed in real time
• A transaction is an application program
with special properties - discussed later - to
guarantee it maintains database correctness
54
What is a Transaction Processing
System?
• Transaction execution is controlled by a TP
monitor
– Creates the abstraction of a transaction,
analogous to the way an operating system
creates the abstraction of a process
– TP monitor and DBMS together guarantee the
special properties of transactions
• A Transaction Processing System consists
of TP monitor, databases, and transactions
55
transactions
Transaction Processing System
DBMS
database
DBMS
database
Transaction Processing Monitor
56
System Requirements
• High Availability: on-line => must be
operational while enterprise is functioning
• High Reliability: correctly tracks state,
does not lose data, controlled concurrency
• High Throughput: many users => many
transactions/sec
• Low Response Time: on-line => users are
waiting
57
System Requirements (con’t)
• Long Lifetime: complex systems are not
easily replaced
– Must be designed so they can be easily
extended as the needs of the enterprise change
• Security: sensitive information must be
carefully protected since system is
accessible to many users
– Authentication, authorization, encryption
58
Roles in Design, Implementation,
and Maintenance of a TPS
• System Analyst - specifies system using input from
customer; provides complete description of
functionality from customer’s and user’s point of
view
• Data Scientists: use all available data sources
(internal and external) to analyze and gain insights to
help decision makers.
• Database Designer - specifies structure of data that
will be stored in database.
59
Roles in Design, Implementation
and Maintenance of a TPS (con’t)
• Application Programmer - implements
application programs (transactions) that access data
and support enterprise rules
• Database Administrator - maintains database
once system is operational: space allocation,
performance optimization, database security
• System Administrator - maintains transaction
processing system: monitors interconnection of
HW and SW modules, deals with failures and
congestion
60
OLTP vs. OLAP
• On-line Transaction Processing (OLTP)
– Day-to-day handling of transactions that result
from enterprise operation
– Maintains correspondence between database
state and enterprise state
• On-line Analytic Processing (OLAP)
– Analysis of information in a database for the
purpose of making management decisions
61
OLAP
• Analyzes historical data (terabytes) using
complex queries
• Due to volume of data and complexity of
queries, OLAP often uses a data warehouse
• Data Warehouse - (offline) repository of
historical data generated from OLTP or
other sources
• Data Mining - use of warehouse data to
discover relationships that might influence
enterprise strategy
62
Examples - Supermarket
• OLTP
– Event is 3 cans of soup and 1 box of crackers
bought; update database to reflect that event
• OLAP
– Last winter in all stores in northeast, how many
customers bought soup and crackers together?
• Data Mining
– Are there any interesting combinations of foods
that customers frequently bought together?
63
Big Data Era
• Every 2 days we create as much information as
we did from the dawn of civilization up to 2003.
(2010, Eric Shmidt, Google)
• Science: Astronomy, Physics, Bioinformatics,
Neuroinfomatics, earth science, etc.
• Business: Automobile, Healthcare, Financial,
infotaiment
• The automotive industry is projected to be the
2nd largest generator of data by 2015
64
Big Data Opportunities to Auto
• Google self-driving car (generates 1GB/s).
• Recommendation/alert to customers.
• Manufactures can better understand
customers and market trends.
• Better driving behaviors, better cars, less
accidents, and the bottom line is happier
customers.
65
Big Data Opportunities to Auto
• Google self-driving car (generates 1GB/s).
• Recommendation/alert to customers.
• Manufactures can better understand
customers and market trends.
• Better driving behaviors, better cars, less
accidents, and the bottom line is happier
customers.
66
Why is Database Important in the Future?
“In the future, … any given discipline
advances is likely to depend on…
database, workflow management,
visualization, and cloud computing
technologies. “
G. Bell, T. Hey, A. Szalay, “Beyond the
Data Deluge,” Science, Vol. 323, no.
5919, pp. 1297-1298, 2009.
Turing Awardees in DB
Charles Bachman
(1973)
Edgar F. Codd
(1981)
Jim Gray
(1998)
68
Charles Bachman
Developer of IDS: the first database system
Edgar. F. Codd
Inventor of the Relational Model
70
Jim Gray
Founder of Transaction Processing
71
Questions ?
Download