Data Modeling using XML Schemas

advertisement
CS3431 –
Database Systems I
Introduction
Instructor: Mohamed Eltabakh
meltabakh@cs.wpi.edu
1
Today’s Lecture

Overview on Database Management
Systems

Course Logistics
2
What is a Database System?


Software platform for managing large
amounts of data
Managing means:


Storing, querying, indexing,
and structuring the data
Different names refer to the same thing:



Database systems
Database management systems
DBMS
3
What is a Database System?
(Cont’d)

What’s inside a DBMS




Collection of interrelated data (E.g., for a given application)
Set of programs to secure and access the data
An environment that is both convenient and efficient to use
Usually data is too large to fit in computer memory at once

Data stored on disk

Usually many users want to access this data and do so fast

Databases touch all aspects of our lives. We use it without
knowing !!!
4
Database Applications
Have you ever used a database application?
 E-commerce: books, equipment etc. at Amazon

Banks -- your valuable $$ and ATM transactions

Airlines – manage flights to get you places

Universities – manage student enrollment

GIS (Maps) – find restaurants closest to WPI

Bio-informatics (genome data)
Data is everywhere. To efficiently
manage it, we need DBMS
?
5
Why use DBMS, and not files?
Several drawbacks of using file systems

Data redundancy and inconsistency




Difficulty in accessing data


Multiple file formats, duplication of information in different files
Multiple records formats within the same file
No order enforced between fields
Need to write a new program
out each new task
to carry
Integrity problems
….
Account balance >= 0
Student cannot take same course twice
6
Why use DBMS, and not files?
(Cont’d)

Concurrent access by multiple users


Security problems


Hard to provide user access to some,
but not all, data
Recovery from crashes


Many users need to access/update the data
at the same time (concurrent access)
While updating the data the system crashes
Maintenance problems


Hard to search for or update a field
Hard to add new fields
7
DBMS Provides Solutions

Data consistency even with multiple users

Efficient access to the data

Data integrity embedded in the DBMS

Recovery from crashes, security
8
Basic Terminology

Data Model


Data Schema


Collection of actual data that conforms to given schema
Database Management System (DBMS)


Describes structures for a particular application, using the given model
Database


Tools used for describing the data
Software platform that allows us to create, stores, use, and maintain a
database
SQL & Data Manipulation Language (DML)

Language to manipulate, e.g., update or query, the data
9
Data Model

A collection of tools for describing





Data objects
Data relationships
Data semantics
Data constraints
We will learn these
two models
Several data models:





Relational model
Entity-Relationship (ER) data model
Object-based data models (Object-oriented)
Semi-structured data model (XML)
Other older models:


Network model
Hierarchical model
10
Example: ER Model

Graphical model for describing
entities, attributes, and relationships
11
Data Schema


Captures the relationships between objects (“entities”) in an application
Schemas can be represented graphically or textual
12
Query Language (SQL)

Language for accessing and manipulating the
data organized by the appropriate data model

SQL: Structured Query Language
SELECT ID, Name
FROM Student
WHERE address=“320FL”;
13
Query Language

Two classes of languages

Procedural – user specifies
what data is required and
how to get those data


Declarative (non-procedural) – user specifies what data is
required without specifying how to get those data
DBMSs use SQL 
SELECT ID, Name
FROM Student
WHERE address=“320FL”;
14
A Big Picture of What
You will Learn
15
You will Learn


Data Model

Relational Model

Entity-Relationship (ER) Model
Data Schema


Database


Build an actual database and manipulate data
Database Management System (DBMS)


How to put pieces together to build a schema describing the application
We will use Oracle
Query Language

SQL Language
16
Relational Data Model: Overview

The most widely used model today

It is a tabular representation of the data

Main concepts:


Relations (Tables), basically a table with rows and columns.
Every relation has a schema, which describes the columns, or fields.
Field or
attribute
Example Database : Relational
Tabular View of Data in Airline System
Flight
Passenger
Travel
Tabular
flightNo
start
destination
miles
101
BOS
LAX
3000
102
PVD
LAX
2900
pName
freqFlyerID
DoB
milesEarned
Mike
3433
1980
12000
Mary
5872
1981
11000
flightNo
freqFlyerID
date
101
3433
Jan 4
102
5872
Jan 5
view of data is called “Relational Model”
18
Entity-Relationship Model:
Overview

Models the application as a collection of entities and
relationships

Represented using Entity-Relationship Diagram (ERD)
19
SQL: Overview


SQL: Non-procedural language to access the data inside
a database
External programs, e.g., in C or Java, typically
access the database using:

Language extensions to allow embedded SQL

ODBC: Open Database Connectivity

JDBC: Java Database Connectivity
20
Logical vs. Physical
How this information
is stored???
21
Levels of AbstractionView of Data
An architecture for a database system
• View Level --describes how
users see the data
• Logical Level – describes
the logical structures used
• Relational Model
• ERD model
• Physical Level -- describes
files and indexes
th
Database System Concepts - 5 Edition, May 23, 2005
Usually hidden from
users
1.7
©Silberschatz, Korth and Su
22
Levels of Abstraction:
Airline Application Example

Logical (Conceptual) Level


Physical Level



Flight, Passenger, Travel tables
Flight table stored as a sorted file on the flight number
Index on flightNo attribute for Flight relation
View Level (External Schema)


NoOfPassengers (flightNo, date, numPassengers)
Hide employees salary
These levels of abstraction lead to
“Data Independence”
23
Data Independence

DBMS has the three levels of abstractions

Ability to modify one level without affecting the other
levels

Physical data independence:



Physical schema such as indexes can change, but logical
schema need not change
Protection from changes in physical structure of data
Logical data independence:


Logical schema can change, but views need not change
Protection from changes in logical structure of data
Other Advanced Topics

Efficient access

Query optimization

Concurrency control

Recovery control

Big Data Analytics
>> We will not have time to study these subjects during the course
>> It is important to know their existence and what is meant by each
component
25
Efficient Access

Indexing

Indexes gives direct access to “necessary” portion
of data, as opposed to sequential access in files
Directly find this customer
without scanning all customers
26
Query Optimization

Costing:


Estimate expected execution times
Query optimization :



SELECT ID, Name
FROM Student
WHERE address=“320FL”;
Generates many alternatives to answer a query
Estimates the cost of each alternative
Automatically determine and prepare optimal (or near
optimal) access plans for getting the data
Optimizer = “The Bread and Butter of a DBMS !”
27
Concurrency Control

DBMS ensures data is consistent under
concurrent access


E.g.: multiple airline staff trying to reserve a seat
for different customers
Concepts:


Transactions – grouping multiple instructions
(reads/writes) into one atomic unit
Locks – locking of resources (tables)
28
Recovery Control

If system crashes in middle of transaction,
recovery must be provided :


Cannot afford to loose data or leave it
inconsistent
Concepts:


Logging of transactions’ actions
Ability to redo or undo transactions
29
Big Data Analytics
Large-Scale Data Management
Big Data Analytics
Data Science and Analytics
• How to manage very large amounts of data and extract value
and knowledge from them
30
Data Explosion
2 Billion
Internet
users by 2011
1.3 Billion RFID tags in 2005
30 Billion RFID
tags by 2010
4.6 Billon
Mobile Phones
World Wide
Capital market
data volumes grew
1,750%, 2003-06
World Data Centre for Climate
§ 220 Terabytes of Web data
§ 9 Petabytes of additional data
Twitter process
7 terabytes of
data every day
Facebook process
10 terabytes of
data every day
31
Who uses databases?

End users

DB application programmers

Database Administrators




Database design
Security, Authorization
Data availability, crash recovery
Database tuning (for performance)
32
Summary : Why study DBMS?

Need to process large amounts of data efficiently


Video, WWW, computer games, geographic information
systems (GIS), genome data, digital libraries, etc.
Make use of all functionalities provided by DBMSs

DB administrators and programmers hold
rewarding jobs

DB research is one of the most exciting areas in
Computer Science !!
33
Download