Relational Databases and SQL

advertisement
Relational Databases
COMP 416
Fall 2010
Lecture 21
What’s a database?
• A collection of data
• Examples of collections of data?
–
–
–
–
Library
Web
Stacks of papers on your desk
Set of baseball cards.
• Are all of these things databases?
Web vs. Library
• What’s the big difference?
– Organization
– In what ways is the library organized?
• Databases have organization.
Library vs Baseball Cards
• What’s the primary object in these two
collections?
– Libraries: books
– Baseball card collections: baseball cards
• How alike are books?
– Somewhat, but large variatations
• How alike are baseball cards?
– Very alike.
• Things in databases are highly structured.
One library vs another
• What’s the difference between the Brauer
Library and the House Library?
• Different purposes leads to different
priorities (in organization and content).
• Databases are built for a purpose.
– The more specific the purpose, the more
specific its structure and organization.
So, what’s a database
• A database is an collection of structured
information organized for a specific
purpose.
Relational Databases
• Relational databases are the most prevalent
type of database used.
• Information is organized into related tables.
• Each table captures information about a
different entity.
– Columns are different fields of information
(attributes of the entity).
– Each row represents one instance (a specific
example of the entity).
Design Goals
• What kinds of information do we want to
keep track of?
• What do we want to do with that information?
Entities
• First step in database design is to identify
entities.
– Think of entities as “things” that you want to
know information about.
– What do we care about for our bookstore?
• Books (duh?)
Attributes
• Next step is to identify attributes of those
entities.
• An attribute is labeled piece of information
(i.e., a name/value pair)
• In general, we expect every instance of a
particular entity to have specific values for a
set of common attributes.
Book Entity
Book
Author(s)
Title
Publisher
Genre
Price
Normalization
• Not all database designs are equal.
• Experience and research has shown that
certain structures and relationships are
easier to maintain and process than others.
• Normalization: a process through which a
database design is “cleaned up”
• Well-defined set of “normal forms” which
are the incremental result of this process.
1NF
• First Normal Form
– All attributes are single-valued.
– All instances have a unique identifier.
Book Entity Revisited
• Is our book entity in 1NF?
Book
Author(s)
Title
Publisher
Genre
Price
Bookstore Entities (1NF)
• Multi-valued attributes generally indicate
the need for a new entity.
Book
Author
Title
First
Publisher
Last
Genre
Birthday
Price
Unique Identifiers
• What in our book and author entities can act
as a unique identifier?
• Often (almost always) the best way to create
a unique identifier is to create an artificial
one.
– Book ID, Author ID.
– Assigned by the database itself.
– No inherent semantics.
Book Entities (1NF) v2
Book
Author
ID
Title
ID
Publisher
Last
Genre
Birthday
Price
First
Modeling Relationships
• Two relationship types.
– One-to-Many
– Many-to-Many
• For now, we’ll just model this pictorially
like this:
Book
Author
ID
Title
ID
Publisher
Last
Genre
Birthday
Price
First
2NF
• Second Normal Form
– Already in 1NF
– Non-identifying attributes are dependent on the
entity’s unique identifier.
• Rule of thumb: if the same value appears multiple
times for a particular attribute, think hard if what
you really need is another entity.
Bookstore Entities
• What might we pull out into an entity?
Book
Author
ID
Title
ID
Publisher
Last
Genre
Birthday
Price
First
Bookstore Entities (2NF)
Publisher
ID
Name
Address
Book
ID
Title
Price
State
Author
ID
First
Last
Birthday
State Abbrev.
Genre
ID
Genre Name
3NF
• Third Normal Form
– In 2NF
– No attributes dependent on each other.
• What part of our data model violates this?
• To fix, generally want to pull the dependent
attributes out into their own entity.
Bookstore Entities (3NF)
Publisher
ID
Name
Address
Book
ID
Title
Price
Author
ID
First
Last
Birthday
State
ID
Long Name
Abbrev.
Genre
ID
Genre Name
Logical vs Physical Design
• Result so far is “logical” database design.
• Still need to implement this design as a specific
database.
• Relational databases:
–
–
–
–
–
Each entity associated with a table.
Attributes are columns of the table.
Each attribute is given a data type.
Unique identifiers are “primary keys”
Relationships are embodied as “foreign keys”
• An attribute whose value is the unique identifier in another
table.
Implementing 1-to-many
• To implement a 1-to-many relationship, add
an attribute on the “many” side which is the
unique identifier of the “one” side.
Implementing 1-to-many
Publisher
ID
Name
Address
StateID
State
ID
Long Name
Abbrev.
Book
ID
Title
Price
PubID
GenreID
Genre
ID
Genre Name
Author
ID
First
Last
Birthday
Resolving M-to-M
• Many-to-many relationships are hard to
implement in a database.
• Why is this?
– Foreign key attribute which is supposed to
implement the relationship requires multiple
values.
– This breaks 1NF structure.
• How might we fix it?
Junction Entities
• A junction entity is an abstract entity
provides a level of indirection for a manyto-many relationship.
Adding BookAuthor Junction
Publisher
ID
Name
Address
StateID
State
ID
Long Name
Abbrev.
Book
ID
Title
Price
PubID
GenreID
Genre
BookAuthor
ID
BookID
AuthorID
Author
ID
ID
Genre Name
First
Last
Birthday
SQL
• Structured Query Language (SQL)
– The language in which we express actions to be
performed on a relational database.
– Standardized to allow portability across
different products.
• SQL92 (aka SQL2) is the latest standard.
– Product specific differences and extensions still
exist, but much better than before.
MySQL
• MySQL
–
–
–
–
Open-source
Great for small to mid-sized organizations.
Fast, efficient, cheap
Doesn’t support full SQL but a good portion of it.
Web App Model
Browser
HTTP Requests
DB
On Disk
Web Server
HTTP Responses
JavaScript
Programmable,
dynamic
interface to the
document
PHP
Programmable,
dynamic,
document
construction
SQL
Database
Structured,
table-based,
information
storage
Download