1_Managing external data_1

advertisement

Managing external data

Part 1

Design of Databases

Gitte Christensen

Dyalog Ltd

Purpose

• To give you a crash course in data analysis and databases

• After part 1 Design of Databases you will be able to analyse and organise data based on a requirement spec or use case.

• After part 2 Database programming you will be able to use relational data in your APL applications

• After part 3 Database Implementation you will be able to choose between different storage methods based on structure and use of data and performance considerations

Agenda

• The Relational Model

– Entity/Relation model

– Convert E/R to table structure

– Relational Algebra

• Semistructured data

• Multidimensional data

Data Models

• A Database models some portion of the real world.

• Data Model is link between user’s view of the world and bits stored in computer.

• We will concentrate on the Relational

Model

Data Models

• A data model is a collection of concepts for describing data.

• A database schema is a description of a particular collection of data, using a given data model.

• The relational model of data is the most widely used model today.

– Main concept: relation , basically a table with rows and columns.

– Every relation has a schema , which describes the columns, or fields.

Levels of Abstraction

• Views describe how

• users see the data.

View 1 View 2 View 3

Conceptual schema defines logical structure

• Physical schema describes the files and

Users

Conceptual Schema

Physical Schema indexes used.

• (sometimes called the

ANSI/SPARC model )

DB

Data Independence

• A Simple Idea:

Applications should be insulated from how data is structured and stored.

• Logical data independence:

Protection from changes in logical structure of data.

• Physical data independence:

Protection from changes in physical structure of data.

View 1 View 2 View 3

Conceptual Schema

Physical Schema

DB

Entity-Relationship Model

8

Purpose of E/R Model

• The E/R model allows us to sketch database designs.

– Kinds of data and how they connect.

– Not how data changes.

• Designs are pictures called entityrelationship diagrams .

• Later: convert E/R designs to relational

DB designs.

9

Entity Sets

• Entity = “thing” or object.

• Entity set = collection of similar entities.

– Similar to a class in object-oriented languages.

• Attribute = property of (the entities of) an entity set.

– Attributes are simple values, e.g. integers or character strings.

10

E/R Diagrams

• In an entity-relationship diagram:

– Entity set = rectangle.

– Attribute = oval, with a line to the rectangle representing its entity set.

11

Example

name manf

Beers

• Entity set Beers has two attributes, name and manf (manufacturer).

• Each Beers entity has values for these two attributes, e.g. (Bud, Anheuser-Busch)

12

Relationships

• A relationship connects two or more entity sets.

• It is represented by a diamond, with lines to each of the entity sets involved.

13

Example

name addr name manf

Bars Sells license

Frequents

Note: license = beer, full, none name

Drinkers

Likes addr

Beers

Bars sell some beers.

Drinkers like some beers

.

Drinkers frequent some bars

.

14

Relationship Set

• The current “value” of an entity set is the set of entities that belong to it.

– Example: the set of all bars in our database.

• The “value” of a relationship is a set of lists of currently related entities, one from each of the related entity sets.

15

Example

• For the relationship Sells , we might have a relationship set like:

Bar

Joe’s Bar

Joe’s Bar

Sue’s Bar

Sue’s Bar

Sue’s Bar

Beer

Bud

Miller

Bud

Pete’s Ale

Bud Lite

16

Case Movie Database

• We want to create a movie database which will allow our users to find information about movies

• Each movie has a title, a production year, lenght in minutes, whether it is color or b/w and an owner, a studio

• We have adresses for the studios and the actors

EntityName

Relationship

Draw a model of the Movies database using these symbols

Multiway Relationships

• Sometimes, we need a relationship that connects more than two entity sets.

• Suppose that drinkers will only drink certain beers at certain bars.

– Our three binary relationships Likes , Sells , and Frequents do not allow us to make this distinction.

– But a 3-way relationship would.

19

Example

name addr license

Bars

Preferences name

Beers manf name

Drinkers addr

20

A Typical Relationship Set

Bar

Joe’s Bar

Sue’s Bar

Sue’s Bar

Joe’s Bar

Joe’s Bar

Joe’s Bar

Sue’s Bar

Drinker

Ann

Ann

Ann

Bob

Bob

Cal

Cal

Beer

Miller

Bud

Pete’s Ale

Bud

Miller

Miller

Bud Lite

21

Case Movie Database

• In each movie there are actors who are contracted by the studios

• Add this relationship to your model

Many-Many Relationships

• Focus: binary relationships, such as Sells between Bars and Beers .

• In a many-many relationship , an entity of either set can be connected to many entities of the other set.

– E.g., a bar sells many beers; a beer is sold by many bars.

23

In Pictures:

many-many

24

Many-One Relationships

• Some binary relationships are many one from one entity set to another.

• Each entity of the first set is connected to at most one entity of the second set.

• But an entity of the second set can be connected to zero, one, or many entities of the first set.

25

In Pictures:

many-one

26

Example

• Favorite , from Drinkers to Beers is manyone.

• A drinker has at most one favorite beer.

• But a beer can be the favorite of any number of drinkers, including zero.

27

One-One Relationships

• In a one-one relationship , each entity of either entity set is related to at most one entity of the other set.

• Example: Relationship Best-seller between entity sets Manfs (manufacturer) and Beers .

– A beer cannot be made by more than one manufacturer, and no manufacturer can have more than one best-seller (assume no ties).

28

In Pictures:

one-one

29

Representing “Multiplicity”

• Show a many-one relationship by an arrow entering the “one” side.

• Show a one-one relationship by arrows entering both entity sets.

• Rounded arrow = “exactly one,” i.e., each entity of the first set is related to exactly one entity of the target set.

30

Example

Drinkers Likes Beers

Favorite

31

Example

• Consider Best-seller between Manfs and

Beers .

• Some beers are not the best-seller of any manufacturer, so a rounded arrow to

Manfs would be inappropriate.

• But a beer manufacturer has to have a best-seller.

32

In the E/R Diagram

Manfs

Bestseller

Beers

33

Case Movie Database

• Add arrows to your diagram so it reflects the kind of relations between the entities

Attributes on Relationships

• Sometimes it is useful to attach an attribute to a relationship.

• Think of this attribute as a property of tuples in the relationship set.

35

Example

Bars Sells Beers price

Price is a function of both the bar and the beer, not of one alone.

36

Equivalent Diagrams Without

Attributes on Relationships

• Create an entity set representing values of the attribute.

• Make that entity set participate in the relationship.

37

Example

Bars Sells

Prices price

Beers

Note convention: arrow from multiway relationship

= “all other entity sets together determine a unique one of these.”

38

Roles

• Sometimes an entity set appears more than once in a relationship.

• Label the edges between the relationship and the entity set with names called roles .

39

Example

husband

Married wife

Relationship Set

Husband

Bob

Joe

Wife

Ann

Sue

Drinkers

40

Example

1

Buddies

2

Relationship Set

Buddy1 Buddy2

Bob Ann

Joe Sue

Ann

Joe

Bob

Moe

Drinkers

41

Case Movie Database

• The actors can be contracted either by the studio producing the movie or by another studio who rents the actor to the producing studio

• We would like to record what the actor is paid for appearing in a movie

• Update your model to reflect the new facts

Subclasses

• Subclass = special case = fewer entities = more properties.

• Example: Ales are a kind of beer.

– Not every beer is an ale, but some are.

– Let us suppose that in addition to all the properties (attributes and relationships) of beers, ales also have the attribute color .

43

Subclasses in E/R Diagrams

• Assume subclasses form a tree.

– I.e., no multiple inheritance.

• Isa triangles indicate the subclass relationship.

– Point to the superclass.

44

Example

name

Beers color isa

Ales manf

45

Case Movie Database

• For some movies like cartoons we have a different kind of actor, voices.

• Design a subclass to reflect this fact

ISA

E/R Vs. Object-Oriented Subclasses

• In OO, objects are in one class only.

– Subclasses inherit from superclasses.

• In contrast, E/R entities have representatives in all subclasses to which they belong.

– Rule : if entity e is represented in a subclass, then e is represented in the superclass.

47

Example

name

Beers color isa

Ales manf

Pete’s Ale

48

Keys

• A key is a set of attributes for one entity set such that no two entities in this set agree on all the attributes of the key.

– It is allowed for two entities to agree on some, but not all, of the key attributes.

• We must designate a key for every entity set.

49

Keys in E/R Diagrams

• Underline the key attribute(s).

• In an Isa hierarchy, only the root entity set has a key, and it must serve as the key for all entities in the hierarchy.

50

Example:

name

is Key for Beers

name

Beers color isa

Ales manf

51

Example: a Multi-attribute Key

dept number hours room

Courses

• Note that hours and room could also serve as a key, but we must select only one key

.

52

Case Movie Database

• Add keys to your diagram

Weak Entity Sets

• Occasionally, entities of an entity set need

“help” to identify them uniquely.

• Entity set E is said to be weak if in order to identify entities of E uniquely, we need to follow one or more many-one relationships from E and include the key of the related entities from the connected entity sets.

54

Example

• name is almost a key for football players, but there might be two with the same name.

• number is certainly not a key, since players on two teams could have the same number.

• But number , together with the team name related to the player by Plays-on should be unique.

55

In E/R Diagrams

name number

Players

Playson name

Teams

• Double diamond for supporting many-one relationship.

• Double rectangle for the weak entity set.

56

Weak Entity-Set Rules

• A weak entity set has one or more many-one relationships to other (supporting) entity sets.

– Not every many-one relationship from a weak entity set need be supporting.

• The key for a weak entity set is its own underlined attributes and the keys for the supporting entity sets.

– E.g., (player) number and (team) name is a key for

Players in the previous example.

57

Case Movie Database

• We would like to record which camera crews shot a particular movie

• Camera crews are numbered within each studio

• Add these facts to your diagram

Design Techniques

1.

Avoid redundancy.

2.

Limit the use of weak entity sets.

3.

Don’t use an entity set when an attribute will do.

59

Avoiding Redundancy

• Redundancy occurs when we say the same thing in two or more different ways.

• Redundancy wastes space and (more importantly) encourages inconsistency.

– The two instances of the same fact may become inconsistent if we change one and forget to change the other.

60

Example: Good

name name

Beers ManfBy Manfs addr

This design gives the address of each manufacturer exactly once.

61

Example: Bad

name name

Beers ManfBy Manfs addr manf

This design states the manufacturer of a beer twice: as an attribute and as a related entity.

62

Example: Bad

name manf manfAddr

Beers

This design repeats the manufacturer’s address once for each beer and loses the address if there are temporarily no beers for a manufacturer.

63

Entity Sets Versus Attributes

• An entity set should satisfy at least one of the following conditions:

– It is more than the name of something; it has at least one nonkey attribute.

or

– It is the “many” in a many-one or manymany relationship.

64

Example: Good

name name

Beers ManfBy Manfs addr

•Manfs deserves to be an entity set because of the nonkey attribute addr .

•Beers deserves to be an entity set because it is the “many” of the many-one relationship ManfBy .

65

Example: Good

name manf

Beers

There is no need to make the manufacturer an entity set, because we record nothing about manufacturers besides their name.

66

Example: Bad

name name

Beers ManfBy Manfs

Since the manufacturer is nothing but a name, and is not at the “many” end of any relationship, it should not be an entity set.

67

Don’t Overuse Weak Entity Sets

• Beginning database designers often doubt that anything could be a key by itself.

– They make all entity sets weak, supported by all other entity sets to which they are linked.

• In reality, we usually create unique ID’s for entity sets.

– Examples include social-security numbers, automobile VIN’s etc.

68

When Do We Need Weak Entity Sets?

• The usual reason is that there is no global authority capable of creating unique ID’s.

• Example: it is unlikely that there could be an agreement to assign unique player numbers across all football teams in the world.

69

Break

How to translate ER Model to Relational Model

Concepts

Relational Model is made up of tables

• A row of table = a relational instance/tuple

• A column of table = an attribute

• A table = a schema/relation

• Cardinality = number of rows

• Degree = number of columns

Example

tuple/relational instance

Attribute

SID

1234

5678

Name Major

John CS

Mary EE

GPA

2.8

3.6

4 Degree

A Schema / Relation

2

From ER Model to Relational Model

So… how do we convert an ER diagram into a table?? Simple!!

Basic Ideas:

• Build a table for each entity set

• Build a table for each relationship set if necessary (more on this later)

• Make a column in the table for each attribute in the entity set

• Indivisibility Rule and Ordering Rule

• Primary Key

Example – Strong Entity Set

SID Name SSN Name

Advisor

Student Professor

Dept

Major

GPA

SID Name Major GPA

1234 John CS 2.8

5678 Mary EE 3.6

SSN

9999

8888

Name

Smith

Lee

Dept

Math

CS

Representation of Weak Entity Set

• Weak Entity Set Cannot exists alone

• To build a table/schema for weak entity set

– Construct a table with one column for each attribute in the weak entity set

– Remember to include discriminator

– Augment one extra column on the right side of the table, put in there the primary key of the Strong Entity

Set (the entity set that the weak entity set is depending on)

– Primary Key of the weak entity set = Discriminator + foreign key

Example – Weak Entity Set

Age

SID Name Name

Student owns Children

Major

GPA

Primary key of Children is

Parent_SID + Name

Age

10

8

Name

Bart

Lisa

Parent_SID

1234

5678

Representation of Relationship Set

--This is a little more complicated--

• Unary/Binary Relationship set

– Depends on the cardinality and participation of the relationship

– Two possible approaches

• N-ary (multiple) Relationship set

– Primary Key Issue

• Identifying Relationship

– No relational model representation necessary

Representing Relationship Set

Unary/Binary Relationship

• For one-to-one relationship w/out total participation

– Build a table with two columns, one column for each participating entity set’s primary key. Add successive columns, one for each descriptive attributes of the relationship set (if any).

• For one-to-one relationship with one entity set having total participation

– Augment one extra column on the right side of the table of the entity set with total participation, put in there the primary key of the entity set without complete participation as per to the relationship.

Example – One-to-One Relationship Set

Degree

SID Name

Student

Major

GPA study Major

ID Code

SID

9999

8888

Primary key can be either

SID or Maj_ ID_Co

Maj_ID Co S_Degree

07 1234

05 5678

Example – One-to-One Relationship Set

SID Name

Student

1:1

Relationship

Condition

Have Laptop

S/N #

Major

GPA

Brand

SID

9999

8888

Name

Bart

Lisa

Major

Economy

Physics

GPA

-4.0

4.0

LP_S/N

123-456

567-890

* Primary key can be either SID or LP_S/N

Hav_Cond

Own

Loan

Representing Relationship Set

Unary/Binary Relationship

• For one-to-many relationship w/out total participation

– Same thing as one-to-one

• For one-to-many/many-to-one relationship with one entity set having total participation on “many” side

– Augment one extra column on the right side of the table of the entity set on the “many” side, put in there the primary key of the entity set on the “one” side as per to the relationship.

Example – Many-to-One Relationship Set

SID Name

Student

N:1

Relationship

Semester

Advisor

Major

GPA

Dept

SSN

Professor

Name

SID

9999

8888

Name

Bart

Lisa

Major

Economy

Physics

* Primary key of this table is SID

GPA

-4.0

4.0

Pro_SSN Ad_Sem

123-456 Fall 2006

567-890 Fall 2005

Representing Relationship Set

Unary/Binary Relationship

• For many-to-many relationship

– Same thing as one-to-one relationship without total participation.

– Primary key of this new schema is the union of the foreign keys of both entity sets.

– No augmentation approach possible…

Representing Relationship Set

N-ary Relationship

• Intuitively Simple

– Build a new table with as many columns as there are attributes for the union of the primary keys of all participating entity sets.

– Augment additional columns for descriptive attributes of the relationship set (if necessary)

– The primary key of this table is the union of all primary keys of entity sets that are on “many” side

– That is it, we are done.

Example – N-ary Relationship Set

P-Key1

D-Attribute

E-Set 1

P-Key2

A relationship

A-Key

Another Set

E-Set 2

P-Key3

E-Set 3

P-Key1

9999

1234

P-Key2

8888

5678

P-Key3

7777

9012

A-Key

6666

3456

* Primary key of this table is P-Key1 + P-Key2 + P-Key3

D-Attribute

Yes

No

Representing Relationship Set

Identifying Relationship

• This is what you have to know

– You DON’T have to build a table/schema for the identifying relationship set once you have built a table/schema for the corresponding weak entity set

– Reason:

• A special case of one-to-many with total participation

• Reduce Redundancy

Representing Composite Attribute

• Relational Model Indivisibility Rule Applies

• One column for each component attribute

• NO column for the composite attribute itself

SSN Name

SSN Name Street City

9999 Dr. Smith 50 1 st St.

Fake City

8888 Dr. Lee 1 B St.

San Jose

Professor

Address

Street City

Representing Multivalue Attribute

• For each multivalue attribute in an entity set/relationship set

– Build a new relation schema with two columns

– One column for the primary keys of the entity set/relationship set that has the multivalue attribute

– Another column for the multivalue attributes. Each cell of this column holds only one value. So each value is represented as an unique tuple

– Primary key for this schema is the union of all attributes

Example – Multivalue attribute

SID Name

Student

Children

The primary key for this table is Student_SID +

Children, the union of all attributes

Major

GPA

SID Name Major GPA

1234 John CS 2.8

5678 Homer EE 3.6

Stud_SID Children

1234 Johnson

1234

5678

5678

5678

Mary

Bart

Lisa

Maggie

Representing Class Hierarchy

• Two general approaches depending on disjointness and completeness

– For non-disjoint and/or non-complete class hierarchy:

• create a table for each super class entity set according to normal entity set translation method.

• Create a table for each subclass entity set with a column for each of the attributes of that entity set plus one for each attributes of the primary key of the super class entity set

• This primary key from super class entity set is also used as the primary key for this new table

Example

SSN Name

Person

Gender

SID Status

Student

ISA

Major

GPA

SSN

1234

5678

SSN SID Status Major GPA

1234 9999 Full

5678 8888 Part

CS

EE

2.8

3.6

Name

Homer

Gender

Male

Marge Female

Case Movie Database

• Convert your E/R diagram to relational tables

Relational Algebra

Relational Algebra

Relational Algebra is :

• the formal description of how a relational database operates

• the mathematics which underpin SQL operations.

Operators in relational algebra are not necessarily the same as SQL operators, even if they have the same name.

Terminology

• Relation - a set of tuples.

• Tuple - a collection of attributes which describe some real world entity.

• Attribute - a real world role played by a named domain.

• Domain - a set of atomic values.

• Set - a mathematical definition for a collection of objects which contains no duplicates.

Operators - Write

• INSERT - provides a list of attribute values for a new tuple in a relation. This operator is the same as SQL.

• DELETE - provides a condition on the attributes of a relation to determine which tuple(s) to remove from the relation. This operator is the same as SQL.

• MODIFY - changes the values of one or more attributes in one or more tuples of a relation, as identified by a condition operating on the attributes of the relation. This is equivalent to SQL UPDATE.

Operators - Retrieval

There are two groups of operations:

• Mathematical set theory based relations:

UNION, INTERSECTION, DIFFERENCE, and

CARTESIAN PRODUCT.

• Special database operations:

SELECT (not the same as SQL SELECT),

PROJECT, and JOIN.

Relational SELECT

SELECT is used to obtain a subset of the tuples of a relation that satisfy a select condition .

For example, find all employees born after

1st Jan 1950:

SELECT dob > ’ 01/JAN/1950 ’

(employee)

Relational PROJECT

The PROJECT operation is used to select a subset of the attributes of a relation by specifying the names of the required attributes.

For example, to get a list of all employees surnames and employee numbers:

PROJECT surname,empno

(employee)

SELECT and PROJECT

SELECT and PROJECT can be combined together.

For example, to get a list of employee numbers for employees in department number 1:

(employee))

Mapping this back to SQL gives:

SELECT empno

FROM employee

WHERE depno = 1;

Set Operations - semantics

Consider two relations R and S.

• UNION of R and S the union of two relations is a relation that includes all the tuples that are either in R or in S or in both R and S.

Duplicate tuples are eliminated.

• INTERSECTION of R and S the intersection of R and S is a relation that includes all tuples that are both in R and S.

• DIFFERENCE of R and S the difference of R and S is the relation that contains all the tuples that are in R but that are not in S.

SET Operations - requirements

For set operations to function correctly the relations R and S must be union compatible.

Two relations are union compatible if

– they have the same number of attributes

– the domain of each attribute in column order is the same in both R and S.

UNION Example

INTERSECTION Example

DIFFERENCE Example

CARTESIAN PRODUCT

The Cartesian Product is also an operator which works on two sets. It is sometimes called the CROSS PRODUCT or CROSS

JOIN.

It combines the tuples of one relation with all the tuples of the other relation.

CARTESIAN PRODUCT

Example

JOIN Operator

JOIN is used to combine related tuples from two relations:

• In its simplest form the JOIN operator is just the cross product of the two relations.

• As the join becomes more complex, tuples are removed within the cross product to make the result of the join more meaningful.

• JOIN allows you to evaluate a join condition between the attributes of the relations on which the join is undertaken.

The notation used is

R JOIN join condition

S

JOIN Example

Natural Join

Invariably the JOIN involves an equality test, and thus is often described as an equi-join. Such joins result in two attributes in the resulting relation having exactly the same value. A ‘ natural join ’ will remove the duplicate attribute(s).

– In most systems a natural join will require that the attributes have the same name to identify the attribute(s) to be used in the join. This may require a renaming mechanism.

– If you do use natural joins make sure that the relations do not have two attributes with the same name by accident .

OUTER JOINs

Notice that much of the data is lost when applying a join to two relations. In some cases this lost data might hold useful information. An outer join retains the information that would have been lost from the tables, replacing missing data with nulls.

There are three forms of the outer join, depending on which data is to be kept.

• LEFT OUTER JOIN - keep data from the left-hand table

• RIGHT OUTER JOIN - keep data from the right-hand table

• FULL OUTER JOIN - keep data from both tables

OUTER JOIN Example 1

OUTER JOIN Example 2

Semistructured data

cf

Root mh starIn starOf sw starOf starIn

Multidimensional data

End of Part 1

Download