Product

advertisement
CPSC 534A – Background
Rachel Pottinger
January 13 and 18, 2005
Administrative notes
Please note you’re supposed to sign up
for one paper presentation and one
discussion… for different papers
Please sign up for the mailing list
WebCT has been populated – make sure
you can access it
HW 1 is on the web, due beginning of
class a week from today
Overview of the next two classes
Relational databases
Entity Relationship (ER) diagrams
Object Oriented Databases (OODBs)
XML
Other data types
Database internals (Briefly)
An extremely brief introduction to category
theory
Metadata management examples are
interspersed
Relational Database Basics
What’s in a relational database?
Relational Algebra
SQL
Datalog
Relational Data Representation
PName
gizmo
Attribute names or columns
Price
Category
Manufacturer
$19.99
gadgets
GizmoWorks
Power gizmo $29.99
gadgets
GizmoWorks
SingleTouch $149.99
photography
Canon
MultiTouch
household
Hitachi
Tuples or rows
$203.99
Relation or table
Relational schema representation
Every attribute has an atomic type (e.g., Char, integer)
Relation Schema: Column headings:
relation name + attribute names + attribute types
Product(VarChar PName, real Price, VarChar Category,
VarChar Manfacturer)
often types are left off:
Product(PName, Price, Category, Manfacturer)
Relation instance: The values in a table.
Database Schema: a set of relation schemas in the
database.
Database instance: a relation instance for every relation in
the schema.
Querying – Relational Algebra
Select ()- chose tuples from a relation
Project ()- chose attributes from relation
Join (⋈) - allows combining of 2 relations
Set-difference ( ) Tuples in relation 1,
but not in relation 2.
Union ( )
Cartesian Product (×) Each tuple of R1
with each tuple in R2
Find products where the manufacturer
is GizmoWorks
Product
PName
Price
Category
Manufacturer
Gizmo
$19.99
Gadgets
GizmoWorks
Powergizmo
$29.99
Gadgets
GizmoWorks
SingleTouch
$149.99
Photography
Canon
MultiTouch
$203.99
Household
Hitachi
PName
Price
Category
Manufacturer
Gizmo
$19.99
Gadgets
GizmoWorks
Powergizmo
$29.99
Gadgets
GizmoWorks
Selection:
σManufacturer = GizmoWorksProduct
Find the Name, Price, and Manufacturers of products
whose price is greater than 100
Product
PName
Price
Category
Manufacturer
Gizmo
$19.99
Gadgets
GizmoWorks
Powergizmo
$29.99
Gadgets
GizmoWorks
SingleTouch
$149.99
Photography
Canon
MultiTouch
$203.99
Household
Hitachi
Selection + Projection:
πName, Price, Manufacturer (σPrice > 100Product)
PName
Price
Manufacturer
SingleTouch
$149.99
Canon
MultiTouch
$203.99
Hitachi
Find the product names and price of products that cost less than
$200 and have manufacturers where there is a Company that has
a CName that matches the manufacturer, and its country is Japan
Product
Company
PName
Price
Category
Manufacturer
Cname
StockPrice
Country
Gizmo
$19.99
Gadgets
GizmoWorks
GizmoWorks
25
USA
Powergizmo
$29.99
Gadgets
GizmoWorks
Canon
65
Japan
SingleTouch
$149.99
Photography
Canon
Hitachi
15
Japan
MultiTouch
$203.99
Household
Hitachi
πPName, Price((σPrice < 200Product)⋈
Manufacturer = Cname (σCountry =
‘Japan’Company))
PName
Price
SingleTouch
$149.99
When are two relations related?
You guess they are
I tell you so
Constraints say so
A key is a set of attributes whose values are unique;
we underline a key
Product(PName, Price, Category, Manfacturer)
Foreign keys are a method for schema designers to
tell you so
A foreign key states that an attribute is a reference to the key
of another relation
ex: Product.Manufacturer is foreign key of Company
Gives information and enforces constraint
SQL
Data Manipulation Language (DML)
Query one or more tables
Insert/delete/modify tuples in tables
Data Definition Language (DDL)
Create/alter/delete tables and their attributes
Transact-SQL
Idea: package a sequence of SQL statements
 server
Querying – SQL
Standard language for querying and manipulating data
Structured Query Language
Many standards out there:
• ANSI SQL
• SQL92 (a.k.a. SQL2)
• SQL99 (a.k.a. SQL3)
• Vendors support various subsets of these
• What we discuss is common to all of them
SQL basics
Basic form: (many many more bells and
whistles in addition)
Select attributes
From relations (possibly multiple, joined)
Where conditions (selections)
SQL – Selections
SELECT *
FROM Company
WHERE country=“Canada” AND stockPrice > 50
Some things allowed in the WHERE clause:
attribute names of the relation(s) used in the FROM.
comparison operators: =, <>, <, >, <=, >=
apply arithmetic operations: stockPrice*2
operations on strings (e.g., “||” for concatenation).
Lexicographic order on strings.
Pattern matching: s LIKE p
Special stuff for comparing dates and times.
SQL – Projections
Select only a subset of the attributes
SELECT name, stock price
FROM Company
WHERE country=“Canada” AND stockPrice > 50
Rename the attributes in the resulting table
SELECT name AS company, stockPrice AS price
FROM Company
WHERE country=“Canada” AND stockPrice > 50
SQL – Joins
SELECT name, store
FROM
Person, Purchase
WHERE name=buyer AND city=“Vancouver”
AND product=“gizmo”
Product ( name, price, category, maker)
Purchase (buyer, seller, store, product)
Company (name, stock price, country)
Person( name, phone number, city)
Selection:
σManufacturer = GizmoWorks(Product)
Product
PName
Price
Category
Manufacturer
Gizmo
$19.99
Gadgets
GizmoWorks
Powergizmo
$29.99
Gadgets
GizmoWorks
SingleTouch
$149.99
Photography
Canon
MultiTouch
$203.99
Household
Hitachi
PName
Price
Category
Manufacturer
Gizmo
$19.99
Gadgets
GizmoWorks
Powergizmo
$29.99
Gadgets
GizmoWorks
What’s the SQL?
Selection + Projection:
πName, Price, Manufacturer (σPrice > 100Product)
Product
PName
Price
Category
Manufacturer
Gizmo
$19.99
Gadgets
GizmoWorks
Powergizmo
$29.99
Gadgets
GizmoWorks
SingleTouch
$149.99
Photography
Canon
MultiTouch
$203.99
Household
Hitachi
What’s the SQL?
PName
Price
Manufacturer
SingleTouch
$149.99
Canon
MultiTouch
$203.99
Hitachi
π PName, Price((σPrice <= 200Product)⋈
= Cname (σCountry = ‘Japan’Company))
Product
Manufacturer
Company
PName
Price
Category
Manufacturer
Cname
StockPrice
Country
Gizmo
$19.99
Gadgets
GizmoWorks
GizmoWorks
25
USA
Powergizmo
$29.99
Gadgets
GizmoWorks
Canon
65
Japan
SingleTouch
$149.99
Photography
Canon
Hitachi
15
Japan
MultiTouch
$203.99
Household
Hitachi
What’s the SQL?
PName
Price
SingleTouch
$149.99
More SQL – Outer Joins
Product
What happens if
there’s no value
available?
Company
PName
Price
Category
Manufacturer
Cname
StockPrice
Country
Gizmo
$19.99
Gadgets
GizmoWorks
GizmoWorks
25
USA
Powergizmo
$29.99
Gadgets
GizmoWorks
Canon
65
Japan
SingleTouch
$149.99
Photography
Canon
Hitachi
15
Japan
MultiTouch
$203.99
Household
Hitachi
Foo
$1.99
Gadgets
Bar
Select pname, Country
From Product
Product,outer
Company
join Company
Where
on
Manufacturer
Manufacturer
= Cname
= Cname
PName
Country
Gizmo
USA
Powergizmo
USA
SingleTouch
Japan
MultiTouch
Japan
Foo
NULL
Querying – Datalog
Enables expressing recursive queries
More convenient for analysis
Some people find it easier to understand
Without recursion but with negation it is
equivalent in power to relational algebra
and SQL
Limited version of Prolog (no functions)
Datalog Rules and Queries
A datalog rule has the following form:
head :- atom1, atom2, …, atom,…
You can read this as
Distinguished
Subgoal or
then :- if ...
variables
variable Existential
EDB
ExpensiveProduct(N) :- Product(N,M,P) & P > $100
Arithmetic
comparison or
interpreted
predicate
constant
CanadianProduct(N) :- Product(N,M,P) & Company(M, “Canada”, SP)
IntlProd(N) :- Product(N,M,P) & NOT Company(M, “Canada”, SP)
Head or IDB
Negated subgoal - also
denoted by ¬
Conjunctive Queries
A subset of Datalog
Only relations appear in the right hand
side of rules
No negation
Functionally equivalent to Select, Project,
Join queries
Very popular in modeling relationships
between databases
Selection:
σManufacturer = GizmoWorks(Product)
Product
PName
Price
Category
Manufacturer
Gizmo
$19.99
Gadgets
GizmoWorks
Powergizmo
$29.99
Gadgets
GizmoWorks
SingleTouch
$149.99
Photography
Canon
MultiTouch
$203.99
Household
Hitachi
PName
Price
Category
Manufacturer
Gizmo
$19.99
Gadgets
GizmoWorks
Powergizmo
$29.99
Gadgets
GizmoWorks
What’s the Datalog?
Selection + Projection:
πName, Price, Manufacturer (σPrice > 100Product)
Product
PName
Price
Category
Manufacturer
Gizmo
$19.99
Gadgets
GizmoWorks
Powergizmo
$29.99
Gadgets
GizmoWorks
SingleTouch
$149.99
Photography
Canon
MultiTouch
$203.99
Household
Hitachi
What’s the Datalog?
PName
Price
Manufacturer
SingleTouch
$149.99
Canon
MultiTouch
$203.99
Hitachi
πPname,Price((σPrice <= 200Product)⋈
Cname (σCountry = ‘Japan’Company))
Product
Manufacturer =
Company
PName
Price
Category
Manufacturer
Cname
StockPrice
Country
Gizmo
$19.99
Gadgets
GizmoWorks
GizmoWorks
25
USA
Powergizmo
$29.99
Gadgets
GizmoWorks
Canon
65
Japan
SingleTouch
$149.99
Photography
Canon
Hitachi
15
Japan
MultiTouch
$203.99
Household
Hitachi
What’s the Datalog?
PName
Price
SingleTouch
$149.99
Bonus Relational Goodness: Views
Views are relations, except that they are not physically
stored. (Materialized views are stored)
They are used mostly in order to simplify complex queries
and to define conceptually different views of the
database to different classes of users.
Used also to model relationships between databases
View: purchases of telephony products:
CREATE VIEW telephony-purchases AS
SELECT product, buyer, seller, store
FROM Purchase, Product
WHERE Purchase.product = Product.name
AND Product.category = “telephony”
Summarizing Relational DBs
Relational perspective: Data is stored in relations.
Relations have attributes. Data instances are
tuples.
SQL perspective: Data is stored in tables. Tables
have columns. Data instances are rows.
Query languages
Relational algebra – mathematical base for
understanding query languages
SQL – very widely used
Datalog – based on Prolog, very popular with
theoreticians
Views allow complex queries to be written simply
Relational Metadata problems
Data Integration:
Planning a Beach Vacation
Beach
Good
Weather
Fodors
AAA
weather.
com
Expedia
Cheap
Flight
wunder
ground
Orbitz
Data Integration System Architecture
User Query
Virtual
database
Mediated
Schema
“Airport”
Local Schema 1
Local Schema N
Local
Database 1
Local
Database N
Orbitz
Expedia
Data Translation
Data exists in two different schemas. You
have data in one, and you want to put data
into the other
How are the schemas related to one another?
How do you change the data from one to
another?
Data Warehousing
Data Warehouses store vast quantities of
data for fast query processing, but only
batch updating.
Import schemas of data sources
Identify overlapping attributes, etc.
Build data cleaning scripts
Build data transformation scripts
Enable data lineage tracing
Schema Evolution and Data Migration
Schemas change over time; data must change
with it.
How do we deal with schema changes?
How can we make it easy for the data to
migrate
How do we handle applications built on the
old schema that store in the new database?
Outline
Relational databases
Entity Relationship (ER) diagrams
Object Oriented Databases (OODBs)
XML
Other data types
Database internals (Briefly)
An extremely brief introduction to category
theory
Entity / Relationship Diagrams
Entities
Product
Attributes
address
Relationships between entities
buys
Keys in E/R Diagrams
Every entity set must have a key
name
price
Product
category
name
category
name
price
makes
Company
Product
stockprice
buys
employs
Person
address
name
sin
Multiplicity of E/R Relations
one-one:
many-one
many-many
1
2
3
a
b
c
d
1
2
3
a
b
c
d
1
2
3
a
b
c
d
name
category
name
price
makes
Company
Product
stockprice
buys
What does
this say ?
employs
Person
address
name
sin
Roles in Relationships
What if we need an entity set twice in one relationship?
Product
Purchase
buyer
salesperson
Person
Store
Attributes on Relationships
date
Product
Purchase
Person
Store
Subclasses in E/R Diagrams
name
category
price
Product
isa
Software Product
platforms
isa
Educational Product
Age Group
Keys in E/R Diagrams
name
Underline:
category
price
No formal way
to specify multiple
keys in E/R diagrams
Product
Person
address
name
SIN
From E/R Diagrams
to Relational Schema
Entity set  relation
Relationship  relation
Entity Set to Relation
name
category
price
Product
Product(name, category, price)
name
category
price
gizmo
gadgets
$19.99
Relationships to Relations
price
name
category
Start Year
makes
name
Company
Product
Stock price
Makes(product-name, product-category, company-name, year)
Product-name Product-Category Company-name Starting-year
gizmo
gadgets
gizmoWorks
(watch out for attribute name conflicts)
1963
Relationships to Relations
price
name
category
Start Year
makes
name
Company
Product
Stock price
No need for Makes. Modify Product:
name
category price StartYear companyName
gizmo gadgets 19.99
1963
gizmoWorks
Multi-way Relationships to
Relations
Product
name
address
name
price
Purchase
Store
Person
Purchase(
sin
name
,
,
)
Summarizing ER diagrams
Entities, relationships, and attributes
Also has inheritance
Used to design schemas, then relational
derived from it
Metadata problems:
Mapping ER to Relational
Database design
Map ER model to SQL schema
Reverse engineer SQL schema to ER model
Metadata problems:
Round Trip Engineering
Design in ER
Implement in relational
Modify the relational schema
How do we change the ER diagram?
View integration
Define use-case scenario
Identify views for each use-case
Integrate views into a conceptual schema
CPSC 534a
Background: Part 2
Rachel Pottinger
January 18, 2005
Administrative notes
Please sign up for papers if you haven’t already (if
there’s enough time, we’ll do this at the end of class)
Remember that the first reading responses are due 9pm
Wednesday
Mail me if you can’t access WebCT
Remember the 1st homework is due beginning of class
Thursday
General theory – trying to make sure you understand basics and
have thought about it – not looking for one, true, answer.
State any assumptions you make
If you can’t figure out a detail on how to transform ER to
relational based on class discussion, write an explanation as to
what you did and why.
Any other questions?
Office hours?
Outline
Relational databases
Entity Relationship (ER) diagrams
Object Oriented Databases (OODBs)
XML
Other data types
Database internals (Briefly)
An extremely brief introduction to category
theory
Object-Oriented DBMS’s
Started late 80’s
Main idea:
Toss the relational model !
Use the OO model – e.g. C++ classes
Standards group: ODMG = Object Data
Management Group.
OQL = Object Query Language, tries to
imitate SQL in an OO framework.
The OO Plan
ODMG imagines OO-DBMS vendors
implementing an OO language like C++
with extensions (OQL) that allow the
programmer to transfer data between the
database and “host language” seamlessly.
A brief diversion: the impedance mismatch
OO Implementation Options
Build a new database from scratch (O2)
Elegant extension of SQL
Later adopted by ODMG in the OQL language
Used to help build XML query languages
Make a programming language persistent
(ObjectStore)
No query language
Niche market
ObjectStore is still around, renamed to Exelon, stores
XML objects now
ODL
ODL is used to define persistent classes,
those whose objects may be stored
permanently in the database.
ODL classes look like Entity sets with binary
relationships, plus methods.
ODL class definitions are part of the
extended, OO host language.
ODL – remind you of anything?
interface Person
(extent People key sin)
{ attribute string sin;
attribute string dept;
attribute string name;}
interface Course
(extent Crs key cid)
{ attribute string cid;
attribute string cname;
relationship Person instructor;
relationship Set<Student> stds
inverse takes;}
interface Student extends Person
(extent Students)
{ attribute string major;
relationship Set<Course> takes inverse stds;}
Why did OO Fail?
Why are relational databases so popular?
Very simple abstraction; don’t have to think
about programming when storing data.
Very well optimized
Relational db are very well entrenched –
not enough advantages, and no good exit
strategy…
Metadata failure
Merging Relational and OODBs
Object-oriented models support
interesting data types – not just flat files.
Maps, multimedia, etc.
The relational model supports very-highlevel queries.
Object-relational databases are an
attempt to get the best of both.
All major commercial DBs today have OR
versions – full spec in SQL99, but your
mileage may vary.
Outline
Relational databases
Entity Relationship (ER) diagrams
Object Oriented Databases (OODBs)
XML
Other data types
Database internals (Briefly)
An extremely brief introduction to category
theory
XML
eXtensible Markup Language
XML 1.0 – a recommendation from W3C,
1998
Roots: SGML (from document community works great for them; from db perspective,
very nasty).
After the roots: a format for sharing data
Why XML is of Interest to Us
XML is just syntax for data
Note: we have no syntax for relational data
But XML is not relational: semistructured
This is exciting because:
Can translate any data to XML
Can ship XML over the Web (HTTP)
Can input XML into any application
Thus: data sharing and exchange on the Web
XML Data Sharing and Exchange
application
application
object-relational
Integrate
XML Data
Transform
WEB (HTTP)
Warehouse
application
relational data
legacy data
Think of all the metadata problems!
From HTML to XML
HTML describes the presentation
HTML
<h1> Bibliography </h1>
<p> <i> Foundations of Databases </i>
Abiteboul, Hull, Vianu
<br> Addison Wesley, 1995
<p> <i> Data on the Web </i>
Abiteoul, Buneman, Suciu
<br> Morgan Kaufmann, 1999
XML
<bibliography>
<book> <title> Foundations… </title>
<author> Abiteboul </author>
<author> Hull </author>
<author> Vianu </author>
<publisher> Addison Wesley </publisher>
<year> 1995 </year>
</book>
…
</bibliography>
XML describes the content
XML Document
attributes
person elements
<data>
<person id=“o555” >
<name> Mary </name>
<address>
<street> Maple </street>
<no> 345 </no>
<city> Seattle </city>
</address>
</person>
<person>
<name> John </name>
<address> Thailand </address>
<phone> 23456 </phone>
<married/>
</person>
</data>
name elements
XML Terminology
Elements
enclosed within tags:
<person> … </person>
nested within other elements:
<person> <address> … </address> </person>
can be empty
<married></married> abbreviated as <married/>
can have Attributes
<person id=“0005”> … </person>
XML document has as single ROOT element
Buzzwords
What is XML?
W3C data exchange format
Hierarchical data model
Self-describing
Semi-structured
XML as a Tree !!
<data>
<person id=“o555” >
<name> Mary </name>
<address>
<street> Maple </street>
<no> 345 </no>
<city> Seattle </city>
</address>
</person>
o555
<person>
<name> John </name>
<address> Thailand </address>
<phone> 23456 </phone>
</person>
</data>
Element
node
Attribute
node
data
person
person
id
address
address
name
name
John
Mary
street
Maple
no
345
phone
Thai
23456
city
Seattle
Minor Detail: Order matters !!!
Text
node
XML is self-describing
Schema elements become part of the data
In XML <persons>, <name>, <phone> are
part of the data, and are repeated many times
Relational schema: persons(name,phone)
defined separately for the data and is fixed
Consequence: XML is much more flexible
Relational Data as XML
persons
XML:
person
person
nam e
phone
John
3634
Sue
6343
D ic k
6363
name
“John”
person
phone name phone
3634 “Sue”
6343
person
name
phone
“Dick”
6363
<persons>
<person> <name>John</name>
<phone> 3634</phone>
</person>
<person> <name>Sue</name>
<phone> 6343</phone>
</person>
<person> <name>Dick</name>
<phone> 6363</phone>
</person>
</persons>
XML is semi-structured
Missing elements:
<person> <name> John</name>
<phone>1234</phone>
</person>
<person> <name>Joe</name>
</person>
 no phone !
Could represent in a table with nulls
name
phone
John
1234
Joe
-
XML is semi-structured
Repeated elements
<person> <name> Mary</name>
<phone>2345</phone>
<phone>3456</phone>
</person>
 two phones !
Impossible in tables:
name
phone
Mary
2345
3456
???
XML is semi-structured
Elements with different types in different objects
<person> <name> <first> John </first>
<last> Smith </last>
</name>
<phone>1234</phone>
</person>
 structured name !
Heterogeneous collections:
<persons> can contain both <person>s and
<customer>s
Summarizing XML
XML has first class elements and second
class attributes
XML is semi-structured
XML is nested
XML is a tree
XML is a huge buzzword
Will XML replace relational databases?
Outline
Relational databases
Entity Relationship (ER) diagrams
Object Oriented Databases (OODBs)
XML
Other data types
Database internals (Briefly)
An extremely brief introduction to category
theory
Other data formats
Makefiles
Forms
Application code
Other Metadata Applications
Message Mapping
Map messages from one format to another
Scientific data management
Merge schemas from related experiments
Manage transformations of experimental data
Track evolution of schemas and transformations
DB Application development
Map SQL schema to default form
Map business rule to SQL constraints and
form validation code
Manage dependencies between code and schemas
and forms
Outline
Relational databases
Entity Relationship (ER) diagrams
Object Oriented Databases (OODBs)
XML
Other data types
Database internals (Briefly)
An extremely brief introduction to category
theory
How SQL Gets Executed:
Query Execution Plans
Select Pname, Price
From Product, Company
Where Manufacturer = Cname
AND Price <= 200
AND Country = ‘Japan’
πPname, Price
σPrice < 200 ^ Country = ‘Japan’
⋈
Manufacturer = Cname
Product
Company
Query optimization also specifies the algorithms for each
operator; then queries can be executed
Overview of Query Optimization
Plan: Tree of ordered Relational Algebra operators
and choice of algorithm for each operator
Two main issues:
For a given query, what plans are considered?
Algorithm to search plan space for cheapest (estimated) plan.
How is the cost of a plan estimated?
Ideally: Want to find best plan.
Practically: Avoid worst plans.
Some tactics
Do selections early
Use materialized views
Query Execution
Now that we have the plan, what do we do
with it?
How do deal with paging in data, etc.
New research covers new paradigms
where interleaved with optimization
Transactions
Address two issues:
Access by multiple users
Remember the “client-server” architecture:
one server with many clients
Protection against crashes
Transactions
Transaction = group of statements that
must be executed atomically
Transaction properties: ACID
ATOMICITY = all or nothing
CONSISTENCY = leave database in
consistent state
ISOLATION = as if it were the only
transaction in the system
DURABILITY = store on disk !
Transactions in SQL
In “ad-hoc” SQL:
Default: each statement = one transaction
In “embedded” SQL:
BEGIN TRANSACTION
[SQL statements]
COMMIT or ROLLBACK (=ABORT)
Transactions: Serializability
Serializability = the technical term for
isolation
An execution is serial if it is completely
before or completely after any other
function’s execution
An execution is serializable if it equivalent
to one that is serial
DBMS can offer serializability guarantees
Serializability
Enforced with locks, like in Operating Systems !
But this is not enough:
User 1
LOCK A
[write A=1]
UNLOCK A
...
...
...
...
LOCK B
[write B=2]
UNLOCK B
User 2
LOCK A
[write A=3]
UNLOCK A
LOCK B
[write B=4]
UNLOCK B
What is wrong ?
time
Outline
Relational databases
Entity Relationship (ER) diagrams
Object Oriented Databases (OODBs)
XML
Other data types
Database internals (Briefly)
An extremely brief introduction to category
theory
Category Theory
“Category theory is a mathematical theory
that deals in an abstract way with
mathematical structures and relationships
between them. It is half-jokingly known as
‘generalized abstract nonsense’.”
[wikipedia]
There is a lot of scary category theory out
there. You only need to know a few terms.
Category Theory
Started in 1945
General mathematical theory of structures and
systems of structures.
Reveals how structures of different kinds are
related to one another, as well as the universal
components of a family of structures of a given
kind.
It is considered by many as being an alternative
to set theory as a foundation for mathematics.
It is very, very, very abstract
Category Theory Definitions
C is a graph.
There are two classes: the objects or nodes obj(C)
and the morphisms or edges or arrows mor(C).
Any morphism f  mor(C) has a source and target
object f : a  b.
For any composable pair of morphisms f : a  b and
g : b  c, there is a composition morphism (g • f) : a f
a
b
 c.
g
A functor translates objects and morphisms from oneg • f
c
category to another
Diagrams commute if one can follow any path through
the diagram and obtain the same result by composition
a
b
c
d
Why you need a bit of category theory
Lots of people like to use the term
“morphism”
It’s motivation behind a number of views –
understanding this can make reading
papers easier
If you’re theoretically minded, it can give
you a good way to think about the problem
Overall background recap
There are many different data models. We
covered:
Relational Databases
Entity-Relationship Diagrams
Object Oriented Databases
XML
Changing around schemas within data models
creates metadata problems. So does changing
schemas between data models
Databases have some (largely hidden) internal
processes; some of these will be related to in
other papers we’ve read
Theory can be handy to ground your reading.
Now what?
Time to read papers
Prepare paper responses – it’ll help you
focus on the paper, and allow for the
discussion leader to prepare better
discussion
You all have different backgrounds,
interests, and insights. Bring them into
class!
Download