Databases from a category-theoretic perspective David Spivak Presented on 2008/02/07

advertisement
Motivation and Basic Theory
Tables and Records
The category of tables
Queries
Concluding
Databases from a category-theoretic perspective
David Spivak
dspivak@uoregon.edu
Mathematics Department
University of Oregon
Presented on 2008/02/07
Computer and Information Science
University of Oregon
David Spivak
Databases from a category-theoretic perspective
Motivation and Basic Theory
Tables and Records
The category of tables
Queries
Concluding
Why use category theory?
How is category theory useful?
What is a category?
How should one think of a category?
The category Sets
Why use category theory?
• Mathematics is the study of that which can be fully
understood from the outside.
• Category theory is math for mathematicians.
• Category theory is a universal language.
David Spivak
Databases from a category-theoretic perspective
Motivation and Basic Theory
Tables and Records
The category of tables
Queries
Concluding
Why use category theory?
How is category theory useful?
What is a category?
How should one think of a category?
The category Sets
How is category theory useful?
• Category theory is a tool – new technology in math.
• Category theory organizes thought.
• Category theory “makes suggestions.”
• Category theory is about perspectives.
David Spivak
Databases from a category-theoretic perspective
Motivation and Basic Theory
Tables and Records
The category of tables
Queries
Concluding
Why use category theory?
How is category theory useful?
What is a category?
How should one think of a category?
The category Sets
What is a category?
• A category C consists of a set of objects, Ob(C), and between
every two objects a set of arrows: if X , Y ∈ Ob(C), then
Arr(X , Y ) is called the set of arrows from X to Y . An
element f ∈ Arr(X , Y ) is written
f : X → Y.
• Rule 1: given f : X → Y and g : Y → Z , we have a new
arrow g ◦ f : X → Z , called the composition of f and g .
• Rule 2: every object X ∈ Ob(C) has an identity arrow
(idX : X → X ) ∈ Arr(X , X ),
such that f ◦ idX = f and idY ◦ f = f for all f : X → Y .
• Rule 3: associativity: h ◦ (g ◦ f ) = (h ◦ g ) ◦ f .
David Spivak
Databases from a category-theoretic perspective
Motivation and Basic Theory
Tables and Records
The category of tables
Queries
Concluding
Why use category theory?
How is category theory useful?
What is a category?
How should one think of a category?
The category Sets
How should one think of a category?
• Most categories should be thought of as “a type of structure.”
• The objects are the instances of that type of structure,
• The arrows are the structure-preserving maps.
• The objects are viewed from the outside; we can’t “see their
internal structure.” However, the arrows must preserve the
internal structure. The arrows really define the category.
• Example: partially ordered sets.
David Spivak
Databases from a category-theoretic perspective
Motivation and Basic Theory
Tables and Records
The category of tables
Queries
Concluding
Why use category theory?
How is category theory useful?
What is a category?
How should one think of a category?
The category Sets
The category Sets
• The objects of Sets are sets; the arrows in Sets are functions
between sets.
• We will denote Arr by Fun when in the category of sets.
• If X ∈ Sets is an object, how do we “view” its “elements”?
Answer: Fun({∗}, X ).
• Example: The set Strings of strings and the set N of natural
numbers are objects in Sets.
• Name an element of Fun(Strings, N).
• Name an element of Fun(N, N).
• Name an element of Fun({n1 , n2 , n3 }, Strings).
David Spivak
Databases from a category-theoretic perspective
Motivation and Basic Theory
Tables and Records
The category of tables
Queries
Concluding
Why use category theory?
How is category theory useful?
What is a category?
How should one think of a category?
The category Sets
The Cartesian isomorphism – “Currying.”
• If A, B, and C are sets, then there is a natural bijection
Fun(A × B, C ) ∼
= Fun(A, Fun(B, C ))
• This is an example of a “change in perspective.” Being able to
easily change perspectives like this is one of the major goals of
category theory.
David Spivak
Databases from a category-theoretic perspective
Motivation and Basic Theory
Tables and Records
The category of tables
Queries
Concluding
Tables
Records
Relations
From tables to relational tables
Advantages of function-based tables
Tables
• We use a function-based approach.
• In this talk we only use simple databases: only one data type,
namely Strings.
• Let R and C be sets. The set of possible tables on rows R
and columns C is defined as
Fun(R × C , Strings).
David Spivak
Databases from a category-theoretic perspective
Motivation and Basic Theory
Tables and Records
The category of tables
Queries
Concluding
Tables
Records
Relations
From tables to relational tables
Advantages of function-based tables
Example
• If R = {1, 2, 3}, C = {FN, LN}, then R × C is a set of six
elements.
• We can organize them into a rectangle
(1,FN)
(2,FN)
(3,FN)
(1,LN)
(2,LN)
(3,LN)
• A function from the set of these six elements to the set
Strings is a way of putting strings in the table
FN
LN
1
2
3
David Spivak
Databases from a category-theoretic perspective
Motivation and Basic Theory
Tables and Records
The category of tables
Queries
Concluding
Tables
Records
Relations
From tables to relational tables
Advantages of function-based tables
Example continued...
• A table on R = {1, 2, 3} and C = {FN, LN} is a function
τ : R × C → Strings.
• We express such a function in the form of a table as follows:





τ
1
2
3
FN
David
Paea
Zena
LN
Spivak
LePendu
Ariola
David Spivak



 : R × C −→ Strings

Databases from a category-theoretic perspective
Motivation and Basic Theory
Tables and Records
The category of tables
Queries
Concluding
Tables
Records
Relations
From tables to relational tables
Advantages of function-based tables
Records
• Now let’s give an example of an element in Fun(C , Strings),
where again C = {FN, LN}.
• We could express our function as follows:
FN 7→ Paea; LN 7→ LePendu
• or we could just write a line in a table:
FN
Paea
LN
LePendu
• Given a set C , a function C → Strings will be called a record
(of type C ).
• The set of all records of type C will be denoted
Γ(C ) := Fun(C , Strings).
David Spivak
Databases from a category-theoretic perspective
Motivation and Basic Theory
Tables and Records
The category of tables
Queries
Concluding
Tables
Records
Relations
From tables to relational tables
Advantages of function-based tables
Shifting perspective on Tables
• Recall that a table on sets R and C is a function
τ : R × C → Strings,
i.e. an element τ ∈ Fun(R × C , Strings).
• Currying:
Fun(R × C , Strings) ∼
= Fun(R, Fun(C , Strings))
= Fun(R, Γ(C )).
• New perspective on our example:
R = {1, 2, 3}, C

τ

 1

 2
3
= {FN, LN}.
FN
David
Paea
Zena
LN
Spivak
LePendu
Ariola
David Spivak



 : R → Γ(C ).

Databases from a category-theoretic perspective
Motivation and Basic Theory
Tables and Records
The category of tables
Queries
Concluding
Tables
Records
Relations
From tables to relational tables
Advantages of function-based tables
Relations
• Let C be a finite set. A relation of type C is a subset of Γ(C ).
• For example, if C = {Title, LN}, then
{(Dr ., Spivak), (Dr ., Ariola)}
is a subset of Γ(C ).
• This can be organized in tabular form:
Title
Dr.
Dr.
LN
Spivak
Ariola
• There can be no repeated lines.
David Spivak
Databases from a category-theoretic perspective
Motivation and Basic Theory
Tables and Records
The category of tables
Queries
Concluding
Tables
Records
Relations
From tables to relational tables
Advantages of function-based tables
From tables to relational tables
• A subset S ⊂ Γ(C ) can be thought of as an injective function
S → Γ(C ).
• In our theory, we allow non-injective functions also.
• As a function τ : R → Γ(C ), the table
τ
1
2
3
Title
Dr.
Dr.
Dr.
LN
Spivak
Spivak
Ariola
is not injective: τ (1) = τ (2).
• So τ does not represent a relation in Γ(C ) because two tuples
are the same.
David Spivak
Databases from a category-theoretic perspective
Motivation and Basic Theory
Tables and Records
The category of tables
Queries
Concluding
Tables
Records
Relations
From tables to relational tables
Advantages of function-based tables
Advantages of function-based tables
• Given a table in our sense, namely a function τ : R → Γ(C ),
one can always recover the corresponding relation by taking
the image of τ .
• One cannot go back: once you identify the two instances of
“Dr. Spivak”, they cannot be separated.
• The function-based approach has many advantages.
• We keep a set of external keys, which helps with database
management.
• Functions are easier to think about and deal with than
relations are.
• The functional approach leads to a good theory: Queries of
databases return databases – no messy “multisets.”
David Spivak
Databases from a category-theoretic perspective
Motivation and Basic Theory
Tables and Records
The category of tables
Queries
Concluding
The category of Tables
Functions g : Γ(C1 ) → Γ(C2 )
Arrows in Tables
Example of an arrow in Tables
The category of Tables
• We construct the category called Tables.
• Think of a table as an instance of a certain type of structure.
• An object in Tables is a table as above. Namely, it is a triple
(R, C , τ ), where R and C are finite sets and
τ : R → Γ(C )
is a function.
• What should the arrows in Tables be?
• Somehow, they should be data-preserving maps.
• This is a case where category theory makes a suggestion.
• We know that an object of Tables is an arrow of sets.
• What’s an arrow between two “arrows of sets”?
David Spivak
Databases from a category-theoretic perspective
Motivation and Basic Theory
Tables and Records
The category of tables
Queries
Concluding
The category of Tables
Functions g : Γ(C1 ) → Γ(C2 )
Arrows in Tables
Example of an arrow in Tables
Arrows in the category Tables
• Given two tables (R1 , C1 , τ1 ) and (R2 , C2 , τ2 ), an arrow from
the first to the second consists of
• a function f : R1 → R2 and
• a function g : Γ(C1 ) → Γ(C2 ),
• such that the square
R1
τ1
g
f
R2
/ Γ(C1 )
τ2
/ Γ(C2 )
commutes. See the “arrow between arrows”?
• So... what is all this abstract nonsense?
• We will see that it’s a way of tracking the instances (people?)
in your table. “Translation” of tables.
David Spivak
Databases from a category-theoretic perspective
Motivation and Basic Theory
Tables and Records
The category of tables
Queries
Concluding
The category of Tables
Functions g : Γ(C1 ) → Γ(C2 )
Arrows in Tables
Example of an arrow in Tables
Functions g : Γ(C1 ) → Γ(C2 )
• Given a record of type C1 , the function g will produce a
record of type C2 .
• Let’s call these “record transformations.”
• What kinds of record transformations can we think of?
• Truncation of strings,
• Formulas,
• Projections.
• Let’s look more closely at projections.
• We’ll need to remember the definition:
Γ(C ) = Fun(C , Strings).
David Spivak
Databases from a category-theoretic perspective
Motivation and Basic Theory
Tables and Records
The category of tables
Queries
Concluding
The category of Tables
Functions g : Γ(C1 ) → Γ(C2 )
Arrows in Tables
Example of an arrow in Tables
Projecting records
• Suppose that C1 = {Title, FN, LN} and C2 = {First, Last}.
• A map G : C2 → C1 induces a map πG : Γ(C1 ) → Γ(C2 ):
•
)
)
*
)
LN}
{ First, Last } {Title, FN,
_
_
_
Dr.
David Spivak
• Here, the record
Title
Dr.
FN
David
LN
Spivak
!
: C1 → Strings
is sent under πG to the record
First
David
David Spivak
Last
Spivak
!
: C2 → Strings
Databases from a category-theoretic perspective
Motivation and Basic Theory
Tables and Records
The category of tables
Queries
Concluding
The category of Tables
Functions g : Γ(C1 ) → Γ(C2 )
Arrows in Tables
Example of an arrow in Tables
Arrows in Tables
• Again, an arrow in Tables is a commutative square
R1
f
τ1
R2
/ Γ(C1 )
τ2
g
/ Γ(C2 )
• We just examined some possibilities for g .
• The map f sends keys for the first table to keys for the
second. (Foreign keys.)
• The fact that g ◦ τ1 = τ2 ◦ f maintains the integrity of the
data.
David Spivak
Databases from a category-theoretic perspective
Motivation and Basic Theory
Tables and Records
The category of tables
Queries
Concluding
The category of Tables
Functions g : Γ(C1 ) → Γ(C2 )
Arrows in Tables
Example of an arrow in Tables
Example of an arrow in Tables
• Here are two tables (objects in Tables):

LN
τ1 FN


David Spivak  : {1, 2} → Γ({FN, LN})
 1
2
Paea LePendu


Last
τ2


 Sp1 Spivak 

 : {Sp1, Le1, Ar 2} → Γ({Last})
 Le1 LePendu 
Ar2 Ariola

{1, 2}
1 7→ Sp1; 2 7→ Le1
/
τ1
Γ({FN, LN})
π(Last7→LN)
{Sp1, Le1, Ar 2}
David Spivak
τ2
/
Γ({Last}).
Databases from a category-theoretic perspective
Motivation and Basic Theory
Tables and Records
The category of tables
Queries
Concluding
The category of Tables
Functions g : Γ(C1 ) → Γ(C2 )
Arrows in Tables
Example of an arrow in Tables
Example of a non-arrow in Tables
• Consider the tables
1
FN
LN
τ1
A : {1, 2} → Γ({FN, LN})
@ 1
David Spivak
2
Paea
LePendu
0
1
τ2
Last
B Sp1
C
Spivak
B
C : {Sp1, Le1, Ar 2} → Γ({Last})
@ Le1
LePanda A
Ar2
Ariola
0
• There’s a problem with the data, so the square
{1, 2}
1 7→ Sp1; 2 7→ Le1
/
τ1
Γ({FN, LN})
π(Last7→LN)
{Sp1, Le1, Ar 2}
τ2
/
Γ({Last}).
does not commute!
David Spivak
Databases from a category-theoretic perspective
Motivation and Basic Theory
Tables and Records
The category of tables
Queries
Concluding
Projecting tables
Join
Example
Other queries
Projecting tables
• We’ve seen that if D is a subset of C , the map G : D → C
induces a record transformation πG : Γ(C ) → Γ(D).
• Given a table τ : R → Γ(C ), we can compose with πG to get
the bottom map
RB
idR
R
/ Γ(C )
τ
B
B
πG ◦τ
B! πG
/ Γ(D)
• Given G : {LN} → {FN, LN} we would send

τ

 1
2
FN
David
Paea


LN
πG ◦ τ


7
−
→
Spivak 
 1
LePendu
2
David Spivak

LN

Spivak 
LePendu
Databases from a category-theoretic perspective
Motivation and Basic Theory
Tables and Records
The category of tables
Queries
Concluding
Projecting tables
Join
Example
Other queries
Join
• We can join tables. (Θ-join)
• Fancy looking diagram:
τ2
R2
rr9
rrr
/ Γ(C1 ) ×Γ(C ) Γ(C2 )
R1 ×R R2
R
8
qq
qqqq
R1
/
j5 Γ(C2 )
j
j
j
jj
π2
/4 Γ(C )
jjjj
jjjjπ1
τ
τ1
/ Γ(C1 )
• The top front table is the “fiber product” of the other three
tables.
David Spivak
Databases from a category-theoretic perspective
Motivation and Basic Theory
Tables and Records
The category of tables
Queries
Concluding
Projecting tables
Join
Example
Other queries
Example
• Consider the two arrows in Tables:
0
τ1
B
@ 1
2
1
Title
LN
Dr.
Ms.
C
Spivak A
Spivak
f
/
τ
A
LN
Spivak
!
o
0
g
τ2
B
@ Sp1
Sp2
1
FN
LN
David
Sharon
C
Spivak A
Spivak
• The fiber product of this diagram, τ1 ×τ τ2 , is a new object in
Tables. We define the join to be this fiber product.
• The SQL notation for this join is:
SELECT *
FROM τ1 , τ2
WHERE τ1 .LN = τ2 .LN
• What do you get?
David Spivak
Databases from a category-theoretic perspective
Motivation and Basic Theory
Tables and Records
The category of tables
Queries
Concluding
Projecting tables
Join
Example
Other queries
Answer:
• The join is
τ1 ×τ τ2
(1,Sp1)
(1,Sp2)
(2,Sp1)
(2,Sp2)
Title
Dr.
Dr.
Ms.
Ms.
FN
David
Sharon
David
Sharon
LN
Spivak
Spivak
Spivak
Spivak
• This could be an annoying outcome.
• However, because of our function-based approach, there is a
way we could be more careful.
David Spivak
Databases from a category-theoretic perspective
Motivation and Basic Theory
Tables and Records
The category of tables
Queries
Concluding
Projecting tables
Join
Example
Other queries
Lossless join
• If we join instead as follows:
1
0
τ1
Title
LN
B
@
1
2
Dr.
Ms.
C
Spivak A
Spivak
/
τ0
B
@ x
y
0
LN
1
Spivak
Spivak
C
A
o
0
τ2
B
@ Sp1
Sp2
1
FN
LN
David
Sharon
C
Spivak A
Spivak
by sending {1, 2} → {x, y } and {Sp1, Sp2} → {x, y } in an
appropriate way, the fiber product is much more reasonable.
• Namely, we obtain the table:
τ1 ×τ 0 τ2
(1,Sp1)
(2,Sp2)
Title
Dr.
Ms.
FN
David
Sharon
LN
Spivak
Spivak
• In other words, if we are careful with our foreign keys, we can
join tables more coherently.
David Spivak
Databases from a category-theoretic perspective
Motivation and Basic Theory
Tables and Records
The category of tables
Queries
Concluding
Projecting tables
Join
Example
Other queries
Other queries
• Categorically, the SELECT query is just another type of join.
• UNION and INSERT are also fairly simple:
• Given τ1 : R1 → Γ(C ) and τ2 : R2 → Γ(C ),
• there is a natural way to add them together
τ1 ∪ τ2 : R1 ∪ R2 → Γ(C ).
• Importantly, we never leave the category of tables:
“Function-based tables are closed under querying.”
David Spivak
Databases from a category-theoretic perspective
Motivation and Basic Theory
Tables and Records
The category of tables
Queries
Concluding
Wrapping Up
Crash course in Categories
Wrapping Up
• The beauty and value of abstraction.
• Ease of communication.
• We were meant to be together!
David Spivak
Databases from a category-theoretic perspective
Motivation and Basic Theory
Tables and Records
The category of tables
Queries
Concluding
Wrapping Up
Crash course in Categories
Crash course in Categories
• I will be offering a crash course in category theory in the CIS
department. This may lead to a full-blown class, depending on
interest. But I will at least be offering three lectures:
1
2
3
February 14 (Thursday): 12 - 1:30 pm.
February 21 (Thursday): 12 - 1:30 pm.
February 28 (Thursday): 12 - 1:30 pm.
• I will start from scratch and discuss the logic category, the
category of types, the category of categories, etc.
• I will take all questions; the class will be very flexible and
geared towards computer scientists.
David Spivak
Databases from a category-theoretic perspective
Download