Motivation and Basic Theory Tables and Records The category of tables Queries Concluding Databases from a category-theoretic perspective David Spivak dspivak@uoregon.edu Mathematics Department University of Oregon Presented on 2008/02/07 Computer and Information Science University of Oregon David Spivak Databases from a category-theoretic perspective Motivation and Basic Theory Tables and Records The category of tables Queries Concluding Why use category theory? How is category theory useful? What is a category? How should one think of a category? The category Sets Why use category theory? • Mathematics is the study of that which can be fully understood from the outside. • Category theory is math for mathematicians. • Category theory is a universal language. David Spivak Databases from a category-theoretic perspective Motivation and Basic Theory Tables and Records The category of tables Queries Concluding Why use category theory? How is category theory useful? What is a category? How should one think of a category? The category Sets How is category theory useful? • Category theory is a tool – new technology in math. • Category theory organizes thought. • Category theory “makes suggestions.” • Category theory is about perspectives. David Spivak Databases from a category-theoretic perspective Motivation and Basic Theory Tables and Records The category of tables Queries Concluding Why use category theory? How is category theory useful? What is a category? How should one think of a category? The category Sets What is a category? • A category C consists of a set of objects, Ob(C), and between every two objects a set of arrows: if X , Y ∈ Ob(C), then Arr(X , Y ) is called the set of arrows from X to Y . An element f ∈ Arr(X , Y ) is written f : X → Y. • Rule 1: given f : X → Y and g : Y → Z , we have a new arrow g ◦ f : X → Z , called the composition of f and g . • Rule 2: every object X ∈ Ob(C) has an identity arrow (idX : X → X ) ∈ Arr(X , X ), such that f ◦ idX = f and idY ◦ f = f for all f : X → Y . • Rule 3: associativity: h ◦ (g ◦ f ) = (h ◦ g ) ◦ f . David Spivak Databases from a category-theoretic perspective Motivation and Basic Theory Tables and Records The category of tables Queries Concluding Why use category theory? How is category theory useful? What is a category? How should one think of a category? The category Sets How should one think of a category? • Most categories should be thought of as “a type of structure.” • The objects are the instances of that type of structure, • The arrows are the structure-preserving maps. • The objects are viewed from the outside; we can’t “see their internal structure.” However, the arrows must preserve the internal structure. The arrows really define the category. • Example: partially ordered sets. David Spivak Databases from a category-theoretic perspective Motivation and Basic Theory Tables and Records The category of tables Queries Concluding Why use category theory? How is category theory useful? What is a category? How should one think of a category? The category Sets The category Sets • The objects of Sets are sets; the arrows in Sets are functions between sets. • We will denote Arr by Fun when in the category of sets. • If X ∈ Sets is an object, how do we “view” its “elements”? Answer: Fun({∗}, X ). • Example: The set Strings of strings and the set N of natural numbers are objects in Sets. • Name an element of Fun(Strings, N). • Name an element of Fun(N, N). • Name an element of Fun({n1 , n2 , n3 }, Strings). David Spivak Databases from a category-theoretic perspective Motivation and Basic Theory Tables and Records The category of tables Queries Concluding Why use category theory? How is category theory useful? What is a category? How should one think of a category? The category Sets The Cartesian isomorphism – “Currying.” • If A, B, and C are sets, then there is a natural bijection Fun(A × B, C ) ∼ = Fun(A, Fun(B, C )) • This is an example of a “change in perspective.” Being able to easily change perspectives like this is one of the major goals of category theory. David Spivak Databases from a category-theoretic perspective Motivation and Basic Theory Tables and Records The category of tables Queries Concluding Tables Records Relations From tables to relational tables Advantages of function-based tables Tables • We use a function-based approach. • In this talk we only use simple databases: only one data type, namely Strings. • Let R and C be sets. The set of possible tables on rows R and columns C is defined as Fun(R × C , Strings). David Spivak Databases from a category-theoretic perspective Motivation and Basic Theory Tables and Records The category of tables Queries Concluding Tables Records Relations From tables to relational tables Advantages of function-based tables Example • If R = {1, 2, 3}, C = {FN, LN}, then R × C is a set of six elements. • We can organize them into a rectangle (1,FN) (2,FN) (3,FN) (1,LN) (2,LN) (3,LN) • A function from the set of these six elements to the set Strings is a way of putting strings in the table FN LN 1 2 3 David Spivak Databases from a category-theoretic perspective Motivation and Basic Theory Tables and Records The category of tables Queries Concluding Tables Records Relations From tables to relational tables Advantages of function-based tables Example continued... • A table on R = {1, 2, 3} and C = {FN, LN} is a function τ : R × C → Strings. • We express such a function in the form of a table as follows: τ 1 2 3 FN David Paea Zena LN Spivak LePendu Ariola David Spivak : R × C −→ Strings Databases from a category-theoretic perspective Motivation and Basic Theory Tables and Records The category of tables Queries Concluding Tables Records Relations From tables to relational tables Advantages of function-based tables Records • Now let’s give an example of an element in Fun(C , Strings), where again C = {FN, LN}. • We could express our function as follows: FN 7→ Paea; LN 7→ LePendu • or we could just write a line in a table: FN Paea LN LePendu • Given a set C , a function C → Strings will be called a record (of type C ). • The set of all records of type C will be denoted Γ(C ) := Fun(C , Strings). David Spivak Databases from a category-theoretic perspective Motivation and Basic Theory Tables and Records The category of tables Queries Concluding Tables Records Relations From tables to relational tables Advantages of function-based tables Shifting perspective on Tables • Recall that a table on sets R and C is a function τ : R × C → Strings, i.e. an element τ ∈ Fun(R × C , Strings). • Currying: Fun(R × C , Strings) ∼ = Fun(R, Fun(C , Strings)) = Fun(R, Γ(C )). • New perspective on our example: R = {1, 2, 3}, C τ 1 2 3 = {FN, LN}. FN David Paea Zena LN Spivak LePendu Ariola David Spivak : R → Γ(C ). Databases from a category-theoretic perspective Motivation and Basic Theory Tables and Records The category of tables Queries Concluding Tables Records Relations From tables to relational tables Advantages of function-based tables Relations • Let C be a finite set. A relation of type C is a subset of Γ(C ). • For example, if C = {Title, LN}, then {(Dr ., Spivak), (Dr ., Ariola)} is a subset of Γ(C ). • This can be organized in tabular form: Title Dr. Dr. LN Spivak Ariola • There can be no repeated lines. David Spivak Databases from a category-theoretic perspective Motivation and Basic Theory Tables and Records The category of tables Queries Concluding Tables Records Relations From tables to relational tables Advantages of function-based tables From tables to relational tables • A subset S ⊂ Γ(C ) can be thought of as an injective function S → Γ(C ). • In our theory, we allow non-injective functions also. • As a function τ : R → Γ(C ), the table τ 1 2 3 Title Dr. Dr. Dr. LN Spivak Spivak Ariola is not injective: τ (1) = τ (2). • So τ does not represent a relation in Γ(C ) because two tuples are the same. David Spivak Databases from a category-theoretic perspective Motivation and Basic Theory Tables and Records The category of tables Queries Concluding Tables Records Relations From tables to relational tables Advantages of function-based tables Advantages of function-based tables • Given a table in our sense, namely a function τ : R → Γ(C ), one can always recover the corresponding relation by taking the image of τ . • One cannot go back: once you identify the two instances of “Dr. Spivak”, they cannot be separated. • The function-based approach has many advantages. • We keep a set of external keys, which helps with database management. • Functions are easier to think about and deal with than relations are. • The functional approach leads to a good theory: Queries of databases return databases – no messy “multisets.” David Spivak Databases from a category-theoretic perspective Motivation and Basic Theory Tables and Records The category of tables Queries Concluding The category of Tables Functions g : Γ(C1 ) → Γ(C2 ) Arrows in Tables Example of an arrow in Tables The category of Tables • We construct the category called Tables. • Think of a table as an instance of a certain type of structure. • An object in Tables is a table as above. Namely, it is a triple (R, C , τ ), where R and C are finite sets and τ : R → Γ(C ) is a function. • What should the arrows in Tables be? • Somehow, they should be data-preserving maps. • This is a case where category theory makes a suggestion. • We know that an object of Tables is an arrow of sets. • What’s an arrow between two “arrows of sets”? David Spivak Databases from a category-theoretic perspective Motivation and Basic Theory Tables and Records The category of tables Queries Concluding The category of Tables Functions g : Γ(C1 ) → Γ(C2 ) Arrows in Tables Example of an arrow in Tables Arrows in the category Tables • Given two tables (R1 , C1 , τ1 ) and (R2 , C2 , τ2 ), an arrow from the first to the second consists of • a function f : R1 → R2 and • a function g : Γ(C1 ) → Γ(C2 ), • such that the square R1 τ1 g f R2 / Γ(C1 ) τ2 / Γ(C2 ) commutes. See the “arrow between arrows”? • So... what is all this abstract nonsense? • We will see that it’s a way of tracking the instances (people?) in your table. “Translation” of tables. David Spivak Databases from a category-theoretic perspective Motivation and Basic Theory Tables and Records The category of tables Queries Concluding The category of Tables Functions g : Γ(C1 ) → Γ(C2 ) Arrows in Tables Example of an arrow in Tables Functions g : Γ(C1 ) → Γ(C2 ) • Given a record of type C1 , the function g will produce a record of type C2 . • Let’s call these “record transformations.” • What kinds of record transformations can we think of? • Truncation of strings, • Formulas, • Projections. • Let’s look more closely at projections. • We’ll need to remember the definition: Γ(C ) = Fun(C , Strings). David Spivak Databases from a category-theoretic perspective Motivation and Basic Theory Tables and Records The category of tables Queries Concluding The category of Tables Functions g : Γ(C1 ) → Γ(C2 ) Arrows in Tables Example of an arrow in Tables Projecting records • Suppose that C1 = {Title, FN, LN} and C2 = {First, Last}. • A map G : C2 → C1 induces a map πG : Γ(C1 ) → Γ(C2 ): • ) ) * ) LN} { First, Last } {Title, FN, _ _ _ Dr. David Spivak • Here, the record Title Dr. FN David LN Spivak ! : C1 → Strings is sent under πG to the record First David David Spivak Last Spivak ! : C2 → Strings Databases from a category-theoretic perspective Motivation and Basic Theory Tables and Records The category of tables Queries Concluding The category of Tables Functions g : Γ(C1 ) → Γ(C2 ) Arrows in Tables Example of an arrow in Tables Arrows in Tables • Again, an arrow in Tables is a commutative square R1 f τ1 R2 / Γ(C1 ) τ2 g / Γ(C2 ) • We just examined some possibilities for g . • The map f sends keys for the first table to keys for the second. (Foreign keys.) • The fact that g ◦ τ1 = τ2 ◦ f maintains the integrity of the data. David Spivak Databases from a category-theoretic perspective Motivation and Basic Theory Tables and Records The category of tables Queries Concluding The category of Tables Functions g : Γ(C1 ) → Γ(C2 ) Arrows in Tables Example of an arrow in Tables Example of an arrow in Tables • Here are two tables (objects in Tables): LN τ1 FN David Spivak : {1, 2} → Γ({FN, LN}) 1 2 Paea LePendu Last τ2 Sp1 Spivak : {Sp1, Le1, Ar 2} → Γ({Last}) Le1 LePendu Ar2 Ariola {1, 2} 1 7→ Sp1; 2 7→ Le1 / τ1 Γ({FN, LN}) π(Last7→LN) {Sp1, Le1, Ar 2} David Spivak τ2 / Γ({Last}). Databases from a category-theoretic perspective Motivation and Basic Theory Tables and Records The category of tables Queries Concluding The category of Tables Functions g : Γ(C1 ) → Γ(C2 ) Arrows in Tables Example of an arrow in Tables Example of a non-arrow in Tables • Consider the tables 1 FN LN τ1 A : {1, 2} → Γ({FN, LN}) @ 1 David Spivak 2 Paea LePendu 0 1 τ2 Last B Sp1 C Spivak B C : {Sp1, Le1, Ar 2} → Γ({Last}) @ Le1 LePanda A Ar2 Ariola 0 • There’s a problem with the data, so the square {1, 2} 1 7→ Sp1; 2 7→ Le1 / τ1 Γ({FN, LN}) π(Last7→LN) {Sp1, Le1, Ar 2} τ2 / Γ({Last}). does not commute! David Spivak Databases from a category-theoretic perspective Motivation and Basic Theory Tables and Records The category of tables Queries Concluding Projecting tables Join Example Other queries Projecting tables • We’ve seen that if D is a subset of C , the map G : D → C induces a record transformation πG : Γ(C ) → Γ(D). • Given a table τ : R → Γ(C ), we can compose with πG to get the bottom map RB idR R / Γ(C ) τ B B πG ◦τ B! πG / Γ(D) • Given G : {LN} → {FN, LN} we would send τ 1 2 FN David Paea LN πG ◦ τ 7 − → Spivak 1 LePendu 2 David Spivak LN Spivak LePendu Databases from a category-theoretic perspective Motivation and Basic Theory Tables and Records The category of tables Queries Concluding Projecting tables Join Example Other queries Join • We can join tables. (Θ-join) • Fancy looking diagram: τ2 R2 rr9 rrr / Γ(C1 ) ×Γ(C ) Γ(C2 ) R1 ×R R2 R 8 qq qqqq R1 / j5 Γ(C2 ) j j j jj π2 /4 Γ(C ) jjjj jjjjπ1 τ τ1 / Γ(C1 ) • The top front table is the “fiber product” of the other three tables. David Spivak Databases from a category-theoretic perspective Motivation and Basic Theory Tables and Records The category of tables Queries Concluding Projecting tables Join Example Other queries Example • Consider the two arrows in Tables: 0 τ1 B @ 1 2 1 Title LN Dr. Ms. C Spivak A Spivak f / τ A LN Spivak ! o 0 g τ2 B @ Sp1 Sp2 1 FN LN David Sharon C Spivak A Spivak • The fiber product of this diagram, τ1 ×τ τ2 , is a new object in Tables. We define the join to be this fiber product. • The SQL notation for this join is: SELECT * FROM τ1 , τ2 WHERE τ1 .LN = τ2 .LN • What do you get? David Spivak Databases from a category-theoretic perspective Motivation and Basic Theory Tables and Records The category of tables Queries Concluding Projecting tables Join Example Other queries Answer: • The join is τ1 ×τ τ2 (1,Sp1) (1,Sp2) (2,Sp1) (2,Sp2) Title Dr. Dr. Ms. Ms. FN David Sharon David Sharon LN Spivak Spivak Spivak Spivak • This could be an annoying outcome. • However, because of our function-based approach, there is a way we could be more careful. David Spivak Databases from a category-theoretic perspective Motivation and Basic Theory Tables and Records The category of tables Queries Concluding Projecting tables Join Example Other queries Lossless join • If we join instead as follows: 1 0 τ1 Title LN B @ 1 2 Dr. Ms. C Spivak A Spivak / τ0 B @ x y 0 LN 1 Spivak Spivak C A o 0 τ2 B @ Sp1 Sp2 1 FN LN David Sharon C Spivak A Spivak by sending {1, 2} → {x, y } and {Sp1, Sp2} → {x, y } in an appropriate way, the fiber product is much more reasonable. • Namely, we obtain the table: τ1 ×τ 0 τ2 (1,Sp1) (2,Sp2) Title Dr. Ms. FN David Sharon LN Spivak Spivak • In other words, if we are careful with our foreign keys, we can join tables more coherently. David Spivak Databases from a category-theoretic perspective Motivation and Basic Theory Tables and Records The category of tables Queries Concluding Projecting tables Join Example Other queries Other queries • Categorically, the SELECT query is just another type of join. • UNION and INSERT are also fairly simple: • Given τ1 : R1 → Γ(C ) and τ2 : R2 → Γ(C ), • there is a natural way to add them together τ1 ∪ τ2 : R1 ∪ R2 → Γ(C ). • Importantly, we never leave the category of tables: “Function-based tables are closed under querying.” David Spivak Databases from a category-theoretic perspective Motivation and Basic Theory Tables and Records The category of tables Queries Concluding Wrapping Up Crash course in Categories Wrapping Up • The beauty and value of abstraction. • Ease of communication. • We were meant to be together! David Spivak Databases from a category-theoretic perspective Motivation and Basic Theory Tables and Records The category of tables Queries Concluding Wrapping Up Crash course in Categories Crash course in Categories • I will be offering a crash course in category theory in the CIS department. This may lead to a full-blown class, depending on interest. But I will at least be offering three lectures: 1 2 3 February 14 (Thursday): 12 - 1:30 pm. February 21 (Thursday): 12 - 1:30 pm. February 28 (Thursday): 12 - 1:30 pm. • I will start from scratch and discuss the logic category, the category of types, the category of categories, etc. • I will take all questions; the class will be very flexible and geared towards computer scientists. David Spivak Databases from a category-theoretic perspective