Chapter 2: Relational Algebra -- Introduction to database principles MaoyingWu (bi203.sjtu@gmail.com) March 6, 2013 Relation terminology Jargon (专业术语) Vulgo (俗称) Relation name (关系名) Table name (表名) Schema (关系模式) Table header (表头/表描述) Relation (关系) 2-d table (二维表) Tuple (元组) Row (行) Attribute (属性) Column (列) Attribute name (属性名) Column name (列名) Attribute value (属性值) Column value (列值) Domain (域) Column type (列类型) Learning strategies Know how => know why 适当的不求甚解 After-class assignment 1 Install MySQL on your own computer Linux: from source package or LAMP Windows: from rebuilt package or XAMP Learn some basic knowledge on HTML/PHP and B/S development Review of C programming skills DML (数据操作语言) Data Manipulation Language Language for accessing and manipulating the data organized by appropriate data model aka, query language (查询语言) Two classes of languages procedural (过程性): specifies what data is required and how to get these data declarative (声明性): specifies what data is required without knowing how to get these data SQL: most widely used query language DDL (数据定义语言) Data-defintion Language Specified notation for defining the database schema Example: create table account( account_no char(10), balance integer) DDL compiler generates a set of tables stored in a data dictionary (数据字典) Data dictionary contains metadata (元数据) Database schema Data storage and definition language specifies the storage structure and access methods (存储结构和访问方法) Integrity constraints domain constraints (类型约束) referential integrity (参考完整性) assertions (声明约束) Authorization (权限) SQL (结构化查询语言) Structural Query Language Principal language to describe and manipulate relational databases SQL-99 standard Two aspects: Data-Definition language (DDL) for declaring database schemas Data-Manipulation language (DML) for querying and modifying SQL: 3 kinds of relations tables (表):relations stored in the database allowing modification as well queries views (视图): relations defined by a computation, constructed when needed temporary tables (临时表): constructed by SQL processor when performing queries. Relational Algebra -- Operations on relations, construct new relations from old ones Four classes of operations set operations: union, intersection, and difference removing: “selection” eliminates some rows (tuples), and “projection” eliminates some columns relation combination: Cartesian product, join renaming: changing the relation schema, i.e., the names of the attributes and/or the name of relation itself. Outlines Procedural language (过程性语言) Six basic operators (基本操作) Select: σ (选择) Project: Π (投影) Union: ᴜ (并) Set difference: - (差) Cartesian product: × (笛卡尔积) rename: ρ (重命名) The operators take one or two relations as inputs and produce a new relation as a result Select Operator (选择) Relation r σ (A=B)Ʌ(D>5) (r) select operator Notation: σ p (r) p is called the selection predicate (选择谓词) Defined as p (r ) t | t r and p( t ) p is a formula in propositional calculus (命题演算) consisting of terms connected by: ˄(and), ˅(or), ¬(not) Each term can be one of <attribute> op (<attribute> or <constant>) where op can be: =, ≠, ≤, ≥, >, < Example of selection: σ name=“Charlie” (account) Project operator: example Relation r: ΠA,C(r): Project operator (投影) Notation: ΠA1,A2,…, Ak(r) A1, …, Ak are the attribute names and r is a relation name The result is defined as the relation of k columns obtained by erasing the columns that are not selected Duplicate rows are removed from result, since relation are sets Example: To eliminate the sname attribute of relation exam: Πsid, score(exam) union operator: example relation r, s: r ᴜ s: union operator Notation: r ᴜ s Defined as: r s {t | t r or t s} For union operation to be valid, r, s must have the same arity (same number of attributes) the attribute domains must be compatible (each column of the two operands should have the same data type) Example: find all students with A or Not pass set difference operator Relation r, s r–s set difference operator Notation: r – s Defined as: r s {t | t r and t s} Set differences must be taken between compatible relations r and s must have the same arity attribute domains of r and s must be compatible Cartesian product - example Relation r, s: r × s: Cartesian product Notation: r ×s Defined as: r s {t q | t r and q s} Assume that attributes of r(R) and s(S) are disjoint. That is, R ɅS=ø If attributes of r(R) and s(S) are not disjoint, then rename must be used. combination of operations a single expression can contain multiple operations Example: r×s σA=C(r × s) Renaming operator Renaming operator allows us to name, and therefore to refer to the results of relational-algebra expression (关系代数表达式) Allows us to refer to a relation by more than one name. Example: ρX(E) returns the expression E under the name X If a relational algebra expression E has arity n, then ρX(A1,A2,…, An)(E) returns the result of expression E under the name X, and with the attributes renamed to A1, A2, …, An Banking example branch(branch_name, branch_city, assets) customer(customer_name, customer_street, customer_city) account(account_number, branch_name, balance) loan(loan_number, branch_name, amount) depositor(customer_name, account_number) borrower(customer_name, loan_number) Examples Find loans over $1200: σamount>1200(loan) Find the loan numbers for loans with amount over 1200: o Πloan_number(σamount>1200(loan)) Find the customer names for all who have a loan, an account, or both, from the bank: Πcustomer_name(borrower) ᴜ Πcustomer_name(depositor) Example queries Find the names of all customers who have a loan at the Shanghai branch: Πcustomer_name(σbranch_name=“Shanghai”(σborrower.loan_numbe r=loan.loan_number(borrower×loan))) Find the names of all customers who have a loan at the Shanghai branch but do not have an account at any branch of the bank: Πcustomer_name(σbranch_name=“Shanghai”(σborrower.loan_number=loan.loan_number(borr ower×loan))) - Πcustomer_name(depositor) Two different strategies find the names of all customers who have a loan at the Shanghai branch (1) Πcustomer_name(σbranch_name=“Shanghai”(σborrower.loan_number=l oan.loan_number(borrower×loan))) (2) Πcustomer_name(σborrower.loan_number=loan.loan_number(σbranch_na me=“Shanghai”(loan))×borrower)) Example queries find the largest account balance Strategy: (1) Find those balances that are not the largest (a) rename account as d so that we can compare each account balance with all others (2) use set difference to find those account balances that were not found in the previous step Πbalance(account) – Πaccount.balance(σaccount.balance < d.balance(account ×ρd(account))) Formal definition A basic expression in the relational algebra consists of either one of the following: A relation in the database A constant relation Let E1 and E2 be relational-algebra expressions; the following are all relational-algebra expressions: E1∪E2 (并) E1 – E2 (差) E1 x E2 (笛卡尔积) σp(E1), p is a predicate on attributes in E1 (选择) Πs(E1), s is a list of some of the attributes in E1 (投影) ρx(E1), X is the new name for the result of E1 (更名) Additional operations Set intersection (交) Natural join (自然连接) Division (除) Assignment (赋值) We must keep in mind that additional operations do NOT add anything to relational algebra, but that simplify some common queries set intersection - example Relation r, s r∩s: Set intersection Notation: r∩s Defined as: r s {t | t r and r s} Assume: r, s have the same arity attributes of r and s are compatible Note: r∩s = r – (r-s) Natural join Notation: r s Let r and s be relations on schemas R and S, respectively. Then obtained s is a relation on schema Rr ∪S as follows: Consider each pair of tuples tr from r and ts from s. If tr and ts have the same value on each of the attributes in R∩S, add a tuple t to the result, where t has the same value as tr on r and t has the same value as ts on s Example: R= (A, B, C, D) S= (E, B, D) Result schema = (A, B, C, D, E) Natural join - example Relation r, s: r :s Division operator (除法) Notation: r ÷s Suitable for queries that include the phrase “for all” Let r and s be relations on schemas R and S, where R = (A1, …, Am, B1, …, Bn) S = (B1, …, Bn) The result of r ÷s is a relation on schema R – S = (A1, A2, …, Am) r s {t | t R S (r ) u s(tu r)} where tu means the concatenation of tuples t and u to produce a single tuple Division example 1 r ÷s: Division example 2 r ÷s: Assignment operator The assignment operator (←) provides a convenient way to expression complex queries write query as a sequential program consisting of a series of assignments followed by an expression whose value is displayed as a result of the query Assignment must always be made to a temporary relation variable Example: write r÷s as temp1 ← ΠR-S(r) temp2 ← ΠR-S((temp1×s) - ΠR-S,S(r) result = temp1 – temp2