Introduction to Database Principles http://cbb.sjtu.edu.cn/course

advertisement
Chapter 2: Relational Algebra
-- Introduction to database principles
MaoyingWu (bi203.sjtu@gmail.com)
March 6, 2013
Relation terminology
Jargon (专业术语)
Vulgo (俗称)
Relation name (关系名)
Table name (表名)
Schema (关系模式)
Table header (表头/表描述)
Relation (关系)
2-d table (二维表)
Tuple (元组)
Row (行)
Attribute (属性)
Column (列)
Attribute name (属性名)
Column name (列名)
Attribute value (属性值)
Column value (列值)
Domain (域)
Column type (列类型)
Learning strategies
 Know how => know why
 适当的不求甚解
After-class assignment 1
 Install MySQL on your own computer
 Linux: from source package or LAMP
 Windows: from rebuilt package or XAMP
 Learn some basic knowledge on HTML/PHP and B/S
development
 Review of C programming skills
DML (数据操作语言)
Data Manipulation Language
 Language for accessing and manipulating the data organized
by appropriate data model
 aka, query language (查询语言)
 Two classes of languages
 procedural (过程性): specifies what data is required and how to get
these data
 declarative (声明性): specifies what data is required without
knowing how to get these data
 SQL: most widely used query language
DDL (数据定义语言)
Data-defintion Language
 Specified notation for defining the database schema
 Example:
create table account(
account_no
char(10),
balance integer)
 DDL compiler generates a set of tables stored in a data dictionary (数据字典)
 Data dictionary contains metadata (元数据)
 Database schema
 Data storage and definition language
 specifies the storage structure and access methods (存储结构和访问方法)
 Integrity constraints
 domain constraints (类型约束)
 referential integrity (参考完整性)
 assertions (声明约束)
 Authorization (权限)
SQL (结构化查询语言)
Structural Query Language
 Principal language to describe and manipulate relational
databases
 SQL-99 standard
 Two aspects:
 Data-Definition language (DDL) for declaring database schemas
 Data-Manipulation language (DML) for querying and modifying
SQL: 3 kinds of relations
 tables (表):relations stored in the database allowing
modification as well queries
 views (视图): relations defined by a computation,
constructed when needed
 temporary tables (临时表): constructed by SQL
processor when performing queries.
Relational Algebra
-- Operations on relations, construct new relations from old
ones
Four classes of operations
 set operations: union, intersection, and difference
 removing: “selection” eliminates some rows (tuples), and
“projection” eliminates some columns
 relation combination: Cartesian product, join
 renaming: changing the relation schema, i.e., the names of
the attributes and/or the name of relation itself.
Outlines
 Procedural language (过程性语言)
 Six basic operators (基本操作)
 Select: σ (选择)
 Project: Π (投影)
 Union: ᴜ (并)
 Set difference: - (差)
 Cartesian product: × (笛卡尔积)
 rename: ρ (重命名)
 The operators take one or two relations as inputs and
produce a new relation as a result
Select Operator (选择)
 Relation r
 σ
(A=B)Ʌ(D>5)
(r)
select operator
 Notation: σ p (r)
 p is called the selection predicate (选择谓词)
 Defined as
 p (r )  t | t  r and p( t )
 p is a formula in propositional calculus (命题演算) consisting of
terms connected by: ˄(and), ˅(or), ¬(not)
 Each term can be one of
 <attribute> op (<attribute> or <constant>)
 where op can be: =, ≠, ≤, ≥, >, <
 Example of selection:
 σ
name=“Charlie” (account)
Project operator: example
 Relation r:
 ΠA,C(r):
Project operator (投影)
 Notation: ΠA1,A2,…, Ak(r)
 A1, …, Ak are the attribute names and r is a relation
name
 The result is defined as the relation of k columns
obtained by erasing the columns that are not selected
 Duplicate rows are removed from result, since relation
are sets
 Example: To eliminate the sname attribute of relation
exam:
 Πsid, score(exam)
union operator: example
 relation r, s:
 r ᴜ s:
union operator
 Notation: r ᴜ s
 Defined as:
r  s  {t | t  r or t  s}
 For union operation to be valid,
 r, s must have the same arity (same number of attributes)
 the attribute domains must be compatible (each column of
the two operands should have the same data type)
 Example: find all students with A or Not pass
set difference operator
 Relation r, s
 r–s
set difference operator
 Notation: r – s
 Defined as:
r  s  {t | t  r and t  s}
 Set differences must be taken between compatible relations
 r and s must have the same arity
 attribute domains of r and s must be compatible
Cartesian product - example
 Relation r, s:
 r × s:
Cartesian product
 Notation: r ×s
 Defined as:
r  s  {t q | t  r and q  s}
 Assume that attributes of r(R) and s(S) are disjoint. That is, R
ɅS=ø
 If attributes of r(R) and s(S) are not disjoint, then
rename must be used.
combination of operations
 a single expression can contain multiple operations
 Example:
r×s
 σA=C(r × s)
Renaming operator
 Renaming operator allows us to name, and therefore to refer to
the results of relational-algebra expression (关系代数表达式)
 Allows us to refer to a relation by more than one name.
 Example:
 ρX(E) returns the expression E under the name X
 If a relational algebra expression E has arity n, then
ρX(A1,A2,…, An)(E)
returns the result of expression E under the name X, and with the
attributes renamed to A1, A2, …, An
Banking example
 branch(branch_name, branch_city, assets)
 customer(customer_name, customer_street, customer_city)
 account(account_number, branch_name, balance)
 loan(loan_number, branch_name, amount)
 depositor(customer_name, account_number)
 borrower(customer_name, loan_number)
Examples
 Find loans over $1200:
 σamount>1200(loan)
 Find the loan numbers for loans with amount over 1200:
o Πloan_number(σamount>1200(loan))
 Find the customer names for all who have a loan, an account,
or both, from the bank:
 Πcustomer_name(borrower) ᴜ Πcustomer_name(depositor)
Example queries
 Find the names of all customers who have a loan at the
Shanghai branch:
Πcustomer_name(σbranch_name=“Shanghai”(σborrower.loan_numbe
r=loan.loan_number(borrower×loan)))
 Find the names of all customers who have a loan at the
Shanghai branch but do not have an account at any branch of
the bank:
Πcustomer_name(σbranch_name=“Shanghai”(σborrower.loan_number=loan.loan_number(borr
ower×loan))) - Πcustomer_name(depositor)
Two different strategies
 find the names of all customers who have a loan at the
Shanghai branch
(1)
Πcustomer_name(σbranch_name=“Shanghai”(σborrower.loan_number=l
oan.loan_number(borrower×loan)))
(2)
Πcustomer_name(σborrower.loan_number=loan.loan_number(σbranch_na
me=“Shanghai”(loan))×borrower))
Example queries
 find the largest account balance
 Strategy:
(1) Find those balances that are not the largest
(a) rename account as d so that we can compare each account
balance with all others
(2) use set difference to find those account balances that were
not found in the previous step
Πbalance(account) – Πaccount.balance(σaccount.balance <
d.balance(account ×ρd(account)))
Formal definition
 A basic expression in the relational algebra consists of either one of the
following:
 A relation in the database
 A constant relation
 Let E1 and E2 be relational-algebra expressions; the following are all






relational-algebra expressions:
E1∪E2 (并)
E1 – E2 (差)
E1 x E2 (笛卡尔积)
σp(E1), p is a predicate on attributes in E1 (选择)
Πs(E1), s is a list of some of the attributes in E1 (投影)
ρx(E1), X is the new name for the result of E1 (更名)
Additional operations
 Set intersection (交)
 Natural join (自然连接)
 Division (除)
 Assignment (赋值)
 We must keep in mind that additional operations do NOT
add anything to relational algebra, but that simplify some
common queries
set intersection - example
 Relation r, s
 r∩s:
Set intersection
 Notation: r∩s
 Defined as:
r  s  {t | t  r and r  s}
 Assume:
 r, s have the same arity
 attributes of r and s are compatible
 Note: r∩s = r – (r-s)
Natural join
 Notation:
r  s
 Let r and s be relations on schemas R and S, respectively. Then
obtained
s
is a relation on schema Rr ∪S
as follows:
 Consider each pair of tuples tr from r and ts from s.
 If tr and ts have the same value on each of the attributes in R∩S, add a
tuple t to the result, where t has the same value as tr on r and t has
the same value as ts on s
 Example:
 R= (A, B, C, D)
 S= (E, B, D)
 Result schema = (A, B, C, D, E)
Natural join - example
 Relation r, s:

r  :s
Division operator (除法)
 Notation: r ÷s
 Suitable for queries that include the phrase “for all”
 Let r and s be relations on schemas R and S,
where
 R = (A1, …, Am, B1, …, Bn)
 S = (B1, …, Bn)
 The result of r ÷s is a relation on schema R – S =
(A1, A2, …, Am)
r  s  {t | t   R S (r )  u  s(tu  r)}
 where tu means the concatenation of tuples t and u to
produce a single tuple
Division example 1
 r ÷s:
Division example 2
 r ÷s:
Assignment operator
 The assignment operator (←) provides a convenient way to
expression complex queries
 write query as a sequential program consisting of a series of
assignments
 followed by an expression whose value is displayed as a result of
the query
 Assignment must always be made to a temporary relation
variable
 Example: write r÷s as
 temp1 ← ΠR-S(r)
 temp2 ← ΠR-S((temp1×s) - ΠR-S,S(r)
 result = temp1 – temp2
Download