Lecture 22 4/1/04 18:21 Lecture 22 Databases Numeric & Symbolic Computing (S&G, §§11.3 –11.4) §§11.3– 4/1/04 CS 100 - Lecture 22 1 Read S&G ch. 12 (Computer Networks) for next week 4/1/04 CS 100 CS 100 - Lecture 22 2 1 Lecture 22 4/1/04 18:21 Data Organization • A database is a collection of related files – analogy: all the file cabinets in a business • A file is a collection of related records – analogy: all the folders in one drawer (holding, say, the personnel records) • A record is composed of fields – analogy: the folder for a particular employee, containing, for example their name, employment history, pay rate, insurance information, evaluations 4/1/04 CS 100 - Lecture 22 3 Example File “Employee” ID Name Age PayRate Hours 86 Janet Kay 51 16.50 94 1560.40 123 Francine Perreira 18 8.50 185 1572.50 149 Fred Takasano 43 12.35 250 3087.50 71 John Kay 53 17.80 245 4361.00 165 Butch Honou 17 6.70 53 355.10 Field Record 4/1/04 CS 100 Pay CS 100 - Lecture 22 4 2 Lecture 22 4/1/04 18:21 How is this different from a spreadsheet? • Databases are typically oriented toward very large amounts of data – think of IRS databases, Wal-Mart employee & inventory databases • Therefore efficiency is critical: – efficiency of data storage – efficiency of retrieval • The data in a database is usually static – updated manually, not automatically 4/1/04 CS 100 - Lecture 22 5 Relational Database Model • A file is viewed as a table • Each table contains information about a number of instances of some entity – an entity is a fundamental distinguishable object, such as “employee” • Each instance of the entity is represented by a tuple – e.g., the data for a particular employee • Each tuple has a number of attributes – which characterize the instance (e.g., a particular employee’s attributes) • Primary key: attribute(s) that uniquely identify a tuple 4/1/04 CS 100 CS 100 - Lecture 22 6 3 Lecture 22 4/1/04 18:21 A Table for the “Employee” Entity ID Name Age 86 Janet Kay 51 16.50 94 1560.40 123 Francine Perreira 18 8.50 185 1572.50 149 Fred Takasano 43 12.35 250 3087.50 71 John Kay 53 17.80 245 4361.00 165 Butch Honou 17 6.70 53 355.10 Primary Key 4/1/04 PayRate Hours Tuple Pay Attribute CS 100 - Lecture 22 7 Query Languages • A query language allows users to: – – – – retrieve information from a database relate information in different files in a database update information in a database perform statistical and other data processing operations on selected information • SQL (Structured Query Language) – a standard query language – a textual language – sometimes used behind a graphical “front end” 4/1/04 CS 100 CS 100 - Lecture 22 8 4 Lecture 22 4/1/04 18:21 Example Query >SELECT ID, NAME, AGE, PAYRATE, HOURS, PAY >FROM EMPLOYEE >WHERE ID = 123; 123 Francine Perreira $8.50 185 $1572.50 18 > 4/1/04 CS 100 - Lecture 22 9 Example Query (2) >SELECT ID, NAME, AGE, PAYRATE, HOURS, PAY >FROM EMPLOYEE >WHERE NAME = ’John Kay’; 71 John Kay $4361.00 53 $17.80 245 > 4/1/04 CS 100 CS 100 - Lecture 22 10 5 Lecture 22 4/1/04 18:21 Example Query (3) >SELECT NAME, PAY >FROM EMPLOYEE >WHERE NAME = ’John Kay’; John Kay $4361.00 > 4/1/04 CS 100 - Lecture 22 11 Example Query (4) >SELECT * >FROM EMPLOYEE >ORDER BY PAYRATE; ID 165 123 149 86 71 4/1/04 CS 100 Name Butch Honou Francine Perreira Fred Takasano Janet Kay John Kay Age 17 18 43 51 53 PayRate $6.70 $8.50 $12.35 $16.50 $17.80 CS 100 - Lecture 22 Hours 53 185 250 94 245 Pay $355.10 $1572.50 $3087.50 $1560.40 $4361.00 12 6 Lecture 22 4/1/04 18:21 Example Query (5) >SELECT * >FROM EMPLOYEE >WHERE AGE > 21; ID 86 149 71 4/1/04 Name Janet Kay Fred Takasano John Kay Age 51 43 53 PayRate $16.50 $12.35 $17.80 Hours Pay 94 $1560.40 250 $3087.50 245 $4361.00 CS 100 - Lecture 22 13 Modifying Databases • DELETE * FROM EMPLOYEE WHERE AGE < 21; • UPDATE EMPLOYEE SET PAYRATE = 8.75 WHERE ID = 123; • INSERT INTO EMPLOYEE VALUES (456, ’Sandy Beech’, 13.25, 0, 0); 4/1/04 CS 100 CS 100 - Lecture 22 14 7 Lecture 22 4/1/04 18:21 Another Table Primary Key InsuredID PlanType DateIssued 86 A4 02/23/78 123 B2 12/03/91 149 A1 06/11/85 71 A4 10/01/72 149 B2 04/23/90 4/1/04 CS 100 - Lecture 22 15 Foreign Key • The “InsuredID” attribute is a foreign key because it is a primary key into a different table (EMPLOYEE) • Foreign keys establish relationships between tables • E.g., between the employee (with all his/her attributes) and the insurance plan (with all its attributes) 4/1/04 CS 100 CS 100 - Lecture 22 16 8 Lecture 22 4/1/04 18:21 Example Query of Joined Tables >SELECT EMPLOYEE.NAME, INSURANCE.PLANTYPE >FROM EMPLOYEE, INSURANCE >WHERE EMPLOYEE.NAME = ’Fred Takasano’ AND EMPLOYEE.ID = INSURANCE.INSUREDID; NAME Fred Takasano Fred Takasano PLANTYPE A1 B2 > 4/1/04 CS 100 - Lecture 22 17 Computer Science Issues • SQL is a very high-level language – nonprocedural – problem-specific • Performance in a major issue • Consistency issues with simultaneous updates • Distributed databases (files stored in many locations) – access time & consistency problems 4/1/04 CS 100 CS 100 - Lecture 22 18 9 Lecture 22 4/1/04 18:21 Numeric and Symbolic Computing 4/1/04 CS 100 - Lecture 22 19 Numeric Computation • Applications that make heavy use of real arithmetic • Especially used in science, engineering, economics, statistics, animation • The motivation for the first computers • Still drives the development of supercomputers and parallel computers a teraflop machine performs at least 1012 (a trillion) floating-point operations per second 36 Tflops already achieved (Japan’s Earth Simulator, which cost $350–500M) 4/1/04 CS 100 CS 100 - Lecture 22 20 10 Lecture 22 4/1/04 18:21 Computer Science Issues • Performance: – – – – better algorithms accessing of data in memory hierarchies parallel computation data communication in networks • Mathematical software libraries • Accuracy and stability of numerical approximations 4/1/04 CS 100 - Lecture 22 21 Symbolic Computing • Manipulate mathematical formulas, equations, etc. much the way a mathematician would – automate processes that are mechanical, tedious, and error-prone • Examples: Macsyma, Mathematica, Maple, MatLab 4/1/04 CS 100 CS 100 - Lecture 22 22 11 Lecture 22 4/1/04 18:21 Example: Simplification ( x −1) 2 2 + ( x + 2) + (2x − 3) + x • Simplify[(x-1)^2 + (x+2) + (2x-3)^2 + x] € • 12 - 12x + 5x2 4/1/04 CS 100 - Lecture 22 23 Example: Expansion (1+ x + 3y ) 4 • Expand[(1 + x + 3y)^4] • 1 + 4x + 6x2 + 4x3 + x4 + 12y € + 36xy + 36x2y + 12x3y + 54y2 + 108xy2 + 54x2y2 + 108y3 +108xy3 + 81y4 4/1/04 CS 100 CS 100 - Lecture 22 24 12 Lecture 22 4/1/04 18:21 Example: Solving Equations 2x + y = 11 6x − 2y = 8 • Solve[ {2x + y == 11, 6x - 2y == 8}, {x, € y}] • {{x -> 3, y -> 5}} 4/1/04 CS 100 - Lecture 22 25 Typical Expansion Rules Expand[ X × (Y + Z )] ⇒ X × Y + X × Z Expand[( X + Y ) × Z ] ⇒ X × Z + Y × Z Expand[ X 2 ] ⇒ X × X Hence, Expand[(n + 1)2] € ⇒Expand[(n + 1)(n + 1)] ⇒Expand[(n + 1)n + (n + 1)1] ⇒Expand[(n + 1)n + (n + 1)1] ⇒Expand[n×n + 1×n + n×1 + 1×1] 4/1/04 CS 100 CS 100 - Lecture 22 26 13 Lecture 22 4/1/04 18:21 Digression • Recall our discussion of formalized mathematics, and the idea of reducing mathematics to the mechanical application of formal rules • Formal rules: depend on the form of expressions, not their meaning • Symbolic computation is an application of the idea of a calculus 4/1/04 CS 100 - Lecture 22 27 Computer Science Issues • Symbolic computation systems are: – very high-level languages – problem-specific – nonprocedural • Depend on many algorithms, e.g.: – pattern matching – efficient management of complex data structures representing formulas • Results should be presented in a form familiar and useful to the mathematically literate 4/1/04 CS 100 CS 100 - Lecture 22 28 14