CSC 321, Zhen Jiang Multiple Tables and Aggregate Statistics Consider the following database with two tables Emp (table of employees) and Dept (table of departments). Assume that there is a one-to-many relationship between Dept and Emp (Each dept has 0 or more employees, and each employee works in one department). Emp Dept Enum LName age salary Dnum wYears E101 Jones 45.00 $56,000.00 D25 12 E202 Anders 66.00 $46,000.00 D22 25 E303 Smith 34.00 $25,000.00 D22 9 E404 Rivera 22.00 $30,000.00 D25 1 E505 Brown 45.00 $80,000.00 D25 17 E606 Caldwell 52.00 $70,000.00 D28 20 E707 Stiles 44.00 $65,000.00 D28 11 E808 Walker 48.00 $90,000.00 D22 21 E909 Hartman 20.00 $25,000.00 D28 1 E222 Carter 29.00 $35,000.00 D25 3 Dnum Dname City D22 Web WC D25 Databases NY D28 Software LA Suppose we need to get information from both tables. Notice there is a chance to create a lot of redundant information. We create one table from the data in both tables, by matching rows with the same value in the common column. Note that each employee is lined up with the department that they work in. (Matching values of the Dnum column) Select Emp, Dept where Emp.Dnum = Dept.Dnum Enum LName age salary Dnum Dname City E202 Anders 66.00 $46,000.00 D22 Web WC E303 Smith 34.00 $25,000.00 D22 Web WC E808 Walker 48.00 $90,000.00 D22 Web WC E101 Jones 45.00 $56,000.00 D25 Databases NY E404 Rivera 22.00 $30,000.00 D25 Databases NY E505 Brown 45.00 $80,000.00 D25 Databases NY E222 Carter 29.00 $35,000.00 D25 Databases NY E606 Caldwell 52.00 $70,000.00 D28 Software LA E707 Stiles 44.00 $65,000.00 D28 Software LA E909 Hartman 20.00 $25,000.00 D28 Software LA Then, we just choose the result as the same as for single tables. For the reason not to use join operator: 1. Overhead cost: http://stackoverflow.com/questions/173726/when-and-why-are-database-joinsexpensive 2. Feasibility: http://dba.stackexchange.com/questions/4602/why-cant-we-perform-joins-in-adistributed-database-like-bigtable 3. Scalability: http://stackoverflow.com/questions/2623852/why-are-joins-bad-when-consideringscalability Aggregations (Statistics with Grouping) For these exercises you can download the database company.accdb from the class website: http://www.cs.wcupa.edu/~zjiang/company.zip. Phase 1: Sometimes we need to find aggregation, which combines rows together and performs some operation on their combined value. Common aggregations are count, sum and avg. For example: Find the average salary for all employees Find the maximum salary for all employees in D22 Find the average salary and a count of employees under 40 Find the number of employees under 30 making more than $40000 1) Show the average age (label as “average age”) and average salary (label as “average salary”) for all employees Phase 2 (the use of filter): Instead of finding statistics for every row in the entire table, we will use the “where” clause to restrict which rows are used before the aggregation. 2) For all employees under 30, show a count of these employees and their average salary. 3) Show the max, min and average age for all employees in Dept “Software”. 4) For all employees in department “D22” or “D28”, show a count of these employees, their average age, and maximum salary. Phase 3 (the use of group division): The “group by” clause is used in conjunction with the aggregate functions to group the result-set, by one or more columns. 5) For each department list avg, max and min age, along with a count of employees in each department. 6) For each department find a count and average salary for all employees younger than 40. Phase 4 (restricting the groups chosen): Sometimes, we want to find groups that meet certain conditions (on check with group identity or values). For example: Show all departments with more than 3 employees and the average salaries and number of employees in these departments. This query is shown below The >3 refers to the calculated count. We apply it to each group NOT each row 7) For each department with an average salary of >52,000, list the dnums, employee count for these departments. 8) For each department with less than 4 employees, show the average age and number of employees. In Access, you may describe the group selection in the criteria right under the group-by column. In SQL computer program, this is implemented by “having” clause! In summary: Try this. 9) For each department with more than 2 people over 40 list deptNum, and a count of these people. Congratulations for making the query successfully!