3 - WCU Computer Science

advertisement
CSC 321, Zhen Jiang
Multiple Tables and Aggregate Statistics
Consider the following database with two tables Emp (table of employees) and Dept
(table of departments). Assume that there is a one-to-many relationship between Dept and
Emp (Each dept has 0 or more employees, and each employee works in one department).
Emp
Dept
Enum LName age
salary Dnum wYears
E101 Jones
45.00 $56,000.00 D25 12
E202 Anders 66.00 $46,000.00 D22 25
E303 Smith 34.00 $25,000.00 D22 9
E404 Rivera 22.00 $30,000.00 D25 1
E505 Brown 45.00 $80,000.00 D25 17
E606 Caldwell 52.00 $70,000.00 D28 20
E707 Stiles
44.00 $65,000.00 D28 11
E808 Walker 48.00 $90,000.00 D22 21
E909 Hartman 20.00 $25,000.00 D28 1
E222 Carter 29.00 $35,000.00 D25 3
Dnum Dname City
D22 Web
WC
D25 Databases NY
D28 Software LA
Suppose we need to get information from both tables. Notice there is a chance to create a
lot of redundant information. We create one table from the data in both tables, by
matching rows with the same value in the common column. Note that each employee is
lined up with the department that they work in. (Matching values of the Dnum column)
Select Emp, Dept where Emp.Dnum = Dept.Dnum
Enum LName age
salary Dnum Dname City
E202 Anders 66.00 $46,000.00 D22 Web
WC
E303 Smith 34.00 $25,000.00 D22 Web
WC
E808 Walker 48.00 $90,000.00 D22 Web
WC
E101 Jones
45.00 $56,000.00 D25 Databases NY
E404 Rivera 22.00 $30,000.00 D25 Databases NY
E505 Brown 45.00 $80,000.00 D25 Databases NY
E222 Carter 29.00 $35,000.00 D25 Databases NY
E606 Caldwell 52.00 $70,000.00 D28 Software LA
E707 Stiles
44.00 $65,000.00 D28 Software LA
E909 Hartman 20.00 $25,000.00 D28 Software LA
Then, we just choose the result as the same as for single tables.
For the reason not to use join operator:
1. Overhead cost: http://stackoverflow.com/questions/173726/when-and-why-are-database-joinsexpensive
2. Feasibility: http://dba.stackexchange.com/questions/4602/why-cant-we-perform-joins-in-adistributed-database-like-bigtable
3. Scalability: http://stackoverflow.com/questions/2623852/why-are-joins-bad-when-consideringscalability
Aggregations (Statistics with Grouping)
For these exercises you can download the database company.accdb from the
class website: http://www.cs.wcupa.edu/~zjiang/company.zip.
Phase 1:
Sometimes we need to find aggregation, which combines rows together and performs
some operation on their combined value. Common aggregations are count, sum and avg.
For example: Find the average salary for all employees
Find the maximum salary for all employees in D22
Find the average salary and a count of employees under 40
Find the number of employees under 30 making more than $40000
1) Show the average age (label as “average age”) and average salary (label as “average
salary”) for all employees
Phase 2 (the use of filter):
Instead of finding statistics for every row in the entire table, we will use the “where”
clause to restrict which rows are used before the aggregation.
2) For all employees under 30, show a count of these employees and their average salary.
3) Show the max, min and average age for all employees in Dept “Software”.
4) For all employees in department “D22” or “D28”, show a count of these employees,
their average age, and maximum salary.
Phase 3 (the use of group division):
The “group by” clause is used in conjunction with the aggregate functions to group the
result-set, by one or more columns.
5) For each department list avg, max and min age, along with a count of employees in
each department.
6) For each department find a count and average salary for all employees younger than
40.
Phase 4 (restricting the groups chosen):
Sometimes, we want to find groups that meet certain conditions (on check with group
identity or values).
For example: Show all departments with more than 3 employees and the average salaries
and number of employees in these departments. This query is shown below
The >3 refers to the calculated count. We apply it to each group NOT each row
7) For each department with an average salary of >52,000, list the dnums, employee
count for these departments.
8) For each department with less than 4 employees, show the average age and number of
employees.
In Access, you may describe the group selection in the criteria right under the group-by
column. In SQL computer program, this is implemented by “having” clause!
In summary: Try this.
9) For each department with more than 2 people over 40 list deptNum, and a count of
these people.
Congratulations for making the query successfully!
Download