Group functions using SQL

advertisement
Group functions using SQL
Additional information in speaker
notes!
Group functions
1 update first_pay
2 set bonus = null
3 where name = 'Donald Brown’;
SQL> SELECT * FROM first_pay;
PAY_
---1111
2222
3333
4444
5555
6666
7777
8888
NAME
-------------------Linda Costa
John Davidson
Susan Ash
Stephen York
Richard Jones
Joanne Brown
Donald Brown
Paula Adams
JO
-CI
IN
AP
CM
CI
IN
CI
IN
STARTDATE
SALARY
BONUS
--------- --------- --------15-JAN-97
45000
1000
25-SEP-92
40000
1500
05-FEB-00
25000
500
03-JUL-97
42000
2000
30-OCT-92
50000
2000
18-AUG-94
48000
2000
05-NOV-99
45000
12-DEC-98
45000
2000
For these exercises, I wanted a null value in one of
the fields. The update above put a null value in the
bonus field for Donald Brown.
Group functions
SQL> SELECT COUNT(*)
2 FROM first_pay;
COUNT * essentially does a
count on everything and
ignores null values. I think of
it as a count of rows/records.
COUNT(*)
--------8
SQL> SELECT COUNT(name)
2 FROM first_pay;
COUNT(name) counts the names. In this case, each
row/record has a name so the result is 8.
COUNT(NAME)
----------8
SQL> SELECT COUNT(bonus)
2 FROM first_pay;
COUNT(BONUS)
-----------7
As shown on the previous slide, bonus now
has one record where the bonus is null.
Therefore the COUNT(bonus) returns 7.
Group functions
SQL> SELECT COUNT(NVL(bonus,0))
2 FROM first_pay;
COUNT(NVL(BONUS,0))
------------------8
SQL> SELECT COUNT(NVL(bonus,1000))
2 FROM first_pay;
COUNT(NVL(BONUS,1000))
---------------------8
In these examples, I am replacing
null values in bonus with a value. In
the first example, I replaced it with 0
and in the second example I replaced
it with 1000. It doesn’t matter what
the replacement is, what matters is
that it is no longer a null value and
therefore it shows I the count.
Group functions
In this SUM, the record with a
null value is ignored. The
total is 11000.
SQL> SELECT SUM(bonus)
2 FROM first_pay;
SQL> SELECT * FROM first_pay;
PAY_
---1111
2222
3333
4444
5555
6666
7777
8888
NAME
-------------------Linda Costa
John Davidson
Susan Ash
Stephen York
Richard Jones
Joanne Brown
Donald Brown
Paula Adams
JO
-CI
IN
AP
CM
CI
IN
CI
IN
STARTDATE
SALARY
BONUS
--------- --------- --------15-JAN-97
45000
1000
25-SEP-92
40000
1500
05-FEB-00
25000
500
03-JUL-97
42000
2000
30-OCT-92
50000
2000
18-AUG-94
48000
2000
05-NOV-99
45000
12-DEC-98
45000
2000
SUM(BONUS)
---------11000
SQL> SELECT SUM(NVL(bonus,0))
2 FROM first_pay;
In this SUM, the record with
the null value is set to 0. It is
now included in the sum but
has no impact because it is 0.
SUM(NVL(BONUS,0))
----------------11000
SQL> SELECT SUM(NVL(bonus, 1000))
2 FROM first_pay;
SUM(NVL(BONUS,1000))
-------------------12000
In this SUM, the record with
the null value is set to 1000. It
is included in the sum and
clearly impacts the total which
is now 1000 bigger.
Group functions
SQL> SELECT SUM(bonus), AVG(bonus)
2 FROM first_pay;
In this example, the sum is taken of the 7 rows
that do not contain null values and the sum is
divided by the count of the 7 rows that do not
contain null values to yield the average.
SUM(BONUS) AVG(BONUS)
---------- ---------11000 1571.4286
SQL> SELECT SUM(NVL(bonus,0)), AVG(NVL(bonus,0))
2 FROM first_pay;
SUM(NVL(BONUS,0)) AVG(NVL(BONUS,0))
----------------- ----------------11000
1375
In this example, the average
is taken using all 8 columns
because the NVL put a 0 in
the column that contained a
null. The sum divided by 8
is shown as the average.
SQL> SELECT SUM(NVL(bonus,1000)), AVG(NVL(bonus,1000))
2 FROM first_pay;
This time I am including the
1000 in the average so the sum
SUM(NVL(BONUS,1000)) AVG(NVL(BONUS,1000))
for the average is 1000 higher
-------------------- -------------------and the division is still by 8
12000
1500
giving me the answer of 1500.
Group functions
SQL> SELECT MIN(salary), MAX(salary)
2 FROM first_pay;
MIN(SALARY) MAX(SALARY)
----------- ----------25000
50000
SQL> SELECT MIN(bonus), MAX(bonus)
2 FROM first_pay;
This statement extracts the minimum
salary and the maximum salary from
the first_pay table.
This extract the minimum bonus
and the maximum bonus from the
first_pay table. Note that there is a
null value in this column that is not
dealt with.
MIN(BONUS) MAX(BONUS)
---------- ---------500
2000
SQL> SELECT MIN(NVL(bonus,0)), MAX(NVL(bonus,0))
2 FROM first_pay;
In this example, the null value is
MIN(NVL(BONUS,0)) MAX(NVL(BONUS,0))
replaced by 0 in both the MIN and
----------------- ----------------MAX function. This means that
0
2000
the MIN field now sees the field
with 0 as the minimum.
Group functions
SQL> SELECT jobcode, count(name)
2 FROM first_pay
3 GROUP BY jobcode;
JO COUNT(NAME)
-- ----------AP
1
CI
3
CM
1
IN
3
In this example, I want to get a
count of how many people there are
with each jobcode. This mean I
need to GROUP BY jobcode.
Because I am grouping on job code
and therefore looking for a total by
jobcode, I am allowed to SELECT
the jobcode field.
Since I want a count of the number
of people with a specific jobcode I
need to do a count. I put name in
count because I was thinking of
counting the people. Note that I
could have used COUNT(*) as
shown below.
SQL> SELECT jobcode, count(*)
2 FROM first_pay
3 GROUP BY jobcode;
JO COUNT(*)
-- --------AP
1
CI
3
CM
1
IN
3
Group functions
SQL> SELECT * FROM first_pay;
PAY_
---1111
2222
3333
4444
5555
6666
7777
8888
SQL>
2
3
4
NAME
-------------------Linda Costa
John Davidson
Susan Ash
Stephen York
Richard Jones
Joanne Brown
Donald Brown
Paula Adams
SELECT jobcode, COUNT(name)
FROM first_pay
WHERE salary <= 45000
GROUP BY jobcode;
JO COUNT(NAME)
-- ----------AP
1
CI
2
CM
1
IN
2
JO
-CI
IN
AP
CM
CI
IN
CI
IN
STARTDATE
SALARY
BONUS
--------- --------- --------15-JAN-97
45000
1000
25-SEP-92
40000
1500
05-FEB-00
25000
500
03-JUL-97
42000
2000
30-OCT-92
50000
2000
18-AUG-94
48000
2000
05-NOV-99
45000
12-DEC-98
45000
2000
In this example, I want to only
include people in the groups when
their salary is <= 45000. As you
can see this excludes one record
from the CI group and one record
from the IN group.
Group functions
SQL>
2
3
4
5
SELECT jobcode, COUNT(name)
FROM first_pay
WHERE salary <= 45000
GROUP BY jobcode
ORDER BY jobcode desc;
JO COUNT(NAME)
-- ----------IN
2
CM
1
CI
2
AP
1
SQL>
2
3
4
5
SELECT jobcode, COUNT(name)
FROM first_pay
WHERE salary <= 45000
GROUP BY jobcode
ORDER BY COUNT(name);
JO COUNT(NAME)
-- ----------AP
1
CM
1
CI
2
IN
2
In this example I want the output to be
ordered by jobcode in descending order.
The ORDER BY clause can be used to
achieve this goal.
Note on the previous slide, the results were
in default order which is in ascending order
by the GROUP BY column/field.
In this example, I want to order by
the count instead of by the group by
field/column. Again, the GROUP
BY clause can be used to achieve
this goal. Because I did not specify
ascending or descending, the default
of ascending is used.
Group functions
SQL> SELECT * FROM first_pay;
PAY_
---1111
2222
3333
4444
5555
6666
7777
8888
SQL>
2
3
4
JO
-AP
CI
CM
NAME
-------------------Linda Costa
John Davidson
Susan Ash
Stephen York
Richard Jones
Joanne Brown
Donald Brown
Paula Adams
JO
-CI
IN
AP
CM
CI
IN
CI
IN
STARTDATE
SALARY
BONUS
--------- --------- --------15-JAN-97
45000
1000
25-SEP-92
40000
1500
05-FEB-00
25000
500
03-JUL-97
42000
2000
30-OCT-92
50000
2000
18-AUG-94
48000
2000
05-NOV-99
45000
12-DEC-98
45000
2000
SELECT jobcode, COUNT(name)
FROM first_pay
WHERE jobcode != 'IN'
In this example I want to group by jobcode
GROUP BY jobcode;
except that in doing the grouping, I want to
exclude all records where the jobcode = ‘IN’
COUNT(NAME)
----------As you can see the results are correct.
1
3
1
Group functions
SQL> SELECT * FROM first_pay;
PAY_
---1111
2222
3333
4444
5555
6666
7777
8888
NAME
-------------------Linda Costa
John Davidson
Susan Ash
Stephen York
Richard Jones
Joanne Brown
Donald Brown
Paula Adams
JO
-CI
IN
AP
CM
CI
IN
CI
IN
STARTDATE
SALARY
BONUS
--------- --------- --------15-JAN-97
45000
1000
25-SEP-92
40000
1500
05-FEB-00
25000
500
03-JUL-97
42000
2000
30-OCT-92
50000
2000
18-AUG-94
48000
2000
05-NOV-99
45000
12-DEC-98
45000
2000
SQL> SELECT jobcode, bonus, SUM(salary)
2 FROM first_pay
3 GROUP BY jobcode, bonus;
This example groups by jobcode and then
bonus within jobcode. In fact there are only
JO
BONUS SUM(SALARY)
two records with the the same jobcode and
-- --------- ----------the same bonus, record 6666 and record
AP
500
25000
8888. They are shown at the bottom. For all
CI
1000
45000
of the other groupings there happens to be
CI
2000
50000
only one record.
CI
45000
CM
2000
42000
IN
1500
40000
IN
2000
93000
Group functions
SQL> SELECT * FROM donor;
IDNO
----11111
12121
22222
23456
33333
34567
NAME
--------------Stephen Daniels
Jennifer Ames
Carl Hersey
Susan Ash
Nancy Taylor
Robert Brooks
STADR
--------------123 Elm St
24 Benefit St
24 Benefit St
21 Main St
26 Oak St
36 Pine St
CITY
---------Seekonk
Providence
Providence
Fall River
Fall River
Fall River
ST
-MA
RI
RI
MA
MA
MA
ZIP
----02345
02045
02045
02720
02720
02720
DATEFST
YRGOAL CONTACT
--------- --------- -----------03-JUL-98
500 John Smith
24-MAY-97
400 Susan Jones
03-JAN-98
Susan Jones
04-MAR-92
100 Amy Costa
04-MAR-92
50 John Adams
04-APR-98
50 Amy Costa
6 rows selected.
SQL> SELECT state, contact, SUM(yrgoal)
2 FROM donor
3 GROUP BY state, contact;
ST
-MA
MA
MA
RI
CONTACT
SUM(YRGOAL)
------------ ----------Amy Costa
150
John Adams
50
John Smith
500
Susan Jones
400
This shows grouping by state
and then contact within state.
Two records go into MA Amy
Costa and two records go into
RI Susan Jones. The other two
totals are made up from one
record each.
Group functions
SQL> SELECT jobcode, MIN(salary), MAX(salary)
2 FROM first_pay
3 GROUP BY jobcode;
This shows the minimum and
maximum salary for each jobcode.
JO MIN(SALARY) MAX(SALARY)
-- ----------- ----------AP
25000
25000
CI
45000
50000
CM
42000
42000
IN
40000
48000
SQL> SELECT jobcode, AVG(salary)
2 FROM first_pay
3 GROUP BY jobcode;
Shows the average
JO AVG(SALARY)
salary of each
-- ----------jobcode group.
AP
25000
CI
46666.667
CM
42000
IN
44333.333
Group functions
SQL> SELECT jobcode, AVG(salary)
2 FROM first_pay
3 GROUP BY jobcode;
From previous slide.
JO AVG(SALARY)
-- ----------AP
25000
CI
46666.667
CM
42000
IN
44333.333
SQL> SELECT jobcode, MIN(AVG(salary)), MAX(AVG(salary))
2 FROM first_pay
3 GROUP BY jobcode;
Jobcode can not be
SELECT jobcode, MIN(AVG(salary)), MAX(AVG(salary)) used in this context.
*
ERROR at line 1:
ORA-00937: not a single-group group function
SQL> SELECT MIN(AVG(salary)), MAX(AVG(salary))
2 FROM first_pay
3 GROUP BY jobcode;
This returns the minimum
group average and the
MIN(AVG(SALARY)) MAX(AVG(SALARY))
maximum group average.
---------------- ---------------25000
46666.667
Group functions
SQL> SELECT jobcode, SUM(salary), SUM(bonus)
2 FROM first_pay
3 GROUP BY jobcode;
JO SUM(SALARY) SUM(BONUS)
-- ----------- ---------AP
25000
500
CI
140000
3000
CM
42000
2000
IN
133000
5500
SQL>
2
3
4
This example shows the sum
of salary and sum of bonus
for all jobcodes.
Now I decided I only wanted to see those groups
where either the sum of the salary was greater than
75000 or the sum of the bonus was greater than
3000. This excludes AP because it meets neither
criteria and it excludes CM because it also meets
neither criteria.
SELECT jobcode, SUM(salary), SUM(bonus)
FROM first_pay
GROUP BY jobcode
HAVING SUM(salary) > 75000 OR SUM(bonus) > 3000;
JO SUM(SALARY) SUM(BONUS)
-- ----------- ---------CI
140000
3000
IN
133000
5500
Because I am testing the groups
after they have been formed, I
have to use the HAVING clause.
Group function
SQL> SELECT jobcode, SUM(salary), SUM(bonus)
2 FROM first_pay
3 WHERE SUM(salary) > 75000 OR SUM(bonus) > 3000
4 GROUP BY jobcode;
WHERE SUM(salary) > 75000 OR SUM(bonus) > 3000
This is the error that
*
results from using the
ERROR at line 3:
WHERE clause
ORA-00934: group function is not allowed here
inappropriately. The
HAVING clause
should have been
used here as shown
on the previous slide.
SQL> SELECT jobcode, SUM(salary), SUM(bonus)
2 FROM first_pay
3 GROUP BY jobcode
4 HAVING SUM(salary) > 75000 OR SUM(bonus) > 3000;
JO SUM(SALARY) SUM(BONUS)
-- ----------- ---------CI
140000
3000
IN
133000
5500
Correct code using the HAVING clause
(copied from previous slide).
Download