PROC SQL

advertisement
Displaying Query Results
•
•
Group 8- Last but not the Least
Arturo Cantillep/Greg Lynch/Nora Troy-Shaw/Yanxin Zhao
Displaying Query Results
.1 Introduction
Presenting Data/What the Boss Wants
.2 Its Power
Ordering Data/Producing an Ordered Report
Enhancing Query Output
Usage (Business Scenario)
.3 Summarizing Data/Summary Functions
.4 Count Functions
.5 Data Grouping
Groups and Subgroups
.6 using the where clause
.7 using the having clause
.8 Using the find function
.9 using boolean expression
.10 Summary
.1 Introduction
• SQL – a one stop shop alternative for
the following:
• Data Step – it can create new table,new variables
within a table , add insert data
• Proc Report – do summary, count,mean,
in a nice format too…
• Proc Print – present the result to the user
.1 Presenting Data
Objectives
•
•
Display a query’s result in a specified order
Use SAS formats, labels, and titles to enhance the appearance and
useability of a query’s output
Business Scenario
( a very realistic one)
• You had been instructed by by your boss to cut company costs by
producing a list of old, highly paid sales staff and who had served
the longest in the company
.1 What the Boss Wants
•
•
•
Data on the service years of the employees ranked (descending)
Data on the Salary (descending) of each employee
Data on the age of each employee
(descending)
Sample Report
1. Mr. X 25 yrs.  service yrs $50,000  salary
40yrs.old  age of employee….
.2 Ordering Data
Use the ORDER BY clause to sort query results in a specific order
Descending order (by following the column name with the DESC keyword)
Order the query results by specifying the following:
Any column name from any table in the from clause, even if the column is not on the select list
A column name or number representing the position of an item in the select list
an expression
a combination of any of the above individual items separated by commas
.2 Ordering Data
(Our Boss wish list…)
From the Sales staff database, list the employee ID, employee name,their
years of tenure with the company, their salary and their age
Example1: Arrange it by descending years of tenure (Chopping block 1)
PROC SQL;
select employee_id, employee_name,
int(yrdif(emp_Hire_date,today(),"ACTUAL")) as tenure,
Salary,int(yrdif(Birth_date,today(),'ACTUAL')) as Age
from arturo.salesstaff order by tenure desc;
quit;
•
.2 Ordering Data
PARTIAL PROC SQL OUTPUT ….
The SAS System
•
•
•
•
•
•
•
•
•
•
•
•
17:38 Wednesday, May 26, 2010 48
Employee ID Employee_Name
tenure
Salary
Age
--------------------------------------------------------------------------------------------_______
120172 Comber, Edwin
36
$28,345
66
120174 Simms, Doungkamol
36
$26,850
66
121086 Plybon, John-Michael
36
$26,820
65
120151 Phaiyakounh, Julianna
36
$26,520
65
121035 Blackley, James
36
$26,460
66
120166 Nowd, Fadi
36
$30,660
65
121060 Spofford, Elizabeth
36
$28,800
65
121073 Court, Donald
36
$27,100
61
121075 Sugg, Kasha
36
$28,395
65
120167 Tilley, Kimiko
36
$25,185
56
121138 Tolley, Hershell
36
$27,265
61
120154 Hayawardhana, Caterina
36
$30,490
65
** - Tenure is in descending order
.2 Producing an Ordered Report
• Remember to sort the output in descending order of
tenure and then by Age
Proc SQL;
select employee_id, employee_name,
int(yrdif(emp_Hire_date,today(),"ACTUAL")) as tenure,
Salary, int(yrdif(Birth_date,today(),'ACTUAL')) as Age
from arturo.salesstaff where salary gt 30000
order by tenure desc, salary desc;
Producing an Ordered Report
•
The Output Sample
The SAS System
17:38 Wednesday, May 26, 2010
53
Employee
Annual
Employee ID Employee_Name
tenure
Salary
Age
--------------------------------------------------------------------------------------120166 Nowd, Fadi
36
$30,660
65
120154 Hayawardhana, Caterina
36
$30,490
65
121081 Knudson, Susie
34
$30,235
61
120125 Hofmeister, Fong
31
$32,040
55
120129 Roebuck, Alvin
24
$30,070
45
120159 Phoumirath, Lynelle
23
$30,765
46
120158 Pilgrim, Daniel
22
$36,605
45
121080 Chinnis, Kumar
22
$32,235
51
121021 Farren, Priscilla
16
$32,985
35
.2a Enhancing Query Output
You can use SAS formats and tables to customize PROC SQL Output.
In the SELECT list, after the column name, but before the commas that separate
the columns, you can include the following:
•
Text in quotation marks (ANSI) or the label = Column modifier (SAS enhancement) to alter the
column heading ie use labels instead of variable names
•
The FORMAT = column modifier to alter the appearance of the values in that column
ie. Formatting cash amounts with dollar sign and commas
PROC SQL:
proc sql;
select employee_id label= "Employee Identifier" ,
employee_name label= "Employee Name" ,
int(yrdif(emp_Hire_date,today(),"ACTUAL")) as tenure label="Years of Service",
Salary label="Income" format = dollar12.,
int(yrdif(Birth_date,today(),'ACTUAL')) as Age label="Employee Age"
from learn.salesstaff where salary gt 30000
order by tenure desc, salary desc;
quit;
Enhancing Query Output
PARTIAL PROC SQL OUTPUT ….
The SAS System
13:22 Thursday, May 27, 2010
Employee
Identifier Employee Name
2
Years of
Service
Employee
Income
Age
-----------------------------------------------------------------------------120166 Nowd, Fadi
120154 Hayawardhana, Caterina
121081 Knudson, Susie
120125 Hofmeister, Fong
120129 Roebuck, Alvin
** - Tenure is in descending order
36
36
34
31
24
$30,660
$30,490
$30,235
$32,040
$30,070
65
65
61
55
45
.2b Business Scenario
Produce a report of salary listing of active employees who has wages above
$30000 + their 7% bonus. The requestor provided this sketch of the desired
report.
Proposed Annual Savings
Employee Number
9999
Salary + bonus: $32012.32
Additional Techniques to use:
•
define a new column containing the same constant character value for every row
•
Using SAS titles and footnotes
•
Use a combination of these techniques to produce the proposed Annual Savings Report
.2 Enhancing Query Output
The code:
PROC SQL;
title 'Annual Savings Plan 2011';
proc sql;
select employee_id label="Employee ID",
Salary*1.07 as NewSalary label="Salary 2011" format=dollar12.2
from arturo.salesstaff where Salary > 30000 and emp_term_date < 1 order by NewSalary desc;
quit;
•
•
•
•
•
•
TITLE and FOOTNOTE statements must precede the SELECT statement.
PROC SQL has an option, DQUOTE=, which specifies whether PROC SQL treats values within double quotation
marks (" ") as variables or strings.
With the default, DQUOTE=SAS, values within double quotation marks are treated as text strings.
With DQUOTE=ANSI, PROC SQL treats a quoted value as a variable. This feature enables you to use reserved
words such as AS, JOIN, GROUP, or DBMS names and other names that are not normally permissible in SAS,
such as table names, column names, or aliases. The quoted value can contain any character.
Values in single quotation marks are always treated as text strings.
Enhancing Query Output
PARTIAL PROC SQL OUTPUT ….
Annual Savings Plan 2011
14:51 Thursday, May 27, 2010 13
Employee ID Salary 2011
-----------------------------------120158 $39,167.35
121063 $38,509.30
121021 $35,293.95
121099 $35,015.75
120135 $34,764.30
121085 $34,491.45
121080 $34,491.45
121022 $34,464.70
120166 $34,446.51
.3 Summarizing Data
Objectives


Use functions to create summary queries.
Group data and produce summary statistics
for each group.
24
Summary Functions
How a summary function works in SQL depends on the
number of columns specified.
 If the summary function specifies only one column, the
statistic is calculated for the column (using values from
one or more rows).
 If the summary function specifies more than one
column, the statistic is calculated for the row (using
values from the listed columns).
25
The SUM Function (Review)
The SUM function returns the sum of the non-missing
arguments.
General form of the SUM function:
SUM(argument1<,argument2, ...>)
argument includes numeric constants, expressions, or
variable names. Only when all arguments are
missing will the SUM function return a missing
value.
We
need to find the total salary for all employees in the company.
/*Find total salary for all active employees*/
proc sql;
Title 'Total Salary For Active Employees';
select "TOTAL:" , sum(Salary) format =comma12.2
from greg.salesstaff where Emp_Term_Date is missing;
quit;
26
Total Salary For Active Employees
23
08:24 Saturday,
May 29, 2010
------------------------------TOTAL: 3,512,777.25
.4 The COUNT Function
The COUNT function returns the number of rows returned
by a query.
General form of the COUNT function:
COUNT(*|argument)
argument can be the following:
 * (asterisk), which counts all rows
 a column name, which counts the number
of non-missing values in that column
33
Summary Functions
Example: Determine the total number of current
employees.
proc sql;
select count(*) as Count
from learn.salestaff
where emp_term_date is missing
;
quit;
PROC SQL Output
Count
ƒƒƒƒƒƒƒƒ
308
s103d07
34
Summary Functions
A few commonly used summary functions are listed. Both
ANSI SQL and SAS functions can be used in PROC SQL.
SQL
SAS
AVG
MEAN
COUNT
FREQ, N
MAX
MAX
returns the largest value.
MIN
MIN
returns the smallest non-missing value.
SUM
SUM
returns the sum of non-missing values.
NMISS
35
Description
returns the mean (average) value.
returns the number of non-missing values.
counts the number of missing values.
STD
returns the standard deviation.
VAR
returns the variance.
Proc SQL: Data Grouping
.5 Grouping Data
We can produce output calculated by group using SQL.
Here, we calculate the average salary by gender:
ods rtf file="grouping1.rtf";
proc sql;
title "Grouping by Gender";
select Gender, avg(Salary) as
Average_Salary
format=dollar8. from
learn.salesstaff
group by gender;
ods rtf close;
run;
Gender
F
M
Average_Salary
$27,924
$27,955
Groups and Subgroups
We can produce output calculated by group
and subgroup. Here, we calculate counts
for each gender by job title:
ods rtf
file="grouping2.rtf";
proc sql;
title "Counts by Job
Title";
select Job_Title as Title,
Gender, count(*) as Counts
from staff group by
Job_Title, gender;
ods rtf close;
run;
Job_Title
Sales Rep. I
Gender
F
Counts
21
Sales Rep. I
M
42
Sales Rep. II
F
25
Sales Rep. II
M
25
Sales Rep. III
F
15
Sales Rep. III
M
19
Sales Rep. IV
F
7
Sales Rep. IV
M
9
.6 Using the WHERE Clause
We can create output specified within particular
restraints by using the WHERE clause. Here,
we have the average salaries for women
employees by job title:
ods rtf
file="grouping3.rtf";
proc sql;
title "Average Salary of
Women Employees";
select job_title as Title,
avg(salary) format=dollar8.
as Average from staff
where gender="F"
group by job_title;
ods rtf close;
run;
Job_Title
Sales Rep. I
Average
$26,339
Sales Rep. II
$27,368
Sales Rep. III
$29,436
Sales Rep. IV
$31,420
.7 Using the HAVING Clause
Alternatively, we can use the HAVING clause to
specify which data we want to display. Here,
we count the number of women employees by
job title.
ods rtf
file="grouping4.rtf";
proc sql;
title "Count by Title of
Women Employees";
select job_title as Title,
count(*) as Women from staff
group by Job_title, gender
having gender eq "F"
order by Count desc;
ods rtf close;
run;
Job_Title
Sales Rep. II
Women
25
Sales Rep. I
21
Sales Rep. III
15
Sales Rep. IV
7
.8 Using the FIND Function
The FIND function is a useful tool for locating data
and counting data. Here, we use the FIND function
to count the number of women employees and
display salaries by job title.
ods rtf
file="grouping5.rtf";
proc sql;
title "Summary Information
of Women Employees";
select Job_title,
count(find(gender, "F",
"I")>0) as Women,
avg(salary) format=dollar8.
as Average from staff
group by Job_title, gender
having gender="F";
ods rtf close;
run;
Job_Title
Sales Rep. I
Women Average
21 $26,339
Sales Rep. II
25 $27,368
Sales Rep.
III
Sales Rep.
IV
15 $29,436
7 $31,420
The find function
•
The FIND function returns the starting
position of a substring within a string.
NOTICE: the string must be character value.
•
The general form of FIND function is:
FIND(string,substring<,modifier(s)><,startpos>)
STRING --- constant, variable, or expression to be searched.
SUBSTRING --- constant, variable, or expression sought
within the string.
MODIFIER(S) --- i=ignore case, t=trim trailing blanks.
STARTPOS --- an integer specifying the start position and
direction of the search.
The find function
• EXP: find the starting position of the substring F in
the character variable Gender.
proc sql;
select gender,
find(Gender,"F","t")
"female_employee"
from learn.Salesstaff
;
quit;
Because “F” is in the first position of the first substring, so the the
value returned by FIND function is “1”; and “F” is not is the
second substring , so the returned value is “0”
.9 Using Boolean Expressions
• Boolean expressions evaluate to TRUE(1) or
FALSE(0).
• They are used in this SELECT list to distinguish
rows that have “F” in the Gender column.
proc sql;
select Job_Title,Gender,
(find(Gender,"F","i")>0)
"female_employee"
from learn.Salesstaff
;
quit;
The boolean expression will produce the value 1 when
Gender contains “F” and 0 when is does not.
Using Boolean Expressions
Female_Employee to Male_Employee Partial output
Employee female_
Employee Job Title Gender
employee
==========================================
Sales Rep. I
F
1
Sales Rep. I
F
1
Sales Rep. II
M
0
Sales Rep. III
F
1
Sales Rep. I
F
1
Sales Rep. II
F
1
Sales Rep. II
M
0
Sales Rep. II
M
0
Sales Rep. IV
M
0
Sales Rep. I
M
0
Sales Rep. II
M
0
Sales Rep. II
M
0
Sales Rep. II
F
1
Using Boolean Expressions
Use the counted value to calculate the percentage.
proc sql;
title "Female_Employee to Male_Employee Ratios";
select job_title,
sum((find(Gender,"F","i")>0))
as Female_employee,
sum((find(Gender,"F","i")=0))
as Male_employee,
calculated Female_employee/calculated Male_employee
"F/M Ratio" format=percent8.1
from learn.salesstaff
group by job_title
;
quit;
Using Boolean Expressions
Female_Employee to Male_Employee Ratios
Female_
Male_
F/M
Employee Job Title
employee
employee
Ratio
=================================================================
Sales Rep. I
21
42
50.0%
Sales Rep. II
25
25
100.0%
Sales Rep. III
15
19
78.9%
Sales Rep. IV
7
9
77.8%
Summary
• In summary, the SQL procedure provides the following capabilities:
–
–
–
–
–
–
ordering of report
enhanced report production through labels and formats
summarization of data
use of counts and data grouping
use of selection criteria like where and having
use of find function and boolean expressions
These characteristics makes the analysis of data a breeze….
thanks to PROC SQL.
Download