Displaying Query Results • • Group 8- Last but not the Least Arturo Cantillep/Greg Lynch/Nora Troy-Shaw/Yanxin Zhao Displaying Query Results .1 Introduction Presenting Data/What the Boss Wants .2 Its Power Ordering Data/Producing an Ordered Report Enhancing Query Output Usage (Business Scenario) .3 Summarizing Data/Summary Functions .4 Count Functions .5 Data Grouping Groups and Subgroups .6 using the where clause .7 using the having clause .8 Using the find function .9 using boolean expression .10 Summary .1 Introduction • SQL – a one stop shop alternative for the following: • Data Step – it can create new table,new variables within a table , add insert data • Proc Report – do summary, count,mean, in a nice format too… • Proc Print – present the result to the user .1 Presenting Data Objectives • • Display a query’s result in a specified order Use SAS formats, labels, and titles to enhance the appearance and useability of a query’s output Business Scenario ( a very realistic one) • You had been instructed by by your boss to cut company costs by producing a list of old, highly paid sales staff and who had served the longest in the company .1 What the Boss Wants • • • Data on the service years of the employees ranked (descending) Data on the Salary (descending) of each employee Data on the age of each employee (descending) Sample Report 1. Mr. X 25 yrs. service yrs $50,000 salary 40yrs.old age of employee…. .2 Ordering Data Use the ORDER BY clause to sort query results in a specific order Descending order (by following the column name with the DESC keyword) Order the query results by specifying the following: Any column name from any table in the from clause, even if the column is not on the select list A column name or number representing the position of an item in the select list an expression a combination of any of the above individual items separated by commas .2 Ordering Data (Our Boss wish list…) From the Sales staff database, list the employee ID, employee name,their years of tenure with the company, their salary and their age Example1: Arrange it by descending years of tenure (Chopping block 1) PROC SQL; select employee_id, employee_name, int(yrdif(emp_Hire_date,today(),"ACTUAL")) as tenure, Salary,int(yrdif(Birth_date,today(),'ACTUAL')) as Age from arturo.salesstaff order by tenure desc; quit; • .2 Ordering Data PARTIAL PROC SQL OUTPUT …. The SAS System • • • • • • • • • • • • 17:38 Wednesday, May 26, 2010 48 Employee ID Employee_Name tenure Salary Age --------------------------------------------------------------------------------------------_______ 120172 Comber, Edwin 36 $28,345 66 120174 Simms, Doungkamol 36 $26,850 66 121086 Plybon, John-Michael 36 $26,820 65 120151 Phaiyakounh, Julianna 36 $26,520 65 121035 Blackley, James 36 $26,460 66 120166 Nowd, Fadi 36 $30,660 65 121060 Spofford, Elizabeth 36 $28,800 65 121073 Court, Donald 36 $27,100 61 121075 Sugg, Kasha 36 $28,395 65 120167 Tilley, Kimiko 36 $25,185 56 121138 Tolley, Hershell 36 $27,265 61 120154 Hayawardhana, Caterina 36 $30,490 65 ** - Tenure is in descending order .2 Producing an Ordered Report • Remember to sort the output in descending order of tenure and then by Age Proc SQL; select employee_id, employee_name, int(yrdif(emp_Hire_date,today(),"ACTUAL")) as tenure, Salary, int(yrdif(Birth_date,today(),'ACTUAL')) as Age from arturo.salesstaff where salary gt 30000 order by tenure desc, salary desc; Producing an Ordered Report • The Output Sample The SAS System 17:38 Wednesday, May 26, 2010 53 Employee Annual Employee ID Employee_Name tenure Salary Age --------------------------------------------------------------------------------------120166 Nowd, Fadi 36 $30,660 65 120154 Hayawardhana, Caterina 36 $30,490 65 121081 Knudson, Susie 34 $30,235 61 120125 Hofmeister, Fong 31 $32,040 55 120129 Roebuck, Alvin 24 $30,070 45 120159 Phoumirath, Lynelle 23 $30,765 46 120158 Pilgrim, Daniel 22 $36,605 45 121080 Chinnis, Kumar 22 $32,235 51 121021 Farren, Priscilla 16 $32,985 35 .2a Enhancing Query Output You can use SAS formats and tables to customize PROC SQL Output. In the SELECT list, after the column name, but before the commas that separate the columns, you can include the following: • Text in quotation marks (ANSI) or the label = Column modifier (SAS enhancement) to alter the column heading ie use labels instead of variable names • The FORMAT = column modifier to alter the appearance of the values in that column ie. Formatting cash amounts with dollar sign and commas PROC SQL: proc sql; select employee_id label= "Employee Identifier" , employee_name label= "Employee Name" , int(yrdif(emp_Hire_date,today(),"ACTUAL")) as tenure label="Years of Service", Salary label="Income" format = dollar12., int(yrdif(Birth_date,today(),'ACTUAL')) as Age label="Employee Age" from learn.salesstaff where salary gt 30000 order by tenure desc, salary desc; quit; Enhancing Query Output PARTIAL PROC SQL OUTPUT …. The SAS System 13:22 Thursday, May 27, 2010 Employee Identifier Employee Name 2 Years of Service Employee Income Age -----------------------------------------------------------------------------120166 Nowd, Fadi 120154 Hayawardhana, Caterina 121081 Knudson, Susie 120125 Hofmeister, Fong 120129 Roebuck, Alvin ** - Tenure is in descending order 36 36 34 31 24 $30,660 $30,490 $30,235 $32,040 $30,070 65 65 61 55 45 .2b Business Scenario Produce a report of salary listing of active employees who has wages above $30000 + their 7% bonus. The requestor provided this sketch of the desired report. Proposed Annual Savings Employee Number 9999 Salary + bonus: $32012.32 Additional Techniques to use: • define a new column containing the same constant character value for every row • Using SAS titles and footnotes • Use a combination of these techniques to produce the proposed Annual Savings Report .2 Enhancing Query Output The code: PROC SQL; title 'Annual Savings Plan 2011'; proc sql; select employee_id label="Employee ID", Salary*1.07 as NewSalary label="Salary 2011" format=dollar12.2 from arturo.salesstaff where Salary > 30000 and emp_term_date < 1 order by NewSalary desc; quit; • • • • • • TITLE and FOOTNOTE statements must precede the SELECT statement. PROC SQL has an option, DQUOTE=, which specifies whether PROC SQL treats values within double quotation marks (" ") as variables or strings. With the default, DQUOTE=SAS, values within double quotation marks are treated as text strings. With DQUOTE=ANSI, PROC SQL treats a quoted value as a variable. This feature enables you to use reserved words such as AS, JOIN, GROUP, or DBMS names and other names that are not normally permissible in SAS, such as table names, column names, or aliases. The quoted value can contain any character. Values in single quotation marks are always treated as text strings. Enhancing Query Output PARTIAL PROC SQL OUTPUT …. Annual Savings Plan 2011 14:51 Thursday, May 27, 2010 13 Employee ID Salary 2011 -----------------------------------120158 $39,167.35 121063 $38,509.30 121021 $35,293.95 121099 $35,015.75 120135 $34,764.30 121085 $34,491.45 121080 $34,491.45 121022 $34,464.70 120166 $34,446.51 .3 Summarizing Data Objectives Use functions to create summary queries. Group data and produce summary statistics for each group. 24 Summary Functions How a summary function works in SQL depends on the number of columns specified. If the summary function specifies only one column, the statistic is calculated for the column (using values from one or more rows). If the summary function specifies more than one column, the statistic is calculated for the row (using values from the listed columns). 25 The SUM Function (Review) The SUM function returns the sum of the non-missing arguments. General form of the SUM function: SUM(argument1<,argument2, ...>) argument includes numeric constants, expressions, or variable names. Only when all arguments are missing will the SUM function return a missing value. We need to find the total salary for all employees in the company. /*Find total salary for all active employees*/ proc sql; Title 'Total Salary For Active Employees'; select "TOTAL:" , sum(Salary) format =comma12.2 from greg.salesstaff where Emp_Term_Date is missing; quit; 26 Total Salary For Active Employees 23 08:24 Saturday, May 29, 2010 ------------------------------TOTAL: 3,512,777.25 .4 The COUNT Function The COUNT function returns the number of rows returned by a query. General form of the COUNT function: COUNT(*|argument) argument can be the following: * (asterisk), which counts all rows a column name, which counts the number of non-missing values in that column 33 Summary Functions Example: Determine the total number of current employees. proc sql; select count(*) as Count from learn.salestaff where emp_term_date is missing ; quit; PROC SQL Output Count ƒƒƒƒƒƒƒƒ 308 s103d07 34 Summary Functions A few commonly used summary functions are listed. Both ANSI SQL and SAS functions can be used in PROC SQL. SQL SAS AVG MEAN COUNT FREQ, N MAX MAX returns the largest value. MIN MIN returns the smallest non-missing value. SUM SUM returns the sum of non-missing values. NMISS 35 Description returns the mean (average) value. returns the number of non-missing values. counts the number of missing values. STD returns the standard deviation. VAR returns the variance. Proc SQL: Data Grouping .5 Grouping Data We can produce output calculated by group using SQL. Here, we calculate the average salary by gender: ods rtf file="grouping1.rtf"; proc sql; title "Grouping by Gender"; select Gender, avg(Salary) as Average_Salary format=dollar8. from learn.salesstaff group by gender; ods rtf close; run; Gender F M Average_Salary $27,924 $27,955 Groups and Subgroups We can produce output calculated by group and subgroup. Here, we calculate counts for each gender by job title: ods rtf file="grouping2.rtf"; proc sql; title "Counts by Job Title"; select Job_Title as Title, Gender, count(*) as Counts from staff group by Job_Title, gender; ods rtf close; run; Job_Title Sales Rep. I Gender F Counts 21 Sales Rep. I M 42 Sales Rep. II F 25 Sales Rep. II M 25 Sales Rep. III F 15 Sales Rep. III M 19 Sales Rep. IV F 7 Sales Rep. IV M 9 .6 Using the WHERE Clause We can create output specified within particular restraints by using the WHERE clause. Here, we have the average salaries for women employees by job title: ods rtf file="grouping3.rtf"; proc sql; title "Average Salary of Women Employees"; select job_title as Title, avg(salary) format=dollar8. as Average from staff where gender="F" group by job_title; ods rtf close; run; Job_Title Sales Rep. I Average $26,339 Sales Rep. II $27,368 Sales Rep. III $29,436 Sales Rep. IV $31,420 .7 Using the HAVING Clause Alternatively, we can use the HAVING clause to specify which data we want to display. Here, we count the number of women employees by job title. ods rtf file="grouping4.rtf"; proc sql; title "Count by Title of Women Employees"; select job_title as Title, count(*) as Women from staff group by Job_title, gender having gender eq "F" order by Count desc; ods rtf close; run; Job_Title Sales Rep. II Women 25 Sales Rep. I 21 Sales Rep. III 15 Sales Rep. IV 7 .8 Using the FIND Function The FIND function is a useful tool for locating data and counting data. Here, we use the FIND function to count the number of women employees and display salaries by job title. ods rtf file="grouping5.rtf"; proc sql; title "Summary Information of Women Employees"; select Job_title, count(find(gender, "F", "I")>0) as Women, avg(salary) format=dollar8. as Average from staff group by Job_title, gender having gender="F"; ods rtf close; run; Job_Title Sales Rep. I Women Average 21 $26,339 Sales Rep. II 25 $27,368 Sales Rep. III Sales Rep. IV 15 $29,436 7 $31,420 The find function • The FIND function returns the starting position of a substring within a string. NOTICE: the string must be character value. • The general form of FIND function is: FIND(string,substring<,modifier(s)><,startpos>) STRING --- constant, variable, or expression to be searched. SUBSTRING --- constant, variable, or expression sought within the string. MODIFIER(S) --- i=ignore case, t=trim trailing blanks. STARTPOS --- an integer specifying the start position and direction of the search. The find function • EXP: find the starting position of the substring F in the character variable Gender. proc sql; select gender, find(Gender,"F","t") "female_employee" from learn.Salesstaff ; quit; Because “F” is in the first position of the first substring, so the the value returned by FIND function is “1”; and “F” is not is the second substring , so the returned value is “0” .9 Using Boolean Expressions • Boolean expressions evaluate to TRUE(1) or FALSE(0). • They are used in this SELECT list to distinguish rows that have “F” in the Gender column. proc sql; select Job_Title,Gender, (find(Gender,"F","i")>0) "female_employee" from learn.Salesstaff ; quit; The boolean expression will produce the value 1 when Gender contains “F” and 0 when is does not. Using Boolean Expressions Female_Employee to Male_Employee Partial output Employee female_ Employee Job Title Gender employee ========================================== Sales Rep. I F 1 Sales Rep. I F 1 Sales Rep. II M 0 Sales Rep. III F 1 Sales Rep. I F 1 Sales Rep. II F 1 Sales Rep. II M 0 Sales Rep. II M 0 Sales Rep. IV M 0 Sales Rep. I M 0 Sales Rep. II M 0 Sales Rep. II M 0 Sales Rep. II F 1 Using Boolean Expressions Use the counted value to calculate the percentage. proc sql; title "Female_Employee to Male_Employee Ratios"; select job_title, sum((find(Gender,"F","i")>0)) as Female_employee, sum((find(Gender,"F","i")=0)) as Male_employee, calculated Female_employee/calculated Male_employee "F/M Ratio" format=percent8.1 from learn.salesstaff group by job_title ; quit; Using Boolean Expressions Female_Employee to Male_Employee Ratios Female_ Male_ F/M Employee Job Title employee employee Ratio ================================================================= Sales Rep. I 21 42 50.0% Sales Rep. II 25 25 100.0% Sales Rep. III 15 19 78.9% Sales Rep. IV 7 9 77.8% Summary • In summary, the SQL procedure provides the following capabilities: – – – – – – ordering of report enhanced report production through labels and formats summarization of data use of counts and data grouping use of selection criteria like where and having use of find function and boolean expressions These characteristics makes the analysis of data a breeze…. thanks to PROC SQL.