Professional Seminar Northwestern Polytechnic University By Dr. Michael M Cheng Quiz Select the following multiple choices. What is SAS? a. SAS is a highly contagious disease found in the winter time in Asia. b. SAS is sardines and salmon. c. SAS is a software that compute statistics only. d. SAS is a 4th generation computer language capable of performing full feature computer programming. e. None of the above. SAS (SAS System) A computer software system that consists of several products that provide data retrieval, management, and analysis capabilities in addition to programming (SAS Institute, Inc.) SAS is a problem solving tool. Heuristic Problem Solving Image Mode 1 Linguistic Mode 1 Image Mode 2 Linguistic Mode 2 The interaction between image mode and linguistic mode is called Heuristic Problem Solving. Psychology of Communication By George Miller Coding Decoding Channel Capacity Magic number 7 plus or minus 2 For example: 2121568931 Psychology of Communication By George Miller Coding Decoding Channel Capacity Magic number 7 plus or minus 2 For example: ?????????? Psychology of Communication By George Miller Coding Decoding Channel Capacity Magic number 7 plus or minus 2 For example: 212-156-8931 SAS program source code is composed of many SAS statements, and some for PROC step, some for DATA step, and some used in either step. SAS statements begin with an identifying keyword and end with a semicolon; SAS statements are free-format. A SAS data set is a collection of data values arranged in a rectangular tables. The columns in the table are called variables. The rows in the table are called observations (or records). There are two kinds of variables: character variables number variables VARIABLES NAME SEX AGE HEIGHT WEIGHT ---------------------------------------------------------------------------------------------------------observations 1 JOHN M 12 59.0 99.5 observations 2 JAMES M 12 57.0 83.5 observations 3 AFLRED M 14 69.0 112.5 . . . . . . . . . . . . . . . . . . observations 19 ALICE F 12 56.5 84.0 DATA CLASS; INPUT NAME $1-8 SEX $11 AGE 13-14 HEIGHT 16-19 WEIGHT 21-25; CARDS; data lines PROC PRINT DATA=CLASS; PROC MEANS DATA=CLASS; VARIABLES HEIGHT WEIGHT; Creating SAS data sets Raw data DATA CLASS; INPUT NAME SEX AGE HEIGHT WEIGHT CARDS; $1-8 $11 13-14 16-19 21-25; CLASS A listing of the raw data NAME JOHN JAMES ALFRED WILLIAM JEFFREY RONALD THOMAS PHILIP ROBERT HENRY JANET JOYCE JUDY CAROL JANE LOUISE BARBARA MARY ALICE SEX M M M M M M M M M M F F F F F F F F F AGE 12 12 14 15 13 15 11 16 12 14 15 15 14 14 12 12 13 15 13 HEIGHT 59.0 57.3 69.0 66.5 62.5 67.0 57.5 72.0 64.8 63.5 62.5 67.0 64.3 62.8 59.8 56.3 65.3 66.5 56.5 WEIGHT 99.5 83.0 112.5 112.0 84.0 133.0 85.0 150.0 128.0 102.5 112.5 133.0 90.0 102.5 84.5 77.0 98.0 112.0 84.0 CARDS; JOHN JAMES ALFRED WILLIAM JEFFREY RONALD THOMAS PHILIP ALFRED ROBERT HENRY JANET JOYCE JUDY CAROL JANE LOUISE BARBARA MARY ALICE /* data lines */ M 12 M 12 M 14 M 15 M 13 M 15 M 11 M 16 M 14 M 12 M 14 F 15 F 15 F 14 F 14 F 12 F 12 F 13 F 15 F 13 59.0 57.3 69.0 66.5 62.5 67.0 57.5 72.0 69.0 64.8 63.5 62.5 67.0 64.3 62.8 59.8 56.3 65.3 66.5 56.5 99.5 83.0 112.5 112.0 84.0 133.0 85.0 150.0 112.5 128.0 102.5 112.5 133.0 90.0 102.5 84.5 77.0 98.0 112.0 84.0 OBS 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 NAME PROC PRINT DATA=CLASS; SAS SEX AGE HEIGHT JOHN JAMES ALFRED WILLIAM JEFFREY RONALD THOMAS PHILIP ALFRED HENRY JANET JOYCE JUDY CAROL JANE LOUISE BARBARA MARY ALICE M M M M M M M M M M F F F F F F F F F 12 12 14 15 13 15 11 16 14 14 15 15 14 14 12 12 13 15 13 59.0 57.3 69.0 66.5 62.5 67.0 57.5 72.0 69.0 63.5 62.5 67.0 64.3 62.8 59.8 56.3 65.3 66.5 56.5 WEIGHT 99.5 83.0 112.5 112.0 84.0 133.0 85.0 150.0 112.5 102.5 112.5 133.0 90.0 102.5 84.5 77.0 98.0 112.0 84.0 PROC MEANS DATA=CLASS; VARIABLES HEIGHT WEIGHT; SAS VARIABLES N WEIGHT HEIGHT MEAN STANDARD DEVIATION 19 100.026316 22.7739335 19 62.336842 5.1270752 MINIMUM VALUE 50.5000000 51.3000000 MAXIMUM VALUE 150.000000 72.000000 STD ERROR OF MEAN 5.22469867 1.17623173 THE PROC STEP The PROC (or PROCEDURE) statement is used to call a SAS procedure. SAS procedures are computer programs that: read SAS data sets, compute statistics, print results, and create SAS data sets. For example: PROC MEANS SUM MAXDEC=2 DATA=CLASS; PROC CONTENTS DATA=CLASS; PROC SORT DATA=CLASS; BY SEX DESCENDING WEIGHT; Data Transformations Assignment statement Assignment statements are used to create new variable and to modify values of existing variables. SAS evaluates an expression and assigns the result to a variable. variable = expression; i.e. x=1+2; Example: 1. Read three variables (YEAR, REVENUE, and EXPENSE) into a SAS data set. 2. Add a variable named INCOME, which is the difference between REVENUE and EXPENSE. 3. Change the values of YEAR from 2 digits to 4 digits. DATA PROFITS; INPUT YEAR REVENUE EXPENSE; INCOME=REVENUE–EXPENSE; YEAR = YEAR + 2000; CARDS; 00 5650 1050 01 6280 1140 PROC PRINT: SAS OBS 1 2 YEAR REVENUE EXPENSE INCOME 2000 2001 5650 6280 1050 1140 4600 5140 SAS functions Selected functions that compute simple statistics. SUM MEAN VAR MIN MAX STD sum arithmetic mean variance minimum value maximum value standard deviation Example: Given: Temperature data at a specific location are recorded every hour on the hour for several days. Each record in a file represents one day and contains the date and the 24 recorded temperatures for that date. Objective: Create a SAS data set that contains the date, the 24 hourly temperatures, the average temperature, the minimum temperature and the maximum temperature for each day. DATA TEMP; INPUT DATE $1-7 @11 (T1-T24) (2.); AVGTEMP=MEAN(OF T1-T24); MINTEMP=MIN(OF T1-T24); MAXTEMP=MAX(OF T1-T24); CARDS; data lines program data vector DATE T1 . . . AVGTEMP MINTEMP MAXTEMP The RETAIN statement SAS normally resets all variables in the program data vector to missing before each execution of the DATA step. A RETAIN statement can be used to: - Retain variable values from the last execution of the DATA step - Give initial values to the valuables. Example: Accumulate totals and count observations. DATA ADD; RETAIN COUNT 0 TOTAL 0; INPUT SCORE; TOTALS=TOTAL+SCORE; CARDS; 10 5 3 7 . 6 4 PROC PRINT; program data vector COUNT TOTAL SCORE The SUM statement The SUM statement is a special assignment statement that accumulates values from one observation to the next. It retains the values of the created variable and treats a missing value as zero. Example: Accumulate totals and count observations. DATA ADD; INPUT SCORE; COUNT + 1; TOTALS=TOTAL+SCORE; CARDS; 10 5 3 7 . 6 4 PROC PRINT; CONDITIONAL EXECUTION OF SAS STATEMENT IF-THEN/ELSE Statements Use of the IF-THEN statement when you want to execute a SAS Statement conditional on some expression. Numeric Comparison IF CODE=1 THEN RESPONSE=‘GOOD’; IF CODE=2 THEN RESPONSE=FAIR’; IF CODE=3 THEN RESPONSE=‘POOR; For efficiency, use ELSE statements. IF CODE=1 THEN RESPONSE=“GOOD’; ELSE IF CODE=2 THEN RESPONSE=‘FAIR’ ELSE IF CODE=3 THEN RESPONSE=‘POOR”; Character comparison DATA CLASS; INPUT NAME $SEX $AGE HEIGHT WEIGHT; IF SEX=‘M’ THEN SEX=‘MALE’; ELSE SEX=‘FEMALE’; CARDS; Comparison operators LT GT EQ LE GE NE NL NG < < = <= >= less than greater than equal than less than or equal to greater than or equal to not equal not less than not greater than Logical operators OR AND NOT l & or, either and not, negation DO and END statements Execution of a DO statement specifies that all statements between the DO and its matching END statement are to be executed. For example: DATA EMPLOY; INPUT NAME $1-8 DEPNO 10-12 COM 14-17 SALARY 19-23; IF DEPTNO=201 THEN DO; DEPT=‘SALES’; GROSSPAY = COM+SALARY; END; ELSE DO; DEPT=‘ADMIN’; GROSSPAY = SALARY; END; CARDS; JOHNSON MOSSER LARKIN GARRETT 201 101 101 201 1500 18000 21000 24000 4800 18000 PROC PRINT output OBS 1 2 3 4 NAME JOHNSON MOSSER LARKIN GARRETT DEPTNO 201 101 101 201 SAS COM SARLARY 15000 18000 . 21000 . 24000 48000 18000 DEPT GROSSPAY SALES ADMIN ADMIN SALES 19500 21000 24000 22800 PROC SORT DATA=RATE_A; BY ZIP; PROC SORT DATA=RATE_B; BY ZIP; PROC SORT DATA=RATE_C; BY ZIP; DATA TMTL; MERGE RATE_A(IN=A) CTL_TBL(IN=B); BY ZIP; IF A & B; DATA TMMR; MERGE RATE_B(IN=A) CTL_TBL(IN=B); BY ZIP; IF A & B; DATA TMCR; MERGE RATE_C(IN=A) CTL_TBL(IN=B); BY ZIP; IF A & B; Conclusion 1. SAS is a 4th generation computer language. 2. SAS is a problem solving tool. 3. It makes your life easier (less stressful). THE END