Lesson 4 Overview • • • • • • Descriptive Procedures Procedures FREQ, CORR, REG, SGPLOT Comment and Option Statements Program 4 in course notes LSB: See syllabus LSB: Chapter 11 – Debugging Programs Program 4 DATA weight; INFILE ‘C:\SAS_Files\tomhs.dat' ; INPUT @1 ptid $10. @12 clinic $1. @27 age 2. @30 sex 1. @58 height 4. @85 weight 5. @140 cholbl 3. ; bmi = (weight*703.0768)/(height*height); RUN; PROC FREQ DATA=weight; TABLES clinic sex ; TITLE 'Frequency Distribution of Clinical Center and Gender'; RUN; Frequency Distribution of Clinical Center and Gender The FREQ Procedure Cumulative Cumulative clinic Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ A 18 18.00 18 18.00 B 29 29.00 47 47.00 C 36 36.00 83 83.00 D 17 17.00 100 100.00 Cumulative Cumulative sex Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1 73 73.00 73 73.00 2 27 27.00 100 100.00 PROC FREQ DATA=weight; TABLES clinic/ NOCUM ; TITLE 'Frequency Distribution of Clinical Center '; TITLE2 '(No Cumulative Percentages) '; RUN; Frequency Distribution of Clinical Center (No Cumulative Percentages) The FREQ Procedure clinic Frequency Percent ------------------------------A 18 18.00 B 29 29.00 C 36 36.00 D 17 17.00 *2-Way Frequency Tables ; PROC FREQ DATA=weight; TABLES sex*clinic ; TITLE 'Cross Tabulation of Clinical Center and Sex'; RUN; *Adding a two-way plot ; PROC FREQ DATA=weight; TABLES sex*clinic/ PLOTS=FREQPLOT(TWOWAY=GROUPHORIZONTAL); RUN; Cross Tabulation of Clinical Center and Sex The FREQ Procedure Table of sex by clinic sex clinic Percent men in clinic A Frequency| Percent | Row Pct | Col Pct |A |B |C |D | Total ---------+--------+--------+--------+--------+ 1 | 12 | 20 | 30 | 11 | 73 | 12.00 | 20.00 | 30.00 | 11.00 | 73.00 | 16.44 | 27.40 | 41.10 | 15.07 | | 66.67 | 68.97 | 83.33 | 64.71 | ---------+--------+--------+--------+--------+ 2 | 6 | 9 | 6 | 6 | 27 | 6.00 | 9.00 | 6.00 | 6.00 | 27.00 | 22.22 | 33.33 | 22.22 | 22.22 | | 33.33 | 31.03 | 16.67 | 35.29 | ---------+--------+--------+--------+--------+ Total 18 29 36 17 100 18.00 29.00 36.00 17.00 100.00 *Getting only the counts ; PROC FREQ DATA=weight; TABLES sex*clinic / nopercent norow nocol; RUN; sex clinic Frequency|A |B |C |D Total ---------+--------+--------+--------+--------+ 1 | 12 | 20 | 30 | 11 | 73 ---------+--------+--------+--------+--------+ 2 | 6 | 9 | 6 | 6 | 27 ---------+--------+--------+--------+--------+ Total 18 29 36 17 100 OTHER USEFUL TABLE OPTIONS • CHISQ – performs chi-square analyses for 2-way tables • MISSING – includes missing data as a separate category • LIST – makes condensed table (useful when looking at 3-way or higher tables) * Using PROC SGPLOT for bar charts; ODS GRAPHICS /WIDTH=300px ; PROC SGPLOT; VBAR clinic; TITLE "Vertical Bar Chart of Clinical Center"; LABEL clinic = "Clinical Center"; Plot can be imbedded into an HTML document or kept as a separate file. The file can be inserted in Office documents. * Same plot displayed horizontally; PROC SGPLOT; HBAR clinic; TITLE “Horizontal Bar Chart of Clinical Center"; LABEL clinic = "Clinical Center"; * DATALABEL puts values on top of bar; PROC SGPLOT; YAXIS LABEL = "Mean Cholesterol" VALUES = (0 to 300 by 50); VBAR clinic/RESPONSE=cholbl STAT=MEAN DATALABEL ; TITLE 'Mean Cholesterol by Clinical Center'; LABEL clinic = "Clinical Center"; RUN; * Using SGPLOT to make regression plot; PROC SGPLOT DATA=weight; YAXIS LABEL = "Body Mass Index (BMI)" ; XAXIS LABEL = "Age (y)" ; REG X=age Y=bmi/CLM; WHERE sex = 2; TITLE 'Plot of BMI and Age for Women'; RUN; PROC CORR DATA=weight; VAR bmi age; WHERE gender = 2; TITLE 'Correlation of BMI and Age for Women'; RUN; Pearson Correlation Coefficients, N = 27 Prob > |r| under H0: Rho=0 bmi age bmi age 1.00000 -0.44397 0.0203 -0.44397 0.0203 1.00000 Correlation Coefficient P-value testing if correlation is significantly different from zero ODS GRAPHICS ; PROC REG DATA=weight ; MODEL bmi=age; WHERE gender = 2; TITLE 'Simple Linear Regression'; RUN; Partial Output Parameter Estimates Variable Intercept age DF Parameter Estimate Standard Error t Value Pr > |t| 1 1 43.61312 -0.28964 6.40001 0.11710 6.81 -2.47 <.0001 0.0205 Regression equation: bmi = 43.61 - 0.29*age *Note: many options for plotting within proc reg. ODS graphics on will produce many plots by default. Fit plot from PROC REG Using Comments in Program Two Purposes 1.Documenting your program 2.Temporarily delete part of a program See page 3 LSB Examples of Comment Code * Run proc univariate for variable BMI; *---------------------------------------------------------------------* High resolution graphs can also be produced. The following makes a plot of a histogram with the best fit normal curve and summary statistics. *---------------------------------------------------------------------*; PROC UNIVARIATE DATA = weight PLOT * ID ptid ; VAR bmi; ; PROC UNIVARIATE DATA = weight /*PLOT*/; VAR bmi; Temporarily Removing Code: Do not want to produce histogram but may want to run it at another time PROC UNIVARIATE DATA = weight; VAR bmi; /* HISTOGRAM bmi / NORMAL MIDPOINTS=20 to 40 by 2; INSET N MEAN STD MIN MAX = = = = = 'N' (5.0) 'Mean' (5.1) 'Sdev' (5.1) 'Min' (5.1) 'Max' (5.1)/ POS=lm HEADER='Summary Statistics'; */ LABEL bmi = 'Body Mass Index (kg/m2)'; TITLE 'Histogram of BMI'; RUN; What is wrong with this program ? * This is my first SAS program DATA bp; INFILE ... (more lines) Option Statement OPTION NOCENTER LINESIZE = 78; OPTION NODATE NONUMBER; Many, many options (run PROC OPTIONS) Usually put at top of program Can put in autoexec.sas so they will always be in effect.