Lesson 4 Overview
•
•
•
•
•
•
Descriptive Procedures
Procedures FREQ, CORR, REG, SGPLOT
Comment and Option Statements
Program 4 in course notes
LSB: See syllabus
LSB: Chapter 11 – Debugging Programs
Program 4
DATA weight;
INFILE ‘C:\SAS_Files\tomhs.dat' ;
INPUT @1 ptid $10.
@12 clinic $1.
@27 age 2.
@30 sex 1.
@58 height 4.
@85 weight 5.
@140 cholbl 3. ;
bmi = (weight*703.0768)/(height*height);
RUN;
PROC FREQ DATA=weight;
TABLES clinic sex ;
TITLE 'Frequency Distribution of Clinical
Center and Gender';
RUN;
Frequency Distribution of Clinical Center and Gender
The FREQ Procedure
Cumulative
Cumulative
clinic
Frequency
Percent
Frequency
Percent
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
A
18
18.00
18
18.00
B
29
29.00
47
47.00
C
36
36.00
83
83.00
D
17
17.00
100
100.00
Cumulative
Cumulative
sex
Frequency
Percent
Frequency
Percent
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
1
73
73.00
73
73.00
2
27
27.00
100
100.00
PROC FREQ DATA=weight;
TABLES clinic/ NOCUM ;
TITLE 'Frequency Distribution of Clinical
Center ';
TITLE2 '(No Cumulative Percentages) ';
RUN;
Frequency Distribution of Clinical Center
(No Cumulative Percentages)
The FREQ Procedure
clinic
Frequency
Percent
------------------------------A
18
18.00
B
29
29.00
C
36
36.00
D
17
17.00
*2-Way Frequency Tables ;
PROC FREQ DATA=weight;
TABLES sex*clinic ;
TITLE 'Cross Tabulation of Clinical
Center and Sex';
RUN;
*Adding a two-way plot ;
PROC FREQ DATA=weight;
TABLES sex*clinic/
PLOTS=FREQPLOT(TWOWAY=GROUPHORIZONTAL);
RUN;
Cross Tabulation of Clinical Center and Sex
The FREQ Procedure
Table of sex by clinic
sex
clinic
Percent men in clinic A
Frequency|
Percent |
Row Pct |
Col Pct |A
|B
|C
|D
| Total
---------+--------+--------+--------+--------+
1 |
12 |
20 |
30 |
11 |
73
| 12.00 | 20.00 | 30.00 | 11.00 | 73.00
| 16.44 | 27.40 | 41.10 | 15.07 |
| 66.67 | 68.97 | 83.33 | 64.71 |
---------+--------+--------+--------+--------+
2 |
6 |
9 |
6 |
6 |
27
|
6.00 |
9.00 |
6.00 |
6.00 | 27.00
| 22.22 | 33.33 | 22.22 | 22.22 |
| 33.33 | 31.03 | 16.67 | 35.29 |
---------+--------+--------+--------+--------+
Total
18
29
36
17
100
18.00
29.00
36.00
17.00
100.00
*Getting only the counts ;
PROC FREQ DATA=weight;
TABLES sex*clinic /
nopercent norow nocol;
RUN;
sex
clinic
Frequency|A
|B
|C
|D
Total
---------+--------+--------+--------+--------+
1 |
12 |
20 |
30 |
11 |
73
---------+--------+--------+--------+--------+
2 |
6 |
9 |
6 |
6 |
27
---------+--------+--------+--------+--------+
Total
18
29
36
17
100
OTHER USEFUL TABLE OPTIONS
• CHISQ – performs chi-square analyses
for 2-way tables
• MISSING – includes missing data as a
separate category
• LIST – makes condensed table (useful
when looking at 3-way or higher tables)
* Using PROC SGPLOT for bar charts;
ODS GRAPHICS /WIDTH=300px ;
PROC SGPLOT;
VBAR clinic;
TITLE "Vertical Bar Chart of Clinical
Center";
LABEL clinic = "Clinical Center";
Plot can be imbedded
into an HTML document
or kept as a separate
file. The file can be
inserted in Office
documents.
* Same plot displayed horizontally;
PROC SGPLOT;
HBAR clinic;
TITLE “Horizontal Bar Chart of Clinical
Center";
LABEL clinic = "Clinical Center";
* DATALABEL puts values on top of bar;
PROC SGPLOT;
YAXIS LABEL = "Mean Cholesterol"
VALUES = (0 to 300 by 50);
VBAR clinic/RESPONSE=cholbl STAT=MEAN DATALABEL ;
TITLE 'Mean Cholesterol by Clinical Center';
LABEL clinic = "Clinical Center";
RUN;
* Using SGPLOT to make regression plot;
PROC SGPLOT DATA=weight;
YAXIS LABEL = "Body Mass Index (BMI)" ;
XAXIS LABEL = "Age (y)" ;
REG X=age Y=bmi/CLM;
WHERE sex = 2;
TITLE 'Plot of BMI and Age for Women';
RUN;
PROC CORR DATA=weight;
VAR bmi age;
WHERE gender = 2;
TITLE 'Correlation of BMI and Age for Women';
RUN;
Pearson Correlation Coefficients, N = 27
Prob > |r| under H0: Rho=0
bmi
age
bmi
age
1.00000
-0.44397
0.0203
-0.44397
0.0203
1.00000
Correlation Coefficient
P-value testing if
correlation is
significantly different
from zero
ODS GRAPHICS ;
PROC REG DATA=weight ;
MODEL bmi=age;
WHERE gender = 2;
TITLE 'Simple Linear Regression';
RUN;
Partial Output
Parameter Estimates
Variable
Intercept
age
DF
Parameter
Estimate
Standard
Error
t Value
Pr > |t|
1
1
43.61312
-0.28964
6.40001
0.11710
6.81
-2.47
<.0001
0.0205
Regression equation: bmi = 43.61 - 0.29*age
*Note: many options for plotting within proc reg.
ODS graphics on will produce many plots by default.
Fit plot from PROC REG
Using Comments in Program
Two Purposes
1.Documenting your program
2.Temporarily delete part of a program
See page 3 LSB
Examples of Comment Code
* Run proc univariate for variable BMI;
*---------------------------------------------------------------------*
High resolution graphs can also be produced. The following makes a
plot of a histogram with the best fit normal curve and summary
statistics.
*---------------------------------------------------------------------*;
PROC UNIVARIATE DATA = weight PLOT
* ID ptid ;
VAR bmi;
;
PROC UNIVARIATE DATA = weight /*PLOT*/;
VAR bmi;
Temporarily Removing Code: Do not want to produce histogram
but may want to run it at another time
PROC UNIVARIATE DATA = weight;
VAR bmi;
/*
HISTOGRAM bmi / NORMAL MIDPOINTS=20 to 40 by 2;
INSET N
MEAN
STD
MIN
MAX
=
=
=
=
=
'N' (5.0)
'Mean' (5.1)
'Sdev' (5.1)
'Min' (5.1)
'Max' (5.1)/ POS=lm HEADER='Summary
Statistics';
*/
LABEL bmi = 'Body Mass Index (kg/m2)';
TITLE 'Histogram of BMI';
RUN;
What is wrong with this program ?
* This is my first SAS program
DATA bp;
INFILE ...
(more lines)
Option Statement
OPTION NOCENTER LINESIZE = 78;
OPTION NODATE NONUMBER;
Many, many options (run PROC OPTIONS)
Usually put at top of program
Can put in autoexec.sas so they will
always be in effect.