Summary Statistics/Simple Graphs in SAS/EXCEL/JMP

advertisement
Summary Statistics/Simple
Graphs in SAS/EXCEL/JMP
Setting up Program in SAS
• In a CIRCA Lab (or maybe in your Department):
– START -> Programs -> SAS -> (Whatever Version)
– Three primary Windows will appear (ignore Explorer):
• Editor - This is a standard editor where you will type in or download
the commands that constitute your program
• Log - This is where the information regarding the running of a program
appears. It also will give error messages
• Output - Gives results of SAS program (hopefully)
– Enter a program in the Editor, then save it (filename.sas),
then run it by clicking on Submit (person running image)
– Text results will appear in the Output and can be save to a text
(List) file and should be named (fulename.lst). Be sure to
fully specify the .lst extension or it can overwrite program.
– Graphics output can be Copied and Pasted into word
processors
Writing a SAS Program
• Statements end with semi-colons and can be more than
one line long.
• OPTIONS Statements can be used for many things,
most importantly to frame the page dimensions.
• DATA steps are where data are entered (or external
datasets are targeted from), variable names assigned,
and any new variables created.
– Internal datasets are included in program
– External datasets are accessed using an INFILE statement
• PROC Steps are procedures that act on variables
defined or created in DATA steps.
Basic Form of a SAS Program
Options ps=54 ls=80; /* frames page to approx. 8.5x11 */
data one; /* This dataset will be called “one” */
input y group; /* Each line will contain two variables on a unit */
datalines; /* The data begins on the next line */
52 1
73 1
46 2
28 2
;
/* End of data on previous line */
run;
proc print; /* Prints dataset */
proc univariate; var y; /* Full-blown summary of variable y */
proc means; class group; var y; /* Mean, SD, min, max of y, for each group */
proc gplot; plot y*group; /* Scatterplot of y versus group */
proc boxplot; plot y*group; /* Side-by-side boxplots of y by group */
quit; /* Ends Program */
Using EXCEL for Summary Statistics
• Data Analysis ToolPack has a Descriptive Statistics
option which will compute many summary statistics
• Many statistical options also available. See Some
Useful EXCEL Functions on class website
• When obtaining summaries for multiple groups, it’s
helpful to create a separate column for each group and
copy summary commands across columns
Drawing Boxplots in EXCEL (I)
• Step 1: Place data for the various groups in different columns
(say A,B,C if there are 3 groups)
• Step 2: Obtain the five number summary for each column.
–
–
–
–
–
q1: =percentile(range,0.25)
min: =min(range)
median: =percentile(range,0.5)
max: =max(range)
q3: =percentile(range,0.75)
• Step 3: Create a table containing these results (using numbers!)
Statistic
q1
min
median
max
q3
Group A
Group B
Group C
q1(A)
q1(B)
q1(C)
min(A)
min(B)
min(C)
median(A) median(B) median(C)
max(A)
max(B)
max(C)
q3(A)
q3(B)
q3(C)
Drawing Boxplots in EXCEL (II)
• In Excel 97/2000/2003:
– Highlight whole table, including numbers and labels,
select Chart Wizard
– Choose Line Chart
– At step 2, choose Plot by Rows (Columns is default)
– On each data series, right-click, and use Format Data
Series and remove connecting Lines by selecting None
– Right-click on any data series, use Format Data Series,
then Options tab and click on switches for High-Low
lines and Up-Down Bars
– There will not be a line at median, but will be point.
Experiment with colors
Example: Impulse Rates of 5 Mollusc Species
Statistic Group A Group B Group C Group D Group E
q1
29.4
90.65
71.05
192
339.2
min
20
48.6
61.6
158
312.6
median
40.265
120
78.95
201
396.35
max
94.64
222
93.75
230
800
q3
55.225
149.3
84.9
227
492.85
900
900
800
800
700
700
600
600
q1
500
min
q1
500
median
min
median
max
400
max
400
q3
q3
300
300
200
200
100
100
0
0
Group A
Group B
Original Plot
Group C
Group D
Group E
Group A
Group B
Group C
Group D
Plot After Removing Lines
Group E
Example: Impulse Rates of 5 Mollusc Species
900
800
700
600
q1
500
min
median
max
400
q3
300
200
100
0
Group A
Group B
Group C
Group D
Group E
Obtaining Plots in EXCEL
• Enter data representing the variable on the horizontal
axis in the left-most column in the field to be used for
data in plot.
• Enter data representing the variable(s) on the vertical
axis in columns directly to the right-hand side of the
column containing the variable to be plotted on the
horizontal axis
• Click on Chart Wizard, then XY(Scatter), then
choose the desired style (points, smoothed lines,
jagged lines, etc). Follow steps on dialog box. You can
change scales on final graph by right-clicking on the
X- and Y-axes and selecting scale. Many other options
exist to improve plot quality
Example - Tombstone Weathering Scatterplot
• X=100-Year Mean SO2 Concentration of City (ug/m3)
• Y=Mean Tombstone Surface Recession Rate (mm/100yr)
Washington,DC (Rural)
Cincinnati,OH (Rural)
Philadelphia,PA (Rura
Richmond,VA
Fall River,MA
Hartford,CT
Evanston,IL
Albany,NY
Washington,DC
Louisville,KY
Providence,RI
Cambridge,MA
Baltimore,MD
Newark,NJ
Boston,MA
Pittsburgh,PA
Cincinnati,OH
Brooklyn,NY
Philadelphia,PA
Indianapolis,IN
Chicago,IL
12
20
20
46
48
92
91
94
102
117
122
142
142
178
180
197
224
234
239
244
323
0.27
0.14
0.33
0.81
0.84
1.08
1.78
1.21
1.09
1.72
1.18
1.01
1.9
1.98
1.53
2.71
2.41
1.61
2.51
2.15
3.16
3.5
3
2.5
2
1.5
1
0.5
0
0
50
100
150
200
250
300
350
Example - Interaction Plot of Means
•
•
•
•
Response: Seed Weight (Means of samples of size 6)
Factor A: # of fruits on truss (1,2,…,11)
Factor B: Position of Truss on plant (Low/High)
Goal: Plot Mean versus Factor A w/ separate lines for levels of B
Mean Seeds vs # Fruit on Truss by Trus Position on Plant
3.80
3.60
Low
1
2
3
4
5
6
7
8
9
10
11
High
3.54
3.45
3.28
3.50
3.09
3.31
3.06
3.04
3.43
3.28
2.93
3.40
3.59
3.53
3.45
3.42
2.84
3.32
3.32
3.30
2.88
3.31
2.70
3.20
Mean Seed Weight
#Fruit
3.00
Low
High
2.80
2.60
2.40
2.20
2.00
0
1
2
3
4
5
6
# Fruit on Truss
7
8
9
10
11
12
Importing Text Data into JMP
• Open JMP
• Select File  Open  Files of Type: Text Import
Preview
–
–
–
–
–
Select Fixed Width
Click off Table Contains Column Headers
Assign Names to Variables
Click Specify Fields
Highlight the the full field for each variable and click Set
Field for each variable (Every “column” should be in exactly
one field). Click OK when done. (Alternatively you can
directly specify the numbers of columns based on data
description file)
– Click Apply Settings, then OK
Summarizing a Single Variable in JMP
• Enter or Import the data in JMP
• Select Analyze  Distribution
• Select variable(s) to be summarized and click on
Y,Columns
• If you want these separate for different levels of grouping
variable(s), select the variable(s) and click on By
• Click OK
• Summary Stats, Outlier boxplot, and horizontal barchart
are printed. Click on red arrows for more options
• Copy and Paste can put output in word processor
Side-by-Side Boxplots in JMP
• Enter or Import Data into JMP
• Make any factor variables nominal by clicking on box
next to variable names in Columns box of data editor
window
• Select Analyze  Fit Y by X
• Click on Response Variable(s), then Y,Response
• Click on (nominal or ordinal) Factor variable(s), then
X,Factor
• Click OK (This gives a scatterplot)
• Click on Red Arrow in Oneway Analysis box, and
select Quantiles (This gives side-by-side boxplots)
• Copy and Paste into word processor
Download