Using Excel When Doing Descriptive Statistics

advertisement
17:610:511 Research Methods
EXCEL
Dan O'Connor & Soyeon Park
2003
Using Excel When Doing
Descriptive Statistics
Excel can be useful when doing
statistical analyses and graphs. Excel is
widely available whereas SPSS is usually
found in university settings. Thus, this
handout recognizes the importance of
having you try to use Excel for course
511 whenever that is possible.
It is assumed that you know how
to input data into a spreadsheet. It can
be efficient to put the name of each
variable in the first row of the first
column above the variables’ scores. Save
your data set using three different
names—just in case a spreadsheet gets
messy and you want to start over
Descriptive Statistics
Use the Tools bar and press Data
Analysis… [If Data Analysis is not on the
Tools bar, then you will need to unzip it
and load it into Excel. You do this on the
Tools bar by pressing Add-Ins and then
punching the first item: Analysis
ToolPak.]
For simple descriptive statistics,
pull down the Tools bar, click Data
Analysis, and then click Descriptive
Statistics and press OK. This brings up a
dialog box with two sections: Input and
Output Options.
Input: Fill in the location of the cells of
the Input Range (e.g., a2:a21). Or, you
can use the mouse to go to the
spreadsheet and highlight cells A2 to A21
and your data’s cell locations will be
automatically entered into the Input
Range box. Your data will stay
highlighted and flashing if you use this
latter procedure. [Note: Excel will restate
what you type in the Input Range box by
adding some additional code; for example,
a typical entry might be restated as:
$A$2:$A$21
Next, indicate that your data is: Grouped
By:  Columns. If appropriate, punch
the box indicating:  Labels In First Row.
Output options: Excel wants to know
where you want the descriptive statistical
results to be: on the same page as your
data OR on a different page (i.e., a New
Worksheet Ply, which is the default
option). You might want to list the
variable’s mean, median, etc., on the
same sheet as your data. If so, punch: 
Output Range which then lights up a
location box. You MUST click on that box
to get the cursor there. In that box, give
an arbitrary location for your descriptive
statistics; for example, type in location
f1:g15. [Note: Excel will restate this as:
$F$1:$G$15. Optionally, you can also
highlight the output location using the
mouse.]
You MUST punch:  Summary
statistics. This will get you the mean,
median, etc. If you forget this step, Excel
will remind you.
After you click OK, Excel will print
your statistics where you indicated. It
will be labeled Column1 and you will need
to expand each of the two columns. Do
this by moving the cursor to top of the
column and between alphabetic letters;
cursor will change to a double arrow, ,
and this can be used to expand each
column.
Graphic Displays: Histogram
Pull down Tools and click on Data
Analysis again. Click on Histogram in
the dialog box. Fill in the Input Range
(e.g. a2:a21) and the Output Range (e.g.
a30:d60). Leave Bin Range alone for
now. Be sure to punch  Chart Output.
Then click OK.
Excel will return to your
worksheet; move down to the location
area you specified and your Bin and
Frequency will be listed to the left of your
histogram. The histogram will be
610:511 Research Methods
EXCEL
compressed and appear as a bar chart.
Note the number of Bin categories. You
may want to change this by typing
different scale values to break your
frequency distribution into six to eight
groupings of data. Go back to your
original sheet and type these new bin
numbers (e.g., h1:h8). Here is the reason
for doing this: histograms should have
between six to eight categories for small
samples. If your number range goes from
1 through 14 then type in 2 4 6 8 10 12
14 in separate cells. (Note that this puts
the histogram into seven categories.) Go
back to the histogram dialog box and
insert the cell locations for this new bin.
Change the Output Range (e.g. e30:g60).
To eliminate the word “More” (an
option): After your histogram is
produced, highlight the two columns with
the new bin numbers and frequency
counts and then double click the
histogram or do a right click. In the
dialog box, select Source Data and
change the value to eliminate the location
of the cell using the word “More.”
If your range had been from 10 to
40 (about 30 on a scale), divide this by 7
to get 4 numbers per bin. In this case,
you might specify the following as bins:
10, 14, 18, 22, 26, 30, 34, 38. You may
need to play around with the number of
bins you will use. Try to get six to eight
groupings of the data. Your histogram
will need at least two fixes: expand its
size, and pull its bars together to show it
reflects continuous data (and is not a bar
chart).
Expand graph’s size: If you click
your histogram you will be able to drag a
corner using a diagonal arrow (e.g. in the
lower right corner, use the mouse to get
an arrow such as  . You can then drag
this arrow down and to the right to get a
larger graph.
Now, double click on one of the
histogram’s bars. This should pull up a
page 2
dialog box, Format Data Series, with six
folders. [If you pull up Format Axis by
mistake, just close it.] Go to the Options
folder and change the Gap width from
150 to 0. You will see the bars come
together. Close this box and return to
the histogram.
You can change the titles and
axes labels directly by retyping them. If
you should pull up a three folder box
(usually with a right mouse click but
sometimes with a double click), Format
Chart Title or Format Axis Title, simply
close them. When you click on text, your
mouse will eventually go to a straight line
where you can change text. You may
have to double click a vertical axis label
to get it horizontal, change the label, and
then double click it back to its vertical
position. Do not spend too much time on
the histogram. You can delete extra
histogram by clicking on the entire graph
(it gets surrounded by lines with solid
boxes in each corner and at each midpoint); then press delete.
Bar charts:
Two types of bar charts can be
drawn appropriate to our purposes: one
of the number of cases for each category
in a categorical variable; the other of a
numerical variable’s sub-group means.
Below is a brief explanation of how to do
each type.
Simple bar graph using one categorical
variable:
Assume you have a gender
variable showing that the sample has 12
men and 8 women in it. The simplest
and fastest way to produce a bar graph
with these data is to enter the numbers
12 and 8 into two adjacent cells, go
directly to the Chart Wizard (the symbols
for Chart Wizard is a bar chart next to
the globe on the standard tool bar).
Open up the Chart Wizard and
specify a Column Chart type (it should
already be highlighted as the first on the
610:511 Research Methods
EXCEL
chart list). For Chart sub-type, specify
the first chart on that list (also, already
highlighted). Go to Next step. Step 2
should already show your bar chart with
12 and 8. (If not, indicate the location of
the summary data, and specify column
data). Step 3, Chart Options, allows you
to type in the Chart title, X and Y axis
titles, and specify gridlines, legend, data
labels, and data table. You can ignore
most of this or experiment with it since
you can see directly what it does as you
select various options. Then, go to Step 4
OR go to Finish. At Finish, your chart
will appear on your spreadsheet.
Using COUNTIF to count cells:
Let us assume you had a very
large sample and did not have a quick
count of the number of men and women.
You can instruct Excel to count these for
you by using the COUNTIF function.
First, click on a blank cell where Excel
will put the results. Then click the Paste
Function button, fx (on main toolbar line
between  and AZ sort buttons). In the
All or Statistical categories double click
the COUNTIF function. Fill in the dialog
boxes. For the Criteria dialog box simply
fill in any cell with the value you want to
use for counting. Then, do this same
procedure filling in a different Criteria for
another sub-group.
Bar Chart using numerical (continuous)
variable broken into subgroups by
categorical variable:
Since Excel is not set up the way
SPSS is, you will have to produce the
separate table SORTED by your
categorical variable. That way you can
easily fill in the sequence of cells to
produce descriptive statistics,
histograms, and bar charts. SORT is a
simple procedure. (Again, make a copy of
your data; also, remember that the edit
menu has an Unsort option after sorting
to undo anything you did not want
permanent.)
page 3
To SORT: highlight a single cell
in the data matrix (e.g. M representing
one score indicating a Man). Then, click
the ascending sort key on the main tool
bar (AZ with arrow pointing down). All
contiguous columns will be sorted
keeping the integrity of each row intact.
Note that an adjacent column may be
sorted; always try to keep a blank column
between your data matrix and output
results. Now you can treat each
subgroup as you did the overall data—
simply by identifying the location of each
subgroup (e.g. for Input range, specify
a2:a11 to analyze Men; you might specify
a12:a21 for Women).
You can then use the results of
the descriptive statistical analyses of each
subgroup to create bar charts. Simply
give the location of the means as the two
input cells needed to produce a column
bar chart.
There are several other ways to
accomplish this same thing. A second
method to create a bar chart using subgroup averages can be accomplished by
pressing the Function Paste key and
selecting AVERAGE. Identify the range of
your subgroup’s continuous variable and
report the average in a blank cell. Do
this for the other sub-group average.
Then use this as input for a column bar
chart.
Thirdly, you can create cells with
formulas in them to compute counts or
averages or sums by pressing the equal
sign next to the input window (below the
tool bars but above the spreadsheet data.
Then hit the left down arrow for functions
available. This method takes some
practice and you might want to
experiment with it. Usually, the Function
Paste key can do this and keep your
output results clear. More experienced
Excel users will create shortcuts using
formulas embedded in cells.
610:511 Research Methods
EXCEL
Lastly, you can use the SORT
button for small data sets but you should
know that recent versions of Excel have a
clever device for larger data sets: the Pivot
Table Wizard. Click on the Data menu
and then click Pivot Table Report. Excel
will then walk you through this procedure
using three or four steps (similar to the
Chart Wizard). You can experiment with
pivot tables on your own.
Scattercharts:
Excel can easily create
scattercharts in several different ways.
For now, you can go to the Chart Wizard
and select XY Scatter. Select the first
option among the charts (i.e., not the
ones with lines). In Step 2, fill in the
range of TWO numeric variables (e.g.,
a2:b31). Indicate that this Series is in:
 Columns. In Step 3, you can modify
the chart as you see fit. Then, go to
Finish OR go to Step 4 and go to Finish.
Later, we will create scattercharts as a
byproduct of simple linear regression.
Inferential Statistics
t-tests
Computing a t-test in Excel is
straightforward AFTER you have SORTed
your data matrix. SORT is described
above but its essentials will be repeated
here. Go to your raw score data matrix in
the spreadsheet. Hopefully, you will have
a blank column to the right of the matrix
and a blank row below it (or else SORT
will have an effect on that adjacent data).
To do the sort, simply highlight one cell
in your categorical variable and then
press the AZ key on the toolbar and,
instantly, the entire matrix will be sorted
by your categorical variable.
To do the t-test, go to Tools, then
Data Analysis, then t-Test: Two-Sample
Assuming Equal Variances. This brings
up a dialog box requesting Variable 1
Range. Here, put in cells for category A
variable by entering the Dependent,
page 4
continuous variable’s cell locations, e.g.,
b12:b21). Then, enter Variable 2 Range
(put in cells for category B variable by
entering the Dependent, continuous
variable’s cell locations, e.g., b22:b31).
You do not need to fill much else
into this dialog box unless you want to
experiment with it. Press OK and the
results will display the mean, variance,
etc. statistics. You will need to expand
the column headers to read the results.
The t-statistic will be reported (as
t Stat). It will be a positive OR negative
number (check your means to determine
how Excel computed it). You can
interpret this t Stat by comparing it to the
last line in the results table: t Critical
two-tail. This gives you the same t crit
you can get from the table in the Williams
text. Above this number in your results
table in Excel is the probability of
achieving this finding by chance alone.
You might see: P(T<=t) two-tail
1.43669E-05. You can interpret this as:
the probability of achieving this t-actual
value by chance is less than or equal to
.000014367, or less than one chance in
50,000. (The 50,000 was arrived at by
taking the unit reciprocal of .00002; thus,
1 .00002.)
Correlation & Linear Regression
For correlations, go to Tools, Data
Analysis, and then Correlation. In the
Input Range box type the cells of your
two continuous variables for which you
want a correlation coefficient (e.g.,
b12:c31). Confirm that the data is
Grouped By:  Columns. Punch OK and
you will get a simple table of results
which show the correlation coefficient
between your two variables; e.g., Excel
might report that between Column 1 and
Column 2, there is a correlation of
0.791715. You can interpret this r value
using the table I gave you. Interpretation
of a correlation needs to be done in
concert with the scatterplot you
constructed using these same data.
610:511 Research Methods
EXCEL
You can also get the correlation by
doing a simple linear regression on the
two variables. Go to Tools, Data
Analysis, and then Regression. Enter the
locations of each variable (e.g., at Input Y
Range, enter b12:b31; at Input X Range,
enter c12:c31). After OK, considerable
output appears and we will discuss what
this means in class.
One nice feature of Excel is that
the scatterplot can add the
regression line, regression formula
and R² value to the chart (although
you may have to reduce the type
size of the formula and delete a box
with Series… in it).
To request a correlation, go to Tools—
Data Analysis—Correlation and you can
identify as many variables as you want
for the input box. Excel then reports a
correlation matrix but does not indicate
significance levels.
When constructing the scatterplot
using Excel, have the Dependent Variable
to the right of the Independent Variable.
That way, the DV will plot on the proper Y
axis and the IV will plot on the X axis.
To add a trend line to a scatter
plot, click on the total scatterplot [so all
four corners are highlighted] and then go
to the upper tool bar and click on
Chart. The pull down menu has Add
Trendline and you will select that. Then
go to the Options Dialog box and check
two boxes: Display Trendline on Chart
and Display R-square Value on Chart.
An alternative is to request a
regression from Tools—Data Analysis and
my example in the separate handout does
that.
page 5
Download