Lecture 5: Tables and Figures (5tables)

advertisement
Statitistics
5tables.pdf
Michael Hallstone, Ph.D.
hallston@hawaii.edu
Lecture 5: Tables and Figures
Introduction
We use figures and tables because we want a way to visually represent our data -- “a quick
summary.”
Two ways to do it: pictures and tables
Data can be presented in either tables or figures (pictures). We use tables and figures to present a lot
of data in a quick, easy to understand manner. Most of the time we use a table or a figure to
describe a single variable in our study. In this lecture we will cover just a few of the ways to present
data or describe variables using tables or figures.
Tables
A table is something with numbers that are in rows and columns.
Obviously, when you learn how to do statistical tests on a computer it will present the results in a
table. But in this lecture we will concentrate on how to describe a single variable in your study using
a table. It is way to get a “quick” look at your variable.
The most common table for describing a variable is called a Frequency Distribution.
1
Example of a frequency distribution table
Here are some data for a variable “age.” There were 8 people in the sample: 3, 3, 4, 6, 6, 7, 7, 8
(n=8)
Picture of SPSS data set of Age
So count the number of 3’s. There are two of them. Count the number of 4’s. There is one. Count
the number of 5’s. There are none! Count the number of 6’s. There are two. Count the number of
7’s. There are 2. Count the number of 8’s. There is one. That translates to the following table.
SPSS version of a frequency distribution table of age
2
How did SPSS do that?
First column represents each category in the variable
First it listed each individual age that occurred in the sample in the far left hand column: we had ages
3, 4, 6, 7, and 8. We had no one in the study who was 1 or 2 or 5 years old. They are missing. So
first you create the first column.
value (x)
3
4
6
7
8
total
Next it created the second column of the “frequency counts”
In the frequency column, it counted the number of people in each age category. So, again, count the
number of 3’s. There are two of them. Count the number of 4’s. There is one. Count the number of
5’s. There are none! Count the number of 6’s. There are two. Count the number of 7’s. There are 2.
Count the number of 8’s. There is one. There were 8 total people in the sample. That’s why there
are 8 total.
value (x)
3
4
6
7
8
total
frequency (f)
2
1
2
2
1
n=8
3
Compute the percentages of each value
For the percentages divide by the frequency of each category by the total (8). So for example there
are 2 people who were age 3: 2/8=0.25 or 25% (0.25 x 100= 25). There was one person who was
age 4: 1/8=0.125 or 12.5% (0.125 x 100=12.5). And so on and so on
value (x)
3
4
6
7
8
total
frequency (f)
2
1
2
2
1
n=8
Fraction or%
2/8 or 25%
1/8 or 12.5%
2/8 or 25%
2/8 or 25%
1/8 or 12.5%
100 %
Or
value (x)
3
4
6
7
8
total
frequency (f)
2
1
2
2
1
n=8
%
25%
12.5%
25%
25%
12.5%
100 %
Computing cumulative percent
To compute the cumulative percent you add the cumulative percent row on top to the percent row
immediately below. There is nothing “above” the first cumulative percent row, so it is always exactly
equal to the first percent column. So for the first row of people who were aged 3: 25%+0=25%. For
the people who are aged 4: 25%+12.5%=37.5%. For the people who were aged 6:
37.5%+25%=62.5%. For those aged 7: 62.5%+25%=87.5%. For those aged 8:
87.5%+12.5%=100%.
value
frequency (f)
percent (%)
cumulative %
3
2
25
25
4
1
12.5
37.5
6
2
25
62.5
7
2
25
87.5
8
1
12.5
100
total
8
100
4
You could also create a frequency distribution table that includes the “missing values.”
Recall that we had no people who were ages that 1, 2, and 5. Those values are missing!!! The table
below has the missing values and also the percent (%) and cumulative % for each value.
Table 1: Frequency Distribution Table of Variable “Age”
cumulative
%
value (x)
frequency (f)
1
0
2
0
0.0%
0.0%
3
2
25.0%
25.0%
4
1
12.5%
37.5%
5
0
0.0%
37.5%
6
2
25.0%
62.5%
7
2
25.0%
87.5%
8
1
12.5%
100.0%
Total
n=8
0.0%
0.0%
5
Figures (Pictures)
In statistics or methods or in social sciences (of which public administration, or justice administration is
one) the technical name for a picture is a “figure.” I wish I could tell you why that is, but I can’t! It’s
just the way it is.
There are many different types of figures. What we will cover in this lecture is are two different kinds
of figures that can be used to express a “Frequency Distribution.” So that means you can “convert” or
“express” the frequency distribution table(s) above using a figure. The most common figures used for
this are bar charts and histograms. (Although there are many other types of figures, such as pie
charts, I’m only going to show you how to do two.)
Different pictures (or figures) can be used for different types of data
You may choose to use these “suggestions” for the type of figure to represent continuous/discrete
data or not. They are not real rules or anything although some folks feel they make sense, as there is
“space” between discrete variables and continuous variables have no “gaps” or “space.”
Discrete data tend to use bar charts or something that suggests the “gap” between the values.
Continuous data tend to do the opposite and suggest the “no gap” situation: Histogram (looks like
bar but no spaces between bars ). Some say to only use a line graph when you have two variables –
one on the x and one on the y axis. Others say that you can sometimes use a line graph to represent
a continuous variable. Here I will not use a line graph to represent a continuous variable – I’ll use a
histogram.
6
Bar Charts and Histograms
I personally like bar charts and histograms best to represent frequency distributions. I would use the
bar chart when the variable is discrete and the histogram when the variable is continuous. In a bar
chart the bars do not touch and in a histogram the bars touch. See below.
Age is a tough one! It is continuous theoretically, but most times it collected in whole years
[making it discrete]
If you ever present a figure of a frequency distribution of age you can use either a bar graph or
histogram, depending upon how you think the variable was measured – discrete or continuous.
Bar Chart
The bar chart is created from this frequency table
Figure 1: Bar Chart of Frequency Distribution of Age (n=8)
Notice in the bar graph, the bars do not touch representing the discrete nature of the variable.
7
Histogram
The histogram is also created from the exact same frequency distribution table.
Figure 1: Histogram of Frequency Distribution of Age (n=8)
Notice in the histogram the bars “touch” each other, representing the continuous nature of the
variable. Also notice here that when we assume age is continuous, SPSS shows how the
“age five” category is empty.
8
How did SPSS do that?
Basically all SPSS did is turn a frequency distribution table into a picture or figure. The only
difference – the only difference – between the bar chart and histogram is the separation or touching of
the bars. Again, the bar chart the bars do not touch to represent a discrete variable. In a histogram
the bars touch to represent the continuous nature of the variable.
Here is the raw frequency distribution table from SPSS
Frequency of each category on the y axis
The frequency goes on the vertical (or y axis). The line that goes up and down on the far left that is
labeled “frequency” is called the “y axis.” Start at zero and go up to your highest frequency. Here the
highest frequency we had was two. That’s why each figure had 2 at the top of the y axis.
9
Each category of the variable goes on the x axis
So here you need to decide whether or not the variable is continuous or discrete to decide whether or
not to put the missing values on the x or horizontal axis. Let’s assume it’s discrete so we do not have
to put in the missing values. Then we make a bar chart and you’ll notice how each age present in our
sample is represented by a bar.
10
Download