Statitistics 5tables.pdf Michael Hallstone, Ph.D. hallston@hawaii.edu Lecture 5: Tables and Figures Introduction We use figures and tables because we want a way to visually represent our data -- “a quick summary.” Two ways to do it: pictures and tables Data can be presented in either tables or figures (pictures). We use tables and figures to present a lot of data in a quick, easy to understand manner. Most of the time we use a table or a figure to describe a single variable in our study. In this lecture we will cover just a few of the ways to present data or describe variables using tables or figures. Tables A table is something with numbers that are in rows and columns. Obviously, when you learn how to do statistical tests on a computer it will present the results in a table. But in this lecture we will concentrate on how to describe a single variable in your study using a table. It is way to get a “quick” look at your variable. The most common table for describing a variable is called a Frequency Distribution. 1 Example of a frequency distribution table Here are some data for a variable “age.” There were 8 people in the sample: 3, 3, 4, 6, 6, 7, 7, 8 (n=8) Picture of SPSS data set of Age So count the number of 3’s. There are two of them. Count the number of 4’s. There is one. Count the number of 5’s. There are none! Count the number of 6’s. There are two. Count the number of 7’s. There are 2. Count the number of 8’s. There is one. That translates to the following table. SPSS version of a frequency distribution table of age 2 How did SPSS do that? First column represents each category in the variable First it listed each individual age that occurred in the sample in the far left hand column: we had ages 3, 4, 6, 7, and 8. We had no one in the study who was 1 or 2 or 5 years old. They are missing. So first you create the first column. value (x) 3 4 6 7 8 total Next it created the second column of the “frequency counts” In the frequency column, it counted the number of people in each age category. So, again, count the number of 3’s. There are two of them. Count the number of 4’s. There is one. Count the number of 5’s. There are none! Count the number of 6’s. There are two. Count the number of 7’s. There are 2. Count the number of 8’s. There is one. There were 8 total people in the sample. That’s why there are 8 total. value (x) 3 4 6 7 8 total frequency (f) 2 1 2 2 1 n=8 3 Compute the percentages of each value For the percentages divide by the frequency of each category by the total (8). So for example there are 2 people who were age 3: 2/8=0.25 or 25% (0.25 x 100= 25). There was one person who was age 4: 1/8=0.125 or 12.5% (0.125 x 100=12.5). And so on and so on value (x) 3 4 6 7 8 total frequency (f) 2 1 2 2 1 n=8 Fraction or% 2/8 or 25% 1/8 or 12.5% 2/8 or 25% 2/8 or 25% 1/8 or 12.5% 100 % Or value (x) 3 4 6 7 8 total frequency (f) 2 1 2 2 1 n=8 % 25% 12.5% 25% 25% 12.5% 100 % Computing cumulative percent To compute the cumulative percent you add the cumulative percent row on top to the percent row immediately below. There is nothing “above” the first cumulative percent row, so it is always exactly equal to the first percent column. So for the first row of people who were aged 3: 25%+0=25%. For the people who are aged 4: 25%+12.5%=37.5%. For the people who were aged 6: 37.5%+25%=62.5%. For those aged 7: 62.5%+25%=87.5%. For those aged 8: 87.5%+12.5%=100%. value frequency (f) percent (%) cumulative % 3 2 25 25 4 1 12.5 37.5 6 2 25 62.5 7 2 25 87.5 8 1 12.5 100 total 8 100 4 You could also create a frequency distribution table that includes the “missing values.” Recall that we had no people who were ages that 1, 2, and 5. Those values are missing!!! The table below has the missing values and also the percent (%) and cumulative % for each value. Table 1: Frequency Distribution Table of Variable “Age” cumulative % value (x) frequency (f) 1 0 2 0 0.0% 0.0% 3 2 25.0% 25.0% 4 1 12.5% 37.5% 5 0 0.0% 37.5% 6 2 25.0% 62.5% 7 2 25.0% 87.5% 8 1 12.5% 100.0% Total n=8 0.0% 0.0% 5 Figures (Pictures) In statistics or methods or in social sciences (of which public administration, or justice administration is one) the technical name for a picture is a “figure.” I wish I could tell you why that is, but I can’t! It’s just the way it is. There are many different types of figures. What we will cover in this lecture is are two different kinds of figures that can be used to express a “Frequency Distribution.” So that means you can “convert” or “express” the frequency distribution table(s) above using a figure. The most common figures used for this are bar charts and histograms. (Although there are many other types of figures, such as pie charts, I’m only going to show you how to do two.) Different pictures (or figures) can be used for different types of data You may choose to use these “suggestions” for the type of figure to represent continuous/discrete data or not. They are not real rules or anything although some folks feel they make sense, as there is “space” between discrete variables and continuous variables have no “gaps” or “space.” Discrete data tend to use bar charts or something that suggests the “gap” between the values. Continuous data tend to do the opposite and suggest the “no gap” situation: Histogram (looks like bar but no spaces between bars ). Some say to only use a line graph when you have two variables – one on the x and one on the y axis. Others say that you can sometimes use a line graph to represent a continuous variable. Here I will not use a line graph to represent a continuous variable – I’ll use a histogram. 6 Bar Charts and Histograms I personally like bar charts and histograms best to represent frequency distributions. I would use the bar chart when the variable is discrete and the histogram when the variable is continuous. In a bar chart the bars do not touch and in a histogram the bars touch. See below. Age is a tough one! It is continuous theoretically, but most times it collected in whole years [making it discrete] If you ever present a figure of a frequency distribution of age you can use either a bar graph or histogram, depending upon how you think the variable was measured – discrete or continuous. Bar Chart The bar chart is created from this frequency table Figure 1: Bar Chart of Frequency Distribution of Age (n=8) Notice in the bar graph, the bars do not touch representing the discrete nature of the variable. 7 Histogram The histogram is also created from the exact same frequency distribution table. Figure 1: Histogram of Frequency Distribution of Age (n=8) Notice in the histogram the bars “touch” each other, representing the continuous nature of the variable. Also notice here that when we assume age is continuous, SPSS shows how the “age five” category is empty. 8 How did SPSS do that? Basically all SPSS did is turn a frequency distribution table into a picture or figure. The only difference – the only difference – between the bar chart and histogram is the separation or touching of the bars. Again, the bar chart the bars do not touch to represent a discrete variable. In a histogram the bars touch to represent the continuous nature of the variable. Here is the raw frequency distribution table from SPSS Frequency of each category on the y axis The frequency goes on the vertical (or y axis). The line that goes up and down on the far left that is labeled “frequency” is called the “y axis.” Start at zero and go up to your highest frequency. Here the highest frequency we had was two. That’s why each figure had 2 at the top of the y axis. 9 Each category of the variable goes on the x axis So here you need to decide whether or not the variable is continuous or discrete to decide whether or not to put the missing values on the x or horizontal axis. Let’s assume it’s discrete so we do not have to put in the missing values. Then we make a bar chart and you’ll notice how each age present in our sample is represented by a bar. 10