Graphical Representations of Data

advertisement
Chapter 2 – Organizing and Summarizing Data
Definition: When data are in their original form, as collected, they are called raw data.
We want to be able to visualize the characteristics of a data set; hence we construct graphical
representations of the data. In order to do so, we must look at the frequency of occurrence of data
values.
Definition: A categorical frequency distribution, used for categorical (qualitative) data, is a table
listing the categories, together with the frequency of occurrence of each category in the observed data.
Definition: The frequency for a category is the number of data values falling in that category. The
relative frequency for a category is the fraction, proportion, or percentage of the data values that fall
within that category.
Example: The following table shows data on class rank of students receiving financial aid at a small 4year college.
College Class Rank
Fr
So
Jr
Sr
Frequency
18
12
6
4
Relative Frequency
18/40 = 0.45 = 45%
12/40 = 0.30 = 30%
6/40 = 0.15 = 15%
4/40 = 0.10 = 10%
Often, when the data are numeric, there are too many different data values for a listing of the raw data
to be of use in seeing the characteristics of the data. It is common to divide the interval of values of
the data into a relatively small number of subintervals, called classes, and to tabulate the data using the
frequencies. Each frequency is the number of occurrences of data values in one of the classes.
Definition: A grouped frequency distribution is the organizing of raw data in table form, using classes
and frequencies.
Definition: The largest data value that can be included in a class is the upper class limit for that class;
the smallest data value that can be included is the lower class limit.
Definition: The class width is the difference between the upper class limit of one class and the upper
class limit of the next-higher class.
Definition: The cumulative frequency for a class is the count of all observed data values in that class
or in lower classes.
Rules for constructing a frequency distribution:
1) The number of classes should be between 5 and 20; 5 for small data sets, 20 for large data sets.
“Small” means roughly 25 to 30 observations; “large” means around 1000 or more observations.
2) An observed data value must be in one, and only one, class. This means that the classes must be
non-overlapping, or mutually exclusive.
3) The classes must be continuous; even if there are no observed data values in a given class, that
class must be included, with a frequency value of 0.
4) The classes must be exhaustive; i.e., together they must include all of the data.
5) The classes must be equal in width.
Procedure for constructing a grouped frequency distribution:
1) Find the range by subtracting the lowest value of the data from the highest.
2) Select the number of classes desired (between 5 and 20).
3) Find the class width by dividing the range by the number of classes; round the result up to get the
class width.
4) Go to the TI-83 calculator and construct a graph called a histogram, using the procedure listed
below.
5) Use the information read off the calculator screen to construct the grouped frequency table.
Example: We have 25 scores on a final exam, as follows:
86, 83, 56, 98, 82, 52, 71, 88, 75, 91, 69, 88, 64, 78, 81, 74, 77, 83, 90, 85, 64, 79, 71, 83, 64
We want a frequency distribution. Since the data set is small, we choose 5 as the number of classes.
The range of the data is R = Largest value – Smallest value = 98 – 52 = 46. To get the class width,
we divide the range by 5, obtaining 9.2. We round this number up to obtain the class width, 9.25. We
then go to the TI-83 to construct the histogram. We will talk about constructing histograms first, then
get back to constructing the grouped frequency table.
Graphical Representations of Data
We will do several types of graphs that display numeric data. One of the most common ways to graph
numeric data is through use of a histogram.
Definition: A histogram is a graph that displays the data by using vertical bars of various heights to
represent the frequencies.
Characteristics of a histogram:
1) The classes are listed in order along the horizontal axis of the chart.
2) The vertical axis provides a scale for the frequencies.
3) A rectangle, or bar, is constructed for each class so that
a) the height of the bar is the frequency of the class
b) the bar for the class extends from the lower boundary of
the class to the upper boundary
4) Each axis of the histogram has a label, and the histogram has a title.
Example: Now let us create a histogram for a data set, and in so doing, generate a grouped frequency
distribution.
Entering a data set into the TI – 83 graphing calculator, using the statistics exam data.
The stat list editor is a table where you can store, edit, and view up to 20 lists that are in memory.
Also, you can create list names from the stat list editor.
1) To display the stat list editor, press STAT, and then select 1:Edit from the STAT EDIT menu.
2) Use the up arrow key to move the cursor to the top row of the table. Press 2ND, and then INS. You
will see the
Name = prompt at the bottom of the screen. Type the name of your variable using the alphabetic keys
(green symbols on your calculator).
3) Use the down arrow to move to the list. Type in the first data value and press ENTER. The cursor
will automatically move down to the next space for the next entry. If you make a mistake, use the
arrow keys to return to the location of the mistake and make a correction.
4) If you want to erase a list, move the cursor to the list name, and press DEL.
Steps in constructing a histogram using the TI – 83 graphing calculator:
First, you need to clear previous graphs.
1) Press Y=. You will see a list of functions. If any of them have already been defined, use the arrow
keys and the CLEAR key to erase them.
2) Next press 2ND, and STAT PLOT. You will see a list of plots. All of them should be off. If any
are not, go down to 4:PlotsOff and press ENTER.
3) Clear all drawn figures. Press 2ND and DRAW. Choose 1:ClrDraw, and press ENTER.
4) Set the size of your graph window. Press WINDOW. The Xmin value should be equal to your
smallest data value; in this case, we choose Xmin = 52. The Xmax value should be equal to or slightly
larger than your largest data value; in this example, we choose Xmax = 102. The Xscl value is your
class width. For this example, we choose 6 classes, and so Xscl = 9.25. The Ymin value should be 0;
the Ymax value should be somewhat larger than your expected largest class frequency. Since there are
25 items of data, we choose Ymax = 12.
5) Press 2ND, STATPLOT, 1:Plot1, and ENTER. Turn Plot 1 On.
6) Choose the histogram symbol (the third symbol on the third line of the screen).
7) Go down to Xlist: and enter the name of your variable.
8) Press the GRAPH key. You will see the histogram displayed.
To generate the frequency distribution from the histogram:
1) Press the TRACE key.
2) Use the right arrow key to move from one bar of the histogram to the next, reading the class
boundaries and the frequencies from the calculator screen. The result for this example is given below.
Class Limits
52.00 – 61.24
61.25 – 70.49
70.50 – 79.74
79.75 – 88.99
89.00 – 98.25
Frequency
2
4
7
9
3
Cumulative
Frequency
2
6
13
22
25
Relative Frequency
0.08 = 8%
0.16 = 16%
0.28 = 28%
0.36 = 36%
0.12 = 12%
Note also that the table includes a column for the relative frequencies, which are the proportions of the
data set falling into each class.
Defn: The relative frequency associated with a class is the proportion of the data set falling into that
class. It is found by dividing the class frequency by the size of the data set.
Defn: The cumulative relative frequency associated with a class is the proportion of the data set
falling into that class or lower classes. It is found by dividing the cumulative frequency for a class by
the size of the data set.
Interpretation of Relative Frequency and Cumulative Relative Frequency: If we randomly select an
observation from the data set, the relative frequency for a class is the probability that our selected
observation will be found in that class. The cumulative relative frequency for a class is the probability
that the observation will be found either in that class or in a lower class.
Distribution Shapes: (See p. 88)
1) In a uniform distribution, the frequencies are equal for all classes; the relative frequencies are also
equal for all classes.
2) In a bell-shaped distribution, the greatest frequency (or relative frequency) occurs in the middle
class, with decreasing frequencies away from the center in either direction.
Uniform and bell-shaped distributions are examples of symmetric distributions.
3) In a distribution that is positively skewed, or right-skewed, the majority of the data values fall to the left
of the center and cluster at the lower end of the distribution; the tail of the distribution is to the right.
4) In a distribution that is negatively skewed, or left-skewed, the majority of the data values fall to the
right of the center and cluster at the upper end of the distribution; the tail of the distribution is to the
left.
Other Types of Graphs
Defn: A bar graph is used to represent the frequency distribution for a categorical variable, and the
frequencies are displayed by the heights of the vertical bars.
Defn: A Pareto chart is a bar graph whose bars are drawn in decreasing order of frequency or relative
frequency.
Example: pp. 63-66
Note: Since we are dealing with non-numeric data, the TI-83 calculator will not do this type of graph.
Another type of graph used with categorical data is the pie graph.
Defn: A pie graph is a circle that is divided into sections or wedges according to the proportion of the
data set in each category.
Note: The TI-83 will not do this type of graph. It must be done by hand. Example 6, p. 68.
Note: In any situation in which data are represented using graphical techniques, it is easy to construct
the graph in such a way as to mislead the viewer. It is necessary to carefully examine the graph in
order to interpret it properly. On pages 100 - 109 of the textbook, there are examples of graphs
constructed to be misleading.
Time Series Plots
If the values of a variable are measured at regular intervals over a period of time, the data are referred
to as time series data. Unlike previous data sets, the items in a time series data set may be related to
each other. To represent the data graphically, we use a time series plot.
Defn: A time series plot is obtained by plotting the time at which a variable is measured along the
horizontal axis and the measured value of the variable along the vertical axis. Lines are then drawn
connecting the points.
Example: p. 98, Ex. 55
To do this type of plot using the TI-83/84, we need to enter two lists of numbers, the first list is the
sequence of time points. The second list is the sequence of data values. The type of graph we are
doing is the second of the six types available with the Stat Plot function of the calculator.
Steps in constructing a time series graph using the TI – 83/84 graphing calculator:
1) You will need to enter two columns of data. The first column is the set of time values. To avoid
making the graph look cluttered, I would enter the time values as 1, 2, 3, … up to the number of time
points. The second column is the list of measured values of the variable. These two columns should
have the same number of data items.
2) Set the size of your graph window. Press WINDOW. The Xmin value should be 0. The Xmax
value should be slightly larger than your largest time value; in this example, there were 12 time points,
so I would set Xmax = 13. For a time series graph, the Xscl value is 1. The Ymin value should be 0;
the Ymax value should be somewhat larger than your largest data value; in this example, the largest
price is 34.77, so I would set Ymax = 36.
3) Press 2ND, STATPLOT, 1:Plot1, and ENTER. Turn Plot 1 On.
4) Choose the line graph symbol (the second symbol on the first row of Type).
5) Go down to Xlist: and enter the name of your time list.
6) Go down to Ylist and enter the name of the variable.
7) Press the GRAPH key. You will see the time series graph displayed.
8) If you hit the TRACE key, you can read off the coordinates of each point on the graph.
For time series data, we are looking for trends. In this example, we see that there is a slightly
decreasing trend in Closing Price for the 12-month period, and some cyclical fluctuations. The
decreasing trend corresponds to the period of the onset of the recession. A stock analyst would also
want to find explanations for the cyclical pattern seen in the data.
Graphical Misrepresentations of Data
Data are sometimes graphed in ways that are used to mislead the reader, either intentionally or not.
Example: pp. 106-109: 3, 7, 8, 11
Download