Displaying Data

advertisement
Displaying Data
Displaying Data & Central Tendency
Frequency Distributions
• After collecting data, the first task for a researcher is to
organize and summarize the data so that it is possible to get a
general overview of the results.
– Remember, this is the goal of descriptive statistical techniques.
• One method for simplifying and organizing data is to construct
a frequency distribution.
– Frequency – describes the number of times or how often a category,
score, or range of scores occurs
– Frequency distribution – a summary display for a distribution of data
01:830:200:10-13 Spring 2013
Displaying Data & Central Tendency
Frequency Distribution Tables
• A simple frequency distribution table consists of two columns one listing categories on the scale of measurement (x) and
another for frequency (f).
– In the x column, values are listed in order from lowest to highest (or
from highest to lowest)
– For the frequency column, tallies are determined for each value (how
often each x value occurs in the data set). These tallies are the
frequencies for each x value.
– The sum of the frequencies should equal N.
• Frequency distributions can be computed for grouped or
ungrouped data
01:830:200:10-13 Spring 2013
Displaying Data & Central Tendency
Regular (ungrouped) Frequency Distribution
• When a frequency distribution table lists all of the individual
categories (x values) it is called a regular frequency
distribution.
Example: x = number of naps toddlers take per day
x
f
0
8
1
8
2
15
3
8
4
1
N=40
01:830:200:10-13 Spring 2013
Displaying Data & Central Tendency
Grouped Frequency Distribution
• Sometimes, especially when dealing with continuous
variables, a set of scores covers a wide range of values
– In these situations, a list of all the x values would be too long to allow a
simple presentation of the data.
• In such cases, a grouped frequency distribution table is
used.
– In a grouped table, the x column lists groups of scores, called class
intervals, rather than individual values.
01:830:200:10-13 Spring 2013
Displaying Data & Central Tendency
Example: x = college course enrollment
34
16
14
17
56
7
83
16
16
15
12
31
17
6
9
10
77
18
30
10
33
17
18
17
10
67
5
28
70
13
11
72
13
24
18
17
9
35
18
12
Sorted values:
5
9
11
6
10
12
7
10
12
9
10
13
13
16
17
18
28
34
70
14
16
17
18
30
35
72
15
17
17
18
31
56
77
16
17
18
24
33
67
83
01:830:200:10-13 Spring 2013
Grouped frequency distribution
x
0-10
10-20
20-30
30-40
40-50
50-60
60-70
70-80
80-90
f
5
22
2
5
0
1
1
3
1
N=40
Note: I prefer to use real limits when specifying
intervals. Your book uses apparent limits. You
can use either.
Displaying Data & Central Tendency
Grouped Frequency Distributions: Guidelines
• Sort your data first, it makes building the frequency
distributions easier
• Decide on interval width and number of intervals
– You should have about 5-20 intervals
– All intervals should have the same width
– Your interval width should be a relatively simple number
• Examples: 10, 5, 2, 1, 0.5
– Your set of intervals should cover all observed values and should not
overlap
• I.e., no individual score should fall in more than one interval
01:830:200:10-13 Spring 2013
Displaying Data & Central Tendency
Relative Frequencies & Percentages
• Often, researchers are more interested in the relative
frequency (or proportion) of individuals in each category than
in the total number.
– Remember from the last lecture that we usually measure statistics on
samples to infer parameters of populations
– The relative frequency of a sample approximates the relative frequency
of the population, whereas the raw frequency of a sample does not.
• The relative frequency distribution table lists the proportion
(p) for each category: p = f/N. The sum of the p column
should equal 1.00.
– Alternatively, the table could list the percentage of the distribution
corresponding to each X value. The percentage is found by multiplying
p by 100. The sum of the percentage column should equal 100%.
01:830:200:10-13 Spring 2013
Displaying Data & Central Tendency
Relative Frequencies & Percentages
x
0-10
10-20
20-30
30-40
40-50
50-60
60-70
70-80
80-90
Total
01:830:200:10-13 Spring 2013
f
5
22
2
5
0
1
1
3
1
40
p (or f/N)
0.125
0.550
0.050
0.125
0.000
0.025
0.025
0.075
0.025
1
%
12.5
55.0
5.0
12.5
0.0
2.5
2.5
7.5
2.5
100%
Displaying Data & Central Tendency
Cumulative Frequencies, Proportions, &
Percentages
• Cumulative frequencies, proportions, or percentages
describe the sum of frequencies, proportions, or percentages
across a series of intervals
• Usually refers to bottom-up sum of frequencies
– E.g., the number of college courses with at least k students
01:830:200:10-13 Spring 2013
Displaying Data & Central Tendency
Cumulative Frequencies & Percentages
x
f
Cumulative
Freq.
%
Cumulative
%
0-10
5
5
12.5%
12.5%
10-20
22
27
55.0%
67.5%
20-30
2
29
5.0%
72.5%
30-40
5
34
12.5%
85.0%
40-50
0
34
0.0%
85.0%
50-60
1
35
2.5%
87.5%
60-70
1
36
2.5%
90.0%
70-80
3
39
7.5%
97.5%
80-90
1
40
2.5%
100.0%
01:830:200:10-13 Spring 2013
Displaying Data & Central Tendency
Frequency Distribution Graphs
• In a frequency distribution graph, the score categories (X
values) are listed on the X axis and the frequencies are listed
on the Y axis.
• When the score categories consist of numerical scores from
an interval or ratio scale, the graph should be either a
histogram or a polygon.
01:830:200:10-13 Spring 2013
Displaying Data & Central Tendency
Bar Plots & Histograms
• Bar plots are plots showing the relationship between two
variables. Usually, the height of a bar represents the value of
a dependent variable when the independent variable consists
of nominal or ordinal category labels.
• Histograms are bar plots in which the rectangles are
centered above each score (or class interval) and the heights
of the bars correspond to the frequencies (or relative
frequencies) of the scores.
– The widths of bars should extend to the real limits of the class
intervals, so that adjacent bars touch.
Note: Proper histograms actually represent frequencies in terms of the area rather than
the height of bars, but we won’t worry about that distinction in this course
01:830:200:10-13 Spring 2013
Displaying Data & Central Tendency
Bar Plot Example: M&Ms Colors
x
brown
red
blue
orange
green
yellow
01:830:200:10-13 Spring 2013
f
14
14
10
7
6
5
n=56
Displaying Data & Central Tendency
Histogram Example: Course Enrollment
x
0-10
10-20
20-30
30-40
40-50
50-60
60-70
70-80
80-90
01:830:200:10-13 Spring 2013
f
5
22
2
5
0
1
1
3
1
N=40
Displaying Data & Central Tendency
Line Plots & Frequency Polygons
• Line plots are plots in which dots (rather than rectangles) are
centered above one score in each of a pair of scores, with the
height of the dot determined by the second score, and lines
are drawn to connect the dots. These are generally used to
show the relationship between two quantitative
measurements.
• A frequency polygon is a type of line plot analogous to a
histogram, where the heights of the dots correspond to
frequencies or relative frequencies of scores or intervals.
01:830:200:10-13 Spring 2013
Displaying Data & Central Tendency
Frequency Polygons: Example
01:830:200:10-13 Spring 2013
Displaying Data & Central Tendency
Scatter Plots
A scatter plot (or scatter gram) displays discrete data points (x, y)
to summarize the relationship between two variables
01:830:200:10-13 Spring 2013
Height
70
67
72
75
68
69
71.5
71
72
69
67
68
66
72
73.5
73
69
73
72
74
Weight
150
140
180
190
145
150
164
140
142
136
123
155
140
145
160
190
155
165
150
190
Displaying Data & Central Tendency
Theoretical Distributions, Probability Densities
& Smooth Curves
• If the scores in the population are continuous variables, then
the theoretical distributions describing them will often be
depicted as smooth curves
– Examples of this include the normal distribution (i.e., “the bell curve”) as
well as most of the test statistic distributions that we will deal with in this
course (e.g., the t distribution, the F distribution, the chi-square
distribution)
• The smooth curves represent the expectation that in a large
population, relative frequencies should change smoothly as a
function of a continuous variable.
– These smooth curves actually represent probability densities, which are
related to relative frequencies
01:830:200:10-13 Spring 2013
Displaying Data & Central Tendency
01:830:200:10-13 Spring 2013
Displaying Data & Central Tendency
01:830:200:10-13 Spring 2013
Displaying Data & Central Tendency
01:830:200:10-13 Spring 2013
Displaying Data & Central Tendency
01:830:200:10-13 Spring 2013
Displaying Data & Central Tendency
01:830:200:10-13 Spring 2013
Displaying Data & Central Tendency
Frequency & Probability Distribution Graphs
• Frequency & probability distribution graphs are useful
because they show the entire set of scores.
• At a glance, you can determine the highest score, the lowest
score, and where the scores are centered.
• The graph also shows whether the scores are clustered
together or scattered over a wide range.
01:830:200:10-13 Spring 2013
Displaying Data & Central Tendency
Distribution Shape
• A graph shows the shape of the distribution.
• A distribution is symmetrical if the left side of the graph is
(roughly) a mirror image of the right side.
• One example of a symmetrical distribution is the bell-shaped
normal distribution.
• On the other hand, distributions are skewed when scores pile
up on one side of the distribution, leaving a "tail" of a few
extreme values on the other side.
01:830:200:10-13 Spring 2013
Displaying Data & Central Tendency
Distribution Shape
• In a positively skewed distribution, the scores tend to pile up
on the left side of the distribution with the tail tapering off to
the right.
• In a negatively skewed distribution, the scores tend to pile
up on the right side and the tail points to the left.
• A unimodal distribution has one peak
• A bimodal (multimodal) distribution has two (multiple) peaks
01:830:200:10-13 Spring 2013
Displaying Data & Central Tendency
01:830:200:10-13 Spring 2013
Central Tendency
Displaying Data & Central Tendency
Central Tendency
• In general terms, central tendency is a statistical measure
that determines a single value that accurately describes the
center of the distribution and represents the entire distribution
of scores.
• The goal of central tendency is to identify the single value that
is the best representative for the entire set of data.
01:830:200:10-13 Spring 2013
Displaying Data & Central Tendency
Central Tendency
• By identifying the "average score," central tendency allows
researchers to summarize or condense a large set of data into
a single value.
• Thus, central tendency serves as a descriptive statistic
because it allows researchers to describe or present a set of
data in a very simplified, concise form.
• In addition, it is possible to compare two (or more) sets of
data by simply comparing the average score (central
tendency) for one set versus the average score for another
set.
01:830:200:10-13 Spring 2013
Displaying Data & Central Tendency
The Mean, the Median, and the Mode
• No single procedure always produces a good, representative
value. Therefore, researchers have developed three
commonly used techniques for measuring central tendency:
the mean, the median, and the mode.
01:830:200:10-13 Spring 2013
Displaying Data & Central Tendency
01:830:200:10-13 Spring 2013
Displaying Data & Central Tendency
The Mean
• The mean is the most commonly used measure of central
tendency.
– The population mean is denoted by: 
– The sample mean is denoted by: M or X
• Computation of the mean requires scores that are numerical
values measured on an interval or ratio scale.
• The mean is obtained by computing the sum, or total, for the
entire set of scores, then dividing this sum by the number of
scores.

01:830:200:10-13 Spring 2013
1
N
x
Displaying Data & Central Tendency
01:830:200:10-13 Spring 2013
Displaying Data & Central Tendency
Changing the Mean
• Because the calculation of the mean involves every score in
the distribution, changing the value of any score will change
the value of the mean.
• Modifying a distribution by discarding scores or by adding new
scores will usually change the value of the mean.
• To determine how the mean will be affected for any specific
situation you must consider: 1) how the number of scores is
affected, and 2) how the sum of the scores is affected.
01:830:200:10-13 Spring 2013
Displaying Data & Central Tendency
Changing the Mean
• If a constant value is added to every score in a distribution,
then the same constant value is added to the mean. Also, if
every score is multiplied by a constant value, then the mean is
also multiplied by the same constant value.
01:830:200:10-13 Spring 2013
Displaying Data & Central Tendency
The Weighted Mean
• When combining data from samples with different sizes, you
can compute the combined mean from the sample means
using the following formula:
MW 
1
N
 nM ,
where N   n
• For example, consider the following samples:
– Sample 1: x = {6,2,6,8,3}; M = 5.0; n = 5
– Sample 2: x = {3,6,13,4}; M = 6.5; n = 4
– Sample 3: x = {3,4,2}; M = 3.0; n = 3
01:830:200:10-13 Spring 2013
Displaying Data & Central Tendency
When the Mean Won’t Work
• Although the mean is the most commonly used measure of central
tendency, there are situations where the mean does not provide a
good, representative value, or where you cannot compute a mean at
all.
• When a distribution contains a few extreme scores (or is very skewed), the
mean will be pulled toward the extremes. In these cases, the mean will not
provide a "central" value.
• With data from a nominal scale it is impossible to compute a mean, and
when data are measured on an ordinal scale (ranks), it is usually
inappropriate to compute a mean.
• Thus, the mean does not always work as a measure of central
tendency and it is necessary to have alternative procedures
available.
01:830:200:10-13 Spring 2013
Displaying Data & Central Tendency
The Median
• If the scores in a distribution are listed in order from smallest
to largest, the median is defined as the midpoint of the list.
• This means that computation of the median requires scores that can be
placed in rank order (i.e., ordinal, interval, or ratio)
• The median divides the scores so that 50% of the scores in the
distribution have values that are equal to or less than the median.
• Usually, the median can be found by a simple counting
procedure:
1.
2.
With an odd number of scores, list the values in order, and the median
is the middle score in the list.
With an even number of scores, list the values in order, and the
median is half-way between the middle two scores.
01:830:200:10-13 Spring 2013
Displaying Data & Central Tendency
The Median
• One advantage of the median is that it is relatively unaffected
by extreme scores.
• Thus, the median tends to stay in the "center" of the
distribution even when there are a few extreme scores or
when the distribution is very skewed. In these situations, the
median serves as a good alternative to the mean.
01:830:200:10-13 Spring 2013
Displaying Data & Central Tendency
The Mode
• The mode is defined as the most frequently occurring
category or score in the distribution.
• In a frequency distribution graph, the mode is the category or
score corresponding to the peak or high point of the
distribution.
• The mode can be determined for data measured on any scale
of measurement: nominal, ordinal, interval, or ratio.
– The mode is the only measure of central tendency that can be used for
data measured on a nominal scale.
01:830:200:10-13 Spring 2013
Displaying Data & Central Tendency
Bimodal Distributions
• It is possible for a distribution to have more than one mode.
Such a distribution is called bimodal. (Note that a distribution
can have only one mean and only one median.)
• In addition, the term "mode" is often used to describe a peak
in a distribution that is not really the highest point. Thus, a
distribution may have a major mode at the highest peak and a
minor mode at a secondary peak in a different location.
01:830:200:10-13 Spring 2013
Displaying Data & Central Tendency
01:830:200:10-13 Spring 2013
Displaying Data & Central Tendency
Central Tendency and the Shape of the Distribution
• Because the mean, the median, and the mode are all
measuring central tendency, the three measures are often
systematically related to each other.
• In a symmetrical distribution, for example, the mean and
median will always be equal.
01:830:200:10-13 Spring 2013
Displaying Data & Central Tendency
Central Tendency and the Shape of the Distribution
• If a symmetrical distribution has only one mode, the mode,
mean, and median will all have the same value.
• In a skewed distribution, the mode will be located at the peak
on one side and the mean usually will be displaced toward the
tail on the other side.
• The median is usually located between the mean and the
mode.
01:830:200:10-13 Spring 2013
Displaying Data & Central Tendency
Central Tendency and the Shape of the
Distribution
01:830:200:10-13 Spring 2013
Download