Chapter 2 Review

advertisement
Chapter 2
Frequency Distributions,
Stem-and-leaf displays, and
Histograms
Where have we been?
To calculate SS, the variance, and the
standard deviation: find the deviations from
, square and sum them (SS), divide by N (2)
and take a square root().
Example: Scores on a Psychology quiz
Student
X
John
7
Jennifer
8
Arthur
3
Patrick
5
Marie
7
X = 30
N=5
 = 6.00
X-
+1.00
+2.00
-3.00
-1.00
+1.00
(X- ) = 0.00
(X - )2
1.00
4.00
9.00
1.00
1.00
(X- )2 = SS = 16.00
2 = SS/N = 3.20
 = 3.20 = 1.79
Ways of showing how
scores are distributed
around the mean
Frequency Distributions,
Stem-and-leaf displays
 Histograms
Some definitions
Frequency Distribution -
a tabular display of the
way scores are distributed across all the possible values
of a variable
Absolute Frequency Distribution - displays the
count of each score.
Cumulative Frequency Distribution -
displays
the total number of scores at and below each score.
Relative Frequency Distribution proportion of each score.
displays the
Relative Cumulative Frequency Distribution displays the proportion of scores at and below each
score.
Example Data
Traffic accidents by bus drivers
•Studied 708 bus drivers.
•Recorded all accidents for a period of 4 years.
•Data looks like:
3, 0, 6, 0, 0, 2, 1, 4, 1, … 6, 0, 2
Frequency Distributions
# of
accidents
0
1
2
3
4
5
6
7
8
9
10
11
Absolute
Freq.
117
157
158
115
78
44
21
7
6
1
3
1
708
Relative
Frequency
.165
.222
.223
.162
.110
.062
.030
.010
.008
.001
.004
.001
.998
Calculate relative frequency.
Divide each absolute
frequency by the N.
For example,
117/708 = .165
Notice rounding error
What can you answer?
# of
accidents
0
1
2
3
4
5
6
7
8
9
10
11
Relative
Freq.
.165
.222
.223
.162
.110
.062
.030
.010
.008
.001
.004
.001
.998
Proportion with at most 1 accident?
= .165 + .222 = .387
.387 * 100 = 38.7%
Proportion with 8 or more accidents?
= .008 + .001 +.004 + .001 = .014 = 1.4%
Proportion with between 4 and 7 accidents?
= .110 + .062 +.030 + .010 = .212 = 21.2%
Cumulative Frequencies
# of
acdnts
0
1
2
3
4
5
6
7
8
9
10
11
Absolute
Frequency
117
157
158
115
78
44
21
7
6
1
3
1
708
Cumulative
Frequency
117
274
432
547
625
669
690
697
703
704
707
708
Cumulative
Relative
Frequency
.165
.387
.610
.773
.883
.945
.975
.983
.993
.994
.999
1.000
Cumulative frequencies
show number of scores
at or below each point.
Calculate by adding all
scores below each point.
Cumulative relative
frequencies show the
proportion of scores at
or below each point.
Calculate by dividing
cumulative frequencies
by N at each point.
Grouped Frequency
Example
100 High school students’ average time in seconds to read
ambiguous sentences.
Values range between 2.50 seconds and 2.99 seconds.
2.72
2.58
2.87
2.85
2.83
2.83
2.87
2.88
2.84
2.60
2.87
2.61
2.79
2.96
2.84
2.85
2.63
2.63
2.74
2.54
2.76
2.93
2.84
2.51
2.62
2.70
2.73
2.75
2.89
2.80
2.54
2.73
2.52
2.96
2.86
2.92
2.65
2.98
2.80
2.75
2.90
2.58
2.98
2.70
2.61
2.79
2.99
2.75
2.87
2.59
2.61
2.93
2.96
2.66
2.76
2.89
2.81
2.89
2.87
2.58
2.58
2.93
2.89
2.78
2.83
2.76
2.50
2.71
2.64
2.52
2.95
2.85
2.58
2.82
2.51
2.85
2.59
2.96
2.52
2.66
2.83
2.87
2.70
2.54
2.95
2.66
2.86
2.90
2.87
2.56
2.54
2.56
2.74
2.86
2.91
2.75
2.51
2.85
2.59
2.73
Grouped Frequencies
Needed when
number of values is large OR
values are continuous.
To calculate group intervals
First find the range.
Determine a “good” interval based on
on number of resulting intervals,
meaning of data, and
common, regular numbers.
List intervals from largest to smallest.
Grouped Frequencies
Range = 2.99 - 2.50 = .49 ~ .50
i = .1
#i = 5
i = .05
#i = 10
Reading
Time
Frequency
2.90-2.99
16
2.80-2.89
31
2.70-2.79
20
2.60-2.69
12
2.50-2.59
21
Reading
Time
Frequency
2.95-2.99
9
2.90-2.94
7
2.85-2.89
20
2.80-2.84
11
2.75-2.79
10
2.70-2.74
10
2.65-2.69
4
2.60-2.64
8
2.55-2.59
10
2.50-2.54
11
Either is acceptable.
Use whichever display seems most
informative.
In this case, the smaller intervals and 10
category table seems more informative.
Sometimes it goes the other way and less
detailed presentation is necessary tp
prevent the reader from missing the forest
for the trees.
Stem and Leaf Displays
Used when seeing all of the values is
important.
Shows
data grouped
all values
visual summary
Stem and Leaf Display
Reading time data
i = .05
#i = 10
Reading
Time
2.9
2.9
2.8
2.8
2.7
2.7
2.6
2.6
2.5
2.5
Leaves
5,5,6,6,6,6,8,8,9
0,0,1,2,3,3,3
5,5,5,5,5,6,6,6,7,7,7,7,7,7,7,8,9,9,9,9
0,0,1,2,3,3,3,3,4,4,4
5,5,5,5,6,6,6,8,9,9
0,0,0,1,2,3,3,3,4,4
5,6,6,6
0,1,1,1,2,3,3,4
6,6,8,8,8,8,8,9,9,9
0,1,1,1,2,2,2,4,4,4,4
Stem and Leaf Display
Reading time data
i = .1
#i = 5
Reading
Time
2.9
2.8
2.7
2.6
2.5
Leaves
0,0,1,2,3,3,3,5,5,6,6,6,6,8,8,9
0,0,1,2,3,3,3,3,4,4,4,5,5,5,5,5,6,6,6,7,7,7,7,7,7,7,8,9,9,9,9
0,0,0,1,2,3,3,3,4,4,5,5,5,5,6,6,6,8,9,9
0,1,1,1,2,3,3,4,5,6,6,6
0,1,1,1,2,2,2,4,4,4,4,6,6,8,8,8,8,8,9,9,9
Transition to Histograms
4
4
4
4
2
2
2
1
1
1
0
9
9
9
8
8
8
8
8
6
6
4
3
3
2
1
1
1
0
2.502.54
2.552.59
2.60 –
2.64
6
6
6
5
2.65 –
2.69
4
4
3
3
3
2
1
0
0
0
2.70 –
2.74
9
9
8
6
6
6
5
5
5
5
2.75 –
2.79
4
4
4
3
3
3
3
2
1
0
0
2.80 –
2.84
9
9
9
9
7
7
7
7
7
7
7
6
6
6
5
5
5
5
3
3
3
2
1
0
0
2.85 –
2.89
2.90 –
2.94
9
8
8
6
6
6
6
5
5
2.95 –
2.99
Histogram of reading
times
F
r
e
q
u
e
n
c
y
20
18
16
14
12
10
8
6
4
2
0
2.502.552.60 – 2.65 – 2.70 – 2.75 – 2.80 – 2.85 – 2.90 – 2.95 –
2.54
2.59
2.64
2.69
2.74
2.79
2.84
2.89
2.94
2.99
Reading Time (seconds)
Histogram concepts - 1
Used to display continuous data.
Discrete data are shown on a box graph.
But most psychology data are continuous,
even if they are measured with integers.
Histogram concepts - 2
Use bar graphs, not histograms, for discrete
data.
You rarely see data that is really discrete.
Discrete data are categories or rankings.
If you have continuous data, you can use
histograms, but remember real class limits.
Histograms can be used for relative frequencies
as well.
What are the real limits of
each class?
F
r
e
q
u
e
n
c
y
20
18
16
14
12
10
8
6
4
2
0
2.502.552.60 – 2.65 – 2.70 – 2.75 – 2.80 – 2.85 – 2.90 – 2.95 –
2.54
2.59
2.64
2.69
2.74
2.79
2.84
2.89
2.94
2.99
Real limits of the fifth class are ???? - ???? Real limits of the highest class are
???? - ????.
What are the real limits of
each class?
F
r
e
q
u
e
n
c
y
20
18
16
14
12
10
8
6
4
2
0
2.502.552.60 – 2.65 – 2.70 – 2.75 – 2.80 – 2.85 – 2.90 – 2.95 –
2.54
2.59
2.64
2.69
2.74
2.79
2.84
2.89
2.94
2.99
Real limits of the fifth class are 2.695-2.745 Real limits of the highest class are
2.945 - 2.995
Predicting from Theoretical Distributions
Theoretical distributions show how scores
can be expected to be distributed around
the mean.
(Mean = 2.755 for reading
data).
 Distributions are named after the shapes of their
histograms:
Rectangular
J-shaped
Bell (Normal)
many others
Rectangular Distribution of
scores
Flipping a coin
100
100 flips - how many heads and tails do you expect?
75
50
25
0
Heads
Tails
Rolling a die
100
120 rolls - how many of each number do you expect?
75
50
25
0
1
2
3
4
5
6
Rolling 2 dice
Dice
Total
1
2
3
4
5
6
7
8
9
10
11
12
Absolute
Freq.
0
1
2
3
4
5
6
5
4
3
2
1
36
Relative
Frequency
.000
.028
.056
.083
.111
.139
.167
.139
.111
.083
.056
.028
1.001
How many combinations
are possible?
Rolling 2 dice
100
90
80
70
60
50
40
30
20
10
0
360 rolls - how many of each number do you expect?
1 2 3 4 5 6 7 8 9 10 11 12
Normal Curve
J Curve
Occurs when socially normative behaviors are measured.
Most people follow the norm,
but there are always a few outliers.
Principles of Theoretical Curves
Expected frequency = Theoretical relative
frequency * N
Expected frequencies are your best estimates
because they are closer, on the average, than
any other estimate when we square the error.
Law of Large Numbers - The more observations
that we have, the closer the relative frequencies
should come to the theoretical distribution.
Q & A: Continuous data
 HOW IS THE FACT THAT WE ARE DISPLAYING
CONTINUOUS DATA SHOWN ON A HISTOGRAM AS
OPPOSED TO A BAR GRAPH?
 The bars of the graph on a histogram meet at the real
limits of each interval.
 IF DATA CAN ONLY BE INTEGERS (SUCH AS NUMBER
OF TRUE/FALSE QUESTIONS ANSWERED CORRECTLY
ON A PSYCH QUIZ), HOW COME IT IS CALLED
CONTINUOUS DATA.
 Whether data is continuous or discrete depends on what
your measuring, not the accuracy of your measuring
instrument. For example, distance is continuous whether
you measure it with a yardstick or a micrometer.
Knowledge, like self-confidence and other psychological
variables, is probably best thought of as a continuous
variable.
Determining “i” (the size of
the interval)
WHAT IS THE RULE FOR DETERMINING
THE SIZE OF INTERVALS TO USE IN
WHICH TO GROUP DATA?
Whatever intervals seems appropriate to
most informatively present the data. It is
a matter of judgement. Usually we use 6
– 12 same size intervals each of which use
intuitively obvious endpoints (e.g., 5s and
0s).
Download