Psychology 202a Advanced Psychological Statistics September 1, 2015

advertisement
Psychology 202a
Advanced Psychological
Statistics
September 1, 2015
Overview of today’s class
• Introductory comments on the reading
• Start discussion of variables and
distributions (first using R)
• Working definition of a variable
• Working definition of a distribution
• Graphical and numerical methods for
understanding distributions
• Introduce SAS
What is a variable?
• We are going to be using R and SAS as
tools to help us understand the behavior of
variables.
• We need a working definition of “variable.”
• For our purposes, a variable consists of
– numbers
– that convey information
– about some well-defined entity.
Is this a variable?
69 72 94 64 80 77 96
92 71 81 90 84 100 76
81 65 87 92 89 79 91
86 85 95 93 83 76 84
• Numbers 
• that convey information 
• about a well-defined entity 
86
57
65
90
89
61
91
95
69
84
81
67
Numbers that convey information…
• Peabody Picture Vocabulary scores
• The Peabody is intended to be a test of
verbal ability.
• Items consist of a set of pictures and word;
the task is to choose the picture that
matches the word.
• Example from youtube.
…about some well-defined entity.
• These scores are a sample from the “Child
Health and Development Study.”
• 10-year-old children
• Members of the Kaiser-Permanente health
plan in Oakland, CA
• Data were collected at the beginning of the
1970s.
Is this a variable?
•
•
•
•
69 72 94 64 80 77 96 86
92 71 81 90 84 100 76 57
81 65 87 92 89 79 91 65
86 85 95 93 83 76 84 90
numbers 
that convey information 
about a well-defined entity 
Yes, it’s a variable.
89
61
91
95
69
84
81
67
But it’s not easy to say much about
the variable.
69 72 94 64 80 77
92 71 81 90 84 100
81 65 87 92 89 79
86 85 95 93 83 76
• data are not organized
• difficult to see structure
96
76
91
84
86
57
65
90
89
61
91
95
69
84
81
67
R can help us see the structure.
Peabody <- c(
69, 72, 94, 64,
92, 71, 81, 90,
81, 65, 87, 92,
86, 85, 95, 93,
80, 77,
84,100,
89, 79,
83, 76,
“Peabody gets cee of…”
96,
76,
91,
84,
86,
57,
65,
90,
89,
61,
91,
95,
69,
84,
81,
67)
How can R help us see the
structure?
• How many scores are there?
length(Peabody)
• What’s a big score or a small score?
sort(Peabody)
• Note that there are lot’s of scores in the
80s, not so many in the 70s and 90s, and
very few in the 50s or 100s.
What is a distribution?
• We’ve been looking at what values of
Peabody occur, and how often they occur.
• That’s what a distribution is:
– the values that a variable takes on, together
with…
– …the frequencies (or relative frequencies) of
those values.
Understanding distributions…
•
•
•
•
•
•
•
We could interpret that idea very literally.
There is one score of 57.
There is one score of 61.
There is one score of 64.
There are two scores of 65.
This would rapidly become tedious…
…and would not be very useful.
…by ignoring detail.
• The problem with that approach is that
there is too much information.
• Simplify, ignore detail to see structure.
• Ways to do that:
– group the data
– use pictures
– use summary numbers (descriptive statistics)
Grouping data
• Looking at our sorted data, we can see
that there is (or are)
– one number in the 50s
– seven numbers in the 60s
– six numbers in the 70s
– fourteen numbers in the 80s
– eleven numbers in the 90s
– one number in the 100s.
Grouping data
• We’ve gone too far: that’s not enough
information.
• Here’s a general principle: try to group so
that there are between seven and fifteen
categories.
• (as with any rule of thumb, there will be
exceptions)
Peabody Distribution
Values
Frequency
55 – 59
1
60 – 64
2
65 – 69
5
70 – 74
2
75 – 79
4
80 – 84
8
85 – 89
6
90 – 94
8
95 – 99
3
100 – 104
1
Some details about grouping
• continuous and discrete variables
• real limits
Lower and upper real limits
What we say
What we mean
55 – 59
54.5 – 59.5
60 – 64
59.5 – 64.5
65 – 69
64.5 – 69.5
70 – 74
69.5 – 74.5
75 – 79
74.5 – 79.5
80 – 84
79.5 – 84.5
85 – 89
84.5 – 89.5
90 – 94
89.5 – 94.5
95 – 99
94.5 – 99.5
100 – 104
99.5 – 104.5
Relative frequency distribution
Peabody Values
Relative Frequency
55 – 59
.025
60 – 64
.050
65 – 69
.125
70 – 74
.050
75 – 79
.100
80 – 84
.200
85 – 89
.150
90 – 94
.200
95 – 99
.075
100 – 104
.025
What can we say about the
distribution?
• There is variation in the scores.
• Peabody scores are most frequent in the
80s and 90s.
• Scores at the extremes of the distribution
are much less frequent than scores at the
center.
• But it’s still a little hard to see all this.
A simple graphical technique
• Stem-and-leaf plot
– Divide the numbers into fine-grained and
coarse-grained information.
– coarse = “stem”
– fine = “leaf”
• Manual demonstration
• stem(Peabody)
Download