Psychology 202a Advanced Psychological Statistics September 1, 2015 Overview of today’s class • Introductory comments on the reading • Start discussion of variables and distributions (first using R) • Working definition of a variable • Working definition of a distribution • Graphical and numerical methods for understanding distributions • Introduce SAS What is a variable? • We are going to be using R and SAS as tools to help us understand the behavior of variables. • We need a working definition of “variable.” • For our purposes, a variable consists of – numbers – that convey information – about some well-defined entity. Is this a variable? 69 72 94 64 80 77 96 92 71 81 90 84 100 76 81 65 87 92 89 79 91 86 85 95 93 83 76 84 • Numbers • that convey information • about a well-defined entity 86 57 65 90 89 61 91 95 69 84 81 67 Numbers that convey information… • Peabody Picture Vocabulary scores • The Peabody is intended to be a test of verbal ability. • Items consist of a set of pictures and word; the task is to choose the picture that matches the word. • Example from youtube. …about some well-defined entity. • These scores are a sample from the “Child Health and Development Study.” • 10-year-old children • Members of the Kaiser-Permanente health plan in Oakland, CA • Data were collected at the beginning of the 1970s. Is this a variable? • • • • 69 72 94 64 80 77 96 86 92 71 81 90 84 100 76 57 81 65 87 92 89 79 91 65 86 85 95 93 83 76 84 90 numbers that convey information about a well-defined entity Yes, it’s a variable. 89 61 91 95 69 84 81 67 But it’s not easy to say much about the variable. 69 72 94 64 80 77 92 71 81 90 84 100 81 65 87 92 89 79 86 85 95 93 83 76 • data are not organized • difficult to see structure 96 76 91 84 86 57 65 90 89 61 91 95 69 84 81 67 R can help us see the structure. Peabody <- c( 69, 72, 94, 64, 92, 71, 81, 90, 81, 65, 87, 92, 86, 85, 95, 93, 80, 77, 84,100, 89, 79, 83, 76, “Peabody gets cee of…” 96, 76, 91, 84, 86, 57, 65, 90, 89, 61, 91, 95, 69, 84, 81, 67) How can R help us see the structure? • How many scores are there? length(Peabody) • What’s a big score or a small score? sort(Peabody) • Note that there are lot’s of scores in the 80s, not so many in the 70s and 90s, and very few in the 50s or 100s. What is a distribution? • We’ve been looking at what values of Peabody occur, and how often they occur. • That’s what a distribution is: – the values that a variable takes on, together with… – …the frequencies (or relative frequencies) of those values. Understanding distributions… • • • • • • • We could interpret that idea very literally. There is one score of 57. There is one score of 61. There is one score of 64. There are two scores of 65. This would rapidly become tedious… …and would not be very useful. …by ignoring detail. • The problem with that approach is that there is too much information. • Simplify, ignore detail to see structure. • Ways to do that: – group the data – use pictures – use summary numbers (descriptive statistics) Grouping data • Looking at our sorted data, we can see that there is (or are) – one number in the 50s – seven numbers in the 60s – six numbers in the 70s – fourteen numbers in the 80s – eleven numbers in the 90s – one number in the 100s. Grouping data • We’ve gone too far: that’s not enough information. • Here’s a general principle: try to group so that there are between seven and fifteen categories. • (as with any rule of thumb, there will be exceptions) Peabody Distribution Values Frequency 55 – 59 1 60 – 64 2 65 – 69 5 70 – 74 2 75 – 79 4 80 – 84 8 85 – 89 6 90 – 94 8 95 – 99 3 100 – 104 1 Some details about grouping • continuous and discrete variables • real limits Lower and upper real limits What we say What we mean 55 – 59 54.5 – 59.5 60 – 64 59.5 – 64.5 65 – 69 64.5 – 69.5 70 – 74 69.5 – 74.5 75 – 79 74.5 – 79.5 80 – 84 79.5 – 84.5 85 – 89 84.5 – 89.5 90 – 94 89.5 – 94.5 95 – 99 94.5 – 99.5 100 – 104 99.5 – 104.5 Relative frequency distribution Peabody Values Relative Frequency 55 – 59 .025 60 – 64 .050 65 – 69 .125 70 – 74 .050 75 – 79 .100 80 – 84 .200 85 – 89 .150 90 – 94 .200 95 – 99 .075 100 – 104 .025 What can we say about the distribution? • There is variation in the scores. • Peabody scores are most frequent in the 80s and 90s. • Scores at the extremes of the distribution are much less frequent than scores at the center. • But it’s still a little hard to see all this. A simple graphical technique • Stem-and-leaf plot – Divide the numbers into fine-grained and coarse-grained information. – coarse = “stem” – fine = “leaf” • Manual demonstration • stem(Peabody)