Statistical Analysis for Psychology PSY 348 Lead Instructor: Greg Jensen Co-Instructor: Sabrina Schroerlucke TAs: Kati Wolcott & Gavin Leonard Class Requirements • Readings. There is assigned reading associated with most classes, sometimes from the textbook, sometimes from elsewhere. o Textbook: Statistics for Psychology, 7th Edition (“Aron et al.”). 6th o Additional readings are available electronically on Moodle. • Class Participation. Everything in this class is cumulative, and applied exercises are built into the class period. As such, I expect you to come to class. o You get four free unexcused absences. o For each subsequent unexcused absence, your final overall grade in the course will be lowered by 13%. o There are many valid reasons to miss class (such as flu-like symptoms, or dealing with emergencies), and absences are more likely to be excused if you contact me in advance. Class Requirements • Exams. There are four exams (three midterms and a final) that each reflect 25% of your overall grade. o However! You may retake any section on the midterms up to two additional times. • Each section of an exam is graded on a three-point scale: o M (“mastered”) o G (“room to grow”) o N (“needs work”) • If you earned a G or an N on a section of the exam, you may retake just that section to improve it. o The exams are not intended to be an ordeal. They are opportunities for you to express what you have mastered Class Requirements • Homework. Is strictly optional! You don’t need to do it! o Homework is also graded on the M/G/N scale, to let you know how you’re doing. o However! If you wish to retake any section of an exam, you need to turn in the relevant homework first. o Additionally, you will be asked to revise homework answers that score less than an M before being allowed to retake the relevant exam section. • Different people get tripped up on different statistical concepts. o We ask that you use the opportunities presented to get as much practice as you can for topics that are initially confusing. Class Philosophy • It’s a guarantee that you will find some of the ideas we discuss this semester confusing at first. o The course is designed to avoid punishing that confusion. o We believe that everyone one of you can master these concepts, and we are here to support your doing so. o As such, we want to take the pressure off for the exams and the homework, so you can focus on your own understanding. • In turn, we ask that you take advantage of the opportunities for support that the course provides. o Come to class regularly and ask questions. o Come to office hours to discuss the material further. o Do the homework for its own sake, even if you believe you already understand the material. Math Notation: Finding X π = 0,4,9,3,1,0,1 π is a set of observations. π3 = 9 ππ = 1 ππ is a single observation, the π th in the data π is the number of elements, so ππ is the last observation Math Notation: Finding X ππ·ππ’π = AB positive XDoug is the observation associated with Doug. π can consists of a set of non-numbers. As a rule, subscripts are labels and should be read as the sentence, “The element in [blank] associated with [label].” Try to read every equation as a sentence. Math Notation: Greek π = Greek letter ππ’ π is used to denote the “arithmetic mean” (i.e. the average) πheight = Mean of height πheight describes something about a variable, making it a statistic. Usually, in this class, a Greek letter will refer to some statistic. Math Notation: Summation π = Sum of π Σ is an upper-case Greek sigma. We will use it to as a summation operator. π = π1 + π2 + β― + ππ π = 0 + 4 + 9 + 3 + 1 + 0 + 1 = 18 Math Notation: Summation Σ is an “item-wise” operator. π + π = π1 + π1 + π2 + π2 + β― + ππ + ππ π β π = π1 β π1 + π2 β π2 + β― + ππ β ππ π + 1 = π1 + 1 + π2 + 1 + β― + ππ + 1 = π + π Math Notation: Summation Summation notation can be ambiguous. π+1 ≠ π+1 π + 1 = π1 + π2 + β― ππ + 1 π + 1 = π1 + 1 + π2 + 1 + β― + ππ + 1 Math Notation: Order of Operations P arentheses E xponents Multiplication D ivision A ddition S ubstraction 5β4 3 4 + 5 β 3 − 6 5 = 8403.8 Math Notation: Rounding • Provide correct answers to two decimal places. o Percentages, in particular, should be correct to two decimal places, which means proportions should be correct to four decimal places. o Therefore, don’t round 1 3 to 0.3, or you’ll introduce error into your calculations. • Try to do hand calculations to at least one extra decimal places throughout, and round at the end. o Using spreadsheet software protects against rounding errors, and saves a lot of time generally. Practicing Math • We’ll be doing some computation by hand. • If doing math problems isn’t your thing, it gets easier with practice. The most important thing is to show the steps you did, in order. • Performing a computation is like following a recipe: do the steps in the order prescribed. o Showing your work is like writing a recipe for someone else to follow. o If we ask you to show your work, we want to see all the steps, whether you used a machine or not. Machine-Assisted Analysis • You may use software to do computation. o In fact, many standard computations in modern statistics require computers. o To show your work clearly, you need to be able to report the steps you took, even if you relied on calculators or spreadsheets. • You should trying to solve problems using software, then transcribe the steps you took. o Microsoft Excel, Google Sheets, and LibreOffice are all valid tools for making the tedious parts of math the computer’s problem. A Warning About AI • Large Language Models such as ChatGPT are an important exception. Do not use them. o LLMs are mainly good at producing text that has the right sort of overall look. o They are consistently unreliable because that overall look is achieved with no understanding. Why Are You Here? • All science depends on measurement. o The output of our measurements are data. • All measurement has uncertainty. • We have to understand and reduce the uncertainty in our estimates. • Statistics (the field) is the study of what we can say about our measurements. o Each expression of that understanding is achieved by computing a relevant statistic. What’s In A Datum? • Anything we need to measure is a variable. o “Age” is a variable. So is “religious affiliation.” • Each variable is limited to certain values. o “12 years” is a valid age. “Catholic” is not a valid value for age, but it is for religious affiliation. • The specific value we actually measure is a score. o “12 years” could be an age but it’s not my score. • Scores are only as reliable as our measurement paradigm. They can be vague, or wrong. Psychologists Measure Absurd Things • “Happiness.” • “Memory.” • “Intelligence.” • We need to measure these things in order to discover what they are and how they work. • How do you measure something you don’t understand? Levels of Measurement • When we define a variable, we also define the sorts of things we can do to the resulting scores. o Psychologists have widely adopted a “ladder” of variable types, as proposed by Stevens (1946). o As you climb to the next “level” of the ladder, you are allowed to do more to your scores. o This ladder is specific to psychology; other sciences mostly don’t use this framework. • For this class, we will consider three levels: nominal, rank-order, and equal-interval variables. Nominal Data • A nominal variable consist of discrete categories. o These categories are assumed to never overlap; they are always discrete. • Categories have no ordering. o “Vanilla > Chocolate” is not a valid statement. • Categories can’t have arithmetic done to them. o “Vanilla ÷ Chocolate” is not a valid statement. • The main thing you can do with categories is count how many scores of each kind you received. Rank-Order Data • A rank-order variable consists of values that have some sort of ordering. This is also called an ordinal variable. o Gold vs. silver vs. bronze medals o First-born, second-born, third-born • We can compare ordinal values. o “Gold Medal > Silver Medal” is valid. • However, we can’t do arithmetic with them. o “Gold Medal ÷ Silver Medal” is not valid. • Ideal for things that are hard to measure precisely, or for which sensible units don’t exist. Equal-Interval Data • When we collect data for an equal-interval variable, the data have units that give the numbers meaning. o Height, weight, distance, temperature… • Equal-interval variables do support arithmetic! o “10 > 5” is valid o “10 ÷ 5” is valid (and equals 2) • Many of the outcomes we’ll talk about in the class are equal-interval variables (also called “scale” variables). o Some are not equal-interval, which can be a problem when the field pretends that they are! Interval vs. Ordinal Data • Many variables we might assume are at the scale level are really only at the ordinal level. oExample: IQ. oThe gap between 40 and 50 doesn’t mean the same thing as the gap between 140 and 150. • When a variable is ordinal, the phenomena underlying it are much more obscure. oRegular meals might be enough to get a 3.2 GPA to a 3.4, but won’t necessarily get a 3.7 get to a 3.9. • The level of measurement puts important limits on how rich our theories can become. Measurement Is In Our Heads • People often misinterpret “levels of measurement” as being an objective property of data. oNature itself doesn’t care about measurement. oNearly all variables in nature are continuous. • Level of measurement are a judgment we make about the relationships between the values in our data. oWe say the “wrong” level of measurement is used when there is a disconnect between our assumptions (and the math those assumptions lead us to use) and the behavior of the world. Measures vs. Statistics • Statistics are extracted from measurements, but need not be on the same level or of the same type. oWhat is the average roll on an ordinary die? oWhat is about “% who favor chocolate?” • Typically, our statistics will be continuous interval values, even when the data are not. oHowever a statistic is only as informative of the scale it is based on. oFor example, a sprinter’s average ranking is less informative than their average speed. Practice! Types of Variables • Coin flip • “Are you tall enough to ride?” • Gender presentation • % of folks tall enough to ride • # of drinks per week • Probability • % body fat • Credit rating • 7-point satisfaction scale • Semantic similarity • Age • Intensity of depression