Why Statistics? Two Purposes 1. Descriptive Finding ways to summarize the important characteristics of a dataset 2. Inferential How (and when) to generalize from a sample dataset to the larger population Descriptive Statistics Frequency 3.38 .63 3.13 4.25 .50 3.75 1.50 1.88 .88 2.25 1.13 3.38 1.00 -.25 1.63 1.50 2.00 2.13 6 4 2 0 -0.5 0.5 1.5 2.5 3.5 4.5 More 3.5 4.5 More Secondhand Frequency 3.88 1.88 2.00 3.88 2.50 3.25 3.13 1.50 3.75 2.00 2.38 3.25 2.88 .88 3.50 4.13 .38 4.63 Secondhand Impression 8 6 4 2 0 -0.5 0.5 1.5 2.5 Difference Frequency Firsthand Impression Firsthand 10 5 0 -3 -2 -1 0 1 2 3 More 3.38 .63 3.13 4.25 .50 3.75 1.50 1.88 .88 2.25 1.13 3.38 1.00 -.25 1.63 1.50 2.00 2.13 4 Secondhand 3.88 1.88 2.00 3.88 2.50 3.25 3.13 1.50 3.75 2.00 2.38 3.25 2.88 .88 3.50 4.13 .38 4.63 5 Secondhand Impression 3 2 1 0 0 1 2 3 4 5 -1 Firsthand 2 1.5 1 0.5 Change Firsthand Impression 0 -0.5 0 1 2 3 -1 -1.5 -2 -2.5 -3 -3.5 Firsthand 4 5 Characterizing a Distribution of Data 12 10 Frequency 8 frequency 6 4 2 0 1.00 2.00 3.00 4.00 Voice 5.00 6.00 7.00 Comparing Distributions of Data 10 8 Men Women 8 6 frequency 6 Frequency Frequency frequency 4 4 2 2 0 1.00 2.00 3.00 4.00 Voice 5.00 6.00 7.00 0 1.00 2.00 3.00 4.00 5.00 6.00 7.00 Voice How could you summarize the differences? Looking for Linear Relationships 7.0 6.0 Anger during Conflict 5.0 4.0 3.0 2.0 1 2 3 4 5 Conflict Significance 6 7 Looking for Linear Relationships 7 6 Current relationship satisfaction 5 4 3 2 1 2 3 4 5 Conflict Significance 6 7 Comparing Linear Relationships 7 7.0 Current relationship satisfaction 6.0 Anger during Conflict 5.0 6 5 4.0 4 3.0 3 2.0 2 1 2 3 4 5 Conflict Significance 6 7 1 2 3 4 5 Conflict Significance How could you summarize the differences? 6 7 Complex Linear Relationships theory 7 entity incremental 6 Current relationship satisfaction 5 4 3 2 1 2 3 4 5 Conflict Significance 6 7 Descriptive Statistics Provides graphical and numerical ways to organize, summarize, and characterize a dataset. Types of Studies Experimental: The predictor variable is manipulated by the researcher. Observational: The predictor variables are merely observed and recorded by the researcher. Types of Variables Predictor variable: The antecedent conditions that are going to be used to predict the outcome of interest. If an experimental study, then called an “independent variable”. Outcome variable: The variable you want to be able to predict. If an experimental study, then called a “dependent variable”. Types of Variables Continuous variable: There are an infinite number of possible values that fall between any two observed values. Discrete variable: Consists of separate, indivisible categories Categorical A set of categories that have different names Ordinal A set of categories that are organized in an ordered sequence Summarizing Discrete Data Name Eye Color Janice brown Tom blue Danielle green Ian brown Eduardo brown Emily brown Anja blue Cara brown Adrian brown Eric blue Sarah brown David brown Frequency Tables Eye Color Frequency Brown 33 Blue 14 Green 3 Frequency Tables Eye Color Frequency Relative Frequency Brown 33 66% Blue 14 28% Green 3 6% Frequency Bar Graph 35 30 25 20 Frequency 15 10 5 0 Brown Blue Green Eye Color Relative Frequency Bar Graph 100 80 Relative 60 Frequency 40 Brown Blue Green 20 0 Eye Color Summarizing Continuous Data Name Hours of Sleep / Night Janice 6 Tom 7.5 Danielle 10.5 Ian 9 Eduardo 7 Emily 6 Anja 8 Cara 5 Adrian 8.5 Eric 6.5 Sarah 7.5 David 4 Frequency Tables Hours of Sleep Frequency 3 - 4 hrs 1 4 - 5 hrs 3 5 - 6 hrs 6 6 - 7 hrs 14 7 - 8 hrs 16 8 - 9 hrs 5 9 - 10 hrs 3 10 - 11 hrs 2 Frequency Histogram (Frequency) 16 14 12 10 8 6 4 2 0 3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5 11.5 Nightly Hours of Sleep Frequency Tables Hours of Sleep Frequency Relative Frequency 3 - 4 hrs 1 2% 4 - 5 hrs 3 6% 5 - 6 hrs 6 12% 6 - 7 hrs 14 28% 7 - 8 hrs 16 32% 8 - 9 hrs 5 10% 9 - 10 hrs 3 6% 10 - 11 hrs 2 4% Histogram (Relative Frequency) Relative Frequency 100 80 60 40 20 0 3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5 11.5 Nightly Hours of Sleep Frequency Tables Hours of Sleep Frequency Relative Frequency Cumulative Frequency 3 - 4 hrs 1 2% 2% 4 - 5 hrs 3 6% 8% 5 - 6 hrs 6 12% 20% 6 - 7 hrs 14 28% 48% 7 - 8 hrs 16 32% 80% 8 - 9 hrs 5 10% 90% 9 - 10 hrs 3 6% 96% 10 - 11 hrs 2 4% 100% Stem and Leaf Plots Name Janice 54 Tom 59 Danielle 35 Ian 41 Eduardo 46 Emily 25 Anja 47 Cara 60 Adrian 41 Eric 34 Sarah 22 David 45 Stem Leaves 2 25 3 45 4 11567 5 49 6 0 Stem and Leaf Plots 25 Anja 47 Cara 60 Adrian 41 Eric 34 Sarah 22 David 45 0 Emily 6 46 49 Eduardo 5 41 11567 Ian 4 35 45 Danielle 3 59 25 Tom 2 54 Stem Janice Leaves Name Stem and Leaf Plots Name Janice 54 Tom 59 Danielle 35 Ian 41 Eduardo 46 Emily 25 Anja 47 Cara 60 Adrian 41 Eric 34 Sarah 22 David 45 Stem Leaves 2 25 3 45 4 11567 5 49 6 0 Back-to-Back Stem and Leaf Plots Name Janice 54 Tom 59 Danielle 35 Ian 41 Eduardo 46 Emily 25 Anja 47 Cara 60 Adrian 41 Eric 34 Sarah 22 David 45 men women 2 25 4 3 5 1156 4 7 9 5 4 6 0 Visual Depictions of Distributions Summary Discrete Data Frequency Tables Bar Graphs Continuous Data Frequency Tables Bar Graphs Stem and Leaf Plots Visual Depictions of Relationships IV -- categorical; DV -- continuous • Charts Feelings of Caring With peers With profs With women With men With familiar Unfriendly Female -2.61 1.60 -2.37 -1.60 1.83 With unfamiliar -3.38 Average -1.62 Visual Depictions of Relationships IV -- categorical; DV -- continuous • Bar Graphs Voicing Response 4.9 Mild Conflict Extreme Conflict 5 4.5 4 3.8 3.5 3.5 3 2.5 2.5 2 1.5 1 0.5 0 Entity Incremental Visual Depictions of Relationships IV -- categorical; DV -- continuous • Bar Graphs Voicing Response 4.9 Entity 5 Look at the same graph differently! 4.5 4 Incrementa l 3.8 3.5 3.5 3 2.5 2.5 2 1.5 1 0.5 0 Mild Conflict Extreme Conflict Visual Depictions of Relationships IV -- categorical; DV -- continuous • Bar Graphs Voicing Response 4.9 Entity 5 Look again! Incrementa l 4.5 3.8 4 3.5 3.5 3 2.5 2.5 2 Mild Conflict Extreme Conflict Visual Depictions of Relationships IV -- categorical; DV -- continuous • Line Graphs Intention-Reading Performance 100 90 80 Percentile 70 60 Estimated Actual 50 40 30 20 10 0 1st Quartile 2nd Quartile 3rd Quartile 4th Quartile Visual Depictions of Relationships IV -- categorical; DV -- continuous • Box-plots Voice Implicit Theory Visual Depictions of Relationships IV -- categorical; DV -- continuous • Error-bar plots Person attributions Implicit Theory Visual Depictions of Relationships IV -- continuous; DV -- continuous • Scatterplots 7 6 Current relationship satisfaction 5 4 3 2 1 2 3 4 5 Conflict Significance 6 7 Visual Depictions of Relationships IV -- continuous; DV -- continuous • Scatterplots theory 7 entity incremental 6 Current relationship satisfaction 5 4 3 2 1 2 3 4 5 Conflict Significance 6 7 Visual Depictions of Relationships IV -- continuous; DV -- continuous • Scatterplots with regression lines theory 7 entity incremental 6 Current relationship satisfaction 5 4 3 2 1 2 3 4 5 Conflict Significance 6 7 Visual Depictions of Relationships IV -- categorical; DV -- categorical • Contingency table Narcicists * Ma le Cros stabulation Coun t Male .0 0 Narcici sts To tal 1.00 To tal .0 0 19 53 72 1.00 15 34 51 10 4 66 13 8 Visual Depictions of Relationships IV -- categorical; DV -- continuous • Charts, bar graphs, line graphs, box plots, error bar plots IV -- continuous; DV -- continuous • Scatterplot (regression line) IV -- categorical; DV -- categorical • Contingency table Inferential Statistics Inferential Statistics Population: The set of all individuals of interest (e.g. all women, all college students) Sample: Inferential statistics A subset of individuals selected from the population from whom data is collected 8 7 6 5 4 3 2 1 0 Women 3.5 5.5 7.5 9.5 Nightly Hours of Sleep 11.5 No. of People No. of People Are these sample differences simply due to chance? 8 7 6 5 4 3 2 1 0 Men 3.5 5.5 7.5 9.5 Nightly Hours of Sleep 11.5 Some important terms Parameter: A characteristic of the population. Denoted with Greek letters such as or . Statistic: A characteristic of a sample. Denoted with English letters such as X or S. Sampling Error: Describes the amount of error that exists between a sample statistic and the corresponding population parameter. We want to know whether Joe is an above average free-throw shooter. We collect some data Would you bet $10.00 that he makes the next shot? B B B M M M B B B B B B B B M M M M B B % baskets = .75 M % baskets = .63 B B M % baskets = .58 Chance is “Lumpy” H H H T T T H H H H H H H H T T T T H H % heads = .75 T % heads = .63 H H T % heads = .58 So how do we decide? H T H H Sample proportion = .75 Inferential Statistics helps us answer the question: Given a fair coin tossed four times, how often would we get the result 75% heads by chance alone? Answer: If we took a fair coin and repeated this procedure many times, we’d get this result one out of every four times. Pretty often! So differences we see between samples might not be reliable (especially when the differences are small or the samples are small) Inferential statistics can tell us whether or not our results are likely to be due to chance alone Important Point of Clarification Statistics asks: Was this observed “effect” caused by (lumpy) chance alone? Random Causes: Inferential statistics separates Fluctuations of chance Non-random causes: True differences in the population Bias in the design of the study A statistically significant result doesn’t mean the results have to be “true”. Just that they are non-random. Inferential Statistics Descriptive Statistics Probability Theory Types of Analyses IV -- categorical (groups); DV -- continuous One Sample T-test. Inferences about the mean of one group Two Sample T-test. Differences between the means of two groups. ANOVA. Differences between the means of three or more groups. 50 40 30 First Grade Third Grade 20 Fifth Grade 10 0 Score Types of Analyses IV -- continuous; DV -- continuous Correlation. The linear association between two continuous variables Regression. The best fit line of prediction. 10 9 8 Sleep 7 6 5 4 3 2 1 0 0 10 20 30 40 Age 50 60 70 80 Types of Analyses IV -- categorical (groups); DV -- categorical Z-test for proportions. The difference between two sample proportions. Chi-square test. The distribution of counts in each category, compared across groups. Narcicists * Ma le Cros stabulation Coun t Male .0 0 Narcici sts To tal 1.00 To tal .0 0 19 53 72 1.00 15 34 51 10 4 66 13 8 Fallibility of Everyday Reasoning Everyday Statistical Reasoning 1. Something out of nothing: the misperception of random data. 2. Too much from too little: the misinterpretation of incomplete data 3. Seeing what you expect: biased evaluation of ambiguous data Misperceiving Random Data “The human understanding supposes a greater degree of order and equality in things than it really finds; and although many things in nature be most irregular, will yet invest parallels and conjugates and relatives where no such thing is.” -Francis Bacon • The clustering illusion People do not intuitively expect chance to be lumpy. They reject the possibility that clustering can be random. “Hot hand” in basketball. “Winning streak” or “hot table” in gambling. Gilovich et al., 1985 • Interviewed 100 basketball fans • 91% thought a player has a better chance of making a shot after having just made his last 2-3 shots than he does after having just missed his last 2-3 shots. • They estimated that a player’s shooting percentage would be 61% after having just made a shot and 42% after having just missed a shot. • 84% of the respondents thought that it is important to pass the ball to someone who has just made several shots in a row. The data Gilovich et al., 1985 • On average, players made 51% of shots after making their previous shot, 54% of shots after missing their previous shot. • They made 50% of shots after making their previous two shots, 53% after missing their previous two shots. • They made 46% of shots after making their previous three shots, 56% of shots after missing three in a row. • There were no more streaks of 4, 5, or 6 hits in a row than chance would have predicted. The players, however, believed that they tended to shoot in streaks. The data Gilovich et al., 1985 • A group of college b-ball players were asked to take 100 shots. Before each shot they chose either a risky or conservative bet on their ability to make the shot. •They tended to make risky bets after hitting their previous shot and conservative bets after missing their previous shot. • However, there was no correlation between the outcome of consecutive shots. No correlation between bets and outcomes. The response Gilovich et al., 1985 “Who is this guy? So he makes a study. I couldn’t care less.” -Red Auerbach, Celtics “There are so many variables involved in shooting the basketball that a paper like this doesn’t mean anything.” -Bobby Knight • Selective Attention • Post-hoc causal explanations Dangers of Post-Hoc theorizing! LAW of LARGE NUMBERS The correct proportion of heads and tails or hits and misses will be present globally in a long sequence. It will NOT, however, always be present locally, in each of its parts. Misinterpreting Incomplete Data “They still cling stubbornly to the idea that the only good answer is a yes answer. If they say, “Is the number between 5,000 and 10,000” and I say yes, they cheer; if I say no, they groan, even though they get exactly the same amount of information in either case.” -John Holt “Are professors particularly likely to be absent-minded?” Absent-Minded Not Absent-Minded Professors 600 400 Not Professors 300 200 “Does carrying an umbrella make it less likely to rain?” Rain Umbrella No umbrella No rain “Does the Cosmo horoscope predict the future?” Event happens Cosmo predicts event Cosmo doesn’t predict event Event doesn’t happen Can alternative medical technique X help cancer patients who have been diagnosed as “incurable”? Patient recovers Patient fails to recover Patient gets alternative med 500 4000 Patient does not get alternative med 700 3800 • Selective attention •Available information •Positive test strategy A B 2 3 “All cards with a vowel on one side have an even number on the other.” • Selective attention •Available information •Positive test strategy •Under-appreciation of base rates Watch out for incomplete data! Event occurs Event hypothesized No event hypothesized Event does not occur III. Projecting onto Ambiguous Data “I’ll see it when I believe it.” -Thane Pittman • Illusory correlations When people “see” an association that is not present in the data. “Arthritis pain is influenced by the weather.” “Most women get bad moods before their menstrual periods.” Chapman et al., 1967 • Why do clinical psychologists continue to use projective tests even though dozens of studies have shown these tests are not valid indicators of personality? •Showed clinicians a series of Rorschach cards as well as the patient’s response to the card and some info describing the patient’s characteristics. (including sometimes sexual orientation). • Examined the correlations that clinicians “saw” between particular responses and homosexuality. Chapman et al., 1967 • In truth, there are some counter-intuitive relationships. Homosexuals are more likely to see a monstrous figure on one card and an ambiguous animal-human figure on another card. •Many of the intuitive relationships do not hold. Homosexuals are not more likely to see anal content, feminine clothing, or humans of uncertain gender. Chapman et al., 1967 • In Study 1, researchers designed the materials so that there was no correlation between any of the responses and homosexuality. •Clinicians did, however, believe the highly intuitive -- but invalid -- correlations. Chapman et al., 1967 •In followup studies, researchers designed the materials so that there was a negative correlation between the intuitive responses and homosexuality. •The size of the illusory correlation was not reduced. • Clinicians may “see” non-existent correlations between test responses and diagnoses • Managers may “see” non-existent correlations between employees’ race or gender and performance •Parents may “see” nonexistent correlations between children’s sugar consumption and unruly behavior • Students may “see” nonexistant correllations between their peers’ college majors and personalities. Much of what we “learn” from experience may reflect our prior theories about reality rather than the actual nature of reality. Everyday Statistical Reasoning 1. Something out of nothing: the misperception of random data. - Drawing strong conclusions from small “lumpy” samples 2. Too much from too little: the misinterpretation of incomplete data - Inadequate comparison groups 3. Seeing what you expect: biased evaluation of ambiguous data - Illusory correlation based on confirmation bias But there’s hope … Following training in probability and statistics, people are less likely to make these errors. Fallibility of Statistical Reports Everyday Reasoning 1. Something out of nothing: the misperception of random data. 2. Too much from too little: the misinterpretation of incomplete data (~control groups) 3. Seeing what you expect: biased evaluation of ambiguous data Statistical Reports 1. One thing out of something else: overgeneralization from biased samples and measures 2. Too much from too little: the misinterpretation of incomplete data (~control groups) 3. Getting what you expect: biased presentation of ambiguous data Overgeneralizing from Biased Samples 1934 Election Poll - In 1934, the Literary Digest predicted that Alf Landon would beat Franklin D. Roosevelt in the presidential election, based on approx 2 million survey responses - How could a study with such a large sample be so wrong? Selection bias? But participants were selected randomly from phone books… - Other polling agencies with smaller samples but more representative methods accurately predicted Roosevelt’s win Overgeneralizing from Biased Samples Sperm Study - In early 1996, media raised the alarm about declining sperm counts, as a result of a book published by Colburn, an environmentalist - The book relied heavily on a 1992 Danish meta-analysis reviewing 61 papers published between 1938 and 1991, in which a total of 14,947 men had their sperm tested. - Found a “significant” decline in sperm count: from 113 m sperm per ml in 1940 to 66 m sperm per ml in 1990 Overgeneralizing from Biased Samples Sperm Study - Sample: Pre-1950 596 one study! 1951 1000 1952-1970 184 1970-1991 13,167 -The entire “decline” was carried by the single 1951 sample -From 1970-1991, sperm counts actually increased Misinterpretation of Incomplete Data Crime Study - Murders significantly fell in NYC in the last decade: from 2,245 in 1990 to 596 in 2003 presumed cause: Giuliani - Murders significantly fell all across the country from 1990 to 2003 - Crime started dropping in NYC in 1990, four years before Giuliani became mayor. Misinterpretation of Incomplete Data Unwed Mothers Study - In October 1996, NCHS issued data showing that the rate of births to unwed mothers had declined from 46.9 per thousand in 1994 to 44.9 per thousand in 1995. The first decline in 20 years. Front page coverage in the NYtimes and LAtimes. - Clinton trumpets the results as a success for his new welfare policies (instituted in 1996) - Not mentioned: from 1993-1994 there was the largest one-year increase in out-of-wedlock births since national figures have been kept Biased Presentation of Ambiguous Results • Selective presentation of results Day Care Study -In 1996, media publicized the results of a study presented at an NICHD conference claiming that the bond between mothers and babies is not weakened when the child is placed in day care -Study measured the presence or absence of “secure attachment” in infants -Overall no difference in day care versus home care babies Biased Presentation of Ambiguous Results • Selective presentation of results Day Care Study -What the media did not highlight: a more confusing picture emerges when the averages are broken out -Baby boys were most likely to be insecurely bonded when they were in day care for more than thirty hours a week -Baby girls were most likely to be insecurely bonded when they were in day care for less than 10 hours a week Biased Presentation of Ambiguous Results • Selective presentation of results Psychology Research - Researchers sometimes present significant results and fail to present null (or opposing) results. -Sometimes you can catch them – look at their methods section and see how many tests they must have run and how many they reported. Biased Presentation of Ambiguous Results • “Spin” or specialized emphasis of particular results Mortgage Study -In 1995 a study by the Federal Reserve Bank of Chicago, showed that among people with bad credit ratings, 10% of white applicants are denied mortgages while 20% of black and Hispanic applicants are denied mortgages. -In the same study, however, it was found that compared to past years, approved mortgages rose by 55% for black applicants rose by 55%, 45% for Hispanic applicants, and 16% for white applicants. Biased Presentation of Ambiguous Results • “Spin” or specialized emphasis of particular results Mortgage Study -In 1995 a study by the Federal Reserve Bank of Chicago, showed that among people with bad credit ratings, 90% of white applicants are granted mortgages while 80% of black and Hispanic applicants are granted mortgages. -In the same study, however, it was found that compared to past years, approved mortgages rose by 55% for black applicants, by 45% for Hispanic applicants, and by only 16% for white applicants. Biased Presentation of Ambiguous Results • “Spin” or specialized emphasis of particular results Mortgage Study -The NYtimes did not report the second finding until the fourth paragraph of the article -They also reported denial rates rather than approval rates. -In approval terms, the comparison is 90% versus 80%. In denial terms, the comparison is 10% versus 20%. -“Twice as likely to be denied”. (makes you think people of color were half as likely to be accepted, but actually they were 88% as likely to be accepted) Biased Presentation of Ambiguous Results • “Spin” or specialized emphasis of particular results Psychology Research - Sometimes a p-value of 0.10 is treated as “not significant” (especially if the researcher did not predict the effect) - Other times the same p-value is emphasized as “marginally significant” (esp if the researcher predicted the effect) Can’t always trust intuition - Learn more about possible pitfalls in intuitive decision making Can’t always trust statistical reports - Learn more about how to evaluate statistical reports and research findings Practice Washington Post April 12, 2000 “Government-funded medical surveys since 1960 have shown higher rates of at least one type of cancer – varying from thyroid tumors to leukemia – at most of the major facilities that produced nuclear weapons.” What’s problematic about this statement?