Statistical Reasoning Statistical Reasoning A. Describing data 1. Frequency distributions – Where are the majority of the scores? Multiple Choice 13 A+ 40 12 A 39 38 11 A- 37 10 B+ 36 9 B 35 34 8 B- 33 7 C+ 32 6 C 31 30 5 C- 29 4 D+ 28 3 D 27 26 2 D- 25 1 F 24 <24 Composite Essay 4 41% 11 11 6 4 31% 5 5 4 3 19% 3 2 4 1 8% 2% 1 Mean=34.3 SD=4.2 A 12 23 52% A- 11 10 B+ 10 15 B 9 5 41% B- 8 6 C+ 7 2 C 6 1 5% C- 5 D+ 4 2 D 3 3% D- 2 F 1 0% 0 Mean=10.2 SD=2.0 13 12 11 10 9 8 7 6 5 4 3 2 1 A+ A 11 39% A- 14 B+ 9 B 12 45% B- 8 C+ 2 C+ 2 11% C- 3 D+ D 3 5% DF 0% Mean=9.3 SD=2.3 Statistical Reasoning A. Describing data 1. Frequency distributions 2. Histograms & frequency polygons – ways of showing your frequency distribution data. Histogram Percentage of students Uses a Bar Graph to show data 50 45 40 35 30 25 20 15 10 5 0 A B C Grades D Frequency Polygon Percentage of students Uses a line graph to show data 50 45 40 35 30 25 20 15 10 5 0 A B C Grades D Statistical Reasoning B. Measures of central tendency – 3 types 4, 3, 5, 4, 4 1. Mode=most common=4 (Reports what there is more of – Used in data with no connection. Can’t average men & women.) 2. Mean=arithmetic average=20/5=4 (has most statistical value) 3. Median=middle score=4 (1/2 the scores are higher, half are lower. Used when there are extreme scores) Central Tendency An extremely high or low price/score can skew the mean. Sometimes the median is better at showing you the central tendency. 1968 TOPPS Baseball Cards Nolan Ryan $1500 Elston Howard Billy Williams Luis Aparicio Harmon Killebrew Orlando Cepeda Maury Wills Jim Bunning Tony Conigliaro Tony Oliva Lou Pinella Mickey Lolich $8 $5 $5 $3.50 $3.50 $3 $3 $3 $3 $2.50 Jim Bouton Rocky Colavito Boog Powell Luis Tiant Tim McCarver Tug McGraw Joe Torre Rusty Staub Curt Flood With Ryan: Median=$2.50 Mean=$74.14 $2.25 $2 $2 $2 $2 $1.75 $1.75 $1.5 $1.25 $1 Without Ryan: Median=$2.38 Mean=$2.85 Does the mean accurately portray the central tendency of incomes? NO! What measure of central tendency would more accurately show income distribution? Median – the majority of the incomes surround that number. Statistical Reasoning A. Describing data B. Measures of central tendency C. Measures of variation 1. Range – Difference b/w a high & low score (can be skewed by an extreme score) 2. Variance & standard deviation – How spread out is your data? Calculating Standard Deviation How spread out (consistent) is your data? 1. Calculate the mean. 2. Take each score and subtract the mean from it. 3. Square the new scores to make them positive. 4. Mean (average) the new scores 5. Take the square root of the mean to get back to your original measurement. 6. The smaller the number the more closely packed the data. The larger the number the more spread out it is. Standard Deviation Punt Deviation Distance from Mean 36 38 41 45 36 - 40 = -4 38 – 40 = -2 41 – 40 = +1 45 – 40 = +5 Deviation Squared Numbers multiplied by itself & added together 16 4 1 25 Standard Deviation: variance= 11.5 = 3.4 yds Mean: 160/4 = 40 yds 46 Variance: 46/4 = 11.5 Z-Scores A number expressed in Standard Deviation Units that shows an Individual score’s deviation from the mean. Basically, it shows how you did compared to everyone else. + Z-score means you are above the mean, – Z-score means you are below the mean. Z-Score = your score minus the average score divided by standard deviation. Which class did you perform better in compared to your classmates? Test Total Your Score Average score S.D. Biology 200 168 160 4 Psych. 100 44 38 2 Z score in Biology: 168-160 = 8, 8 / 4 = +2 S.D. Z score in Psych: 44-38 = 6, 6/2 = +3 S.D. You performed better in Psych compared to your classmates. Statistical Reasoning A. Describing data B. Measures of central tendency C. Measures of variation D. Characteristics of the normal curve Skewed Curves A Positive Skew has a tail that goes to the right. A Negative Skew has a tail that goes to the left. Statistical Reasoning A. Describing data B. Measures of central tendency C. Measures of variation D. Characteristics of the normal curve E. Inference 1. Does the sample represent the population? a. Non-biased sample-good b. Low variability-good c. Larger samples-good Statistical Reasoning E. Inference 1. Does the sample represent the pop.? 2. Are differences between groups in your results statistically significant? a. Big differences-good b. Low variability-good c. Big groups-good Statistical Significance p value = likelihood a result is caused by chance • This is bad to a researcher. They want this number to be as small as possible to show that any change in their experiment was caused by an independent variable and not some outside force. • This number can be no greater than 5% for the findings to be considered statistically significant. p ≤ .05 • This means the researcher must be 95% certain their results are not caused by chance. • Replication of the experiment will prove the p value to be true or not.