Textbook Credits • Textbook Shavelson, R.J. (1996). Statistical reasoning for the behavioral sciences (3rd Ed.). Boston: Allyn & Bacon. • Supplemental Material Ruiz-Primo, M.A., Mitchell, M., & Shavelson, R.J. (1996). Student guide for Shavelson statistical reasoning for the behavioral sciences (3rd Ed.). Boston: Allyn & Bacon. 2 Overview • Example data on Teacher Expectancy Research • Frequency Distributions • Measures of Central Tendency & Variability • The Normal Distribution 3 Teacher Expectancy Data 4 Research on Teacher Expectancy Study Design • A 2 x 6 x 2 (treatment x grade level x test occasions) randomized experiment Schematic of design: Occasion 1 Grade Treatment Occasion 2 1 Experimental X (“Bloomers”) 2 IQ Pretest 3 4 Control 5 Randomly Assigned 6 5 IQ Posttest Teacher Expectancy Data Matrix • Another convenient way to easily depict the data 6 Teacher Expectancy Frequency Distribution • Posttest scores from the treatment group 7 Sum = 30 Frequency Distribution Using Class Intervals • Treatment group posttest scores divided into 11 class intervals • Each class interval size is 3 (score values 123, 124, 125, …) • Clear patterns emerge. Look @ interval 114-116 with highest f 8 Frequency Distribution Using Class Intervals • • • • Use 11 intervals Number of Class Intervals: Highest – Lowest score: 125-95 = 30 Class intervals size (i) = H–L / # of class intervals: 30/11 = 2.7(round to 3) Rule: Lowest interval score must be divisible by interval class size. Lowest score 95 is not divisible by 3 so subtract 1. 94 is still not divisible by 3 so subtract 1. 93 is divisible by 3 so lowest class interval score begins with 93. 9 Teacher Expectancy: Histogram • Histogram showing the class interval posttest scores on the abscissa and frequency on the ordinate • Lower and upper limit scores with zero values are shown. 10 Teacher Expectancy: Polygon • Polygon showing the class interval posttest scores on the abscissa and frequency on the ordinate • Lower and upper midpoint values with zero f are shown. 11 Teacher Expectancy: Polygon • Polygon showing the class interval posttest scores on the abscissa and frequency on the ordinate • Lower and upper midpoint values with zero f are shown. 12 Teacher Expectancy: Polygon 13 Teacher Expectancy: Stem-and-leaf plot • Stem-and-leaf plot containing the data matrix posttest scores in increments of 5’s 14 Common Frequency Distribution Shapes Normal Distribution(bell shape) Unimodal distribution : 1 peak Positively Skewed Symmetric about the mean Bimodal Distribution: 2 peaks Multimodal Distribution: > 2 peaks Negatively Skewed 15 Rectangular Distribution(no peaks) Kurtosis (peakedness): platykurtic Symmetric about the mean Kurtosis (peakedness): leptokurtic The Relative Frequency (Probability) Distribution • Score frequencies are shown as a proportion of the total number of frequencies in the sample: RF = f / total # of subjects Sum = 10 Sum = 20 Total Sum = 30 16 The Relative Frequency Polygon • Relative frequency polygons are constructed as the frequency polygons except the relative frequency is listed in the ordinate 17 The Cumulative Frequency Distribution • The cumulative frequency distribution shows the number of scores falling below a certain point on the scale of scores • The cf of a score is defined as the number of cases falling below the upper real limit of the class interval 18 The Cumulative Frequency Polygon • The cumulative frequency polygon uses upper real limits and cumulative frequency 19 Cumulative Proportions and Percentiles • 80% of the subjects in the experimental group of the expectancy study received a posttest score below 119.5 • CP = CF / total number of subjects • C% = CP x 100 20 Percentile Scores • Example: A raw score of 113.5 has a percentile rank of 57 21 Measures of Central Tendency The central tendency of the set of measurements - that is, the tendency of the data to cluster, or center, about certain numerical values. Central Tendency (Location) 22 Measures of Central Tendency The variability of the set of measurements–that is, the spread of the data. Variation (Dispersion) 23 Standard Notation Measure Sample m Mean Size Population n N 24 Mean • Most common measure of central tendency • Acts as ‘balance point’ • Affected by extreme values (‘outliers’) • Denoted as where 25 Mean Example 26 Median 27 Median Example: Odd Size Sample • Raw Data: 24.1 22.6 21.5 23.7 22.6 • Ordered: 21.5 22.6 22.6 23.7 24.1 • Position: 1 2 3 4 5 28 Median Example: Even Size Sample 29 Mode • Measure of central tendency • Value that occurs most often • Not affected by extreme values • May be no mode or several modes • May be used for quantitative or qualitative data 30 Mode Example • No Mode Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7 • One Mode Raw Data: 6.3 4.9 8.9 6.3 4.9 4.9 • More Than 1 Mode Raw Data: 21 28 41 28 31 43 43 Measures of Variability • Measure of dispersion • Difference between largest & smallest observations Range = xlargest – xsmallest • Ignores how data are distributed 32 Standard Notation 33 Sample & Population Variance The variance is the average of the squares of the distance each value is from the mean. n s2 x i x 2 i1 n 1 x1 x x2 x 2 2 L xn x 2 n 1 x m 2 2 N N in the denominator! n – 1 in the denominator! 34 Standard Deviation The standard deviation is the square root of the variance. s s2 n 2 x x i 2 i1 n 1 x1 x x2 x L xn x n 1 2 2 2 35 2 x m N Sample Standard Deviation Formula S = = 2.523 36 Overview of the Normal Distribution • Serves as a reasonable good model of many natural phenomena • Provides a good model for the frequency distribution of scores • Of particular importance is in inferential statistics as a probability distribution • There exists a close connection between the sample size and the distribution of means calculated for many samples of subjects drawn from the same population • As the sample size increases, the distribution of scores becomes normal • May provide a good approximation to probabilities of other distributions that are more difficult to work with 37 Properties of the Normal Distribution • It is unimodal, observing the value of X and the mean • It is symmetric about the mean; ½ the scores fall below the mean and ½ the scores fall above the mean • The mean, mode, and median are all equal • It is asymptotic (never touches the abscissa) • It is continuous for all values of X from - ∞ to +∞ 38 Properties of the Normal Distribution • • • • Unimodal Symmetric Mode=median=mean Asymptotic 39 Empirical Rule of the Normal Distribution Areas 40 Interpretation of z-Scores Example • Approximately 68% of the measurements will have a z-score between –1 and 1. • Approximately 95% of the measurements will have a z-score between –2 and 2. • Approximately 99.7% of the measurements will have a z-score between –3 and 3. 41 Empirical Rule Example 42 Computing the z-Score 43 z-Score Example 1 44 Z-score Example 1 Find the area between the mean and a given raw score • z score: Mean 0, s = 1 – Distance between a score (X) and the mean of a distribution in standard deviation (s) units – Used to display and interpret areas of the normal distribution Assume score is 9, mean = 8, s = 2 z = (x - mean) / s z = (9-8)/2 = 0.5 • Next, find the area between the mean and z = 0.5 From Table B Column 2 in Appendix II we find: 0.1915 0.1915 or 19.15% of the cases 45 Z-score Example 2 Find the area below a given raw score • z score: Mean 0, s = 1 – Distance between a score (X) and the mean of a distribution in standard deviation (s) units – Used to display and interpret areas of the normal distribution Assume score is 9, mean = 8, s = 2 z = (x - mean) / s z = (9-8)/2 = 0.5 Below! • Mark off area in the ND • Find area below z = 0.5 • From Table B column 3 we find: 0.6915 or 69.15% of the cases 46 Z-score Example 3 Find the area above a given raw score • z score: Mean 0, s = 1 – Distance between a score (X) and the mean of a distribution in standard deviation (s) units – Used to display and interpret areas of the normal distribution Assume score is 9, mean = 8, s = 2 z = (x - mean) / s z = (9-8)/2 = 0.5 • Mark off area in the ND • Find area above z = 0.5 Above! • From Table B column 3 we find: 0.3085 or 30.85% of the cases 47 Z-score Example 3 Find the area between two given raw scores • z score: Mean 0, s = 1 – Distance between a score (X) and the mean of a distribution in standard deviation (s) units – Used to display and interpret areas of the normal distribution Assume 1st score is 9, 2nd score is 5.8, mean = 8, s = 2 z = (x - mean) / s z = (9-8)/2 = 0.5 • • z = (x - mean) / s z = (5.8-8)/2 = -1.1 Find area between mean and z = 0.5 Find area between mean and z = -1.1 zX=5.8 = -1.1 Between! zX=9 = .5 • Total = 0.1915 + 0.3643 = 0.5558 0.5558 or 55.58% of the cases 48 Practice Exercises 1. Select a hypothetical product or a process and create some test data of your choice (plausible, no more than 10) as shown in textbook/class 2. Show your type of experimental approach 3. Create a detailed table of frequency distributions 4. Display your data with different types of graphs 5. Calculate the measures of central tendency and variability 6. Calculate the Z-score(s) and indicate the relative position in the normal distribution. 7. Provide any other pertinent information as a result 49 Questions ? 50