Basic Statistics Standard Scores and the Normal Distribution Agenda Standard Scores: Normal Distribution: Where does an observation fall in a distribution of scores? What is “normal” and what does it mean? All data can be arrayed in a “distribution” of data. We describe a set of data by describing the characteristics of its distribution. We do this with tables, graphs, and/or numerical summary measures. Consider the Following Distribution: Score X Frequency f 1 2 3 4 5 1 2 4 2 1 We have described it here with a table. However, it could also describe it with a graph. What type of graph would be appropriate? f Let’s use a Histogram 4-3-- 2-- Score X Freq f 1 2 3 4 5 1 2 4 2 1 1-0-1 2 3 4 5 X It is analogous to stacking 10 boxes (one for each score) on top of one another to form the histogram. We could also use Numerical Descriptive Measures Mode = 3.0 Median = 3.0 X 1 2 2 3 3 3 3 4 4 5 Mean 3.0 n 10 2 XX Variance 1.333 n 1 StandardDeviation Variance 1.333 1.15 These measures describe the full distribution of scores. We will turn now to describing a single score in the distribution Suppose Robert makes a 4 on the quiz and we want to describe his performance. f Score X Freq f 1 2 3 4 5 1 2 4 2 1 Robert’s 4 would be one of the two 4s in the distribution above. 1 2 3 4 5 X Robert’s score is represented by either the red or green rectangle in the graph above. To better understand Robert’s score, we can add “relative frequency” and “cumulative frequency columns Score Frequency X f Relative Frequency rf 1 2 3 4 5 1 2 4 2 1 1/10 = .1 2/10 = .2 4/10 = .4 2/10 = .2 1/10 = .1 10 Cumulative Frequency cf .1 .3 .7 .9 1.0 Cumulative Percentage c% 10 30 70 90 100 Relative Frequency = f/sum Example (rf for 4): 2/10 = .2 Score Frequency X f Relative Frequency rf 1 2 3 4 5 1 2 4 2 1 1/10 = .1 2/10 = .2 4/10 = .4 2/10 = .2 1/10 = .1 Cumulative Frequency cf .1 .3 .7 .9 1.0 Cumulative Percentage c% 10 30 70 90 100 This table enables us to better interpret Robert’s score. In fact, we know that he scored as well as or better than 90% of the students who took the quiz. Since this is the definition of a “Percentile Rank” (the percentage of scores equal to or below a particular score), we not only can better interpret the score, we can compute it. We can see this graphically using the histogram. rf .40 -.30 – .20 – .10 – Represents 90% of Total Distribution .00 – 1 2 3 4 5 X Continuous Distributions Placement of a score within a continuous distribution is analogous to a discrete distribution that we just reviewed. However, the computation is not the same. It is quite easy to identify percentile ranks for scores that fall an exact number of standard deviations above or below (or on) the mean, but we do not yet have the tools for others. RECALL THE EMPIRICAL RULE • For any symmetrical, bell-shaped distribution: • approximately 68% of the observations will lie within 1s of the mean (m); • approximately 95% within 2s of the mean (m); and • approximately 99.7% within 3s of the mean (m). Recall that a normal distribution has the following percentages of scores within 1, 2, or 3 standard deviations. 68% 95% 99% 20 30 40 50 60 70 80 Mean=50 SD=10 Normal Distribution m-3s m-2s m-1s m m+1s m+2s m+3 s 2nd Percentile 16th Percentile 50th Percentile 84th Percentile 98th Percentile 99th Percentile Ordinal vs. Interval Standard Scores A percentile rank is based on an area transformation and results in an ordinal score. We can perform a linear transformation (math term) and maintain the same type of measurement scale. Consider a distribution with a mean of 50 and a standard deviation of 10. | 30 | 40 | 50 | 60 | 70 Suppose we transform to a scale with a mean of 0 and SD of 1. | -2 | -1 | 0 | 1 | 2 This type of scale change is referred to as a linear transformation. That means, we maintain all of the relationships in the data, but change the scale by changing the mean and standard deviation. Suppose X has a distribution with a mean of 80 and a standard deviation of 20. The scores are “standard scores” and are depicted by a “z” X - 80 z= 20 First, we subtract the mean Then, we divide by the standard deviation If we did that to every score in the distribution, the distribution of new scores would have a mean of 0 and a SD of 1. This is the most common type of standard score and is often called a z-score. Every other standard score can be derived from a z-score. X X z s To summarize, we can convert any distribution to a standard (or z) distribution by subtracting the mean of the distribution from every score and dividing the results by the standard deviation. This will not alter the shape of the distribution, it will look just like the original distribution. An Example Suppose the distribution of the number of times it takes people to complete a task has a mean of 32 and a standard deviation of 5. It took Bob 38 minutes to complete the task. What is his z-score? zBob XBob X S 38 32 6 1.2 5 5 It took Mary only 29 minutes. What was her z-score? z Mary X Mary X S 29 32 3 0.6 5 5 How to determine the raw score when given the z score and the mean and SD of the distribution. 1. Begin with a z-score. 2. Multiply by the SD. 3. Add the mean. In the previous example, Bob got a z-score of 1.2. Confirm his raw (X) score. X = 5 x 1.2 + 32 = 6 + 32 = 38 Thus, Bob’s raw score was 38. Just like we can transform any distribution to a new distribution with a mean of 0 and a standard deviation of 1 (a z-distribution), we can transform any z-distribution to a new distribution with any mean and standard deviation we desire. In general, we can create a distribution with any mean and SD with the following formula: W = (new SD) · z + (new Mean) or W Sw z W Some Common Score Scales GRE: Mean = 500 SD = 100 GRE = 100 · z + 500 MAT Mean = 50 SD = 10 MAT = 10 · z+ 50 IQ: Mean = 100 SD = 15 IQ = 15 · z + 100 Stanine: Mean = 5 SD = 2 Stanine = 2 · z + 5* *Rounded to whole number with minimum of 1 and maximum of 9. Area Transformations for Normal Distributions If the area under a curve is adjusted so that the total area is 1.00, then the proportion of the area under the curve represents the proportion of scores in that interval. We can see that for the previous example: rf .4— .1 .3— .1 .2— .1 .1 .1 .1 .1 .1 .1 1 2 .1— .0— 3 4 For example, the proportion of students who made a 2 on the quiz is equal to the area in the boxes representing a 2 or 2/10 = .2. Or, the proportion of students who scored 4 or less is .90 or 90%. This also indicates that this student had a percentile 5 X rank of 90. For a Normal Distribution The principle is exactly the same for a normal as any continuous distribution. However, finding the area under a continuous curve requires a level of mathematics not required for this course. Not to worry! Someone has computed all possible areas for the standard normal distribution (z-scores) and placed them in a table. Assume the Total Area Under the Normal Curve is 1.00 1.00 z How do we determine the area under the curve for a z-score of 1.00? The proportion of the area under the curve represents the proportion of scores in that interval. 0 1.0 z What is the area under the curve below 1.0? In order to determine this value, we must learn how to use Table A, Areas under the Normal Curve (Page 525) . 0 1.0 z What is the area under the curve below 1.0? Table A has three columns. A is the z score; B is the area between the mean (z=0) and the selected z score; and C is the area beyond the selected z score. Note that for z = 1.00, area in column B is .3413 and for column C it is .1587 .84 .1587 0 1.0 z What is the area under the curve below 1.0? Since the total area under the curve is 1.00 and the area you do not want is .1587, the red shaded area (area below 1.00) is 1.00 - .1587 = .8413 ≈ .84 We could convert this to a percentage by multiplying by 100. Thus, a z-score of one would also be a percentile rank of 84. Find the percentile rank of a student whose z-score is -1.22 -1.22 0 z First, we must use the Areas Under the Normal Curve Table to determine the area under the curve between the z-score and the tail of the distribution. How can we use the table when it does not include negative z-values? Since the normal distribution is symmetrical, the upper end (+ end) and the lower end (- end) are exactly the same! Find the percentile rank of a student whose z-score is -1.22 -1.22 0 z Thus, the area below (column C) for z = -1.22 is .1112 or .11 rounded off. Multiplying by 100, we find that the percentile rank is 11. The Normal Curve Area Table can also be used to find a z-score when the percentile rank of a student is known. To do this, the area under the curve below the z-score is noted. If the percentile rank is less than or equal to 50, you can go directly to Column C in Table A and determine the corresponding z-score. What is the z-score for a student who had a percentile rank of 33? What is the z-score for a student who had a percentile rank of 33? First, look in Table A and find .3300 or as close as possible in Column C. Note that the z score is 0.44. Since the percentile is less than 50, the z score must be negative. Therefore, the answer is -.44. VERY IMPORTANT!!!!! Note that the z-score is negative. All z-scores for percentiles less that 50 will be negative as they will be to the left of the mean on the graph. — Z | 0 50th %tile + What if a percentile is greater than 50? How do we find its equivalent z-score? The simple answer is that we subtract the percentile from 100 and proceed just as for a percentile of less than 50, except that the z-score would be positive. However, it is easier to see when we use a graph. What is the z-score for a percentile of 78? 78% 22% 0 ? 50 78 z %tile The 78th percentile has to be to the right of the 50th percentile. Thus, the area to the right of the 78th percentile is 100 – 78 = 22 or .2200 in Column C. Summary •To find a z-score from a distribution with mean X and standard deviation S. XX z S •To rescale a z-score to a new distribution with a new mean W and a new standard deviation S. W Sw z W Summary Continued • To convert score to percentile rank in a normal distribution. 1. Convert score to z-score 2. Draw picture of a normal distribution 3. Place z-score on picture 4. Determine area in tail of the distribution 5. If z-score is positive, subtract area from from 1.00 and convert to percent. 6. If z-score is negative, convert area to percent. Summary Continued • To convert a percentile rank to another scale. 1. Draw picture of a normal distribution 2. Place percentile rank in approximate position 3. Determine area in tail. 4. Find z-score in Normal Curve Area Table 5. Determine sign of z-score (- if PR<50, + if PR>50) 6. Convert z-score to proper scale i.e., use X S z X A Comparison of Standard Scores z-scores -3.0 -2.0 -1.0 0.0 1.0 2.0 3.0 Percentiles 1 2 16 50 84 98 99 MAT Scores 20 30 40 50 60 70 80 GRE Scores 200 300 400 500 600 700 800