Psych 524, 10/10/05 p. 1/5 Normal Distributions and Sampling Distributions (based on Kirk, Ch.9) The Normal Distribution (9.1) The normal distribution has many applications in science. Many populations are normally distributed (e.g., weight, height, IQ scores) Binomial distributions tend to be normally distributed. But binomial distributions are just one class of a larger set of sampling distributions (distributions of (sample) statistics), which are also often normally distributed. More on this later… The normal distribution is defined by an equation which tells us what the height of the distribution is for any given score: f (X ) 1 2 e ( X ) 2 ( 2 2 ) (remember, π = 3.142 and e = 2.718) You will not be asked to apply this formula! However, please note that the formula is based on the values of µ and σ. Thus, the height of the curve will vary based on the mean and the standard deviation. It is also important to know that one standard deviation unit on either side of the mean marks the inflection point, which is where the shape of the curve turns from being concave to convex (bulging in to bulging out) and vice versa. Converting from Scores to Z-Scores to Areas (9.1) Because the formula allows us to know what score corresponds to what height of the curve (and, by extension, area under certain intervals of the curve), we can convert back and forth between one and the other. Instead of using the formula, though, we can use computers or tables (e.g., Table D.2) to help us with this. Because there are an infinite number of normal distributions (because there are an infinite number of possible means and standard deviations!), using a table to perform these conversions might become unwieldy…we would have to use an infinite number of tables. Psych 524, 10/10/05 p. 2/5 But, all normal distributions take on the same characteristic shape, with a mean of 0 and the inflection points occurring at one standard deviation from the mean. standard normal distribution If we convert all scores into standard deviation units (e.g., a score of 115 on an IQ test with a mean of 100 and a standard deviation of 15 is one standard deviation unit above the mean), we will not change the basic shape of the distribution. Instead, this linear transformation changes the scale into one that can be consistently applied to all normal distributions (mean is 0 and standard deviation is 1, with the inflection points occurring at one SD unit above and below the mean). The distribution that results from this scale change is referred to as the standard normal distribution. The scale of the scores is now in standard deviation units, and these scores are referred to as standard scores, or z-scores. The formula for computing z-scores takes the general form of z = (original score-mean)/standard deviation Again, we’re just quantifying how many standard deviation units a score lies from the mean. These standard scores can be positive (above the mean) or negative (below the mean). Because normal distributions exist in many situations, the exact symbols we use for the z-score formula vary from context to context. For example: for a population: z for a sample: z X XX S for a sampling distribution of the mean: z X X X …more on this later! In order to use the tables to convert a score (or a statistic) into an area under the curve, we must first convert the score into a z-score using an appropriate zscore formula. The table will then give you information about area under the curve. Refer to page 1 of “In Class Exercise” Psych 524, 10/10/05 p. 3/5 Converting from Areas to Z-Scores to Scores (9.1) Here we use the opposite process. We know an area, look up area in the table to get the z-score, then convert the z-score to a score. To make this final conversion, we can use the same formulas as on page 2, but we must solve for X. So, for example, for a population: X =µ + zσ Refer to page 2 of “In Class Exercise” Z-scores and Percentile Ranks Note that the area under a curve represents a proportion; therefore, z-scores can easily be converted to percentile ranks. These relationships can be depicted on a normal curve: It’s important to note that if we were to graph percentile rank on the x-axis using even intervals, the resulting distribution would be rectangular: Although percentile ranks are easily interpretable by the general public, they are less tractable than z-scores and are therefore of little use to statisticians. Psych 524, 10/10/05 p. 4/5 Sampling Distributions (9.3) Recall that, in statistical inference the goal is to draw a conclusion about a population (e.g., make an estimate or conduct a hypothesis test) based on a sample drawn from that population. If we repeatedly sample from a given population, the sample statistics (e.g., the mean and SD) will vary; we can therefore generate a distribution that illustrates this variability. Sampling Distribution: theoretical relative frequency distribution of the values of a statistic (not regular scores!!) that would be obtained by chance from an infinite number of samples of a particular size drawn from a given population This is key to understanding inferential statistics! The sampling distribution allows us to determine what sample statistics are likely to occur by chance and with what probability. When we obtain a statistic about a sample, we are interested in knowing where that statistic falls in a hypothesized sampling distribution. When we conduct hypothesis tests, our hypotheses are evaluated based on distributions of hypothetical samples, not distributions of scores. Sampling Distribution of the Mean (SDOM): theoretical relative frequency distribution of all means that would be obtained by chance from an infinite number of samples of a particular size drawn from a given population Refer to demo at http://www.ruf.rice.edu/~lane/stat_sim/sampling_dist/ (to try this at home, your computer will need to have Java installed) Relationship between SDOM and the Population Distribution of Scores (9.3) Mean of the SDOM: The expected value of the mean of the SDOM is simply the mean of the population of scores; we would expect the “mean of the means” to be the mean of the population. Symbolically, we say: X X Psych 524, 10/10/05 p. 5/5 Standard Deviation of the SDOM Thought question: If we take samples of size 2 or larger, would the means we obtained be more or less variable than the actual scores? It turns out that the standard deviation of the SDOM, which is technically called the standard error, can be quantified as: ˆ X X / n Thus, as the sample size gets larger, the denominator of the equation gets larger, which means that, all else being equal, as the sample size increases, the standard error (standard deviation of the SDOM) gets smaller. This is called the law of large numbers, and it makes sense because larger samples will give you more accurate estimates of the mean, which means there will be less variability in those estimates. Shape of the SDOM If the population of scores is normally distributed, we would expect the SDOM to be normally distributed, too. But what about populations that are not normally distributed? Central Limit Theorem: If random samples are selected from a population with a mean of μ and SD of σ, as the sample size (n) increases, the SDOM approaches normal with mean of μ and SD of X / n . So how big does the sample size need to be for the SDOM of a non-normal distribution to approach normal? The answer will depend on how non-normal the distribution is. Many texts say n >= 25 or 30 is large enough, but Kirk says about 100. Experimenting with the demo (http://www.ruf.rice.edu/~lane/stat_sim/sampling_dist/ ) should give you an intuitive sense of the central limit theorem.