Chapter 2 Location of a Distribution – How an individual falls within a distribution Measures of Relative Standing and Density Curves: #1. Z-Scores – tells how many standard deviations away from the mean the original value falls and in which direction. ** a standardized value a) Standardizing – converting scores from original values to standard deviation units. ** If x is an observation from a distribution that has a known mean and std. deviation then: z = x – mean standard dev. **must know mean and std. deviation ** Observation larger than mean = positive ** Observation smaller than mean = negative DO EX. 2.1 P. 117 * We standardize observations from symmetric distributions to express them on a common scale DO EXERCISES P.118 #1-4 #2. Percentiles a) remember pth percentile – the value with p percent of the observations less than or equal to it b) Ex.#2.3 – if Sally got a 72 and only 2 of the 25 test scores in the class are at or below her then: 2/25 = .08 = 8%-- so she scored in the 8th percentile c) 25/25 = 100% --however, some people define the pth percentile of a distribution as the value with p percent of observations below it –that is why always 99th percentile d) The % of observations falling at or below a particular z score depends on the shape of the distribution ** An observation that is = to the mean has a z score of 0 ** Heavily Left Skewed – mean < median, this observation will be somewhere below the 50th percentile (the median) Chebyshev’s Inequality – a result that describes the % of observations in any distribution that falls within a specified # of standard deviations of the mean. ** In any distribution, the % of observations falling within k standard deviations of the mean is at least (100)(1-1/k²) Ex. k=3 (100)(1-3²) = 100(1-1/9) = .89 or 89% ** gives us insight into how observations are distributed with distributions ------------------------------------------------------------------------------------------------------------ Density Curves Strategy for exploring data from a single quantitative variable: 1. Always plot your data: make a graph, usually a histogram or a stemplot 2. Look for the overall pattern:(shape,center,spread) and for striking deviations such as outliers 3. Calculate a numerical summary to briefly describe center and spread. 4. Sometimes the overall pattern of a large number of observations is so regular that we can describe it by a smooth curve – it helps us describe the location of individual observations in a distribution a) curve is a MATHEMATICAL MODEL for the distribution **a description that gives a compact picture of the overall pattern of the data but ignores minor inequalities as well as any outliers. ** Easier to work with a smooth curve rather than a histogram because a histogram depends on choice of classes- curve doesn’t depend on any choices we make Density Curve – a curve that: a) is always on or above the horizontal axis b) has area exactly of 1 underneath it (represents proportions of the total # of observations) c) is an approximation that is easy to use and accurate enough for practical use **it describes the overall pattern of a distribution. The area under the curve and above any interval of values is the proportion of all observations that fall in that interval Normal Density Curves **mean and median at center Left Skewed Density Curve **mean to left of median Right Skewed Density Curve **mean to the right of median Outliers – which are deviations from the overall pattern are not described by the curve Median of Density Curve – “equal areas point” – the point with half of the area under the curve to its left and the remaining half of the area to its right (divides the area under the curve in half) a) Quartiles – divide the are under the curve into quarters b) Symmetric Density Curves – is exactly symmetric and its median is its center Mean of Density Curve – “the balance point” – the point at which the curve would balance if made of solid material. a) Symmetric Density Curves – mean and median are equal and at the center b) Mean of a skewed distribution is pulled toward the long tail Notation: µ (mu) = Mean of Density Curve σ (sigma) = Standard Deviation Do. Exercises p. 128 #2.9-2.13, 2.15-2.20 -----------------------------------------------------------------------------------------------------------2.2 Normal Distributions – density curves that are symmetric, single-peaked and bell Shaped (normal curves) **All normal distributions have the same overall shape **µ(mean) and median = at center of symmetric curve ** changing µ(mean) without changing σ, moves the normal curve along the horizontal axis without changing its spread. ---The σ controls the spread of the normal curve – the larger the σ, the more spread out the curve ** as we move out in either direction from the center µ, the curve changes from falling ever more steeply to falling ever less steeply Inflection Points – The point at which the change of curvature takes place are located at a distance of σ on either side of µ Special Properties of Normal Distribution: 1. µ and σ specify the shape of the distribution 2. shape of density curve reveals σ Why are Normal Distributions Important in Statistics? 1. Normal Distributions are good descriptions for some distributions of real data (ex. scores on test (bell curve), characteristics of biological populations) 2. Good approximations for the results of many kinds of chance outcomes (such as flipping a coin many times) 3. Many statistical inference procedures based on normal distributions work well for other roughly symmetric distributions. Normal distributions obey the following rule: The 68-95-99.7 Rule – in the normal distribution with mean, µ, and standard deviation σ 1. Approximately 68% of observations fall within 1σ of the mean 2. Approximately 95% of observations fall within 2σ’s of the mean 3. Approximately 99.7% of observations fall with 3σ’s of the mean (also known as Empirical Rule) ** Abbreviation of the Normal Distribution with mean,µ and std. deviation,σ is: N(µ,σ) Do Ex. p. 137 2.23-2.26, 2.28 Standard Normal Distribution – is the normal distribution N(0,1) with mean 0 and standard deviation of 1. **If a variable x, has a Normal Distribution N(µ,σ) then the standardized variable: z=x-µ σ Standard Normal Table – gives areas under the standard normal curve (front cover of book)(area is to the left of z) .9868 z = 2.22 **be sure to pay attention if shading to left or right – if to the right must do 1-… Do Ex. p. 142 #2.29-2.30 Solving Problems Involving Normal Distributions: 1. State the problem in terms of the observed variable x. Draw a picture of distribution and shade the area of interest under the curve. 2. Standardize and draw a picture- standardize z to restate the problem in terms of a standard normal variable z 3. Use the table 4. Conclusion – write a conclusion in the context of the problem Do Example 2.9 in book **In a normal distribution x># and x≥# is the same because often there is no area under the curve exactly above a point on the horizontal axis (this isn’t necessarily true of actual data) ** If we meet a z that falls outside the range of Table A --- look at # closest that is in the table, so you can take the area to be 0 with little loss of accuracy Finding A Value Given a Proportion – Table A backwards. Do Example 2.11 on p. 146 Do Exercises p. 147 #2.31-2.36 ------------------------------------------------------------------------------------------------------------ Assessing Normality Method #1: Construct a histogram or stemplot a) See if graph is ≈ bell shaped and symmetric about the mean b) These can reveal non-normal features such as outliers, skewness c) Improve effectiveness by making points, x, x±σ, x±2σ d) Compare amount of observations at each interval w/68-95-99.7 rule Method #2: Construct a normal Probability Plot (use calculator or software) a) arrange data from smallest to largest- record what percentile of data each value occupies. Ex. of set of 20 – smallest observ. at 5% point b) use standard normal distribution (TableA) to find z scores at same percentile. Ex. z = -1.645 is the 5% point of the standard normal distribution c) Plot each data point x against the corresponding z **if data distribution is close to normal—plotted points will be close to some straight line Use of Normal Probability Points **If the points on a normal probability plot are close to a straight line – the plot indicates data is NORMAL --Systematic deviations from a straight line indicate a NON NORMAL distrib. --Outliers appear as points that are far away from the overall pattern of the plot. --Right Skewed distribution = observations fall distinctly above a line drawn through the main body of points --Left Skewed distribution = when the smallest observations fall below the line **Minor wiggles are ok—look for shapes that show clear departure from Normality Do. Ex. p. 154