Chapter 2 Location of a Distribution – How an individual falls within

advertisement
Chapter 2
Location of a Distribution – How an individual falls within a distribution
Measures of Relative Standing and Density Curves:
#1. Z-Scores – tells how many standard deviations away from the mean the original
value falls and in which direction. ** a standardized value
a) Standardizing – converting scores from original values to standard deviation
units.
** If x is an observation from a distribution that has a known mean and std. deviation
then:
z = x – mean
standard dev.
**must know mean and std. deviation
** Observation larger than mean = positive
** Observation smaller than mean = negative
DO EX. 2.1 P. 117
* We standardize observations from symmetric distributions to express them on a
common scale
DO EXERCISES P.118 #1-4
#2. Percentiles
a) remember pth percentile – the value with p percent of the observations less
than or equal to it
b) Ex.#2.3 – if Sally got a 72 and only 2 of the 25 test scores in the class are at or
below her then: 2/25 = .08 = 8%-- so she scored in the 8th percentile
c) 25/25 = 100%
--however, some people define the pth percentile of a distribution as the value
with p percent of observations below it –that is why always 99th percentile
d) The % of observations falling at or below a particular z score depends on the
shape of the distribution
** An observation that is = to the mean has a z score of 0
** Heavily Left Skewed – mean < median, this observation will be
somewhere below the 50th percentile (the median)
Chebyshev’s Inequality – a result that describes the % of observations in any
distribution that falls within a specified # of standard deviations of the mean.
** In any distribution, the % of observations falling within k standard deviations
of the mean is at least (100)(1-1/k²)
Ex. k=3 (100)(1-3²) = 100(1-1/9) = .89 or 89%
** gives us insight into how observations are distributed with distributions
------------------------------------------------------------------------------------------------------------
Density Curves
Strategy for exploring data from a single quantitative variable:
1. Always plot your data: make a graph, usually a histogram or a stemplot
2. Look for the overall pattern:(shape,center,spread) and for striking deviations such
as outliers
3. Calculate a numerical summary to briefly describe center and spread.
4. Sometimes the overall pattern of a large number of observations is so regular that
we can describe it by a smooth curve – it helps us describe the location of
individual observations in a distribution
a) curve is a MATHEMATICAL MODEL for the distribution
**a description that gives a compact picture of the overall pattern of the
data but ignores minor inequalities as well as any outliers.
** Easier to work with a smooth curve rather than a histogram because a histogram
depends on choice of classes- curve doesn’t depend on any choices we make
Density Curve – a curve that:
a) is always on or above the horizontal axis
b) has area exactly of 1 underneath it (represents proportions of the total # of
observations)
c) is an approximation that is easy to use and accurate enough for practical use
**it describes the overall pattern of a distribution. The area under the curve and above
any interval of values is the proportion of all observations that fall in that interval
Normal Density Curves
**mean and median at center
Left Skewed Density Curve
**mean to left of median
Right Skewed Density Curve
**mean to the right of median
Outliers – which are deviations from the overall pattern are not described by the curve
Median of Density Curve – “equal areas point” – the point with half of the area under
the curve to its left and the remaining half of the area to its right
(divides the area under the curve in half)
a) Quartiles – divide the are under the curve into quarters
b) Symmetric Density Curves – is exactly symmetric and its median is its
center
Mean of Density Curve – “the balance point” – the point at which the curve would
balance if made of solid material.
a) Symmetric Density Curves – mean and median are equal and at the center
b) Mean of a skewed distribution is pulled toward the long tail
Notation: µ (mu) = Mean of Density Curve
σ (sigma) = Standard Deviation
Do. Exercises p. 128 #2.9-2.13, 2.15-2.20
-----------------------------------------------------------------------------------------------------------2.2 Normal Distributions – density curves that are symmetric, single-peaked and bell
Shaped (normal curves)
**All normal distributions have the same overall shape
**µ(mean) and median = at center of symmetric curve
** changing µ(mean) without changing σ, moves the normal curve along the
horizontal axis without changing its spread.
---The σ controls the spread of the normal curve – the larger the σ, the more
spread out the curve
** as we move out in either direction from the center µ, the curve changes from
falling ever more steeply to falling ever less steeply
Inflection Points – The point at which the change of curvature takes place are located at
a distance of σ on either side of µ
Special Properties of Normal Distribution:
1. µ and σ specify the shape of the distribution
2. shape of density curve reveals σ
Why are Normal Distributions Important in Statistics?
1. Normal Distributions are good descriptions for some distributions of real data (ex.
scores on test (bell curve), characteristics of biological populations)
2. Good approximations for the results of many kinds of chance outcomes (such as
flipping a coin many times)
3. Many statistical inference procedures based on normal distributions work well for
other roughly symmetric distributions.
Normal distributions obey the following rule:
The 68-95-99.7 Rule – in the normal distribution with mean, µ, and standard deviation σ
1. Approximately 68% of observations fall within 1σ of the mean
2. Approximately 95% of observations fall within 2σ’s of the mean
3. Approximately 99.7% of observations fall with 3σ’s of the mean
(also known as Empirical Rule)
** Abbreviation of the Normal Distribution with mean,µ and std. deviation,σ is:
N(µ,σ)
Do Ex. p. 137 2.23-2.26, 2.28
Standard Normal Distribution – is the normal distribution N(0,1) with mean 0
and standard deviation of 1.
**If a variable x, has a Normal Distribution N(µ,σ) then the standardized
variable:
z=x-µ
σ
Standard Normal Table – gives areas under the standard normal curve
(front cover of book)(area is to the left of z)
.9868
z = 2.22
**be sure to pay attention if shading to left or right – if to the right must do 1-…
Do Ex. p. 142 #2.29-2.30
Solving Problems Involving Normal Distributions:
1. State the problem in terms of the observed variable x. Draw a picture of
distribution and shade the area of interest under the curve.
2. Standardize and draw a picture- standardize z to restate the problem in terms of a
standard normal variable z
3. Use the table
4. Conclusion – write a conclusion in the context of the problem
Do Example 2.9 in book
**In a normal distribution x># and x≥# is the same because often there is no area under
the curve exactly above a point on the horizontal axis (this isn’t necessarily true of actual
data)
** If we meet a z that falls outside the range of Table A --- look at # closest that is in the
table, so you can take the area to be 0 with little loss of accuracy
Finding A Value Given a Proportion – Table A backwards.
Do Example 2.11 on p. 146
Do Exercises p. 147 #2.31-2.36
------------------------------------------------------------------------------------------------------------
Assessing Normality
Method #1: Construct a histogram or stemplot
a) See if graph is ≈ bell shaped and symmetric about the mean
b) These can reveal non-normal features such as outliers, skewness
c) Improve effectiveness by making points, x, x±σ, x±2σ
d) Compare amount of observations at each interval w/68-95-99.7 rule
Method #2: Construct a normal Probability Plot (use calculator or software)
a) arrange data from smallest to largest- record what percentile of data each
value occupies. Ex. of set of 20 – smallest observ. at 5% point
b) use standard normal distribution (TableA) to find z scores at same percentile.
Ex. z = -1.645 is the 5% point of the standard normal distribution
c) Plot each data point x against the corresponding z
**if data distribution is close to normal—plotted points will be close to some
straight line
Use of Normal Probability Points
**If the points on a normal probability plot are close to a straight line – the plot indicates
data is NORMAL
--Systematic deviations from a straight line indicate a NON NORMAL distrib.
--Outliers appear as points that are far away from the overall pattern of the plot.
--Right Skewed distribution = observations fall distinctly above a line drawn
through the main body of points
--Left Skewed distribution = when the smallest observations fall below the line
**Minor wiggles are ok—look for shapes that show clear departure from Normality
Do. Ex. p. 154
Download