Standard Normal Distribution

advertisement
Week 5
Dr. Jenne Meyer



Discuss syllabus
Groundrules
Introductions






Descriptive Statistics and Probability Distributions
Research and Sampling Designs
Research Methods and Business Decisions
Data Collection
Data Analysis
Correlation, Linear Regression, and Multiple Regression
Analysis
Research is…



systematic, controlled, empirical, and critical investigation of
hypothetical propositions about the presumed relations among
phenomenon. (University of Phoenix (Ed.). (2001). Statistics and
research methods for managerial decisions [University of Phoenix
Custom Edition e-text]. Cincinnati, OH).
systematic, controlled, empirical, and critical investigation of
phenomenon of interest to decision makers. (book definition).
systematic process of collecting and analyzing data or information in
order to increase our understanding of the phenomena about which
we are concerned or interested. (Leedy and Ormrod, 2001)
Research is…
a systematic, controlled, empirical, and
critical investigation of hypothetical
propositions about presumed relations among
phenomenon.
 What happened
 How it happened
 Why it happened
(give meaning)




Business research is the primary means of
gathering data for decision making
Understanding of the research process can
lead to better decisions
Research helps make better decisions
Understanding the process can help you ask
the right questions







Identify and effectively solve minor problems in the work setting.
Know how to discriminate good from bad research.
Appreciate and be constantly aware of the multiple influences and
multiple effects of factors impinging on a situation.
Take calculated risks in decision making, knowing full well the
probabilities associated with the different possible outcomes.
Prevent possible vested interests from exercising their influence in
a situation.
Relate to hired researchers and consultants more effectively.
Combine experience with scientific knowledge while making
decisions.

Symbols
 (Uppercase Sigma) = Summation

(Mu) = Population mean
 (Lowercase Sigma) = Standard deviation

(Pi) = Probability of success in a binomial trial

(Epsilon) = Maximum allowable error
2
(Chi Square) = Nonparametric hypothesis test
!
= Factorial
H0
= Null hypothesis
H1
= Alternate hypothesis






A single value that summarizes a set of data.
It locates the center of the values
Arithmetic mean
Weighted mean
Median
Mode
Geometric mean
If x1, x2 ,..., xn
denote a sample of n
observations, then the mean of the
sample is called "x-bar" and is denoted
by:
 xi x  x    x
x
n

1
2
n
n
The mean of a population is denoted by
the Greek letter 
.  X / N





Every set of interval data has a mean
All values are included
Mean is unique - only one
Useful to compare two or more populations
Sum of the deviations of each value from the mean will
always be zero
Disadvantage of arithmetic mean
 Because the Mean is sensitive to extreme values, it may not
always be a good representation of the data..
 Can’t use for open-ended (range) data

Example of “skewed” Mean:
Consider the annual incomes of five families in a
neighborhood: $12K $12K $12K $13K $100K
 The Mean income in this case: $29.8K
 In this case, the Mean is “positively skewed” toward the
higher value outlier, and the Mean does not appear to best
represent the income of this neighborhood
 What we need in this case is a measurement that is less
sensitive to large values…..we can consider using the
Median... >>


The midpoint of the values (exactly half are below, half are
above)
 If the number of observations is odd, the median is the “middle
observation”
 If the number of observations is even, the median is the mean or
average of the two middle observations





Used when the mean is not representative due to high value
outliers
Unique number
Not affected by extremely large or small values
Can be used with open-ended range values
Can be used for several measurement types

Using Previous Examples:
Five Incomes: $12K $12K $12K $13K $100K
Median is: $12K (better representation of
neighborhood)
 (# of observations is odd, take the middle value =
$12K)



The value that appears most frequently
▪ Five Incomes Example: $12K $12K $12K $13K $100K
▪ Mode is: $12K




Can be used fir any measurement type
Not affected by extremely large or small
values
Sometimes it doesn’t exist
Sometimes it represents more than one value

Consider Previous Example: Neighborhood Income
 Mean income: $29.8K
 Median income: $12K
 Modal income: $12K

If you were trying to promote that this is an affluent neighborhood, you
might prefer to report the mean income.

If you were trying to argue against a tax increase, you might argue that
income is too low to afford a tax increase and report the median and/or
the mode.

Note: 3 different measures, each valid and informative in their own way,
like all statistics, have potential to inform or dis-inform!





Range
Mean deviation
Variance
Standard deviation
Range = highest value – lowest value
Mean deviation – the arithmetic mean of the absolute values of the deviations
from the mean
 The # deviates of average x amount from the mean
Variance – the arithmetic mean of the squared deviations from the mean
 Compare the dispersion of two or more sets of data
Standard deviation – the square root of the variance
 represents the spread or variability of the data, the average range from the
center point
Normal Curves with Equal Means and Different Standard
Deviations


15
17
19
21
23
25
27
29
31
33
Values of X
35
37
39
41
43
45




simplest measure of variability or spread
Range = Max value – Min value
Can give a misleading picture of the actual pattern of
variation. Two distributions could have the same range
but different patterns of variation.
Is sensitive to extreme data values
X X
X X
X X X
X X X
X X X X X X X X X X X
20 21 22 23 24 25 26 27 28 29 30
X
X X X
X X X X X
X X X X X X X X X X X
20 21 22 23 24 25 26 27 28 29 30

Population variation
=varp(…)

Sample variation
=var(…)
2

(
x


)
i
2 
N
2

(
x

x
)
i
s2 
n 1

Population variation
=stdevp(…)

( xi   ) 2

N
Sample variation
=stdev(…)
s
( xi  x) 2
n 1

Sample standard deviation is most common
use of statistics
s
 ( x  x)
N 1
2
Example:
Numbers
Mean
100,100,100,100,100,100
100
90, 90, 100, 110, 110
100
Standard Deviation
0
10
Computing the standard deviation:
 find the mean (100)
 find the deviation/variance of each value form the mean (-10, -10, 0, 10,
10)
 square the deviations/variances (100, 100, 0, 100, 100)
 sum the squared deviations (100+100+0+100+100 = 400)
 divide the sum by the # of values minus 1 (# of values = 5 – 1 = 4, 400/4
= 100)
 take the square root of the variance (10)
(Will be important in research when you are trying to determine the range of
information.)
To compare dispersion in data sets with dissimilar
units of measurement (e.g., kilograms and ounces)
or dissimilar means (e.g., home prices in two
different cities) we define the coefficient of variation
(CV), which is a unit-free measure of dispersion:
s
CV  100 
x



Two Investments A & B
Which should I pick?
Choices:
Project
A
B
Mean % Std Dev
Return % Return
7.6
3.2
6.8
2.5

Normal distribution
x
68%


95%

99%




If all samples of a particular size are selected
from any population, the sampling
distribution of the sample mean is
approximately a normal distribution. This
approximation improves with larger samples.
(the larger the sample, the more it appears to
be a normal standard distribution)
 The “chance” or “likelihood” of something happening
 a value between zero and one
▪ zero= “cannot happen”; one= “sure to happen”
▪ expressed as a decimal or fraction
Increasing Likelihood of Occurrence
Probability:
0
.5
The occurrence of the event is
just as likely as it is unlikely.
1
 Discrete Probability (discrete random variables):
▪ fixed number of clearly separated outcomes
▪ examples: rolling a die (6 outcomes); coin flip (2 outcomes)
▪ Binomial Probability
 Continuous Probability (continuous random
variables):
▪ infinite number of outcomes within a certain range
▪ example: life expectancies
▪ Workshop 4: find probabilities under bell shaped curve
Probability P(x)
Probabilities are individual,
.50
singular, and unique values;
.40
number of outcomes are limited;
graphed as bars or rectangles
.30
.20
.10
0
1
2
3
4
Number of Cars Sold on a Saturday, x
Not a smooth
curve…unless
sample size
gets large...




Probabilities:
are the area under the standard normal curve
can be an infinite number of values within a certain range
“Z” is a calculated value, indicating the number of standard
deviations from the mean.
Experiment: is a process involving chance or
probability that leads to results called outcomes.
 Outcome: is the result of a single trial of an
experiment.
 Event: is one or more outcomes of an experiment.
 Sample space: the set of all possible outcomes from
an experiment.






Independent Event: if the probability of one event is not affected or
changed by another
Example: Sampling With Replacement - taking random samples from a
population, then replacing the random sample before taking another. As
a result, each random sample is not affected by another. The population
remains with all data intact.
Dependent Event: if the probability of one event IS affected or changed
by another
Example: Sampling Without Replacement - take a random sample from a
population, then do not replace the sample before taking another. As a
result, each sample taken this way will affect each other. Each removed
sample changes the characteristics of the population.
Trial: the act of testing something
5-4

Classical probability  the outcomes of an experiment are equally likely.

Using this classical viewpoint,
P(A)  Probabilit y of an event =
Number of ways that A can occur
Total number of possible outcomes
5-5
Experiment:
A spinner has 4 equal sectors colored yellow, blue,
green, and red. After spinning the spinner, what is the
probability of landing on each color?
Outcomes:
The possible outcomes of this experiment are yellow,
blue, green, and red
Probabilities:
P(yellow) = number of ways to land on yellow = 1
total number of colors
P(blue) =
number of ways to land on blue =
total number of colors
4
1
4
P(green) = number of ways to land on green = 1
total number of colors
P(red) =
number of ways to land on red =
total number of colors
4
1
4
Chapter
7
Discrete Variable – each value of X has its own
probability P(X).
• Continuous Variable – events are intervals and
probabilities are areas underneath smooth
curves. A single point has no probability.
•
Probability Density Function (PDF) – For
a continuous
random variable,
the PDF is an
equation that shows
the height of the
curve f(x) at each
possible value of X
over the range of X.
Normal PDF
Continuous PDF’s:
• Denoted f(x)
• Total area under
curve = 1
• Mean, variance and
shape depend on
the PDF parameters
• Reveals the shape
of the distribution
Normal PDF
Probabilities as Areas
Continuous probability functions are smooth curves.
• Unlike discrete distributions,
the area at any
single point = 0.
• The entire area under
any PDF must be 1.
• Mean is the balance
point of the distribution.
Normal PDF f(x) reaches a maximum at 
and has points of inflection at  + 
Bell-shaped curve
Since for every value of  and , there is a
different normal distribution, we transform a
normal random variable to a standard normal
distribution with  = 0 and  = 1 using the
formula:
x
–

z=

Denoted N(0,1)
A common scale from -3 to +3 is used.
Entire area under the curve is unity.
The probability of an event P(z1 < Z < z2) is a
definite integral of f(z).
However, standard normal tables or Excel
functions can be used to find the desired
probabilities.
Now find P(Z < 1.96):
.5000
.5000 - .4750 = .0250
Now find P(-1.96 < Z < 1.96).
Due to symmetry, P(-1.96 < Z) is the same as P(Z
< 1.96).
.9500
So, P(-1.96 < Z < 1.96) = .4750 + .4750 = .9500
or 95% of the area under the curve.
Suppose John took an economics exam and
scored 86 points. The class mean was 75
with a standard deviation of 7. What
percentile is John in (i.e., find P(X < 86)?
86 – 75 = 11/7 = 1.57
x
–

zJohn =
=
7

So John’s score is 1.57 standard deviations
about the mean.
Suppose John took an economics exam and
scored 86 points. The class mean was 75
with a standard deviation of 7. What
percentile is John in (i.e., find P(X < 86)?
Suppose John took an economics exam and
scored 86 points. The class mean was 75
with a standard deviation of 7. What
percentile is John in (i.e., find P(X < 86)?
normal distribution
p(lower) p(upper)
z x mean std.dev
.9420
.0580 1.57 86
75
7
Suppose John took an economics exam and
scored 86 points. The class mean was 75
with a standard deviation of 7. What
percentile is John in (i.e., find P(X < 86)?
John is approximately in the 94th percentile
For example, let  = 2.040 cm and  = .001
cm, what is the probability that a given
steel bearing will have a diameter
between 2.039 and 2.042cm?
In other words, P(2.039 < X < 2.042)
Excel only gives left tail areas, so break the
formula into two, find P(X < 2.039) and
P(X < 2.042), then subtract them to find
the desired probability:
P(X < 2.042) = .9773
P(X < 2.039) = .1587
P(2.039 < X < 2.042) = .9773 - .1587 = .8186
or 81.9%
suppose we wanted the probability of selecting a foreman who
earned less than $1,100. In probability notation we write this
statement as P(weekly income < $1,100).
suppose we wanted the probability of selecting a foreman who earned less
than $1,100. In probability notation we write this statement as P(weekly
income < $1,100).
=.8413
suppose we wanted the probability of selecting a foreman who earned less
than $1,100. In probability notation we write this statement as P(weekly
income < $1,100).
=.8413
The mean of a normal probability distribution is 500; the standard deviation is
10.
a. About 68 percent of the observations lie between what two values?
b. About 95 percent of the observations lie between what two values?
c. Practically all of the observations lie between what two values?
The mean of a normal probability distribution is 500; the standard deviation is
10.
a. About 68 percent of the observations lie between what two values?
b. About 95 percent of the observations lie between what two values?
c. Practically all of the observations lie between what two values?
a. 490 and 510, found by 500 +/- 1(10).
b. 480 and 520, found by 500 +/- 2(10).
c. 470 and 530, found by 500 +/- 3(10).
A normal distribution has a mean of 50 and a standard deviation
of 4.
a. Compute the probability of a value between 44.0 and 55.0.
b. Compute the probability of a value greater than 55.0.
c. Compute the probability of a value between 52.0 and 55.0.
a. 0.8276: First find z -1.5, found by (44 - 50)/4 and
z = 1.25 = (55 - 50)/4. The area between -1.5 and 0 is
0.4332 and the area between 0 and 1.25 is 0.3944, both
from Appendix D. Then adding the two areas we find
that 0.4332 + 0.3944 = 0.8276.
b. 0.1056, found by 0.5000 - 0.3994, where z = 1.25.
c. 0.2029: Recall that the area for z = 1.25 is 0.3944, and
the area for z = 0.5, found by (52 - 50)/4, is 0.1915.
Then subtract 0.3944 - 0.1915 and find 0.2029.

Problem 7.25 (p272)
The Layton Tire and Rubber Company
wishes to set a minimum mileage
guarantee on its new MX100 tire.
Tests reveal the mean mileage is
67,900 with a standard deviation of
2,050 miles and that the distribution
of miles follows the normal
distribution. They want to set the
minimum guaranteed mileage so that
no more than 4 percent of the tires
will have to be replaced. What
minimum guaranteed mileage should
Layton announce?
Draw it out
Notice that there are two unknowns, z and X. To find X,
we first find z, and then solve for X. Notice the area
under the normal curve to the left of  is .5000. The
area between  and X is .4600, found by .5000 .0400. Now refer to Appendix C. Search the body of
the table for the area closest to .4600. The closest
area is .4599.
Notice that there are two unknowns, z and X. To find X, we first
find z, and then solve for X. Notice the area under the normal
curve to the left of  is .5000. The area between  and X is
.4600, found by .5000 - .0400. Now refer to Appendix C.
Search the body of the table for the area closest to .4600.
The closest area is .4599.
X  67,900
 1.75 
2050
X  67,900
 1.75 
2050
-1.75(2050) = X – 67,900
X = 67,900 – 1.75(2050)
X = 64,312
So Layton can advertise that it will replace for free any tire that
wears out before it reaches 64,312 miles, and the company will
know that only 4 percent of the tires will be replaced under this
plan.
Download