mean - LICH

advertisement
Probability Tables
Normal distribution table
Standard normal table
Unit normal table
It gives values of the
cumulative distribution
function of the normal
distribution.
http://www.math.unb.ca/~knight/utility/NormTble.htm
• What is the probability that the zscore is lower then 2.37 P(z < 2.37)?
• P(z > 1.82)
• P(-1.18 < z < 2.1)
• Be carefull which probabilities you are
given!
• Cumulative probabilities are most
common.
• However, you can also see tables giving
complementary cumulative (i.e. 1-x, see
above) or cumulative from zero.
• Use the table I gave you in print.
• How many percent of your data lie within +- 1
standard deviation from the mean?
0.8413-(1-0.8413) = 0.6826
• How many standard deviations you must
add/subtract to the mean to cover 80% of
your data?
You’re looking for Z-value with the probability of
0.9. This is 1.28.
80%
10%
10%
• Scores on the Stanford-Binet IQ test follow a
normal distribution. The mean of this
distribution is 100, the standard deviation is
16.
– This is true always, as IQ score is just another
transformed score. Transformed so it has mean of
100 and standard deviation of 16.
So between the mean (100) and
one stdev (116) is 34.14% of the
scores in the population.
Standard distribution table
• Which proportion of scores is between 100
and 125?
~ 44 %
• And which proportion lies between 116 and
125?
~ 9.9 %
Excel
• NORMDIST
– find normal distribution areas for a given point, the
distribution is given by its mean and standard deviation
• NORMINV
– flip side of NORMDIST
– supply a cummulative probability, mean, stdev, score is
returned
• NORMSDIST, NORMSINV
– standard distribution (z-distribution)
• Try it:
– What is the proportion of IQ scores between 116 and 125?
=NORM.DIST(125,100,16,TRUE)-NORM.DIST(116,100,16,TRUE)
Critical value
• A critical value is the value that a test statistic must
exceed in order for the the Ho to be rejected.
• Use Table
• Data set has 13 points, what is the critical value on tdistribution using the 0.05 significance level?
– The values are given as upper tail probability in the table.
Thus you have to look for 0.05/2=0.025. It is 2.18.
• Use NORMSINV to get critical value
• Use NORMSDIST to get p-value
• What is the critical z-value using the 0.05 significance
level?
– table?
– Excel?
t-distribution critical values
The entries in this table
are the critical values tn,p,
where n represents the
number of degrees of
freedom and p is the
upper tail probability.
Entry for t∞,0.05 should
correspond to which
distribution? Verify if they
are really the same.
Confidence
Sampling distribution
A sampling distribution is the
distribution of all possible values of a
statistic for a given sample size.
A sampling distribution - like any other
group of scores - has a mean and a
standard deviation.
The symbol for the mean of the
sampling distribution of the mean
(yes, I know that’s a mouthful) is 𝜇𝑥 .
The standard deviation of a sampling
distribution is a pretty hot item. It has
a special name - standard error. For
the sampling distribution of the mean,
the standard deviation is called the
standard error of the mean. Its symbol
is 𝜎𝑥 .
Central Limit Theorem
• In real world, you never take an infinite amount
of samples, you never create a sampling
distribution of the mean.
• Typically, you draw one sample and calculate its
statistics.
• So if you have only one sample, how can you ever
know anything about a sampling distribution - a
theoretical distribution that encompasses an
infinite number of samples?
• You can figure out a lot about a sampling
distribution because of the CLT.
1. The sampling distribution of the mean is
approximately a normal distribution if the
sample size is large enough (>30).
2. The mean of the sampling distribution of the
mean is the same as the population mean.
x  
3. The standard deviation of the sampling
distribution of the mean (also known as the
standard error of the mean) is equal to the
population standard deviation divided by the
square root of the sample size.   
x
n
• The population that supplies the samples doesn’t have to be a
normal distribution for the Central Limit Theorem to hold.
• What if the population is a normal distribution? In that case,
the sampling distribution of the mean is a normal distribution
regardless of the sample size.
J. Schmuller, Statistical Analysis with Excel For Dummies
The limits of confidence
• Sampling distributions help you to answer the
question: How much confidence can you have in
the estimates you create?
• The idea is to calculate a statistic, and then use
that statistic to establish upper and lower bounds
for the population parameter with, say, 95%
confidence.
• You can only do this if you know the sampling
distribution of the statistic and the standard
error.
Confidence for the mean
• The manufacturer of navigation systems has
developed a new battery to power their
portable model. To help market their system,
they want to know how long, on average, each
battery lasts before it burns out.
• They’d like to estimate that average with 95%
confidence. They test a sample of 100
batteries, and find that the sample mean is 60
hours, with a standard deviation of 20 hours.
• CLT: the sampling distribution of the mean
approximates a normal distribution.
• The standard error of the mean (the standard
deviation of the sampling distribution of the
mean) is  x   n
• σ is unknown, its best estimate is standard
deviation of the sample s.
sx  s
n
 20
100
2
• The best estimate of the population mean is
the sample mean, 60.
• Now you can envision the sampling distribution of the mean.
• Now that you have the sampling distribution, you can establish the
95% confidence limits for the mean.
• This means that, starting at the center of the distribution, how far
out to the sides do you have to extend until you have 95% of the
area under the curve?
• We know the answer: approx. 2 standard errors (from zdistribution, 1.96 is the exact number).
J. Schmuller, Statistical Analysis with Excel For Dummies
• So the upper bound in the sampling distribution is 60 +
1.96 * 2 = 63.92, and the lower bound is 60 - 1.96 * 2 =
56.08.
• This means you can say with 95% confidence that the
battery lasts, on the average, between 56.08 hours and
63.92 hours.
J. Schmuller, Statistical Analysis with Excel For Dummies
How to do this in Excel?
• CONFIDENCE.NORM function
• Try it: mean = 60, s = 20, n = 100
– You actually supply not 95%, but α value, which is 1confidence.
– What did you get and how do you calculate the
confidence interval?
You add/subtract the number
you’ve got from
CONFIDENCE.NORM to/from
the mean.
• For small samples, you have to use t-distribution (with
n-1 DF).
• Suppose the sample consisting of 25 batteries, mean is
still 60.
• What is the estimate of the standard error of the
mean?
– 20/√25 = 4
• DF = 25 – 1 = 24
• TINV - finds the value in the t-distribution that cuts off
the desired area.
• Try it
• What do you have to do now to get the confidence
interval for the mean?
– Multiply TINV’s answer by the standard error of the mean
(4) and find upper/lower limits.
• Excel 2007 and earlier had only CONFIDENCE
function to get confidence intervals for normal
distribution.
• There was no similar function for tdistribution, so the previous procedure had to
be adopted.
• However, since Excel 2010 there are
CONFIDENCE.NORM and CONFIDENCE.T
functions.
• Using CONFIDENCE.T function you should
obtain the same result as before. Try it.
Download