Probability Tables Normal distribution table Standard normal table Unit normal table It gives values of the cumulative distribution function of the normal distribution. http://www.math.unb.ca/~knight/utility/NormTble.htm • What is the probability that the zscore is lower then 2.37 P(z < 2.37)? • P(z > 1.82) • P(-1.18 < z < 2.1) • Be carefull which probabilities you are given! • Cumulative probabilities are most common. • However, you can also see tables giving complementary cumulative (i.e. 1-x, see above) or cumulative from zero. • Use the table I gave you in print. • How many percent of your data lie within +- 1 standard deviation from the mean? 0.8413-(1-0.8413) = 0.6826 • How many standard deviations you must add/subtract to the mean to cover 80% of your data? You’re looking for Z-value with the probability of 0.9. This is 1.28. 80% 10% 10% • Scores on the Stanford-Binet IQ test follow a normal distribution. The mean of this distribution is 100, the standard deviation is 16. – This is true always, as IQ score is just another transformed score. Transformed so it has mean of 100 and standard deviation of 16. So between the mean (100) and one stdev (116) is 34.14% of the scores in the population. Standard distribution table • Which proportion of scores is between 100 and 125? ~ 44 % • And which proportion lies between 116 and 125? ~ 9.9 % Excel • NORMDIST – find normal distribution areas for a given point, the distribution is given by its mean and standard deviation • NORMINV – flip side of NORMDIST – supply a cummulative probability, mean, stdev, score is returned • NORMSDIST, NORMSINV – standard distribution (z-distribution) • Try it: – What is the proportion of IQ scores between 116 and 125? =NORM.DIST(125,100,16,TRUE)-NORM.DIST(116,100,16,TRUE) Critical value • A critical value is the value that a test statistic must exceed in order for the the Ho to be rejected. • Use Table • Data set has 13 points, what is the critical value on tdistribution using the 0.05 significance level? – The values are given as upper tail probability in the table. Thus you have to look for 0.05/2=0.025. It is 2.18. • Use NORMSINV to get critical value • Use NORMSDIST to get p-value • What is the critical z-value using the 0.05 significance level? – table? – Excel? t-distribution critical values The entries in this table are the critical values tn,p, where n represents the number of degrees of freedom and p is the upper tail probability. Entry for t∞,0.05 should correspond to which distribution? Verify if they are really the same. Confidence Sampling distribution A sampling distribution is the distribution of all possible values of a statistic for a given sample size. A sampling distribution - like any other group of scores - has a mean and a standard deviation. The symbol for the mean of the sampling distribution of the mean (yes, I know that’s a mouthful) is 𝜇𝑥 . The standard deviation of a sampling distribution is a pretty hot item. It has a special name - standard error. For the sampling distribution of the mean, the standard deviation is called the standard error of the mean. Its symbol is 𝜎𝑥 . Central Limit Theorem • In real world, you never take an infinite amount of samples, you never create a sampling distribution of the mean. • Typically, you draw one sample and calculate its statistics. • So if you have only one sample, how can you ever know anything about a sampling distribution - a theoretical distribution that encompasses an infinite number of samples? • You can figure out a lot about a sampling distribution because of the CLT. 1. The sampling distribution of the mean is approximately a normal distribution if the sample size is large enough (>30). 2. The mean of the sampling distribution of the mean is the same as the population mean. x 3. The standard deviation of the sampling distribution of the mean (also known as the standard error of the mean) is equal to the population standard deviation divided by the square root of the sample size. x n • The population that supplies the samples doesn’t have to be a normal distribution for the Central Limit Theorem to hold. • What if the population is a normal distribution? In that case, the sampling distribution of the mean is a normal distribution regardless of the sample size. J. Schmuller, Statistical Analysis with Excel For Dummies The limits of confidence • Sampling distributions help you to answer the question: How much confidence can you have in the estimates you create? • The idea is to calculate a statistic, and then use that statistic to establish upper and lower bounds for the population parameter with, say, 95% confidence. • You can only do this if you know the sampling distribution of the statistic and the standard error. Confidence for the mean • The manufacturer of navigation systems has developed a new battery to power their portable model. To help market their system, they want to know how long, on average, each battery lasts before it burns out. • They’d like to estimate that average with 95% confidence. They test a sample of 100 batteries, and find that the sample mean is 60 hours, with a standard deviation of 20 hours. • CLT: the sampling distribution of the mean approximates a normal distribution. • The standard error of the mean (the standard deviation of the sampling distribution of the mean) is x n • σ is unknown, its best estimate is standard deviation of the sample s. sx s n 20 100 2 • The best estimate of the population mean is the sample mean, 60. • Now you can envision the sampling distribution of the mean. • Now that you have the sampling distribution, you can establish the 95% confidence limits for the mean. • This means that, starting at the center of the distribution, how far out to the sides do you have to extend until you have 95% of the area under the curve? • We know the answer: approx. 2 standard errors (from zdistribution, 1.96 is the exact number). J. Schmuller, Statistical Analysis with Excel For Dummies • So the upper bound in the sampling distribution is 60 + 1.96 * 2 = 63.92, and the lower bound is 60 - 1.96 * 2 = 56.08. • This means you can say with 95% confidence that the battery lasts, on the average, between 56.08 hours and 63.92 hours. J. Schmuller, Statistical Analysis with Excel For Dummies How to do this in Excel? • CONFIDENCE.NORM function • Try it: mean = 60, s = 20, n = 100 – You actually supply not 95%, but α value, which is 1confidence. – What did you get and how do you calculate the confidence interval? You add/subtract the number you’ve got from CONFIDENCE.NORM to/from the mean. • For small samples, you have to use t-distribution (with n-1 DF). • Suppose the sample consisting of 25 batteries, mean is still 60. • What is the estimate of the standard error of the mean? – 20/√25 = 4 • DF = 25 – 1 = 24 • TINV - finds the value in the t-distribution that cuts off the desired area. • Try it • What do you have to do now to get the confidence interval for the mean? – Multiply TINV’s answer by the standard error of the mean (4) and find upper/lower limits. • Excel 2007 and earlier had only CONFIDENCE function to get confidence intervals for normal distribution. • There was no similar function for tdistribution, so the previous procedure had to be adopted. • However, since Excel 2010 there are CONFIDENCE.NORM and CONFIDENCE.T functions. • Using CONFIDENCE.T function you should obtain the same result as before. Try it.