Last Update 11th May 2011 SESSION 39 & 40 Continuous Probability Distributions Lecturer: University: Domain: Florian Boehlandt University of Stellenbosch Business School http://www.hedge-fundanalysis.net/pages/vega.php Learning Objectives 1. Population and Samples 2. Point Estimates vs. Confidence Interval Estimates 3. Calculating Confidence Intervals Normal Probabilities Often it may be prohibitively expensive to obtain information on all member of a population. Thus, market researchers usually collect information from a sample or sub-set of the population. The sample statistics (e.g. the sample mean) are calculated and used to estimate the population parameters (e.g. the population mean). This process is know as statistical inference. Notation The notation for sample statistics and population parameters is given in the table below: Population Sample Parameters Statistics Size N n Mean μ x Standard Deviation σ s Proportion P p Inference A point estimator draws inferences about the population by estimating the value of an unknown parameter using a single value or point Sample Statistic Point Estimate = Sample Statistic An interval estimator draws inferences about the population by estimating the value of an unknown parameter using an interval Confidence Interval Estimate Unknown Population Parameter Common confidence intervals include: - 90 % Weak statistical evidence - 95% Strong statistical evidence - 99% Overwhelming statistical evidence Central Limit Theorem The sampling distribution of the mean of a random sample drawn from any population is approximately normal for sufficiently large sample sizes. The larger the sample size, the more closely the sampling distribution of x-bar will resemble the normal distribution. This is an important notation since it allows for using the normal distribution to describe the dispersion of sample means. Example: Tossing n dies and recording the average results Sampling Distribution It can be shown that the sampling distribution is described as follows: If X is normal. X-bar is normal. If X is nonnormal, X-bar is approximately normal for sufficiently large sample sizes. So for the sampling distribution: Changes to: Example Suppose that the amount of time to assemble a computer is normally distributed with a mean μ = 50 minutes and a standard deviation σ = 10 minutes. a) What is the probability that one randomly selected computer is assembled in a time less than 60 minutes? b) What is the probability that four randomly selected computers have a mean assembly time of less than 60 minutes? Solution a) b) 60 60 50 50 10 10 1 4 The associated probabilities are P(Z < 1) = 0.8413 and P(Z < 2) = 0.9772 respectively. Sampling Distribution and Inference The 95% confidence interval (i.e. the area underneath the graph) for the standard normal distribution is expressed algebraically: With the definition of Z for the sampling distribution: Rearrangement yields: Or for the general case: The smaller-than term is referred to as LowerConfidence-Limit (LCL) and the larger-than term as UpperConfidence-Limit (UCL) Example Suppose that the average assembly time across n = 25 computers is X-bar = 50 minutes. In addition, we assume that the population standard deviation is known and is equal to σ = 10 minutes. What is the 95% confidence interval? Comment: α = 1 – CL. Here, α = 1 – 0.95 = 0.05 (or 5%). Thus, α/2 = 0.025. Solution LCL and UCL 1.96 50 10 25 Thus, the LCL = 53.92 and the UCL = 46.08. The interpretation is straight-forward: For n = 25 with σ = 10, there is a 95% chance that the true population mean μ falls in between the LCL = 53.92 and the UCL = 46.08. Finding zα/2 Since CL = 0.95, α = 1 – 0.92 = 0.05. Then α / 2 = 0.025. For one half of the standard normal distribution table, this corresponds to 0.5 – 0.025 = 0.4750 = P(Z < 1.96). Thus, zα/2 = 1.96. Z 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 0 0.3413 0.3643 0.3849 0.4032 0.4192 0.4332 0.4452 0.4554 0.4641 0.4713 0.4772 0.01 0.3438 0.3665 0.3869 0.4049 0.4207 0.4345 0.4463 0.4564 0.4649 0.4719 0.4778 Represents 2.5% of the area underneath the chart 0.02 0.3461 0.3686 0.3888 0.4066 0.4222 0.4357 0.4474 0.4573 0.4656 0.4726 0.4783 0.03 0.3485 0.3708 0.3907 0.4082 0.4236 0.4370 0.4484 0.4582 0.4664 0.4732 0.4788 0.04 0.3508 0.3729 0.3925 0.4099 0.4251 0.4382 0.4495 0.4591 0.4671 0.4738 0.4793 0.05 0.3531 0.3749 0.3944 0.4115 0.4265 0.4394 0.4505 0.4599 0.4678 0.4744 0.4798 0.06 0.3554 0.3770 0.3962 0.4131 0.4279 0.4406 0.4515 0.4608 0.4686 0.4750 0.4803 0.07 0.3577 0.3790 0.3980 0.4147 0.4292 0.4418 0.4525 0.4616 0.4693 0.4756 0.4808 Normal Approximation of the Binomial Distribution The binomial distribution may be approximated using the normal distribution. A graphical derivation of this is included in most statistics textbooks and is omitted here. The upside is that the normal approximation allows us to calculate confidence intervals for the binomial distribution It can be shown that the sampling distribution is described as follows: where p-hat is the proportion of successes in a Bernoulli trial process estimated from the statistical sample. Confidence Interval Binomial Distribution Replacing E(P-hat) for μ and the standard error σ / √n with the standard error of the proportion in the formula for the confidence interval yields: Example In a survey including 1000 people, a political candidate received 52% of the votes cast. What is the 95% confidence interval associated with this result? Solution LCL and UCL 1.96 0.52 0.48 1000 Thus, the LCL = 0.504 and the UCL = 0.536. Note that the LCL is in excess of 0.5 (i.e. from the sample, there is strong evidence to infer that the candidate may win the election).