STA2023 E.Philias The normal distribution is the most important and most widely used of all probability distributions. A large number of phenomena in the real world are normally distributed either exactly or approximately. The continuous random variables representing heights and weights of people, scores on an examination, weights of packages( e.g., cereal boxes, boxes of cookies), amount of milk in a gallon, life of an item ( such as a light-bulb or a television set), and time taken to complete a certain job have all observed to have a (approximate) normal distribution. Part 3 Chapter 7: Probability Distributions for Continuous Random Variables-The normal distribution COMMON PATTERNS OF FREQUENCY DISTRIBUTION 1) Bell type or normal distributions 2) Positively skewed 3) Negatively skewed Continuous Probability distribution The probability distribution for a continuous random variable, x, can be represented by a smooth curve-a function of x, denoted f(x). The curve is called a density function or frequency function. The probability that x falls between two values, a and b, i.e., P (a < x< b), is the area under the curve between a and b. Normal random variable is a continuous random variable with a frequency curve that is smooth, symmetric, and bell-shaped. Normal or Bell Curve is the frequency curve for a normal random variable. Normal probability distribution is the probability model for a normal random variable. Parameters of a normal probability distribution are the mean (µ) and standard deviation (σ) of the associated normal random variable. Normal population is a population in which a normal random variable has been defined. . . STA2023 E.Philias Normal distributions 1) Its peak occurs directly above the mean µ (mu) 2) The curve is symmetric about the vertical line through the mean (That is, if you fold the graph along this line, the left half of the graph will fit exactly on the right half). 3) The curve never touches the x-axis- it extends indefinitely in both directions. 4) The area under the curve (and above the horizontal axis) is 1. Notice: A normal distribution is completely determined by its mean µ and standard deviation sigma σ. The area of the shaded region under the normal curve from a to b is the probability that an observed value will be between a and b. Standard Normal Distribution The normal distribution with mean µ = 0 and sigma σ =1 is called the standard normal distribution Z Values or Z Scores Z-scores are the values of the standard normal variable. They indicate the number of standard deviations that any value of a normal random variable deviates from the mean The units marked on the horizontal axis of the standard normal curve are denoted by z and are called z values or z scores. A specific value of z gives the distance between the mean and the point represented by z in terms of the standard deviation. Example: A point with a value of z = 2 is two standard deviations to the right (above the mean) of the mean. Similarly, a point with a value of z = -2 is two standards deviations to the left (or below the mean) of the mean. Property of Normal Distribution If x is a normal random variable with mean π and standard deviation π , then the random variable z defined by the formula has a standard normal distribution. The value z describes the number of standard deviations between x and π STA2023 E.Philias Converting an x value to a z value If a normal distribution has mean π and standard deviation sigma σ, then the z-score for the number x is: Area under a normal curve The area under a normal curve between x =a and x = b is the same as the area under the standard normal curve between the z-score for a and the z-score for b. Procedure for solving problems based upon the Normal Distribution 1) Convert all data items to z scores using the formula 2) Make a sketch of the normal curve. Along the horizontal axis show the mean (z = 0) and all z scores obtained in step 1. Shade the region whose area is desired. 3) Use the Z table to fill in the appropriate percents under the curve, and answer the question. STA2023 E.Philias Technology Step by Step The Normal Distribution TI-83/84 Plus Finding Areas under the Normal Curve Step 1 : From the HOME sreen, press 2nd VARS to access the DISTRibution menu. Step 2: Select 2: normalcdf ( Step 3: With normal cdf ( on the HOME screen, type lowerbound, upperbound, π, σ). For example, to find the area to the left of X = 35 under the normal curve with π = 40 and σ = 10, type normalcdf (-1E99, 35, 40, 10) and hit ENTER. Note: When there is no lowerbound, enter -1E99. When there is no upperbound, enter 1E99. The E shown is scientific notation: it is 2nd , Finding Scores Corresponding to an Area Step 1: From the HOME sreen, press 2nd VARS to access the DISTRibution menu. Step 2 : Select 3: invNorm( Step 3: With invNorm( on the screen, type “area left”, π, σ ). For example, to find the score such that the area under the normal curve to the left of the score is 0.68 with π = 40 and σ 10, type InvNorm ( 0.68, 40, 10) and hit ENTER. STA2023 E.Philias Part 3 Chapter 8: Sampling and Sampling Distributions Sampling Distributions Illustration To illustrate the concept of a sampling distribution, let us construct the one for the mean of a random sample of size n = 2 from the finite population of size N = 5, whose elements are the numbers 1, 3, 5, 7, and 9. 1) The mean of this population is : π = 1+3+5+7+9 5 =5 2) The standard deviation is π = 2.828 Note: In actual applications we never know all the values. Now, if we take a random sample of size n = 2 from this population, there are 5C 2 = 10 possibilities And they are 1 and 3 3 and 7 5 and 7 1 and 5 3 and 9 5 and 9 1 and 7 3 and 5 7 and 9 1 and 9 The means of these samples are Μ = π π+π π = 2, 1+5 1+7 1+9 2 2 2 = 3, = 4, = 5, 3+7 2 = 5, 6, 4, 6, 7, and 8 If sampling is random, then each sample has the probability 1 10 Μ ) of the mean: The sampling distribution (Probability distribution of π Μ π 2 3 4 5 6 7 8 Probability 1 10 1 10 1 10 2 10 2 10 1 10 1 10 STA2023 E.Philias Estimation of π If we did not know the mean of the given population and wanted to estimate it with the mean of a random sample of two observations, this would give us some idea about size of our error. More information Further useful information about sampling distribution of the mean can be obtained by calculating its mean ππΜ and its standard deviation ππΜ Μ : ππΜ = 5 1) Mean of π Μ βΆ ππΜ = 1.732 2) Standard deviation of π Observe that, at least for this example, Μ , equals π, the mean of the 1. ππΜ , the mean of the sampling distribution of π population. 2. ππ₯Μ , the standard deviation of the sampling distribution of π₯Μ , is smaller than π , the standard deviation of the population List of Population parameters and Corresponding Sample Statistics Mean Variance Standard deviation Binomial Proportion Population Parameter π π2 π p Sample Statistics π₯Μ π 2 s πΜ Parameter is a descriptive numerical measure of the population. Parameters are fixed numbers usually unknown because the associated population is very large. Statistic is a descriptive numerical measure of a sample. It varies from sample to sample. Statistics are used to estimate parameters. Random Sampling is a method of sampling for which every possible sample has the same probability of being selected. Sampling Distribution is the probability distribution (model) associated with any statistic when repeated random samples are drawn from the defined population. STA2023 E.Philias Μ Mean and standard deviation of π For random samples of size n taken from a population having mean π and the standard Μ has the mean deviationπ, the sampling distribution of π ππΜ = π And the standard deviation ππΜ = π √π It is customary to refer to ππΜ , the standard deviation of the sampling distribution of the mean, as the standard error of the mean. Μ Shape of the Sampling Distribution of π Μ relates to the following cases. The shape of the sampling distribution of π 1. The population from which samples are drawn has a normal distribution. 2. The population from which samples are drawn does not have a normal distribution. Central Limit Theorem is a statistical property stating that the sampling distribution of the sample mean is approximately normal when the sample size is large enough. Central Limit Theorem According to the central limit theorem, for a large sample size, the sampling distribution of Μ is approximately normal, irrespective of the shape of the population distribution. The π mean and standard deviation of the sampling distribution are ππΜ = π and ππΜ = π √π The sample size is usually considered to be large if n ≥ 30 STA2023 E.Philias Chapter 9: Estimation with Confidence Intervals In statistical inference we make generalizations based on samples and, traditionally, such inferences have been divided into problems of estimation and tests of hypothesis. Estimation is the process of estimating or predicting the value of a population parameter using a random sample and an estimator. Estimator is a formula or statistic defined on sample data with the purpose of estimating a parameter. Estimate is the numerical result obtained by substituting the sample data on any given estimator. Types of estimates: point estimate and interval estimate. Point estimate consists of a single figure predicting the parameter value. Structure of Interval Estimates for Means and Proportions Point estimate ± Margin Error Maximum Error of Estimate (n ≥ 30) Μ as an estimate of π, the probability is 1 - πΌ that this estimate will be “off” When we use π either way by at most π E= π§πΌ⁄2 β π √ The value of π§πΌ⁄2 used here is read from the standard normal distribution table for the given confidence level. The critical value π§πΌ⁄2 denotes the positive value of z for which the area under the standard normal curve to its right is equal to πΌ ⁄2 (Greek lowercase alpha), The three values that are most common used for 1 - πΌ are 0.90, 0.95 and 0.99 Level of Confidence (1- πΌ)β100% 90% 95% 99% πΌ Area in each Tail, 2 Critical Value, π§πΌ 0.05 0.025 0.005 1.645 196 2.575 2 STA2023 E.Philias Sample size for estimating π (n ≥ 30) n=( πβ π§πΌ⁄2 2 ) πΈ Interval estimate consists of a numerical range where the parameter is expected to fall with certain confidence. Confidence coefficient is a probability that measures the reliability of any interval estimate. Confidence level is the confidence coefficient expressed as a percentage. Confidence interval is an interval estimate calculated with a specified confidence level. Margin of error is a measure of the error of estimation that involves the given confidence level and sample size. Precision of any confidence interval is associated with the width of the interval estimate. The precision is better as the margin of error is smaller. Large-Sample Confidence Interval for π The (1 - πΌ ) 100% confidence interval for π is Μ ± π§πΌ⁄2 ππΜ if π is known π Where Μ ± π§πΌ⁄2 π πΜ if π is not known π ππΜ = π √π and π πΜ = π √π The value of π§πΌ⁄2 used here is read from the standard normal distribution table for the given confidence level. The critical value π§πΌ⁄2 denotes the positive value of z for which the area under the standard normal curve to its right is equal to πΌ ⁄2 (Greek lowercase alpha), ο· ο· ο· ο· ο· ο· Μ − π§πΌ⁄2 ππΜ , π Μ + π§πΌ⁄2 ππΜ ) is called a Confidence Interval: An interval like this (π confidence interval. Μ − π§πΌ⁄2 ππΜ and π Μ + π§πΌ⁄2 ππΜ are called confidence limits. Confidence Limit: π Degree of confidence: The probability 1 - πΌ is called the degree of confidence. Significance level: πΌ is called the significance level. If n is at least 30, then the use of s is recommended, even if the value of π is claimed to be known. Critical value: π§πΌ⁄2 is called a critical value. STA2023 E.Philias Interpretation of a confidence Interval for a Population Mean When we form a (1 - πΌ ) 100% confidence interval for π, we usually express our confidence in the interval with a statement such as, “We can be (1 - πΌ ) 100% confident that π lies between the lower and upper bounds of the confidence interval.” For example: With a 95% confidence interval and 0 .476 < π <0.544, we can state that ” We are 95% confident that the interval from 0.476 to 0.544 actually does contain the true value of π. “ Confidence Intervals for Means (Small samples) The t distribution The t distribution is a specific type of bell-shaped distribution with a lower height and a wider spread than the standard normal distribution. As the sample size becomes larger, the t distribution approaches the standard normal distribution. The t distribution has only one parameter, called the degrees of freedom (df). The mean of the t distribution is equal to 0 and its standard deviation is √ππ(ππ − 2) The t distribution is used to make a confidence interval about π if 1. The population from which the sample is drawn is (approximately) normally distributed. 2. The sample size is small (that is, n < 30) 3. The population standard deviation π , is not known. Small-Samples Confidence Intervals for Means (π) The (1 - πΌ )100% confidence interval for π is Where Μ ± π‘πΌ⁄2 π πΜ π π πΜ = π √π The value π‘πΌ⁄2 is obtained from the t distribution table for n -1 degrees of freedom and the given confidence level. STA2023 E.Philias Population and Sample Proportions The population and sample proportions, denoted by p and πΜ (pronounced p hat), respectively are calculated as π π₯ P = π and πΜ = π Where N = total number of elements in the population n= total number of elements in the sample X = number of elements in the population that possess a specific characteristic x= number of elements in the sample that possess a specific characteristic Μ Sampling distribution of π The probability distribution of the sample proportion, πΜ , is called its sampling distribution. It gives the various values that πΜ can assume and their probabilities. Large-Sample Confidence Interval for p The (1 - πΌ )100% confidence interval for the population proportion, p, is πΜ (1 − πΜ ) πΜ ± π§πΌ⁄2 √ π π₯ Approximate Maximum Error of Estimate using πΜ = π to estimate p E = π§πΌ⁄2 √ πΜ(1−πΜ) π STA2023 E.Philias Technology Step by Step Confidence Intervals about π, π Known (Large Samples) TI-83-84 Plus Step 1: If necessary, enter raw data in πΏ1 . Step 2: Press STAT, highlight TESTS, and select 7 : Z Interval. Step 3: If the data are raw, highlight DATA. Make sure List1 is set to πΏ1 . And Freq to 1. If summary statistics are known, highlight STATS and enter the summary statistics. Following π:, enter the population standard deviation. Step 4: Enter the confidence level following C-level:. Step 5: Highlight Calculate; press ENTER. Confidence Intervals about π, π unKnown (Small Samples) Step 1: If necessary, enter raw data in πΏ1 . Step 2: Press STAT, highlight TESTS, and select 8: TInterval. Step 3: If the data are raw, highlight DATA, Make sure List1 is set to π³π and Freq to 1. If summary statistics are known, highlight STATS and enter the summary statistics. Step 4: Enter the confidence level following C-Level:. Step 5: Highlight Calculate; press ENTER. Confidence Intervals about p Step 1: Press STAT, highlight TESTS, and select A: 1-PropZInt.. Step 2: Enter the values of x and n. Step 3: Enter the confidence level following C-level: Step 4: Highlight Calculate; press ENTER.