Population Mean Mean = (sum of the values / the number of the value) if probabilities are equal Compute the population mean Population/Sample mean: 1. Collect the data 2. sum all the values in the population/sample. 3. divide the sum by the number of elements in the population/sample. Median The median is a center value that divides a sorted list of data into two halves. Data Array Data that have been arranged in numerical order. Mode Is the value in a data set that occurs most frequently. Percentile location value i = (P/100) (n+1) p desired percentile n number of values in the data set. The pth percentile in a data array is a value that divides the data set into two parts. The lower segment contains at least p%, and the upper segment contains at least (100 – p)%, of the data. The 50th percentile is the median. Box and Whisker plots 1. sort the data values from low to high 2. find the 25th percentile ( first quartile), 50th percentile (median), 75th percentile 3. draw a box so that the ends of the box at Q1 and Q3, This box wil contain the middle 50% of the data values in the population or sample 4. Draw a vertical line through the box at the median. Half the data values in the box will be on either side of the median. 5. Calculate the interquartile range (IQR = Q3 – Q1). Compute the lower limit for the box and whisker plot as Q1 – 1.5(Q3-Q1) and upper limit Q3 + 1.5(Q3 – Q1). Any data values outside these limits are referred to as outliers. 6. extend dashed lines(call the whiskers) from each end to the box to the lowest (on the left) and highest value (on the right) within the limits. 7. any value outside the limits (outlier) found in 5 is marked with an asterisk(*). Range R = Maximum value – minimum value Interquartile Range IQR = Q3 – Q1 Variance The population variance is the average of the squared distances of the data values from the mean. The sample variance is the “average” (divide by n-1 instead n) of the squared distances of the data values from the mean (“residuals”). Standard Deviation Positive square root of the variance. Coefficient of Variation CV = (SD/mean) 100 Standardized Data Values (Z scores) 1. compute the population mean and SD or the sample mean and SD 2. use these formulas: Z = (x – mean) / SD For samples Z = (x – sample mean) / sample SD Using Tree Diagram Independent Events Two events are independent if the occurrence of one event in no way influences the probability of the occurrence of the other event. Probability Rule P(E1 or E2) = P(E1) + P(E2) – P(E1 and E2) For two mutually exclusive events P(E1 or E2) = P(E1) + P(E2) Conditional probability P(E1 | E2) = P(E1 and E2) / P(E2) It reads “probability of Event E1 given event E2 has occurred.” The sample space is E2 and you find the elements in E1 that are also in E2. Conditional Probability for Independent Events P(E1 | E2) = P (E1) And P(E2 | E1) = P (E2) Binomial Use R Pbinom for if you want to find the probability less than or equal to q, Size = number of trials and p = probability of a success at each trial. pbinom( q , number_of_trials, probability of success) Probability of outcome x, use: dbinom( x, number_of_trials, probability_of_success ) Expected value for the binomial is = number_of_trials x probability_of_success Poisson: number of successes when number_of_trials is very large and the probability of a success is very small. λ= number_of_trials x prob_of_success = expected number of successes. Use R dpois (x, lambda ) = prob of x if expected value = lambda Normal distribution Use R Pnorm Sample Error Sample Error = sample mean – population mean Std.Error = SD of the Sample error = population SD / square root of n This is SD of the sampling distribution. To find probabilities associated with a sampling distribution of xbar for samples of size n from a population with mean and SD (if population is normal or if n is large) 1. compute the sample mean 2. Define the sampling distribution Population Mean of Sample mean = population mean SD of sample mean = SD / square root of n 3. define the event of interest 4. Express in terms of a Z value = (Sample Mean – Pop Mean) / ( SD of sample mean) and use pnorm to get the probability Sample proportion 1. “find” p (true probability) 2. find pbar 3. find SD pbar If we have p: sqrt( p(1-p) / n) [Hypothesis testing] (6.10) If only pbar: sqrt( pbar(1-pbar) / n) [Confidence intervals] 4. define the event of interest 5. find the Z value 6. use pnorm Confidence Interval Calculation Point estimate +/- (critical Value (Z or T))(Standard Error of Estimate) Developing a confidence interval estimate for a population proportion 1. define the population of interest and the variable from which to estimate the population proportion. 2. determine the sample size and select a simple random sample. 3. specify the level of confidence and obtain the critical value from qnorm or qt (in R) 4. calculate the pbar, the sample proportion. 5. construct the interval estimate. One tailed test for a hypothesis about a population mean, SD known, large samples 1. Specify the population value of interest. 2. Formulate the null hypothesis and the alternative hypothesis in terms of the population mean. 3. Specify the desired significance level 4. construct the rejection region 5. compute the test statistic. 6. draw the conclusion © T. Lau 2007