Review of MGT 2110 Descriptive Statistics Probability distribution Estimation (Confidence interval) Inference (Hypothesis testing) Descriptive Statistics Numerical measures o Mean, Median, Mode o Variance and standard deviation o Percentiles o Quartiles and Interquartile-Range o Frequency distribution (use Frequency array-function) Graphical Presentations o Histogram o Scatter Diagram (for two columns of data) Probability Distribution Random Variable (RV): A numerical description of the outcome of an experiment. Discrete RV: A random variable that can take a countable set of values. For instance, if an experiment consists of inspecting 10 laptops produced by a manufacturer, then a random variable X can be defined as the number of defective laptops in the lot. The possible values for X are any number from zero to 10. Continuous RV: A random variable that can take an uncountable range of values. For instance, if an experiment consists of measuring the amount of toothpaste in a 6 oz. tube, then a random variable X can be defined as the amount of toothpaste in a tube. The possible values for X could be any value between 5.8 oz. To 6.2 oz. The values within the range are not countable. Probability Distribution: A description of how the probabilities are distributed over the values the random variable can assume. Probability distribution for a discrete RV is called a discrete probability distribution. Probability distribution for a continuous RV is called a continuous probability distribution. Continuous probability distribution: Normal Probability Distribution: A continuous probability distribution. The normal distribution is a symmetrical distribution with a mean, , and a standard deviation, . Example A department store has determined that its customers charge an average of $500 per month, with a standard deviation of $80. Assume the amounts of charges are normally distributed. a. What percentage of customers charges less than $340 per month? b. What percentage of customers charges more than $380 per month? c. What percentage of customers charges between $644 and $700 per month? d. What is least dollar amount of the top 10% of customer charges? e. What are the minimum and maximum of the middle 95% of customer charges? Four Excel functions for answering the above questions To find probabilities using normal distribution: =NORM.S.DIST(z,1) z must first be calculated before using this function. Returns cumulative probability =NORM.DIST(X,,) Returns cumulative probability for X To find value of X, given normal probability: =NORM.S.INV(probability) Returns the Normal table value of z Then, X may be computed using X = + z =NORM.INV(Probability,,) Returns the value of X for the given cumulative probability Estimation (Confidence Interval) Confidence Interval for population mean ( Assume a simple random sample of size n Point Estimation: Sample Statistic Size Mean Standard deviation ± SE S SE = Sampling Error = t 2 . n n Population Parameter N S Confidence Interval = (Always use t, use Z only if is known) Then, Confidence interval for x t 2 . S n Two methods for calculating confidence interval Method A – Using Excel TINV function Step 1 Find t-table value using the Excel function =T.INV.2T(,df) = 1 – Confidence level df = degrees of freedom Step 2 Determine the sampling error (SE) SE = t/2 S/√n Step 3 Calculate the lower and upper limits of the confidence interval LL = UL = – SE + SE Method B – Using Excel Data Analysis command Step 1 Run Descriptive Statistics command from Data Analysis command with Confidence Level for mean checked The output includes the sampling error – the last item of the output table, Confidence Level Step 2 Calculate the lower and upper limits of the confidence interval LL = UL = – SE + SE Example 1 A sample of 100 cans of coffee showed an average weight of 13 ounces with a standard deviation of 0.8 ounces. Develop and interpret a 98% confidence interval for the mean weight of coffee in the cans. Example 2 For the Net Income as a % of equity, develop and interpret a 97% confidence interval for the mean. Confidence Interval for population proportion (p Assume a simple random sample of size n Point Estimation: Sample Statistic Size Mean Population Parameter n Confidence Interval for p = N p ± SE Estimating Sampling Error (SE) = z 2 . p (1 p ) n Then, Confidence interval for p = p z 2 . p (1 p) n Step 1 Find z-table value using the Excel function Step 2 Determine the standard error estimate =ZINV(/2) . Step 3 p (1 p ) n Determine the sampling error (SE) SE = z 2 . Step 4 Calculate the lower and upper limits of the confidence interval LL = UL = p (1 p ) n – SE + SE Example In a poll 600 voters were asked whether they were in favor of eliminating plastic bags in grocery stores. 390 of the voters were in favor and 210 of the voters were opposed. Develop a 92% confidence interval estimate for the proportion of all the voters who are opposed to the proposal. Inference (Hypothesis Testing) Step 1: Set up the null and the alternative hypotheses. Three types of hypotheses Type For population mean For population proportion p Two-tailed H0: p = p0 Ho: = a H1: p ≠ p0 Ha: ≠ a One-tailed H0: p ≤ p0 Ho: ≤ a H 1: p > p0 Ha: > a One-tailed H0: p ≥ p0 Ho: ≥ a H1: p < p0 Ha: < a Step 2: Decision rule for testing the hypotheses Possible results of a Hypothesis Test H0 is accepted Correct decision H0 is true Type II error H0 is false H0 is rejected Type I error Correct decision Decision rule: Reject H0 if the probability of type I error <= , where, = Level of significance. i.e. the maximum tolerable value for the probability of type I error up to which the H0 can be rejected Note: Probability of type II error = Step 3: Compute p-value and reject H0, if p-value <= . Case 1: For hypotheses about , use t-distribution for p-value p-value = T.DIST.2T(abs(t),df) for two tailed test = T.DIST.RT(abs(t),df) for one tailed test Where, t X a , df = degrees of freedom = n-1, and k = number of tails, 1 or 2. S n Case 2: For hypotheses about p, use z-distribution for p-value p-value = 1 - NORMSDIST(abs(z)) for one-tailed tests p-value = 2*(1 - NORMSDIST(abs(z)) for two-tailed tests Where, z p p0 . p0 (1 p0 ) n Example 1: A sample of 81 account balances of a credit company showed an average balance of $1,200 with a standard deviation of $126. Determine if the mean of all account balances is significantly different from $1,150. Use a .05 level of significance. Example 2: It is assumed that at least half the membership of a national trade union is female. A random sample of 400 members showed 168 women. Does the sample show that the proportion of women among the membership is less than 50%? Use a .05 level of significance for this hypothesis test. Example 3: It is normally assumed that the net income as % equity for the companies in the population is no more than 13%. However, test whether the sample data shows that the net income as % equity for the companies in the population is now greater than 13%. Use a .01 level of significance. When to use .INV and .DIST functions Use .INV for find table values for confidence intervals only Use .DIST for find p-value fop hypothesis testing only Using .INV functions for Confidence Interval If Sigmas are known: Table value Z/2 = NORM.S.INV(cell containing the value of 1-/2) If Sigmas are unknown: t/2 = Table value T.INV.2T(,df) Using .DIST functions for Hypothesis testing If Sigmas are known: Step 1: Find Z using formula (don’t use functions like NORM….) Step 2: p-value for 2-tailed test: (1-NORM.S.DIST(ABS(Z-calculated),1))*2 p-value for 1-tailed test: (1-NORM.S.DIST(ABS(Z-calculated),1)) If Sigmas are unknown: Step 1: Find t using formula (don’t use functions like T.INV ot T.DIST...) Step 2: p-value for 2-tailed test: T.DIST.2T(ABS(t-calculated),df) p-value for 1-tailed test: T.DIST.RT(ABS(t-calculated),df)