QMM 250 EXAM I KEY 1. A parameter is a descriptive measurement of a population. A statistic is a descriptive measurement of a sample. A statistic is used to estimate its corresponding population parameter. Because there are many samples of a given size that can be drawn from a population, a statistic is a variable, its value depending on what observations are in the particular sample that’s been drawn from the population. A parameter depends on all observations in the population and, therefore, a parameter is unique and constant, as there is only one population of interest. 2. Numerical scale applied to survey question, ranking response level from low to high. Typically, the response values are integer and run from 1 to 5, 1 to 7, or 1 to 9. The level of measurement of a Likert scale is, at least, ordinal, i.e., the value recorded reflects the intensity of some attitude or feeling. Many argue, however, that a properly constructed Likert scale question achieves the interval level of measurement, in that the intervals between values are the same and therefore meaningful, while ratios do not have meaning. (Refer to the discussion on pp. 2829 in the textbook.) 3. The Empirical Rule holds that approximately 68% of observations in a data set fall within 1 standard deviation of the mean (π₯Μ ± π ), approximately 95% of observations fall within two standard deviations of the mean (π₯Μ ± 2π ), and more than 99% of observations fall within three standard deviations of the mean (π₯Μ ± 3π ). The Empirical Rule works best when the data are symmetrically distributed, with greater data frequencies in the middle data range than at the extremes (i.e., when the data have a “bell-shaped” distribution). Thus, to the extent that a given set of data don’t comply with the Empirical Rule, such discrepancy helps inform the data analyst regarding distribution. 4. The classical approach is used when there are n equally likely simple events in the sample space. In this instance, the probability of simple event i is (πΈπ ) = 1/π . The relative frequency approach is used when past history can serve as indicative of a simple event’s probability. In this instance, the probability of simple event i is determined as π(πΈπ ) = ππ /π, where ππ is the frequency of occurrence of event i out of n possible outcomes in the past. The subjective approach is used when neither the classical approach nor the relative frequency approach are applicable to the question at hand. In this instance, one relies on, hopefully, informed/expert judgment to assign probabilities to the simple events in the sample space. 2 5. a. Answers will vary depending on number of classes chosen and class widths. Here’s a histogram I think sensible: Sales Receipt Distribution 12 11 10 Frequency 8 8 7 6 4 4 Frequency 4 2 2 1 1 600 700 2 0 0 500 800 900 1000 1100 1200 1300 1400 Sales Receipts ($) I used the common sense approach in setting the class widths. Specifically, intervals of $100 seemed to make most sense for this particular variable. $100 intervals in turn imply 9 classes given the range of the data. b. Using the histogram, sales receipts appear to typically be about $1,000 per day. In terms of dispersion, receipts range between $800 and $1200 a majority of days (3/4 to be specific), but are as low as $500 and as high as $1400 at the extremes. The distribution of receipts in this histogram is bimodal and isn’t particularly symmetric. c. The modal class is the class in the histogram with the most observations. That’s the $901-$1000 class in my histogram. As noted above, however, my histogram has a second mode (with lower frequency) in the $1101-$1200 range. 3 d. Sales Receipts ($) < 600 601-700 701-800 801-900 901-1000 1001-1100 1101-1200 1201-1300 1301-1400 Cumulative Frequency Frequency 1 1 1 2 2 4 7 11 11 22 4 26 8 34 4 38 2 40 Cumulative Relative Frequency 0.025 0.050 0.100 0.275 0.550 0.650 0.850 0.950 1 e. The ogive plots the cumulative relative frequencies of the sales receipt data. It shows the variable’s cumulative distribution. The ogive can be used to determine what proportion/percentage of sales receipts fall below a particular value. For example, the picture tells me that sales receipts were less than $900 on about 25% of the days. Likewise, I can infer from the ogive that sales fell below about $1125 on 75% of the days. Note that I’ve just used the ogive to approximate the first and third quartiles of the data. I can also use the ogive to infer indirectly the proportion of values that fall above a given value. For example, the ogive shows that sales receipts fell below $1200 4 about 90% of the days. It stands to reason, therefore, that sales exceeded $1200 on 10% of the days in the sample. 6. π₯πππ = 12, π1 = π₯12.75 = 45, π2 = π₯25.5 = 69, π3 = π₯38.25 = 85.25, π₯πππ₯ = 150 According to the boxplot, the typical manager earns $69,000 per year. The middle 50% of the managers earn between$45,000 and $85,250. A few managers earn considerably less than $45,000 (earning as little as $12,000) and some earn considerably more than $85,250 (as much as $150,000). The earnings data are fairly symmetric over the inter-quartile range, but there is also considerable right skewness in the upper end of the distribution, as evidenced by the long right whisker of the boxplot. 7. a. π₯Μ = 15, median = 10, π 2 = 283.2, π = 16.829 . The sample variance is an estimate of the population variance, which is the average squared deviation of the observations from the mean in the population. The sample variance, by virtue of the way it’s calculated (the sample sum of squared deviations from the mean is divided by π − 1 rather than by π) is always slightly larger than the average squared deviation from the mean in the sample. The sample standard deviation estimates the population standard deviation, which measures the typical amount of difference between the observations and the mean of the data. At 16.8, the sample standard deviation seems to be suggesting a lot more dispersion in the data than a cursory inspection of the sample values would suggest (five of the observations are within 13 units of the mean in absolute terms). So the last value clearly exerts a lot of influence on π as well as on π₯Μ . b. The median. The mean is unduly influenced by the outlying last observation. 5 8. a. π(π΄ ∪ π΅) = 0.8 by the general law of addition. b. A and B are not collectively exhaustive because π(π΄ ∪ π΅) < 1. c. π(π΄|π΅) = d. A and B are dependent because . 6 = π(π΄) ≠ π(π΄|π΅) = .667. π(π΄∩π΅) π(π΅) = .4 .6 = .667. 9. a. Age Group Outcome Success (S) Fail (F) Column Total 18-34 .085 .265 .350 35-59 .220 .390 .610 > 60 .005 .035 .040 Row Total .310 .690 1.0 b. π(π) = .31. Obtained from the table by recognizing that this probability is the marginal probability for the Success (S) row in the table. π(π) is the unconditional probability of successfully climbing Mt. Everest. c. π(π|π΄ππ 18 − 34) = π(π∩π΄ππ 18−34) π(π΄ππ 18−34) .085 = .350 = .243. The conditional probability of success given that the climber is in the 18 to 34 age group is .243, which, notably, is somewhat below the unconditional probability of success. d. π(π΄ππ ≥ 60) = .04. Obtained from the table by recognizing that this probability is the marginal probability for the > 60 age column. e. π(π΄ππ ≥ 60|π) = π(π∩π΄ππ ≥60) π(π) .005 = .310 = .016. The conditional probability that a climber is aged 60 or over given that the climber successfully scaled Mt. Everest is only 1.6%. Said alternatively, of the pool of all successful climbers, only 1.6% of them are 60 or older; therefore, 98.4 % of successful climbers are younger than 60. f. Age and success are dependent. b and c show this as . 31 = π(π) ≠ π(π|π΄ππ 18 − 34) = .243. Likewise, parts d and e reveal dependence in that . 04 = π(π΄ππ ≥ 60) ≠ π(π΄ππ ≥ 60|π) = .016. 6 10. a. A false positive occurs when the test shows positive given that the applicant is not a drug user, i.e., π(π|π΄′ ) = .1 . A false negative occurs when the test is negative given that the applicant is a drug user, i.e., π(π ′ |π΄) = .04. b and c. We’re given the unconditional probability of drug use in the applicant pool is π(π΄) = .15, which, in turn, implies the unconditional probability that an applicant is not a drug user, π(π΄′ ) = 1 − π(π΄) = 1 − .15 = .85. Furthermore, the false positive probability implies that π(π ′ |π΄′ ) = 1 − π(π|π΄′ ) = 1 − .1 = .9. This is the probability that the test comes back negative given that the applicant is not a drug user. Finally, the false negative probability implies π(π|π΄) = 1 − π(π ′ |π΄) = 1 − .04 = .96. This is the probability that the test is positive given the applicant is a drug user. Accordingly, the probability tree and corresponding joint probabilities (obtained using the general law of multiplication) are as follows: 7 d. The question here is asking for the conditional probability than an applicant is a drug user given that they test positive. This is (π΄|π) = π(π΄∩π) π(π) π(π΄∩π) = π(π΄∩π)+π(π΄′ ∩π) . The expression after the last equality is Bayes’ Theorem. Using Bayes’ Theorem and the joint probabilities from part c yields π(π΄|π) = e. .144 .144+.085 = .629. The prior probability is the unconditional probability of drug use in the applicant pool, π(π΄) = .15. In other words, there’s a 15% chance that a randomly selected applicant is a drug user. The posterior probability is the conditional probability that an applicant is a drug user given that the test is positive, i.e., π(π΄|π) = .629. Drug use is not generally rampant in the applicant pool, but there’s a much higher probability of drug use among those who do test positive. Why is this latter probability not equal to 1 or something close to 1? Because everyone gets tested, there’s a nontrivial false positive rate at 10%, and most of those tested are not drug users. So it stands to reason that the complement of π(π΄|π), which is π(π΄′|π), must be somewhat positive. Specifically, of those who test positive, 1-.629 = .371 aren’t drug users. Again, this follows from the high false positive rate and the fact that the large majority of those tested aren’t drug users. The drug test isn’t fool proof, but its results should serve as a red flag in the applicant screening process.