advertisement

Math 3307 Lecture Notes Perkowsky text Monday format May’13 Jan. 2015 Chapters 8 – 10 Homework Assignments 10 points each problem Homework 6 Chapter 8 2, 4, 6, 8, 10, 14, 16 Homework 7 Chapter 9 70 points 40 points 2, 4, 6, 12 Homework 8 60 points Chapter 10 2, 4, 6, 10, 12, 16 Homework style sheet and rules: Work on one side only; pdf it and upload it before the deadline on the calendar. Work that is poorly scanned or illegible will be given a zero. This includes sideways or upside down scans! Do NOT crowd the work, leave at least 3” between problems. Label the answers carefully so the grader can grade efficiently. 1 Chapter 8 – Distributions from Random Samples 8.1 Random Sampling Let’s go with the book’s comment about defining “random” by what it’s NOT: systematic, logical, having a clear pattern or order. In statistics, random has to do with the process of picking a sample – each element in the population has an equally likely chance to be chosen. Let’s look at Classroom Exploration 8.1 page 235 Let’s read it – will there be repetitions in the scenario? Plan A 24 cards, one name per card Plan B roll a die – the number on top is the row number Plan A Plan B how many possible samples? Equally likely? how many possible samples? Equally likely? Question 3 and Question 4 Picking Amy? Let’s now read page 237 at the top: an exerpt… Note that in this part of the class we are doing inferential statistics – we want to infer some conclusion about the population from our work…and we want to quantify how reliable this conclusion is. 2 Now let’s read the Focus on Understanding project that starts on page 237…and check out the results from doing it on page 240. What do you notice about the dot plots? What can you conclude about small samples vs bigger samples? Note that we look at a range of values for the mean – why do we do this? What are we trying to ensure by doing this? Focus on the discussion on page 241 in the middle of the page for a discussion about these ideas. 3 8.2 The Distribution of Sample Means The mean of a random sample is an estimator of the true population mean. It can be a good estimate or a poor estimate. We want to ensure that it’s a good one! How can we do this? A we want the mean to be unbiased We can check this by finding the expected mean of the SAMPLE means. If the expected mean is the true mean, then the sample is unbiased. Operationally, the more perfectly random your samples, the more unbiased your sample means are. B we want a large sample size, not a small one Operationally, n = 30 is the best minimum sample size, but more is better if you can afford it! When we have these, then the distribution of sample means is normally distributed about the true mean, . This is so important! And took so long to discover! Page 249 The Central Limit Theorem: Regardless of the distribution of the population being sampled, the distribution of sample means taken from random samples of size n is approximately normally distributed when n is large. See the caution on page 240 at the bottom of the last paragraph. 4 The mean of the sample means is the true population mean and the standard deviation is the population standard deviation divided by the square root of n. x x n Let’s discuss that standard deviation: Suppose n is small Suppose n is large Now compare the two dot diagrams on page 240 again. So now, suppose we have 50 samples (random!) and we calculate the mean of each. We then have a list of sample means as our data. We find the mean of these sample means and the standard deviation of these sample means. What do we know about the original population? We know the means are the same and we can multiply our standard deviation to get the original population standard deviation. Do you see how? What DON’T we know? The shape of the original distribution! ACTIVITIES 8 #1 5 Let’s look at the example on page 250: Back to Nicky’s free throws! Recall her distribution (page 250 – mean is .96). Now we’ll look at a simulation of size 50. Let’s go through the calculations to find the mean and standard deviation for the distribution of the sample means. How do you find the mean and the standard deviation? What are the formulas? WHERE are the formulas in the textbook? 6 Now let’s walk through Lauren’s simulation of doing 50 free throws and calculate the probability that Lauren’s sample mean will be within .1 of the actual mean. See page 251 Suppose we do this 4 times and take the AVERAGE mean from those 4 attempts…will this be more accurate than doing it just once? Why or why not? What we are doing here with that “0.1” is finding an error bound or margin of error. The probability that our estimate is within the given error bound is what we calculated in this example. The probability is called the “confidence level” of our estimate. The confidence level of an estimate goes up as n increases. Let’s review our procedure from a Big Picture viewpoint. We got our sample and calculated the mean We then went to z-scores* to find the probability “between” We used Table 1 or our calculators to get the probability We described our confidence level in our estimate *and we used the distribution of the SAMPLE MEANS not the original distribution in our calculations! ACTIVITIES 8 #2 7 SD – Problem 1 A company that specializes in data analysis tests all its applicants for employment by having them solve three short problems that are indicative of the type of work they will be required to perform. An applicant is given a score from 0 to 10 for each problem. From the performances of previous applicants, the sampling distribution of mean scores has been found to be as shown in the table below. Sketch this distribution on the right: Mean Prob 0 .001 1 .005 2 .010 3 .045 4 .060 5 .100 6 .150 7 .350 8 .200 9 .070 10 .009 Use your calculator to find the mean (6.570) and standard deviation (1.63) Page numbers for formulas: Check the Empirical Rule on your distribution. What is the z-score for 8? 8 SD Problem 2 The number of patients admitted per day to a medium-sized regional hospital is 35 with a standard deviation of 10. If, on a given day, there are 60 beds available for new patients, do you think the hospital will have to divert emergency vehicles to another hospital? SD Problem 3 The sampling distribution of X, the number of people who arrive at a cashier’s counter in a bank per minute is given below: X P(X) 0 .36 1 .38 2 .18 3 .06 4 .02 Verify the Empirical Rule. ACTIVITIES 8 #3 9 8.3 The Distribution of Sample Proportions Proportions have a place in statistics. And we use a sample proportion from a random sample to estimate the true proportion of a population that has a specific property often. Let’s look at Classroom Exploration 8.3 on page 253… Let’s look at “drawing more blocks” page 254… And look at the proportion on the bottom of page 254 to see how this differs a bit from a sample mean. Class discussion: What are the differences? “hat” or caret notation is discussed on page 255 at the top…we have special notation to use when we are talking about a sample proportion p The expected value of “p-hat” The standard deviation of “p-hat” page 256 page 258 The distribution – no surprises here! ACTIVITIES 8 #4 10 SP Problem 1 Suppose a warship takes 6 shots at a target, and it takes at least 4 hits to sink the target. If the warship has a record of hitting with 20% of its shots, in the long run, what is the probability of sinking the target. Is this binomial? Sketch the distribution … make a table first. Answer the question. 11 SP Problem 2 Let’s consider the 107th Congress: There are 100 senators (2 per state). At that time, there were 87 males and 13 females. What is the population proportion of each type of senator? (M/F, NOT R/D) Suppose we take 5 random samples of size 10. S1 S2 S3 S4 S5 MFMMFMMMMM MFMMMMMMMM MMMMMMFMMM MMMMMMMMMM MMMMMMMMFM Calculate the sample proportions. Now suppose we go on and do 95 more samples resulting the in the following table: Sketch the frequency table: Prop F Freq 0.0 26 0.1 41 0.2 24 0.3 7 0.4 1 0.5 1 Check that the mean is 0.119 and the standard deviation is 0.100 12 What would the frequency table look like if we did this 10,000 times? 13 SD Problem 3 Here is the population of all 5 US Presidents who had professions in the military along with their ages at inauguration: Eisenhower Grant Harrison Taylor Washington (62) (46) (68) (64) (57) Assume that samples of size 2 are randomly selected WITH REPLACEMENT. How many samples are possible? What is the mean age of each sample? Make a frequency table for these means…is this a sampling distribution? What is the distribution for these means? What is the mean of the table? How does this compare with the actual mean of the presidents? 14 Chapter 9 – Estimating with Confidence 9.1 Confidence Intervals for Proportions Now for a bit more reality. What if we really DON’T know the population mean and standard deviation. What if we CAN’T check our sample means or sample proportions against a true mean? This is usually the case, too. We know that, within some boundaries, x and p are estimators. Let’s put those boundaries on and quantity our certainty or confidence about them. We know that a sample statistic might not be the true value, of course. So we report the sample statistic with a “margin of error” and a percent. For example: our sample mean might be 87…so we’d report: 87 5 with a confidence level of 95% This means that we’re 95% sure that the true mean lies in the interval (82, 93). Note that if you do 100 samples, the true mean would NOT be in the interval 5 times out of the 100 samples! This is what 95% confidence means…5% bad news! We do need a large enough, random sample of course! ( n 30 or n min( p, q) 5 ) You need to be able to approximate your distribution of sample statistics as normal. Typically you start with an industry standard confidence level (90%, 95%, and 99% are the usual). And we’re going to work BACKWARDS. We set a confidence level or use the industry standard…we find the associated z-score…backsolve for the necessary statistic ( x or p ) , and get our error bound. 15 Let’s step through the example on page 269: You’d do your random draws and establish the distribution of the statistic. This time, n = 40 experiments with a population proportion. You’d find the sample proportion mean. Let’s reproduce their distribution. Their sample mean is 0.375 and they want a 95% confidence interval. The sample standard deviation is 0.07655. Let’s get the z-scores for “95% of the data is inbetween these error bounds” Sketch the normal curve and place the “CI” on the x-axis. Half of 5% is 2.5% which is .025. Look in the chart for area .975 (WHY NOT 95%? )…or use 2nd…DISTR….3:invNORM(.975)…enter You’ll get a z-score of 1.96 Now we’ll find that 95% of the area, the probability, is between −1.96 and 1.96 Let’s find our error bounds from this: z-score: p p p let’s discuss what EACH item is We KNOW z-score: 1.96 16 1.96 = .375 pboundaryvalue .07655 Now let’s look at their equation…do you see where it comes from? We’ll find that the boundary sample p is 0.525 so this is our error bound. We’ll report that the mean of our sampling distribution is .375 .525 with 95% confidence. Let’s go through the Focus on Understanding at the bottom of page 270… Note, too that our confidence interval goes from −.15 to .9 Effectively from 0 to 90%....this is because we demanded 95% confidence. If we’d drop down to 60% confidence, our range would shorten dramatically… Sketch this ci as though it were perfect. 17 In general with confidence intervals: High confidence wide range – long interval Lower confidence shorter range – shorter interval See this on the sketch above. Page 272 Still working with sample proportions, let’s find a formula for the error bounds with 95% confidence…that’s going to make our computations quite a bit shorter. We’ll start with their formula from the middle of page 270 p2 p z2 p Now let’s do some substituting p p 1.96 pq n Now our error is the difference between our statistic and our mean: we want the error to be positive so we can add/subtract it easily. p p Substitute in the formula above…the p’s cancel and we get E 1.96 pq n This is somewhat unrealistic! We’ll actually use the p and q from our sampling distribution rather than the true values that we don’t know! This formula makes it simple to see how to change out to confidence levels other than 95%. We just change the “1.96” to a z-score that reflects less confidence. 18 Let’s look at shortening our interval by dropping back to 90% confidence. Sketch the standard normal curve. Mark off a symmetric area of 90%...how much area is in each tail? Translate that to an upper z-score: 2nd…DISTR….3:invNORM(____)…enter Substitute in the error bound formula: 19 Let’s find the size of the confidence interval from above when the sampling mean and sd are 0.375 and 0.07655 with n = 40. These specific z-scores for the confidence levels are call critical values. Let’s make a table with 99%, 95%, and 90% critical values. How will we find the one for 99%? Let’s wrap up this section by checking out the mayfly experiment on pages 278 and 279. First read the example, then let’s go around the room answering the questions. ACTIVITIES 9 #2 20 Confidence Interval for proportions Problems Problem 1 Suppose a pollster interviews 1000 voters and finds that 540 favor building a new elementary school in Houston’s east side. Find an estimate for the proportion in favor of building the school. Find a 95% confidence interval Hints: what is the appropriate z-score and how did you find it Use the formula: p z pq n why? What is the Error Bound and how did you find it? State it in terms of 54% Ez pq n 21 This leads directly to estimating the sample size for a proportion: z N = 2E 2 where z is the appropriate z-score for your desired percent confidence and E is your error bound from above. ACTIVITIES 9 #3 22 Problem 2 Alcohol abuse has been described by college presidents as the number one problem on campus, and it is a major cause of death in young adults. How common is it? In 2000, an article by Henry Wechsler (and colleagues) in the Journal of American College Health reported the following data. “Binge drinking” is defined as having five or more drinks in a row for men and four or more for women. “Frequent” is defined as having three or more binge drinking times in the past two weeks. The survey included 13, 819 students and 3, 140 students were considered frequent binge drinkers. Find a 90% confidence interval for how many binge drinkers there are in the student population of the USA. 23 Problem 3 The most recent Walmart retail worker survey found that 65 out of 100 employees agreed that work stress had a negative impact on their personal lives. What is the 90% confidence for this information? What is the 98% CI for this information? How do they compare? 24 Problem 4 One research firm in 2000 interviewed n = 1156 drivers aged 18 – 20 years old and found that 83% enjoyed driving. Construct a 95% confidence interval for the proportion of 18 – 20 year olds who enjoy driving. Identify the error bound. Do an 80% CI. Compare to the first one…what do you notice? 25 9.2 Confidence Intervals for Means This is a way to estimate a population mean…it’s an interval with a margin for error attached. You can have, for example, 92% confidence that the true mean is in the interval. Finding a 95% confidence interval for the true mean step by step: Note that 95% of the data is inside the curve bounded by z = −1.96 and + 1.96…let’s look at why! Next solve this inequality for (note: it’s the z-score formula) 1.96 x x 1.96 The hard part is ALWAYS be the standard deviation. Using the adjusted sd will help. 26 Suppose we have information from the State of Texas Education Coordinating Board that they have selected 100 tests from fifth graders for analysis (randomly, of course). The sample mean for the tests is 74.4 and the sample standard deviation is 12.4. How will we find the true mean for these kids? How will we describe our ERROR BOUND and what will we report to the parents and tax payers? What if we wanted an 85% confidence interval? A 99% confidence interval? ACTIVITIES 9 #4 27 So what does it mean when the insert in the medicine box says there’s an allergic reaction in .02% of takers plus/minus .01%? Again, a normal distribution with, say 95% of the area marked of symmetrically. How much area in the tails…what is the appropriate z-score? SKETCH: 28 Suppose you want to estimate the mean 4th grade STARR score for the more than 53,000 students in the fourth grade at a Texas school system statewide. At considerable effort and expense, you give the appropriate STARR test to a SRS of 300 Texas fourth grade students and the mean score is 78 with a standard deviation of 15. What can you say about your level of confidence in this statistic? First, it’s normal with a true mean and an adjusted standard deviation! What are those numbers? Now, the Empirical Rule says that 95% of the data is within 2 standard deviations of the mean…let’s look at those numbers – frame the mean with them What do we have here? What is our level of confidence here? How can we report this? 29 Problem 1 A study based on a sample of size 35 reported a mean of 93 with a margin of error of 11 with 95% confidence. Give the confidence interval. Discuss where the true mean might be? What measurement is at the midpoint of the confidence interval? 30 Problem 2 A survey of 1532 recent UH grads found that 175 had loans in excess of $35,000 for their education. Give a 95% confidence interval for the proportion of all student loan borrowers who have loans this size at UH. Is this a proportion question or a mean question? Work the problem! 31 Problem 3 Is a large sample confidence interval valid if the population from which the sample is taken has a distribution VERY different from a normal one? Problem 4 A fact long known but little understood is that twins, in their early years, tend to have lower IQ’s and pick up language more slowly than singletons. Recently, psychologists have found that this may be caused by benign parental neglect – it’s too much for most parents dividing time between two babies. A random sample of 46 sets of 2 year old twins is taken and at the end of one week, the attention time given to each pair is recorded. The mean attention time is 22 hours and the sample standard deviation is 16 hours. Using this data find a 90% CI for the mean attention time given to each pair. 32 Technology makes it simpler: See pages 290 – 291 for getting CI’s from TI’s 33 9.3 Sample Size Which is better and why A SR sample size of 1000 A SR sample size of 45 How do we find the ideal sample size? We use the error bound formula from confidence intervals. Let’s do this: Solve for n Proportions Ez pq n Means s E z n 34 Problem 1 The variance on the weight of a Hershey’s kiss is 1.5129. How many kisses are needed to estimate the mean weight of each candy to within 0.1 grams with 85% confidence? Proportion or Mean? How do you know? 85% confidence n = _______ What about 90% confidence? n = _______ 35 Problem 2 Suppose you want to estimate the number of people who watched a new television show. How large a sample would you need if you want to be 95% confident that your estimate is within 3% of the true value? Proportion or mean? Why? Big hint: see the box page 295, in the middle 36 Chapter 10 – Testing Hypotheses 10.1 What is a Hypothesis Test? One real reason for using statistics is to make a decision. And hypothesis testing provides a format for doing just that when the topic is the numerical value of a statistical parameter. The STEPS are: Formulate the null hypothesis (H nought -- Ho) This is generally a claim like “67% of all dentists use Crest at home” .67 And the alternative hypothesis (Ha) This would be from the competitors. Both must be in mathematical form and the hypothesis comes first: For example .67 .67 Not equal, greater then, or less than all work Select the appropriate test statistic Use an expert here – there are many. We have chosen mean or average so we’d use z-score. But a binomial hypothesis would use proportions. There are LOTS of distributions we haven’t studied in this brief course. 37 TS for means TS for proportions z= xx s n z= p p pq n Determine the decision rule (IN ADVANCE!) Industry standards loom large here. 70% works for most political situations with 99% for medical situations. Deciding your level of confidence totally determines the outcome! Collect the data/Evaluate the test statistic We’ve done some work on this. This is where Sampling Means and proportions show up! Make a decision Reject the hypothesis; cannot reject the hypothesis ACTIVITIES 10 #1 38 Now a four part chart sums up what’s happening Across the top: the hypothesis (Hnought) is true or false (in reality) Down the side: Reject; do not reject Note that we are trying to minimize errors. A Type 1 error (reject Ho when it is really true) and a Type 2 error (don’t reject it when it is actually false). These are linked! Minimizing one generally maximizes the other! The other two outcomes are correct decisions! 39 Let’s look at a courtroom for a non-numerical example Ho Ha the defendant is innocent the defendant is guilty The defendant is convicted or not Set up the 2x2 table. Which error does our system want to minimize! 40 So let’s look at a numerical hypothesis test about means! Example Building specifications in Houston require that residential sewer pipe have a minimum mean breaking strength of 2400 pounds per foot (ppf). A contractor has been having problems with pipe bought from a particular manufacturer and this contractor thinks the pipe does not meet the minimum standard. In an attempt to substantiate this feeling, the contractor hires a testing lab to test a random sample of 55 sections of pipe and finds the following: x 2340 ppf s 200 ppf Is there sufficient evidence to conclude that the contractor is correct? Use an alpha of 10%...ie a confidence level of 90% Note that n = 55 is enough to use the idea that the distribution of SAMPLE means is normally distributed with an adjusted standard deviation. Ho Ha the mean = 2400 ppf nope, it’s less than 2400 ppf Picture: So our critical z-score is z = −1.28 This defines our rejection region! If our test statistic is below this critical value we’ll reject Ho and go with Ha. 41 Calculate our test statistic. NOTE the adjusted sd: z 2340 2400 2.22 200 55 WOW decision time! What do we decide? What do we do next? Give up in despair? Mediation? Court? 42 Note different pictures for different Ha’s Not equal alpha of , for example, 15% two tailed Greater than alpha of 20% one tailed Less than alpha of 5% one tailed Again, there are MANY test statistics and MANY distributions, but the gist of the process is here. ACTIVITIES 10 #2 43 Another example: A research psychologist will administer a test designed to measure self-confidence to a random sample of 50 professional athletes. The psychologist thinks that professional athletes are more self-confident than the population at large. Since the national average on the test is known to be 72, the psychologist does his testing. He finds a sample mean of 74.1 and a standard deviation of 13.3. Is he right with an alpha of .05? What is the picture? Ho: mean = 72 Ha: mean > 72 Test statistic: z 74.1 72 1.12 13.2 50 Not very unusual, but is it enough? Well, our alpha is .05. We find that .05 of the area corresponds to a z score cut off of 1.65 Check your chart. Nope. The TS needed to be HIGHER than 1.65 for him to claim he’s right. We do not reject Ho. This doesn’t mean he’s totally wrong and too stupid to live. It does mean that the results of this test with this random sample don’t support his belief. He won’t get a published article out of this research. 44 Let’s review the types of alternate hypotheses and the rejections regions all in one place Alpha: 10% Less than z < −1.28 left-tailed test Greater than z > 1.28 right-tailed test Not equal z > 1.65 OR < −1.65 two-sided test Why the change in z? Pictures: 45 05% Less than z < −1.65 Greater than z > 1.65 Not equal z > 1.96 OR < −1.96 Pictures: 46 01% Less than z < −2.33 Greater than z > 2.33 Not equal z > 2.58 OR < − 2.58 Pictures: ACTIVITIES 10 #3 47 Problem 1 A consumer advocate group thinks that a cereal manufacturer is wrong about the bran content of their cereal. The manufacturer claims that the cereal has 1.2 oz of bran per serving; the advocacy thinks it’s less than the claimed amount. The group selects 60 boxes randomly from grocery stores all over the country and has an analysis done by an outside lab. The mean is 1.170 and the standard deviation is .111 The group wants an alpha of .05 Who’s claim is supported by the evidence? 48 Problem 2 Ford thinks the new Focus exceeds the EPA recommended mileage recommendation of 43 mpg. They have an independent firm do the testing. 40 cars are selected and tested; the sample mean is 43.6 and the sample standard deviation is 1.3? Are they right? 49 Problem 3 A new pain reliever is being tested on hospital patients. The pain reliever currently in use is effective in 4.0 minutes. The new drug is randomly administered to 50 patients and the relief time is recorded. The sample mean is 3.8 with a sample standard deviation of 1.4 minutes. Is the new drug more effective? Test with an alpha of 10%, then with 1% 50 Problem 4 Brides magazine sponsored a survey of 3600 subscribers and found that 62% of them spent more than $750 on their gown. Use an alpha of .05 to test the claim that less than 65% of brides spend more than $750 on their gown. Comment on using subscribers vs the general population of prospective brides. 51 10.2 Tests about Proportions The reputation and the sales of a manufacturing company can be damaged by shipments that contain large percentages of damaged goods. For example, an artisan who creates earring on Etsy for sale to other businesses wants to keep damaged goods under 3%. A random sample of 300 earrings is selected from a very large shipment from her factory in Indonesia; each is tested and 27 are found to be defective. Does this provide sufficient evidence at the .10 level of significance that this shipment should not be sent? So this is about proportions. What changes? Only the test statistic! Why are we still using z-score? Everything else is the SAME, including the rejection regions! TS: z= p p pq n Let’s list the steps together! ACTIVITIES 10 #4 52 Example 1 There is a LOT of debate about mandatory retirement (with an eye toward draining Social Security, for one reason). A survey was conducted to estimate the fraction of workers who were forcibly retired at 65 and would have preferred to stay on the job. In a random sample of 450 Americans in this situation, it was found that 278 would have preferred to keep working. Is there sufficient evidence that most Americans would prefer to keep on working at an alpha of .05? 53 Problem 2 Of 880 randomly selected drivers 56% admitted to running red lights. This information was used to write that “the majority of Americans run red lights”. Is this accurate at an alpha of .05? 54 10.3 The P-Value for a Test Another style of hypothesis test gives the reader a chance to make a decision about whether or not the results of the test statistic are significant. This method uses the first steps of the traditional test and then, instead of rejecting or not rejecting Ho, a level of significance, called the p-value is reported. Let’s look at one of these using data we’ve already processed. Remember the researcher who felt that athletes were more self-confident? He got a z-score of 1.12 and not the required 1.65 for an alpha of 0.05. What he could have done was figure out the percentile for his results and publish that. What is the p-value for a one-sided test that results in a z-score of 1.12? Look in the chart? Do you see .8686? He could report 87% as the p-value and let the reader decide. Now this was a one-tailed (right sided) test. What about p-value with a “not equals” two-tailed test? Before, with alpha, we split the significance and balanced it between the two tails…for example, an alpha of .05 would have .025 area under the left tail and .025 under the right tail. With a p-value, we need to DOUBLE the area found by looking up the percentile or area under one side. Let’s look at an example: Suppose we have a test and it is a proportion. Our alternate hypothesis is p 0.25 and when we run the sample statistics we come up with a z-score of 2.34. We look in the chart to find that 2.34 is associated with an area of 0.0096. Noting that this is a two-tailed test, we DOUBLE the area to .0192. Note that this is still quite unlikely to occur by chance. This is an excellent p-value to report. You need to be very careful to check to see if you have a one-tailed or two-tailed test before you report a p-value 55 Discussion with problem Suppose we have a simple random sample of 144 body temperatures with a sample mean of 98.15 degrees and a sample standard deviation of 0.65 degrees. Does this provide evidence that perhaps the body temperature of a healthy adult is not 98.6? Report the p value. Ho: Ha: One tailed or two? Test statistic – which formula? P-value and decision? 56 10.4 Tests about Means Problem 1 A single M&M candy is supposed to weigh 0.9085 grams. The Mars Company wants to hit that standard with high reliability. “Every candy every time” is their goal. Over years of testing the standard deviation is known to be 0.03691 A simple random sample of 100 candies is taken by the quality control staff. The mean weight was 0.91470 and the sample standard deviation was 0.03680. The line staff feels they are hitting the standard; the quality control folks are not so sure. Who’s feelings are supported by the data. First compute a p-value, then test at the 0.05 level of significance. 57 Problem 2 One quick way to see if a data sample is random is to look at the last digit in the sample. These should have a uniform distribution with a mean of 4.5 and a standard deviation of 2.87. This is for data that is NOT rounded – that kind of data is bimodal (0 and 5!). Picture Now using data reporting the last digits in lengths for the 73 home runs hit by Barry Bonds in 2001, we get a sample mean of 1.753 and a standard deviation of 2.650 Does it appear that the lengths were accurately measured? 58 Problem 3 The Flesch-Kincaid Grade Level formula gives the grade reading level K of a passage of text. For this formula W is the number of words in the passage, S is the number of sentences, and L is the number of syllables. The formula is K 0.39 W L 11.8 15.59 S W First figure out the grade level for the paragraph about from “The” through “is”. Now, the publisher of Harry Potter series analyzed 36 pages from each book in the series and calculated the mean K to be 4.075 with a sample standard deviation of 1.28. The teachers at Devon Williams Middle School won’t use a book unless the reading level for a typical page can be shown to be 4th grade or higher. Does the evidence support this claim at the 0.05 level or higher? 59 Problem 4 The NC tobacco company claims their low tar cigarette contains at most 40 grams of tar. A consumer advocacy group thinks it’s higher and tests 49 randomly selected cigarettes: Frequency table 47.3 39.3 40.3 38.3 46.3 43.3 39.2 03 02 04 11 09 07 04 Who’s claim is supported by the evidence? ACTIVITIES 10 #5 60 THE END 61