Research Methods Homework 4: Statistics When submitting this homework assignment to the dropbox “Homework 4” folder, please put the three exercises in one file and name the file using the following format: HW4_your_name.docx (e.g. HW4_Sed_Keller.docx). Assignment Purpose: This assignment will familiarize you with common statistics used in the scientific studies. As scientists, we require a basic understanding of statistics in order to draw conclusions from our data. Here you will be introduced to mean, standard deviation and standard sample error. In later assignments more advanced statistics will be taught. You will want to consider using these in your inquiry 2 data analysis. The Latest Diet Dr. Dangane has a new idea for a diet, based on the following medical logic: By definition, 1 calorie is the heat it takes to raise 1 gram of water by 1 degree. If you drink 1 liter of ice water, you are warming up 1000 grams by 37 degrees, burning 37,000 calories. Since the average person eats only 2000 to 2500 Calories per day, Dr. Dangane figures all his patients need to do is drink a liter of ice water, thereby burning more than 10 times their daily intake. To test this, he gets 2 groups of 30 volunteers who have provided signed consent forms and their current weight. Group A is asked to drink 1 extra liter of warm water at the end of each day. Group B is asked to drink 1 extra liter of ice water each day. After a month, Dr. Dangane weighs each volunteer again, and calculates the weight difference. On average, group A (warm water) weights declined by 1.24 pounds, group B declined by an average of 3.67 pounds. Do these data justify Dr. Dangane’s belief that the ice water diet will be the next big health revolution? That is, how much can one legitimately conclude given this information and no more? No calculations: just comment on the relation between data and conclusions. [For extra credit: could this diet really work? If not, what is wrong with it?] Mean and Standard Deviation Please write down formulas for the following two values: N xi Sample mean: x N i 1 1 N xi x Sample variance: s N 1 i1 2 2 Now, answer the following questions: 1) The variance is a type of average. What is being averaged? you need to square a term in the equation for variance? 2) Why do 3) Taking the square of that term is just one solution to the problem you described in 2008 The University of Texas at Austin Research Methods Homework 4: Statistics answering part (2). Can you suggest a different mathematical operation that would solve the same problem? 4) The standard deviation is defined to be the square root of the variance. Explain why it is conventional to take the square root, using the following set of measurements of pH in Waller Creek to illustrate your logic: pH measurements: 7.34 7.48 7.12 7.33 7.28 7.41 7.22 7.37 7.30 7.29. 5) In words, explain the difference between the standard deviation and the standard error. Water Safety Detection The Environmental Protection Agency regulates the amount of mercury that is allowed in drinking water, suggesting that more than 1 ppm (part per million) in water poses a health hazard that can lead to birth defects, and to brain damage in adults. A student measures mercury concentration in water samples from 20 tap water sources in Irvine to see whether Irvine water exceeds EPA standards. She obtained the following values: 1.1 1.23 1.07 0.98 0.94 1.29 1.04 0.99 1.34 1.05 1.18 1.2 1.08 1.01 0.94 1.01 0.85 0.99 1.05 1.13 Calculate the sample mean, sample standard deviation, and sample standard error for the mercury data: (Use Excel Spreadsheet, with these 20 numbers to calculate the following:) x ________ s = _______ x = ___________ Based on these numbers, calculate standard error intervals for the following: (Note: these interval ranges are the same as error bars you graphed on HW 2 and they correspond to ± 1, 2, and 3 standard errors. Provide your answers accurate to two decimal places.) 1) You can expect that if this experiment were repeated, 68.2% of the time repeated sample means in ppm would lie between _____ and __ ___ (+1 and –1 standard error, applies to sample means of data sets.) However, if you only had enough time or resources to gather a single data point, your 68.2% expected interval, determined by standard deviation, would lie between _____ and _____ (+1 and –1 standard deviation, applies to single data points. Assume a true population mean = to your sample mean calculated above.). 2) 95.5% of the sample means would lie between __________ and _______ (+2 and –2 standard errors, applies to sample means of data sets.) However, if you only had enough time or resources to gather a single data point, your 68.2% expected interval, determined by standard deviation, would lie between _____ and __ ___ (+2 and –2 standard deviation, applies to single data points). Page 2 of 4 Research Methods Homework 4: Statistics 3) and 99.7% of the time the sample means would lie between ______ and ______ (+3 and –3 standard errors, applies to sample means of data sets.) However, if you only had enough time or resources to gather a single data point, your 68.2% expected interval, determined by standard deviation, would lie between _____ and __ ___ (+3 and –3 standard deviation, applies to single data points). For the following questions, you should consider water safe when the standard error interval includes water considered safe by the EPA. 4) based on the 95.5% standard error interval you calculated, is Irvine water safe? 5) based on the 99.7% standard error interval you calculated, is Irvine water safe? 6) the student could come to a wrong conclusion in two ways: Fill in two boxes below with either Type II error (False Negative) or Type I error (False Positive) True status of Irvine water: Student decides water is okay Water really is safe Water really is unsafe Student decides water is dangerous 7) Which of these types of error is a bigger problem? Why? 8) Based on your answer to part (7), what standard error interval do you feel would be more appropriate? Explain your logic. [Note: consumer advocates and city budget officers might have different opinions on these last two questions!]. What do you think about the appropriateness of using the Standard Deviation for a single data point instead? A study of Vitamin C Consider the question of whether taking vitamin C supplements helps to prevent colds. Starting in July, researchers assign 2000 volunteers to take a placebo pill with no vitamin C in it, and 2000 other people to take 500 mg of vitamin C. At the end of the main cold season (autumn and early winter), they survey the volunteers as to whether or not they had a cold that year. The results look like this: Page 3 of 4 Research Methods Homework 4: Statistics Vitamin C No vitamin C Number of people who got a cold 1200 1250 Number of people who did not get a cold 800 750 Now calculate the fraction of people with colds in each group. Fill in the table: Fraction of people p who got a cold Vitamin C No vitamin C Consulting the examples on page 55 for hints, calculate the 95% standard error probability interval for the fraction of people getting colds in each group: (Hint: According to eq 3.17, f (1 f ) ) Upper 95% limit Vitamin C No vitamin C Lower 95% limit (+2 standard error) (+2 standard error) (-2 standard error) (-2 standard error) Comparing the confidence intervals for the two groups, what conclusion do you reach about whether vitamin C helped prevent colds? Explain your logic. Page 4 of 4