Homework 4

advertisement
Research Methods
Homework 4: Statistics
When submitting this homework assignment to the dropbox “Homework 4” folder, please put the
three exercises in one file and name the file using the following format: HW4_your_name.docx
(e.g. HW4_Sed_Keller.docx).
Assignment
Purpose: This assignment will familiarize you with common statistics used in the scientific
studies. As scientists, we require a basic understanding of statistics in order to draw conclusions
from our data. Here you will be introduced to mean, standard deviation and standard sample
error. In later assignments more advanced statistics will be taught. You will want to consider
using these in your inquiry 2 data analysis.
The Latest Diet
Dr. Dangane has a new idea for a diet, based on the following medical logic: By definition, 1
calorie is the heat it takes to raise 1 gram of water by 1 degree. If you drink 1 liter of ice water,
you are warming up 1000 grams by 37 degrees, burning 37,000 calories. Since the average
person eats only 2000 to 2500 Calories per day, Dr. Dangane figures all his patients need to do
is drink a liter of ice water, thereby burning more than 10 times their daily intake. To test this, he
gets 2 groups of 30 volunteers who have provided signed consent forms and their current
weight. Group A is asked to drink 1 extra liter of warm water at the end of each day. Group B is
asked to drink 1 extra liter of ice water each day. After a month, Dr. Dangane weighs each
volunteer again, and calculates the weight difference. On average, group A (warm water)
weights declined by 1.24 pounds, group B declined by an average of 3.67 pounds.
Do these data justify Dr. Dangane’s belief that the ice water diet will be the next big health
revolution? That is, how much can one legitimately conclude given this information and no
more? No calculations: just comment on the relation between data and conclusions. [For extra
credit: could this diet really work? If not, what is wrong with it?]
Mean and Standard Deviation
Please write down formulas for the following two values:
N
xi
Sample mean: x 
N
i 1
1 N
 xi  x
Sample variance: s 
N 1 i1
2



2
Now, answer the following questions:
1) The
 variance is a type of average. What is being averaged?
 you need to square a term in the equation for variance?
2) Why do
3) Taking the square of that term is just one solution to the problem you described in
 2008 The University of Texas at Austin
Research Methods
Homework 4: Statistics
answering part (2). Can you suggest a different mathematical operation that would solve
the same problem?
4) The standard deviation is defined to be the square root of the variance. Explain why it is
conventional to take the square root, using the following set of measurements of pH in
Waller Creek to illustrate your logic: pH measurements: 7.34 7.48 7.12 7.33 7.28 7.41
7.22 7.37 7.30 7.29.
5)
In words, explain the difference between the standard deviation and the standard error.
Water Safety Detection
The Environmental Protection Agency regulates the amount of mercury that is allowed in
drinking water, suggesting that more than 1 ppm (part per million) in water poses a health
hazard that can lead to birth defects, and to brain damage in adults. A student measures
mercury concentration in water samples from 20 tap water sources in Irvine to see whether
Irvine water exceeds EPA standards. She obtained the following values:
1.1 1.23 1.07 0.98 0.94 1.29 1.04 0.99 1.34 1.05
1.18 1.2 1.08 1.01 0.94 1.01 0.85 0.99 1.05 1.13
Calculate the sample mean, sample standard deviation, and sample standard error for the
mercury data: (Use Excel Spreadsheet, with these 20 numbers to calculate the following:)
x  ________

s = _______
x = ___________
Based on these numbers, calculate standard error intervals for the following:
(Note: these interval ranges are the same as error bars you graphed on HW 2 and they
correspond to ± 1, 2, and 3 standard errors. Provide your answers accurate to two decimal
places.)
1) You can expect that if this experiment were repeated, 68.2% of the time repeated
sample means in ppm would lie between _____ and __ ___ (+1 and –1 standard error,
applies to sample means of data sets.)
However, if you only had enough time or resources to gather a single data point, your
68.2% expected interval, determined by standard deviation, would lie between _____
and _____ (+1 and –1 standard deviation, applies to single data points. Assume a true
population mean = to your sample mean calculated above.).
2) 95.5% of the sample means would lie between __________ and _______ (+2 and –2
standard errors, applies to sample means of data sets.)
However, if you only had enough time or resources to gather a single data point, your
68.2% expected interval, determined by standard deviation, would lie between _____
and __ ___ (+2 and –2 standard deviation, applies to single data points).
Page 2 of 4
Research Methods
Homework 4: Statistics
3) and 99.7% of the time the sample means would lie between ______ and ______ (+3
and –3 standard errors, applies to sample means of data sets.)
However, if you only had enough time or resources to gather a single data point, your
68.2% expected interval, determined by standard deviation, would lie between _____
and __ ___ (+3 and –3 standard deviation, applies to single data points).
For the following questions, you should consider water safe when the standard error interval
includes water considered safe by the EPA.
4) based on the 95.5% standard error interval you calculated, is Irvine water safe?
5) based on the 99.7% standard error interval you calculated, is Irvine water safe?
6) the student could come to a wrong conclusion in two ways: Fill in two boxes below with
either
Type II error (False Negative)
or
Type I error (False Positive)
True status of Irvine water:
Student decides water is okay
Water really is safe
Water really is unsafe
Student decides water is
dangerous
7) Which of these types of error is a bigger problem? Why?
8) Based on your answer to part (7), what standard error interval do you feel would be
more appropriate? Explain your logic. [Note: consumer advocates and city budget
officers might have different opinions on these last two questions!]. What do you think
about the appropriateness of using the Standard Deviation for a single data point
instead?
A study of Vitamin C
Consider the question of whether taking vitamin C supplements helps to prevent colds. Starting
in July, researchers assign 2000 volunteers to take a placebo pill with no vitamin C in it, and
2000 other people to take 500 mg of vitamin C. At the end of the main cold season (autumn and
early winter), they survey the volunteers as to whether or not they had a cold that year. The
results look like this:
Page 3 of 4
Research Methods
Homework 4: Statistics
Vitamin C
No vitamin C
Number of people who got a
cold
1200
1250
Number of people who did not
get a cold
800
750
Now calculate the fraction of people with colds in each group. Fill in the table:
Fraction of people p who got a cold
Vitamin C
No vitamin C
Consulting the examples on page 55 for hints, calculate the 95% standard error probability
interval for the fraction of people getting colds in each group:
(Hint: According to eq 3.17,   f (1 f ) )
Upper 95% limit
Vitamin C
No vitamin C
Lower 95% limit
(+2 standard error)
(+2 standard error)
(-2 standard error)
(-2 standard error)
Comparing the confidence intervals for the two groups, what conclusion do you reach about
whether vitamin C helped prevent colds? Explain your logic.
Page 4 of 4
Download