topic_1_notes_statistics

advertisement
Topic 1: Statistical Analysis
1.1.1:State that error bars are a graphical representation of the variability of data
1.1.2: Calculate the mean and standard deviation of a set of values
Table 1: Number of caterpillars seen on rose bush samples
Garden A (red roses only) & Garden B (yellow & white roses)
Use your calculator to check the means and standard deviations in this table. Are they correct?



Which mean is more likely to be closest to the actual mean of the population?
Which garden has the most variation in caterpillar number per bush? Suggest a reason for the
variation.
Do you think Garden B has more caterpillars per rose bush than Garden A? What test would you
use answer this question with confidence?
1

Does the t-test result prove that there is no difference between the two gardens for the number of
caterpillars per bush?

If the sample sizes were bigger is it possible that a t-test might find a significant difference
between gardens in the average number of caterpillars per bush?
1.1.3: State that the standard deviation is used to summarize the spread of values around the mean
State that 68% of the values fall within one standard deviation of the mean of normally distributed data.
State that 95% of the values fall within two standard deviation of the mean of normally distributed data.
1.1.4: Explain how the standard deviation is useful for comparing the means and the spread of data between two or
more samples.






When we collect a sample and then calculate the sample mean, we end up with an estimate of the true
mean; larger samples produce more reliable estimates of the true mean than smaller samples.
The mean is often quoted along with the standard deviation; the standard deviation (σ) is a measure of
how widely spread the data points are from the mean.
A small standard deviation tells us that the data are clustered tightly around the mean whereas a large
standard deviation tells us the values are dispersed more widely.
In a large and normally distributed data set, about 68% of the data points will fall within ± 1 standard
deviation of the mean; and about 95% of the data points will fall within ± 2 standard deviations of the
mean.
When means are plotted on a scatter graph it is common practice to add an error bar to each point. Error
bars are a graphical representation of the variability of data. A smaller error bar indicates a higher
confidence in the mean, and a larger error bar indicates a smaller confidence in the mean. Therefore,
graphs with small error bars provide more reliable information than graphs with large error bars.
The standard deviation is very useful for comparing the means of two data sets using a t-test. The t-test
allows us to conclude - with a known probability of being correct - if two means are significantly
different from one another.
2
1.1.5: Deduce the significance of the difference between two sets of data using calculated values for t
and the appropriate tables.
NOTE: values for t will be given to you in an exam, however this is how you can work it out on your calculator
3
1.1.6: Explain that the existence of a correlation does not establish that there is a causal relationship between two
variables.



When a variable Q is correlated with a variable R, we can not say that one is the cause of the other:
hence the expression, “correlation ≠ causation”. The reasons are:
Two correlated variables can be completely unrelated. e.g. Hitler & Stalin both had moustaches and they
both committed genocide. So there is a 'spurious' correlation between moustaches and genocide.
Two variables may be correlated due to a third variable. e.g. There is a correlation between cancer and
coughing. But coughing isn't a cause of cancer; cancer and coughing are both caused by a 3rd variable
(cigarette-smoking).
Extended reading:
Example 1: Politicians often correlate positive changes with their own initiatives. For example, when
crime rates drop in a particular city it is common for the city mayor to link the reduction of crime with
changes that she introduced; changes such as increased funding for the police force. Of course there are
numerous other variables that can be linked to a drop in crime besides the ones that politicians like to
take credit for; which the politicians will skillfully avoid mentioning. Examples include: national-level
changes in gun regulations; economic improvements at the state-level; teenage pregnancy rates 15-20
years before the mayor took office; improvements in forensics such as DNA testing and brain-imaging;
reductions in prosecutions and/or lower conviction rates of the prosecuted.
Example 2: Religious fundamentalists sometimes confront atheists and vice versa. One recent example
of this occurred on Fox Network when a talk-show host, Bill O’Reilly, interviewed Richard Dawkins, a
renowned evolutionary biologist and Atheist . O’Reilly suggested a link between the atheism of Hitler
and Stalin and their terrible acts of genocide. Dawkin’s response to this spurious correlation was, “Hitler
and Stalin had moustaches too”.
4
Download