Chapter 1 Statistics: The Art and Science of Learning from Data

advertisement
Chapter 1
Introduction to the
Statistical Process
Section 1.1: Introduction to Statistics
Statistics vs. Anecdotal Evidence
Smoking causes cancer.
Seat belts save lives.
Autism and Vaccines
Nelson says it wasn't long after her son
Parker's shots at 15 months that she noticed
something was wrong.
"He had run a slight fever after the
vaccinations, but i didn't think anything of it,"
said Nelson. "You know kids run fevers all the
time, but about a week after that he just
completely stopped talking."
After months of worrying, wondering, and
going back and forth with doctors, an official
diagnosis was made: autism.
Nelson believes it started with the vaccines.
"Gradually, I started piecing it together. He
got sick after his vaccinations and about a
week later everything changed. He was a
completely different little boy then," said
Nelson.
What is Statistics?

Statistics the discipline that guides us to produce or
collect data which is then analyzed in order to draw
inferences or make predictions.

Numerical summaries such as means, percentages,
and standard deviations are called statistics.
Descriptive Statistics
Descriptive Statistics refers to methods for
summarizing data. These summaries
consist of graphs (histograms, scatterplots,
pie charts, etc.) and numbers (means,
standard deviations, regression equations,
percentages, etc.).
Inferential Statistics
Inferential statistics refers to methods of making
decisions or predictions about a population or
a process, based on data obtained from a
sample. We will use tests of significance and
confidence intervals to achieve this.
This semester, we will be looking at and
conducting a number of studies
Statistical Process
1. Ask a research
question
2. Design a study
Research
Conjecture
3. Collect data
4. Explore the data
Logic of
Inference
5. Draw inferences
- Significance
- Estimation
Scope of
Inference
6. Formulate
conclusions
- Generalize
- Cause/Effect
7. Communicate
findings
Physicians’
Health Study I
1. Research Question:
Will taking aspirin help
reduce heart attacks?
2. Design Study: Started
in 1982 with 22,071 male
physicians.
•
Half took a 325mg
aspirin every other day
(the other half took a
placebo)
Physicians’ Health Study I
3. Collect Data:
 Intended to go until 1995, the aspirin study was
stopped in 1988 after 189 heart attacks occurred in
the placebo group and 104 in the aspirin group.
 Hoped to be a wonder drug, it was found there was
no benefit or harm from beta carotene. This result
allowed investigators to turn to other, more
promising agents.
Physicians’ Health Study I
4. Explore Data: 1.7% in the placebo group had heart
attacks while only 0.9% in the aspirin group had heart
attacks. (45% reduction in heart attacks for the aspirin
group)
5. Draw Inferences: The likelihood of the difference
between the proportions of heart attacks in each group
being as large as it was just by chance is very, very
small.
Physicians’ Health Study I
6. Formulate Conclusions: They concluded that taking
aspirin does reduce the likelihood of heart attacks in
middle-age and older males.
7. Report Findings:
Terminology
The individual entities on which data are recorded
are called observational units.
 The recorded characteristics of the observational
units are the variables of interest.
 What are the observational units and variables in the
Physician’s Health Study?

Section 1.2
Introduction to the
Logic of Statistical Inference
Dolphin Communication
Can dolphins communicate abstract ideas?
 In an experiment done in the 1960s, Doris was
instructed which of two buttons to push. She then
had to communicate this to Buzz (who could not see
Doris). If he picked the correct button, both dolphins
would get a reward.
 What are the observational units and variables in
this study?

Dolphin Communication
In one set of trials, Buzz chose the correct button 15
out of 16 times.
 Based on these results, do you think Buzz knew
which button to push or is he just guessing?
 How might we justify an answer?
 How might we model this situation?

Modeling Buzz and Doris
Flip Coins
 Applet

Can Chimps Solve Problems?
http://youtu.be/ySMh1mBi3cI
Exploration 1.2: Can Chimps Solve
Problems?
Sarah, a 30 year-old chimp, is shown videos of a
person struggling with some problem. (can’t reach a
banana, cage door locked, record player not working,
etc.)
 She is then shown two pictures. One of the solution
and one not.
 She then picks one of the pictures.
 Does Sarah understand the solution to these
problems or is she just randomly picking a picture?

Exploration 1.2
(pg 15)
Read the first paragraph.
1. State the research question. (This is a broad
statement.)
2. State the research conjecture. (This is more specific
to our test.)
Sarah correctly picked 7 of the 8 pictures. Is this
unlikely if she is just guessing?
 Continue working on the exploration.

Section 1.3
Statistical Significance:
Other Random Choice Models
Can dogs sniff out cancer?
Marine sniffing samples
Can Dogs Sniff Out Cancer?
1. Research Question: Can dogs detect a
patient with cancer by smelling their breath?
 2. Design a study: Five breath bags were
shown to Marine, one from a cancer patient
and four from non-cancer patients.
 3. Collect data: Marine completed 33
attempts at this procedure.
 4. Explore the data: Marine identified the
correct bag 30 out of 33 times.

Can Dogs Sniff Out Cancer?
How is the chance model we will use for
this situation different than our previous
ones?
 Can we use coins again?

Can Dogs Sniff Out Cancer?

5. Draw Inferences
Three S Strategy
 Statistic: Compute the statistic from the observed data.
 Simulate: Identify a model that represents a chance
explanation. Use the model to simulate data that “could
have happened” when the chance model is true.
Calculate the value of the statistic from the could-havebeen data. Repeat the simulation process to generate a
distribution of the could-have-been values for the
statistic.
 Strength of evidence: Consider whether the value of
the observed statistic is unlikely to occur when the
chance model is true.
Can Dogs Sniff Out Cancer?
We have the statistic. Marine made the
correct identification 30 out of 33 times.
 How could we set up a simulation?




Tactile (how could this be done?)
Applet
Strength of evidence. Is 30 out of 33
very unlikely under the chance model?
Can Dogs Sniff Out Cancer?

6: Formulate conclusions:



Can we conclude that marine can identify
cancerous breath?
Can we conclude that all dogs can do this?
Some dogs?
7: Communicate findings:
Marine, the dog that can
sniff out bowel cancer
By Jeremy Laurance, Health Editor
A labrador retriever called Marine
has been trained to sniff out
cancer with stunning accuracy,
researchers report today.
Terminology: Hypotheses
The null hypothesis is the chance
explanation.
 Typically the alternative hypothesis is
what the researchers think is true.

Null hypothesis: Marine is randomly
choosing which bag to sit next to.
 Alternative hypothesis: Marine is not
randomly choosing which bag to sit next
to.

Terminology: Null Distribution
We will refer to the distribution of chance
outcomes as the null distribution.
 For Marine, we should have gotten a null
distribution similar to the following.

Terminology: P-value
The p-value as the proportion of
outcomes in the null distribution that are
at least as extreme as the value of the
statistic actually observed in the study.
 What was our p-value for Marine?



Were they all the same?
Were they all close to the same?
Guidelines for evaluating strength of
evidence from p-values
p-value >0.10, not much evidence against
null hypothesis
 0.05 < p-value < 0.10, moderate evidence
against the null hypothesis
 0.01 < p-value < 0.05, strong evidence
against the null hypothesis
 0.001 < p-value < 0.01, very strong
evidence against the null hypothesis
 p-value < 0.001, extremely strong
evidence against the null hypothesis

Terminology: Statistically Significant
If the observed results provide strong
evidence that the data did not arise by
random chance alone then the research
result is called statistically significant.
 Are Marine’s results statistically
significant?

Let’s play some rock-paper-scissors
Rock smashes
scissors
 Paper covers rock
 Scissors cut paper


Play the novice
version at least 30
times and keep
track of all your
choices.
Activity 1.4

Now work on activity 1.4.
Criminal Justice System vs.
Significance Tests
Innocent until proven guilty. We assume
a defendant is innocent and the
prosecution has to collect evidence to try
to prove the defendant is guilty.
 Likewise, we assume our chance model
(or null hypothesis) is true and we collect
data and calculate a sample proportion.
We then show how unlikely our proportion
is if the chance model is true.

Criminal Justice System vs.
Significance Tests


If the prosecution shows lots of evidence that go
against this assumption of innocence (DNA,
witnesses, motive, contradictory story, etc.) then
the jury concludes that the defendant the
innocence assumptions is wrong.
If after we collect data and find that the
likelihood (p-value) of such a proportion is so
small that it would rarely occur by chance if the
null hypothesis is true, then we conclude our
assumption of the chance model being true is
wrong.
Review
For Sarah the chimp, you could have gotten a
null distribution similar to the one shown here.
• What does a single dot represent?
• What does the whole distribution represent?
• What is the p-value for this simulation?
• What does this p-value mean?
More Review
The null hypothesis is the chance
explanation.
 Typically the alternative hypothesis is
what the researchers think is true.
 Three S Strategy



Statistic, Simulate, Strength of evidence
The p-value as the proportion of
outcomes in the null distribution that are
at least as extreme as the value of the
statistic actually observed in the study.
Still More Review
A small p-value gives evidence against the
null and for the alternative.
 If the observed results provide strong
evidence that the data did not arise by
random chance alone then the research
result is called statistically significant.

Section 1.4
Other Chance Models
Ron Artest,
choker at the line?
In the 2009-10 basketball
Season Ron Artest made
68.8% of his free throws, similar to his
career average.
 In his first 15 attempts in the playoffs, he
only made 7 free throws. (46.7%)
 Is this evidence that he is “choking” and
performing significantly worse than during
the regular season?

Ron Artest Example

What are the observational units?
Artest’s 15 free throw attempts.

What is the variable?
Whether or not he makes the free throw.

What is the statistic of interest?
7/15
Notation
Our sample proportion (statistic) can be
described using the symbol 𝑝 (p-hat).
 A parameter is a numerical summary of a
variable that is either an unobservable
long-run outcome or a value for an entire
population. It can be described using the
symbol 𝜋 (pi).
 In our example, 𝜋 = 0.688 and 𝑝 = 0.467.

Hypotheses


Null hypothesis: Ron Artest’s performance
at the free throw line during the 2010 NBA
finals is the same as his regular season
performance; his probability of making a
basket in the playoffs is 0.688.
Alternative hypothesis: Ron Artest’s
performance at the free throw line during the
2010 NBA finals is worse than his regular
season performance; his probability of
making a basket in the playoffs is less than
0.688.
Simulated Chance Model
Coins, cards, dice, spinners, etc. don’t
really work well here to develop a chance
model of a 68.8% success rate.
 But we can still use the magic of an
applet. (While this will be a different
applet than the first two we used, it is
essentially the same.)

Ron Artest Continued
So we have moderate evidence against
the null.
 Let’s see what would happen if we had
more data.
 Suppose he continued to shoot 46.7%
from the free throw line so that he made 7
out of 15 of his next attempts as well for a
total of 14 out of 30. Let’s return to the
applet to see how our p-value would
change.

Ron Artest Continued
As the sample size increases, there is less
variability in our null distribution.
 It is still centered around 0.688, but its
width becomes more and more narrow.
 As a result, 0.467 gets further and further
out in the tail and thus the p-value gets
smaller.
 This should make intuitive sense in that
with a larger sample size, we have more
evidence.

Ron Artest Continued

Besides a larger sample size, how else
could we get more evidence against the
null?
Artest could make fewer shots.
Is that what really happened?
 No. Artest made 4 of his next 5 shots for
a total of 11 out of 20 (55%) for the
playoffs.
 Let’s return to the applet and see how this
changes our p-value.

Exploration 1.4
Shaky Putting?
Phil Mickelson is one of the best golfers in
the world. He’s won the Masters
Tournament three times.
 However, 2011 was not his best year. He
seemed to struggle with his putting and
switched to a “belly putter” late in the
year.

Exploration 1.4
Was Mickelson a poor putter in 2011?
 In this exploration, you will compare
Mickelson’s 2011 record of putting from
10 feet away from the hole with that of all
other professional golfers that year.
 Was he significantly worse than his peers?

Section 1.5
Modeling More
Complex Situations
Infant preference for helper or hinderer?
Helper Toy
Baby chooses a toy
Helper or Hinderer?
Sixteen babies were shown the two
demonstrations. One helper toy and one
hinderer toy. Which toy used and the
order was random.
 When presented with the two toys
(randomly which was to the left and which
to the right) 14 of the babies chose the
helper toy.
 How is this experiment different than any
we have looked at so far?

Helper or Hinderer?
The key difference is that each attempt
was made by a different baby.
 Our chance model implies that each baby
has the same chance of choosing the
helper toy (50%).
 It could be that some babies randomly
choose and some do not. We will talk
about this in our conclusion.
 Let’s run the test.

Helper or Hinderer?



Null Hypothesis: Each baby is randomly
choosing one of two toys. (The babies choose
the helper toy 50% of the time in the long
run.)
Alternative Hypothesis: The babies are not
randomly choosing, but show a preference for
the helper toy. (The babies choose the helper
toy more than 50% of the time in the long
run.)
We can use any applet to test this. Remember
that our sample proportion is 14 out of 16.
Helper or Hinderer?
So what can we conclude?
 Do all the babies prefer the helper toy?
 Do some of the babies prefer the helper
toy?
 Because we had a low p-value, we can
conclude that not all the babies are
randomly choosing and that at least some
of them prefer the helper toy.
 Can we make conclusions beyond these 16
babies?

Which Tire?
Two students miss a chemistry exam
because of excessive partying, but blame
their absence on a flat tire.
 The professor allowed them to take a
make-up exam, and he sent them to
separate rooms to take it.
 The first question, worth 5 points, was
quite easy.
 The second question, worth 95 points,
asked: Which tire was flat?

Which Tire?

How would you answer this question?
Driver’s
side front
Passenger’s
side front
Driver’s
side rear
Passenger’s
side rear
Exploration 1.5: Tire Story Falls Flat
We will use the data from class to
determine if students have a preference
for picking one of the four tires.
 This is similar to the helper-hinderer
example because our observational units
are different people.
 Let’s work exploration 1.5 (page 50).

Download