Statistical Hypothesis Testing - Analytica Wiki

advertisement
Statistical Hypothesis Testing
(8th Session in “Gentle Introduction to Modeling Uncertainty”)
Lonnie Chrisman, Ph.D.
Lumina Decision Systems
Analytica User Group
15 July 2010
Copyright © 2010 Lumina Decision Systems, Inc.
Scope of Today’s Webinar
Included:
• Conceptual underpinnings of classical hypothesis
testing.
• Interpretation of statistical significance (p-values).
• General methodology for applying it in any scenario.
Intended to promote conceptual understanding.
Building on Monte Carlo tools.
Not included:
• Standard canned hypothesis tests (like t-tests, etc)
Copyright © 2010 Lumina Decision Systems, Inc.
Outline
•
•
•
•
•
•
•
•
•
Motivating example
Statistical significance
The Statistic
Methodology
Modeling the Null hypothesis
Computing the pValue
Interpretation of results
Drawbacks of methodology
Additional exercise
Copyright © 2010 Lumina Decision Systems, Inc.
Does Stock Market Volatility
Vary with Day of Week?
• Random selected 100 trading days (from 2000-2010).
• Computed day change (close-open)/open for S&P 500 index.
Day of week # samples
Volatility
Mon
20
19.4%
Tue
20
11.9%
Wed
20
21.5%
Thu
20
20.1%
Fri
20
14.3%
Side note:
Annualized volatility :=
SDeviation * sqrt(T)
where
T = # trading days/yr = 250
Total volatility: 18.1%
• Alice: “This shows that the market volatility does
depend on the day of the week.”
• Bob: “No, the variation is just due to random sampling
variation.”
Copyright © 2010 Lumina Decision Systems, Inc.
Download Model with S&P Data
• Please download:
“Hypothesis Test S&P Volatility.ana”
the download link is at the bottom of
talk abstract on Analytica Wiki.
• You’ll use this data for exercises…
Copyright © 2010 Lumina Decision Systems, Inc.
Statistical Significance
Day of week
# samples
Observed
Volatility
Mon
20
19.4%
Tue
20
11.9%
Wed
20
21.5%
Thu
20
20.1%
Fri
20
14.3%
• Alice: “This shows that the market volatility depends on the
day of the week.”
• Alice’s mission: To show that this observed variation is
unlikely if it is just due to random sampling variation.
• Null Hypothesis: The “true” underlying volatility is the same
for every day of the week.
• Level of significance: The probability that this much
variation in volatility would be observed if the Null
Hypothesis is true. (termed the “p-value”)
Copyright © 2010 Lumina Decision Systems, Inc.
Statistical Significance #2
Day of week
# samples
Observed
Volatility
Mon
20
19.4%
Tue
20
11.9%
Wed
20
21.5%
Thu
20
20.1%
Fri
20
14.3%
• After her statistical analysis, Alice might say:
“This shows at a significance level p=3% that market
volatility varies with the day of the week.”
• By convention, p ≤ 5% is usually considered to be “statistically
significant”. p>5% is said to be “not statistically significant”.
• What can you conclude if the p-value turns out to be 20%?
Copyright © 2010 Lumina Decision Systems, Inc.
The “Statistic”
Day
# samples
Observed Volatility (vol)
Mon
20
19.4%
Tue
20
11.9%
Wed
20
21.5%
Thu
20
20.1%
Fri
20
14.3%
Total volatility: 18.1%
• We need a scalar metric to summarize degree of
conflict with Null-hypothesis (H0).
Smaller value more consistent with H0
Larger value greater disagreement with H0
• Examples:
Max(vol,day) – Min(vol,day)
SDeviation(vol,day)
F = Variance(vol,day) / Total_volatility^2
• Exercise: Pick a statistic and compute its value for the
S&P 500 dataset in your Analytica model.
Copyright © 2010 Lumina Decision Systems, Inc.
Methodology
Model of Null
Hypothesis
Simulated
Dataset
Statistic on
simulated
pValue
Measured
dataset
Statistic on
measured
• Construct a model that simulates measurements
given that the null-hypothesis is true.
Typically makes various assumptions.
• Use Monte Carlo simulation to produce several
simulated data sets. Apply the statistic to each.
• pValue: Pr( Statsim ≥ Statmeas )
Copyright © 2010 Lumina Decision Systems, Inc.
Modeling the Null Hypothesis
Day
# samples
Observed Volatility (vol)
Mon
20
19.4%
Tue
20
11.9%
Wed
20
21.5%
Thu
20
20.1%
Fri
20
14.3%
Total volatility: 18.1%
• Null Hypothesis: The volatility is 18.1% on every day of the
week.
• How could you simulate the data?
(Hint: There are multiple possible approaches)
What assumptions are you making?
• Some ideas:
Randomly generate each day’s price change from a LogNormal
distribution.
Shuffle existing data.
• Exercise: Implement a model of the null-hypothesis in your
Analytica model. (One random dataset for each item in Run)
Copyright © 2010 Lumina Decision Systems, Inc.
Computing Statistic on
Simulated
• Exercise: Apply your statistic to each
simulated dataset.
Note: Larger statistic values occur when the
variation in volatility by day is largest.
• Exercise: What fraction of simulated
datasets have a larger statistic value than
the actual data?
This is the p-value
Is Alice’s hypothesis statistically significant?
Copyright © 2010 Lumina Decision Systems, Inc.
Common Misuse of Paradigm:
Multiple Hypotheses
• Scenario:
Alice identifies 20 other plausible hypotheses to
test, e.g.:
Volatility on Tues is different than the other 4 days.
Volatility varies my month.
September has a higher volatility than other months.
…
She tests each of these individually and finds one
of them to be statistically significant at a 5%
level.
She publishes this result.
• What’s wrong here?
• What should she do differently?
Copyright © 2010 Lumina Decision Systems, Inc.
Interpreting p-Value
• Small value (< 5%)
Accept main hypothesis
Data is inconsistent with Null-hypothesis
• Otherwise (p > 5%)
Conclude only that data sample was too small to
detect relationship.
Hypothesis may still be true or false:
“Larger research study required”
• P-value is not:
A measure of the strength of relationship.
The probability that the hypothesis is true.
Copyright © 2010 Lumina Decision Systems, Inc.
Drawbacks with Statistical
Hypothesis Testing Paradigm
• 1 in 20 false hypotheses are accepted (at 5% significance
level).
Often abused by people testing many hypotheses.
• Nearly any hypothesis is confirmed with a large enough
sample.
Most hypotheses will have at least a miniscule “true” effect.
With enough data, even the most miniscule effect becomes
statistically significant.
• The “uncertainty” about the hypothesis is not available.
Doesn’t provide P(H), which would be useful in model that use
the results.
• Numerous subjective components that are not recognized or
reported explicitly.
• “Cookbook tests” are very often misapplied when assumptions
don’t hold, leading to greater confidence than is warranted by
the data.
Copyright © 2010 Lumina Decision Systems, Inc.
New Exercise
Number of subjects: (purely fictional data)
Parkinson’s
No Parkinson’s
Not exposed
10
140
Exposed to TCE
4
25
• Hypothesis: TCE exposure is associated with an
increased risk of getting Parkinson’s disease.
• Null Hypothesis:
Parkinson’s rates are the same among those exposed
and not exposed to TCE.
• Exercise:
Identify an appropriate statistic.
Model the null-hypothesis
Compute the p-Value
Copyright © 2010 Lumina Decision Systems, Inc.
Summary
• Statistical Hypothesis Testing tests:
Is the support for a hypothesis statistically significant given
a dataset.
• Significance level (p-value) is:
Probability of seeing data at least as extreme as the actual
data when the Null hypothesis is true.
• p-value <= 5%  accept hypothesis
p-value > 5%  conclude nothing,
need more data.
• Methodology:
Identify statistic (scalar metric): A measure of divergence
from null-hypothesis.
Build model of null-hypothesis to “simulate” data sets.
Compute p-value.
Copyright © 2010 Lumina Decision Systems, Inc.
Download