P-Value

advertisement
CSE 5331/7331
Fall 2011
P-Value and Statistical
Significance
Margaret H. Dunham
Department of Computer Science and Engineering
Southern Methodist University
CSE 5331/7331 F‘11
.
1
Outline
Overview
 P-Value
 Statistical Significance
 Test Statistics
 Examples

CSE 5331/7331 F‘11
2
Outline
 Overview
 P-Value
Statistical Significance
 Test Statistics
 Examples

CSE 5331/7331 F‘11
3
Data Mining - Remember




When we analyze data we may only be
dealing with a sample of the complete data
(which may be infinite)
We often want to generalize our findings to
the entire data population
Calculation of a P-value can help us to
determine how likely our results are to apply
to the entire population
How do we do this?
CSE 5331/7331 F‘11
4
Normal Distribution
Will G. Hopkins, “A New View of Statistics; P Values and Statistical Significance,”
2002, http://sportsci.org/resource/stats/pvalues.html
CSE 5331/7331 F‘11
5
But what are we comparing?


Two different results:
– Employment rate for MSCS vs MBA
– Scores on a standardized test between
male and female or USA and another
country.
– Classification by two different classifiers
Sometimes a correlation coefficient or a test
statistic whose distribution is known (such as
chi squared) is examined.
CSE 5331/7331 F‘11
6
Correlation Coefficient

When we look at Statistical Significance we will be
comparing two values. The figure below shows the
probability coefficient values in a sample of size 20
when the correlation is 0 (that is no correlation).*
* Will G. Hopkins, “A New View of Statistics,” 2002,
http://www.sportsci.org/resource/stats/pvalues.html
CSE 5331/7331 F‘11
7
Statistics Assumptions



Sample data is representative of the entire
population
Sample data is randomly chosen
These assumptions often (usually) do not
hold for a given real world sample (training
data).
– Real world data format is unknown.
– May wish to extrapolate to a larger population
CSE 5331/7331 F‘11
8
Hypotheses
 Alternative Hypothesis (HA): This is the relationship
between the variables that you expect (hope) your
experiments will demonstrate.
 Null Hypothesis (H0).
This is just the opposite
relationship. There is no relationship between the
variables.
 In significance testing we really determine whether
we should reject the null hypothesis.
CSE 5331/7331 F‘11
9
Outline

Overview
 P-Value
 Statistical
Significance
Test Statistics
 Examples

CSE 5331/7331 F‘11
10
P-Value
The probability that a variable has a
value greater than the observed value
 http://en.wikipedia.org/wiki/P-value
 http://sportsci.org/resource/stats/pvalue
s.html

CSE 5331/7331 F‘11
11
P-Value



“The probability that a variate would assume
a value greater than or equal to the observed
value strictly by chance.”*
“the probability of obtaining a test statistic at
least as extreme as the one that was actually
observed, assuming that the null hypothesis
is true.”**
Smaller the value the better (You want to
reject the Null Hypothesis.
* Weisstein, Eric W. "P-Value." From MathWorld--A Wolfram Web Resource.
http://mathworld.wolfram.com/P-Value.html , 9/21/11.
** Wikipedia, “P-value”, http://en.wikipedia.org/wiki/P-value, 9/19/11.
CSE 5331/7331 F‘11
/12
Finding P-value



May be able to calculate P-value directly or
you may have to convert the data (by using a
test statistic) into a value that can be used.
Example: Find correlation between two
variables. The correlation coefficient can be
used directly if we know its distribution.
However, in some cases a new statistic is
used to provide that P-value.
CSE 5331/7331 F‘11
13
Another Way to Look at It



P-value is a measure of how much evidence
you have against the Null Hypothesis.
Null Hypothesis: Hypothesis of no change
Critical Regions – values of statistics for
which, if they occur you will reject the Null
Hypothesis
CSE 5331/7331 F‘11
14
Confidence Intervals




Range of values that contains the true
parameter.
Size depends on sample size and variance.
Larger sample size, smaller interval
If the variance is large, larger interval.
CSE 5331/7331 F‘11
15
P-Value and Confidence Intervals





Confidence Intervals and P-Values are related.
However, 95% confidence interval
is not the same thing as P=0.05
It is the same as a P-value of 0.05
“that doesn’t overlap zero.” [3]
“If the 95% CI includes no
95% confidence intervals
difference between groups, then
the P values is > 0.05.” [1]
“If the 95% CI does not include
* Will G. Hopkins, “A New View of Statistics,” 2002,
no difference between groups
http://www.sportsci.org/resource/stats/pvalues.html
then the P value is < 0.05.” [1]
CSE 5331/7331 F‘11
16
Outline
Overview
 P-Value

 Statistical
Significance
Test Statistics
 Examples

CSE 5331/7331 F‘11
17
So …


So, if we obtain a data value from a
population distribution that is normal AND we
know that it occurs in one of the 0.2% wings
of the distribution, than we should be
somewhat convinced that that value is not as
likely (although still possible) to have seen as
the more commonly occurring data values.
How sure are you that the results you’ve
found in the experiments are actually true???
CSE 5331/7331 F‘11
18
Statistical Significance






Way to assign a confidence value to a finding.
Probability that a finding is true in the general
population and not a fluke (not due to chance or
random).
A significance level is a measure as to how likely the
result is due to chance.
P-value is a test or measure of statistical significance.
What is probability that relationship between two
variables exists?
What is the probability that this relationship (our
experiments seem to indicate exists) is due only to
random chance?
CSE 5331/7331 F‘11
19
Using Significance Test








State Alternative Hypothesis
State Null Hypothesis.
Perform research
Identify statistic and its distribution
Decide on alpha threshold
Calculate statistic
Calculate P-Value
Compare P-Value to threshold
– If lower, probability is small that the result was by
chance therefore the finding is significant
– If higher, then finding is not significant
CSE 5331/7331 F‘11
20
Probability of Error





Alpha Level (Threshold)
Threshold value to make decision
Rule of thumb: 0.05
Since you hope to reject the Null Hypothesis,
then you hope you’ll find p < 0.05.
This means you are beyond (outside in the
distribution) the Alpha level and your chance
of a Type I error is acceptable.
CSE 5331/7331 F‘11
21
Significance Levels





Rule of thumb: 0.95 – The result has a 95% chance
of being true.
However, results are being stated in terms of not
being true. (i.e. P-value = 1 – Level )So then a 0.05
value is good.
So p<0.005 indicates a significant result.
The smaller the p value the better. P=0.001 is better
than p=0.05.
If we know what the distribution of the statistic is,
then we can estimate the extreme areas outside
(furthest from the mean). This can tell us the
significance
CSE 5331/7331 F‘11
22
Sample P-Value Levels


Suppose your threshold is P=0.05
P-Value
Wording
> 0.05
Not Significant
0.01 to 0.05
Significant
0.001 to 0.01
Very Significant
< 0.001
Extremely Significant
Table from GraphPad, “Interpreting statistical significance,”
1999, http://www.graphpad.com/articles/interpret/principles/stat_sig.htm .
CSE 5331/7331 F‘11
23
DM &Significance Tests


In MOST data mining experiments we want to
determine if the results we found in the sample can
be generalized to the general population.
So the two variables being compared are the one
found in the sample and the one that should be found
in the population (if we could even get the entire
population set).
CSE 5331/7331 F‘11
24
Two-Tailed vs One-Tailed


Whether the test assumes both sides of the
statistics distribution
If direction of difference or relationship is
important than a one-tailed probability is
include – otherwise two-tailed.
Will G. Hopkins, “A New View of Statistics; P
Values and Statistical Significance,” 2002,
http://sportsci.org/resource/stats/pvalues.html
CSE 5331/7331 F‘11
25
Types of Errors

Type I: Conclude relationship exists when in fact it
doesn’t and evidence shows that it doesn’t.
– Null hypothesis should be accepted.
– Alpha error

Type II: Conclude relationship does not exist when in
fact it does and the evidence shows it does.
– Null hypothesis should be rejected.
– Beta error

To be on safe side, try to minimize Type I errors.
– Use high Alpha probability
– Alpha level is the probability of making an error that you are
willing to accept.
– Often Alpha=0.05 or even 0.01.
CSE 5331/7331 F‘11
26
Outline
Overview
 P-Value
 Statistical Significance

 Test Statistics
 Examples
CSE 5331/7331 F‘11
27
Test Statistics
So how do we calculate P-Value?
 We need to convert the raw data
generated by experiments into a single
value which we know follows some
known frequency distribution.
 We can then calculate the P-Value
based on that
 These are called Test Statistics

CSE 5331/7331 F‘11
28
Standard Test Statistics
Common Test Statistics
http://en.wikipedia.org/wiki/Statistical_hypothesis_testing
 T – Compares distributions of means of two groups
http://www.socialresearchmethods.net/kb/stat_t.php
http://mathworld.wolfram.com/Studentst-Distribution.html
http://www.stattools.net/tTest_Tab.php
 Conversion between
http://www.graphpad.com/quickcalcs/pvalue1.cfm

CSE 5331/7331 F‘11
29
Chi Square Test



Experiments may actually general nominal or
ordinal data.
How do you convert these values into
numbers that can be evaluated?
Often a Chi Square Test is used. Here the
collected data is summarized and then
converted into a statistic that can be used to
measure how the values compare to what
would be expected randomly.
CSE 5331/7331 F‘11
30
Chi Square Distribution
Not normal but approaches normal as
degrees of freedom (number of
indepdent variables) increases.
http://stattrek.com/lesson3/chisquare.aspx

CSE 5331/7331 F‘11
31
Chi Square Test Process






Complete contingency table based on two subset
types and possible values.
Even though actual values are not numeric, when we
complete the table we then have counts (frequencies)
which are numeric and can be analyzed.
The Chi Squared Statistic is calculated by comparing
the observed frequency values to the expected
frequency values.
This statistic is what is examined for significance.
It follows a chi squared distribution.
Note: http://epm.sagepub.com/content/52/1/57.short
CSE 5331/7331 F‘11
32
Chi Squared Example




Suppose we compare the hiring rate for recent MS
CS graduates to MBA graduates. We find that 150
out of 200 MS students have a job and 100 out of
200 MBA students have a job.
Hypothesis: MSCS students are hired at a higher rate
than MBA students.
Null Hypothesis: MBA and MSCS students are hired
at the same rate.
Suppose we use p=0.05 as significance level.
CSE 5331/7331 F‘11
33
Chi Squared Example (cont’d)






MSCS
MBA
Total
Job
150
100
250
No Job
50
100
150
Total
200
200
400
250 students have jobs so 250/400=0.625
150 do not have jobs, so 150/400=0.375
We would expect these percentages to hold for each
population if the Null Hypothesis is true.
150/200=0.75 MSCS have jobs
100/200=0.5 MBA have jobs
We would expect 0.625 * 200 = 125 in each group to
have jobs
CSE 5331/7331 F‘11
34
Chi Squared Example (cont’d)

Calculate Chi Square
http://course1.winona.edu/sberg/Equation/chi-squa.gif

http://www.graphpad.com/quickcalcs/contingency1.cfm


Chi Square value is 26.67
How do we convert this to a P-Value?

http://www.statsoft.com/textbook/distribution-tables/#chi

P-Value is <0.05.
Reject Null Hypothesis
Yes it is significant


(Actually p<0.0001 so it is extremely significant)
CSE 5331/7331 F‘11
35
Outline
Overview
 P-Value
 Statistical Significance
 Test Statistics

 Examples
CSE 5331/7331 F‘11
36
Example 1 – Iris Data
J48 Default Cross Validation
 We have the Kappa statistic but can’t use it to calculate P-value
without variance.
http://twiki.org/p/pub/Main/SigurdurRunarSaemundsson/Interrater_ag
reement.Kappa_statistic.pdf

CSE 5331/7331 F‘11
37
Example 1 – Iris Data
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure ROC Area Class
0.98
0
1
0.98
0.99
0.99
Setosa
0.94
0.03
0.94
0.94
0.94
0.952 Versicolor
0.96
0.03
0.941 0.96
0.95
0.961 Verginica
Weighted Avg. 0.96
0.02
0.96
0.96
0.96
0.968
=== Confusion Matrix ===
a b c <-- classified as
49 1 0 | a = Setosa
0 47 3 | b = Versicolor
0 2 48 | c = Verginica
CSE 5331/7331 F‘11
View confusion matrix as
contingency Table and
Calculate Chi Squared
38
Example 1 (cont’d)




http://faculty.vassar.edu/lowry/newcs.html
Chi Square = 266
P < 0.0001
Extremely statistically significant
CSE 5331/7331 F‘11
39
Example 2
Nektarios Leontiadis, Tyler Moore and Nicolas Christin.
"Measuring and Analyzing Search-Redirection
Attacks in the Illict Online Prescription Drug Trade".
20th USENIX Security Symposium. August 10-12,
2011: San Francisco, Ca.
http://cs.wellesley.edu/~tmoore/usenix11.pdf
CSE 5331/7331 F‘11
40
References
1.
2.
3.
American College of Physicians- American Society of Internal Medicine, “Primer
on 95% Confidence Intervals,” Effective Clinical Practice, September/October
2001, Vol 4, No 6, pp 229-231.
Will G. Hopkins, “A New View of Statistics,” 2002,
http://www.sportsci.org/resource/stats/pvalues.html
M. A. Saint-Germain, “PPA 696 Research Methods,” 1/1/01,
http://www.csulb.edu/~msaintg/ppa696/696menu.htm
CSE 5331/7331 F‘11
41
Download