Hypothesis Testing

advertisement
Hypothesis Testing:
(working from incomplete information)
•Jury deliberations
•Binomial distribution
•Poisson distribution and quantal release
•Normal distribution: standard deviation
•Stdev of samples of size N
•Estimating population statistics from small samples
•student’s t-test
•Predicting the future
•non-parametric statistics: Difference of Proportions
•The Black Swan…
•Correlation
•Fuzzy Logic and fuzzy controllers…
handouts:
• Selected pages from chapters 7 and 10 of
Loftus & Loftus, Essence of Statistics, 2nd
Ed (Knopf, 1988)
• You may have seen some of this material
from AM0650, or AM1650…
• http://www.stat.brown.edu/
• 12 biostatisticians (ScM level) on call
• “Our mission is to foster research and statistical education at Brown
Medical School and the University at large. Center faculty and staff
conduct methodologic research in Biostatistics and interdisciplinary
research in a broad range of areas of Medicine and Public Health.
The Center is home to the graduate program in Biostatistics and the
undergraduate statistics concentration at Brown, and organizes the
Brown Statistics Seminar Series.”
• my guy: Brad Snyder…
Hypothesis matrix for a jury
REALITY
(past, future)
Null
hypothesis
true
Judgment
(prediction)
Accept null
hypothesis
Accept
alternative
hypothesis
Alternative
hypothesis
true
Null hypothesis is true
(innocent verdict for innocent person)
REALITY
Null
hypothesis
true
Judgment
Accept null
hypothesis
Accept
alternative
hypothesis
Correct (innocent)
but it’s just the
status quo
Alternative
hypothesis
true
Alternative hypothesis is true
(guilty verdict for guilty criminal)
REALITY
Null hypothesis
true
Judgment
Accept null
hypothesis
Accept
alternative
hypothesis
Alternative
hypothesis true
Correct (innocent)
but it’s just the
status quo
Correct (guilty)
change in the
future (jail)
Alternative hypothesis is true
(innocent verdict for guilty criminal)
REALITY
Judgment
Accept null
hypothesis
Accept
alternative
hypothesis
Null hypothesis
true
Alternative
hypothesis true
Correct (innocent)
but it’s just the
status quo
Wrong: guilty
goes free. missed
opportunity
Correct (guilty)
change in the
future (jail)
Alternative hypothesis is true
(guilty verdict for innocent citizen)
REALITY
Judgment
Null hypothesis
true
Alternative
hypothesis true
Accept null
hypothesis
Correct (innocent)
but it’s just the
status quo
Wrong: guilty
goes free. missed
opportunity
Accept
alternative
hypothesis
Wrong: very bad
result: pursue a
dead end
Correct (guilty)
change in the
future (jail)
Signal detection theory:
http://wise.cgu.edu/sdtmod/index.asp
http://teachline.ls.huji.ac.il/72633/SDT_intro.pdf
SDT: All about a “detector” making decisions
• Is the “detector” (who can be a human making
decisions/judgments) prone to false positives or
misses? (mistakes)
• Hits and rejections are correct answers…
• Misses  a “conservative” detector
• False Positives  aggressive, optimistic,
paranoid, indefatigable…
• Not just about finding significant differences
between samples A and B…
• A Grand Jury can be classified as prone to
“guilty” or prone to “innocent”…
from WISE cgu.edu SDT website:
• “SDT is a method of modeling the decision
making process for someone who decides
between different classes of items (e.g.,
friend or [foe]) and their bias to favor a
particular type of response.”
• Jury selection (voir dire); jury consultants;
hung juries, mistrials; Louisiana v.
Morgan—”jury of peers;” civil rights…
from WISE SDT website , p.3:
• “Note that Misses and Correct Rejections
are redundant with Hits and False Alarms.
• The miss rate is 10/50 which is .20 or simply
(1 - "hit rate") and the Correct Rejection rate
is 45/50 or .90 or (1 - "false alarm rate").
• Therefore, you can perfectly describe all
four measures of a person's performance in
a signal detection experiment through their
Hit and False Alarm rates.”
Detector sensitivity: d’
• “The most commonly used SDT measure of
sensitivity is d' (d prime), which is the standardized
difference between the means of the Signal
Present and Signal Absent distributions. To
calculate d', we need only know a person's hit and
false alarm rates.
• The formula for d' is as follows: d' = z(FA) - z(H)
• where FA and H are the False Alarm and Hit rates,
respectively, that correspond to right-tail
probabilities on the normal distribution.”
Criterion
• Criterion is a measure of the willingness of a respondent
to say 'Signal Present' in an ambiguous situation.
• The choice of a criterion may depend on perceived
consequences of outcomes.
• For example, if the consequences are costly for saying
'Signal Present' when the signal actually is absent, then
a respondent may generally be less willing to say 'Signal
Present.'
• On the other hand, if the consequences are more costly
for failing to detect a signal when it is present, then a
respondent may be more willing to say 'Signal Present.'
• Positive Criterion » more willing to say “yes, I saw it”
• The ROC is the locus of Criterion points…
SDT Summary
• “Signal Detection Theory (SDT) allows an analyst to separate
sensitivity from response bias. Observers are assumed to make
decisions based upon information derived from two distributions.
The first (Signal Absent) is assumed to represent a background level
of "noise." The second distribution (Signal Present) represents an
increase to a background level of noise caused by the introduction
of a stimulus. That is why the second distribution is sometimes
referred to as the 'Signal + Noise‘ distribution.
• An observer's sensitivity, as indexed by d', is how well the observer
can differentiate items coming from the Signal Absent and Signal
Present distributions. Criterion (i.e., response bias) represents the
minimum level of internal certainty needed for the observer to
decide that a signal was present.
• ROCs represent the relationship between hits and false alarms, and
can be used to describe performance in terms of d'. SDT has
applications in fields such as medical diagnosis, bioinformatics,
psychology, and engineering.”
Receiver Operating Characteristics (ROCs)
• “The receiver-operating characteristic
(ROC) is a fundamental plot in signal
detection theory. A ROC is essentially a
scatterplot that shows the relationship
between false alarm rates on the x-axis,
and hit rates on the y-axis. ROCs describe
the relationship between the underlying
Signal Absent and Signal Present
distributions.”
Null hypothesis, the scientific method,
and troubleshooting
• Some independent variable (input) has been changed in the
experiment.
• The output is the dependent variable.
• The null hypothesis: That the independent variable has no
affect on the dependent variable.
• You want to design an experiment to test whether the null or
alternative hypothesis is true.
• Something goes wrong with a circuit: Test your hypothesis
as to why. (Lab ADA example…)
• Horace Barlow: direction selectivity in rabbit retinal ganglion cells:
2 alt hypotheses, test between them, so as not to favor one…
ordered combinations and
Pascal’s triangle
•An ordered combination deals with items that have individual labels,
such as their place in a row…
•The number of ordered combinations of N things taken r at a time is
Pascal’s triangle shows C(N, r) with
N as the row and r as the “column”
http://ptri1.tripod.com/
Binomial formula
Consider a random variable that can be in one of two states:
“success” or “failure”
The probability of exactly r successes out of N attempts is
where p is the probability of success and q of failure
another use of the formula:
Binomial distribution in EXCEL
• The probability that 3 or fewer coin flips come up heads
out of 10 tosses of a fair coin:
↓cumulative factor
= BINOMDIST(3, 10, 0.5, 1)
= 0.172
• Also see =COMBIN(10, 2) for EXCEL version of
number of combinations of 10 things taken 2 at a time…
• or try MATLAB function nchoosek(N, k) say nchoosek(10, 2)...
Or solve using Pascal’s triangle:
• find the row with “10” as the second number
• where 1, 10, 45, 120 are the number of combinations of
10 things taken “0”, 1, 2, 3 at time
• The number 0 represents that none of the coins
came up heads--there’s only 1 way that can happen…
Why roulette is my favorite form of gambling
• C:\MatlabR12\work\JDD\roulette13.m
• Pascal’s catastrophe…
• Thomas Bass, Newtonian Casino, Penguin (1991)
Poisson distribution
where n*p is the average for N trials…
Or try EXCEL
=poisson(0, 4.7, 1)
=0.009095
looking at the probability that there would be no release for
one stimulation where n*p = 4.7, and say n=470 and p = 0.01
compare to =binomdist(0, 470, 0.01, 1) = 0.008883 …close
Vesicle release from synapse
work of Bernard Katz (Nobel Prize, 1970)
Example: epsp = "excitatory post-synaptic potential".
Of 198 stimulus impulses 18 resulted in no epsp
(failure of release).
sum 3
n*p=2.33
bins
m
mV
calc
obs
0
0
19
18
1
0.4
45
44
2
0.8
52
55
3
1.2
41
36
4
1.6
24
25
5
2
11
12
6
2.4
4
5
7
2.8
1
2
8
3.2
0
1
9
3.6
0
0
198
198
(image from Tepper, at Rutgers...)
presynaptic bouton on left.
78 spontaneous epsp’s were observed
(next slide) average height 0.4mV.
for the event count of m*0.4mV add up 3 neighboring bins, except for 0.
Spontaneous vs evoked epsp’s …
Fitting the data
• Katz’ data are very well fit by a Poisson
distribution with n*p = 2.33, the only free
parameter in the equation.
• What is n in n*p? Not 198, the number of
shocks to the presynaptic axon.
• n is the number of vesicles in the
presynaptic synapse. est: n = 800
so p = 2.3/800 = 0.002875
How many vesicles are there per
pre-synaptic bouton?
• Anywhere from hundreds to thousands.
• One estimate says 987 vesicles per cubic micron.
• There are docking vesicles ready to be released,
and reserve vesicles--recently reconstituted, and
away from the membrane that is facing the
synpatic cleft.
• At any rate, p is the probability that one vesicle
will be released (due to one pre-synaptic shock...)
Quiz example of quantal release question:
• Suppose there are 700 vesicles at a synapse and
each has 0.002 probability of being released by one
pre-synaptic shock. What is the expected number of
shocks out of 200 that will result in no vesicles being
released?
• n*p = 700*0.002 = 1.4, the mean released…
• =POISSON(0, 1.4, 1) = 0.25
• 0.25*200 = 50 shocks will result in
no vesicle being released
• =BINOMDIST(0, 700, 0.002, 1) = 0.246252
A giant has swallowed 6 dwarfs numbered Di;
you hit him on the back and he coughs up N D’s;
how many he coughs up fits a binom dist: avg 3
Binomial example of 2 giants coughing…
• % Binom_samp_sze2 11.4.14
• % compare std of sample size 2 from binomdist of 6
% assume 50% probability of success
• % possible to cough up 0…
• pasc_7 = [ 1 6 15 20 15 6 1] % total of 64…
• to_6 = [ 0 1 2 3 4 5 6 ] % avg = 21/7 = 3
• dot_prod = sum(pasc_7 .* to_6)
• avg1 = dot_prod/sum(pasc_7)
• OR Prob of two times of 5 or 6 = (7/64)^2 ≈ 1%
Normal (Gaussian, Bell-shaped) Distribution
Say the mean of the data is μ and the standard deviation is σ
cumulative normal probability density function
0 mean, 1 stdev
from z= -1.96 to +1.96 is 95% of the area under the curve
The Black Swan*: How can you tell if
your data are NOT normally distributed?
• mean ≠ median, or
• CPDF not sigmoid-shaped, or
• PDF has “barbell” distribution or
• Fat-tailed asymmetric distribution or
• Data “fails” Chi-squared test…
*The Black Swan: The Impact of the Highly Improbable, Nicholas Taleb, Random House (2007)
Binomial becomes Normal (SAT?)
• Consider a binomial distribution with p =0.5
• p(x) vs x will be a symmetric up-down staircase curve
• As the number of “coin flips” N in the binomial data set
increases, the curve will look smooth and “normal”
• standard deviation of binomial dist =
connecting the dots of
a 40-point binomial plot…
Are you smarter than a 10th grader?
•
•
•
•
•
•
•
•
•
•
•
Sample of one:
Suppose you score 600 on the SAT math test
The average 10th grader scores 500
Standard deviation of the SAT = 100
What's the probability that you're smarter than a 10th grader?
You did receive a higher score, but (in EXCEL)
=NORMDIST(600, 500, 100, 1) = 0.84
1-0.84 = 16%
16% = one-tailed probability that someone will score 600 or more.
You're in the 84th percentile.
You’re not significantly smarter than a 10th grader…
Are you and your 9 left-handed friends
smarter than a 10th grader?
Say the mean of sample size 10 is 600…
Best est. of mean of the means of many samples
of size 10 is the mean of the one sample, 600…
What is the best estimate of the variance of
the means of many samples 10 scores
drawn from a SAT distribution?
Note: it doesn’t matter what the particular
standard deviation of the one sample is…if pop. σ known
• Sample of 10, with (average of the 10) = 600 on SAT math test
• The standard deviation of samples size 10 is
sqrt(10000/10) = sqrt(1000) = 31.6
• =NORMDIST(600, 500, 31.6, 1) = 0.9992, wildly significant
• Example of testing sample against a known population
Comparing two variants of a population
• What about comparing two experimental groups
from a known population?
• Form a normalized z term as shown below:
Ms1 and Ms2 are the means of the two groups.
• We are interested in the difference of the means here
Comparing two variants of a population
(cont)
•
•
•
•
•
•
•
Suppose it's known that the average area of maple leaves on the ground in
October is 28 cm-sq, with a standard deviation of 5 cm-sq.
A sample of 12 Japanese maple leaves has an average area of 34 cm-sq, std 4
cm^2
Someone else comes in and says that a sample of 18 “big leaf” maple leaves had
an average area of 38 cm-sq, unknown standard deviation.
Is it significant at the 5% level that big leaf maple leaves are larger than
Japanese maple leaves? (When was the hypothesis conceived?)
From the formula on the previous slide, the estimated std_dev is 1.86,
→z = 4/1.86 = 2.14
and without having to actually calculate z, =NORMDIST(4, 0, 1.86, 1) = 0.984
The significance is 1.6% < 5% Answer: yes the difference is significant.
More Maple Leafs (evs?)
• work\fold23\MapleLeafSizeScript12
• MapleTST.xls
• Tools\Data Analysis\z-test for 2 means
The paradox of two tails
The area in yellow must be less than 5% of the total for
the two-tailed test to be significant.
A two tailed-test is 2x more difficult to pass than a one-tailed
Digression for November elections
• A qualifier seen in news articles about political polling:
3.1% margin of error…
• Suppose X voters out of N sampled will vote for your
candidate. What is the number of voters N needed in a
sample to insure that 95% of the time the actual
percentage of voters underlying your candidate's
percentage of X/N will be within ±3.1 percent of X/N?
• This question, whose answer is N=1000 and whose
derivation is here, is different from the question:
• What should be N such that you're confident at the 95%
level that the range of poll percentages is ±3 percent of
X/N if you repeated the poll many times?
Number of voters needed in a poll
Men’s height (age 20-40)
and %-age of 7-footers in NBA
• “guys who are just tall…”
•
http://www.truthaboutit.net/2012/05/true-or-false-half-of-all-7-footers-are-in-the-nba.html
• CDC data: for age 20, mean = 69.8”
std_dev = 2.8”
• (84-69.8)/2.8 = 5.07 = z (num of std_dev out)
•
•
•
•
1-NORMDIST(84, 69.8, 2.8, 1) = 2 x 10-7
320M/2 = 160M; ¼*160 = 40M men age 20-40
→40x106 * 2 x10-7 = 8, too small of a number:
fat tail distribution…
Estimating unknown population variance
• Suppose the statistics of the underlying population are unknown…
• What is the best estimate of population variance?
• Remember from AM65?
t-distributions
• Once we enter a world of unknown population statistics, where we
rely on the small sample data alone, we end up dealing with
t-distributions--examples below for 3 and 6 deg of freedom…
• Contained within EXCEL are the “t-tables” for each
degree of freedom N-1.
Two tails example: Comparing to a standard
with TDIST
• Suppose I do an experiment to see how close people can
come to guessing my weight. I ask ten people.
• I know my exact weight, but don't know the standard deviation
of all guesses, only the stdev of a sample of 10 guesses.
• Next, I estimate the variance of the population from
• Then I divide the est. variance by N=6 to find the est. variance of
the means of sample size 10 = σ2.
• Now I calculate t = (diff_mean – wt)/σ and have EXCEL compute
• =TDIST(diff_mean, t, 9, 1)
• But what if I don’t care if they’re high or low, just wrong in either direction?
Time for 2-tails? see weight sheet…
•WeightEst12.xls on the screen
Two tailed test example (cont)
• Suppose all the guesses are too high.
• Can I do a one-tailed test concerning the hypothesis that
people overestimate my weight?
• NO!
• The hypothesis was conceived after collecting the data.
• The two-tailed criterion must be used.
• Whatever the one-tailed (normal) significance, I must multiply it by 2.
(considering significance to be a small number…)
• The hypothesis: It is significant that people are wrong about my
weight, guessing either too high or too low.
• May lead to a Difference of Proportions test with threshold.
• Or what about using the absolute value of the “error” as the data?
Example: femurs in lemurs
Suppose a sample of 9 femurs from ring-tailed lemurs show their mean
length to be 20 cm, And that the variance of the length is 9 cm. What is
the probability that the ringtail lemur femurs are NOT from a population
of mean length 24 cm? A one-tailed test?
Example 10-5 from Loftus & Loftus, then use =TTEST(A1, A2, tails, type)
How can you use the EXCEL tools
if all you have is sample size, mean and stdev?
• Create your own sample with the same mean and standard
deviation:
• The cpdf--cumulative probability density function--is the integral of
the probability distribution.
• Sample at equal intervals of the cumulative probability y-axis; pick
off the associated z values, then un-normalize the z’s: x = σ*z+μ
• See test code in folder fold23 function
[pdfx, pdfy, samp, samp3, std1, std3] = pdf_tst12(52, 100, 30, 1);
• The result will be a sample with a slightly smaller variance than the
underlying population.
• Tweak that data to get the exact stdev, and use EXCEL on the
resulting synthetic sample.
• rev12 has rand(div, 1) generate from a UNIFORM distribution…
How long to find the T?
Nonparametric (not normal) data
hypothesis testing: Difference of proportions.
• Example: A study of reaction times for
discovering one T among many L’s.
• Some reaction times can be very long
• None can be very short or negative
• Result: fat-tailed distribution
• mean > median
• Transform data into binary format:
reaction time > 500 msec?
• Now a “binomial” problem…
Courtesy of S Geman AM65 notes - Adapted from McCabe & Moore (6th ed.)
Other ways to “normalize” your data
• Throw
outliers out of data base…
• Take logarithm of data (compression of large values…)
•How to justify either action?
Correlation
Assume we have a set of NORMALIZED paired data { x, y }
• r will be a number between -1 and +1
• A LINEAR correlation
• correlation is not causation!
• In EXCEL use the Pearson or CORREL operation
• = PEARSON(M5:M14, N5:N14)
• Example: dose and response should be positively correlated
• Time-shifting auto-correlation: pattern matching
• Why M-1?
Auto-correlation example from wikipedia
• time series with a hidden sine wave
• autocorrelation reveals hidden pattern...
• “barrel-shifting” the data
Black Swan challenge: Power Law
from page 235 of The Black Swan
URL: http://www.engin.brown.edu/courses/en123/eqnSTAT/BlackSwan8020.jpg
For his 80/20 example, what is the underlying power in the power law?
Suppose 100 people own land in Italy, and you order and number them from least to most.
The following matlab script gives frac 21% owned by the first 80%, for a power of 6.
(yes, the integral of the power fcn could be used...)
for nn = 1:100
ara(nn) = nn^6; % the power is 6
end,
tot = sum(ara);
tot80 = sum( ara(1:80) );
frac = tot80/tot
but
tot98 = sum(ara(1:98))
and
tot98/frac = 86%, so the top 2% do not own 50% of the land...
Predicting the future:
How many hurricanes will hit the USA next year?
• http://weather.unisys.com/hurricane/atlantic/index.html
• Hypothesis: As the years go by, anthromorphic release of CO2 into
the atmosphere will warm the Atlantic ocean and cause more
hurricanes to spawn off the coast of Africa...
• 2005 (4) Dennis, Katrina, Rita and Wilma struck the US.
• 2006: none
• 2007: (1½) Humberto and Noel (½)
• 2008: (3) Dolly (½), Gustav, Ike, Kyle (½)
• 2009: (1) Bill
• Explanation: Blame El Nino. (Thank El Nino?)
• 2010: (2) Earl, Igor (Blame volcanic ash from Iceland)
• 2011: (1) Irene
• 2012: (2) Issac, Sandy
• 2013: none (Blame polar vortex)
• 2014: (1) Arthur (July)
Predicting Global Warming in year 2000:
Probability compared to Fuzzy Logic
• grading fuzzy concepts like
“tall” or “close” or “warm” or “fast” or “guilty”
• Fuzzy set membership from 0 to 1
• fuzzy set membership functions
• Fuzzy logic functions: OR AND…
• fuzziness and existentialism: the refrigerator example
• example: the fuzzy ellipse
• “unlike fuzziness, probability dissipates with increasing
information.” (Kosko p. 267)
• applications: inverted pendulum; backing up 18 wheeler truck…
• Use of fuzzy logic controllers in Japanese rail transport
•
•
Lotfi Zadeh, UC Berkeley
(father of fuzzy sets and logic)
Bart Kosko (USC): “Fuzzy Engineering”
http://office.microsoft.com/en-us/excel/HP052042111033.aspx#Statistical%20functions
→
→
→
Rising sea levels:
• From 1900 to 2000 sea levels rose 9”
• Prediction from UN panel:
from 2000-2100 levels will rise 36”!
• From 2000-2014 levels have risen 1.5”
• Linear UN prediction for 2014: 5.4”
• need 34.5” in 86 years…
Topics for Stat Quiz Question
• Poisson or binomial functions used to solve
release of transmitter vesicle problems
• Significance of difference between two samples
taken from known population
• Sig of diff between two samples without
knowledge of overlying population
• Correlation of time-shifted signals
(stimulus and response waveforms)
Download