n! - OpenWetWare

advertisement
Genomics, Computing,
Economics
10 AM Tue 13-Feb
Harvard Biophysics 101
(MIT-OCW Health Sciences & Technology 508)
http://openwetware.org/wiki/Harvard:Biophysics_101/2007
Binomial, Poisson, Normal
0.10
0.09
0.08
0.07
Normal (m=20, s=4.47)
0.06
Poisson (m=20)
0.05
Binomial (N=2020, p=.01)
0.04
0.03
0.02
0.01
0.00
0
10
20
30
40
50
Binomial frequency distribution as a function of
X {int ...n}
p and q
p q q = 1 – p
Factorials 0! = 1
two types of object or event.
n! = n(n-1)!
Combinatorics (C= # subsets of size X are possible from a set of total size of n)
n!
X!(n-X)!
=C(n,X)
B(X) = C(n, X) pX qn-X
m=np
s2 =npq
(p+q)n = B(X) = 1
B(X: 350, n: 700, p: 0.1) = 1.53148×10-157
=PDF[ BinomialDistribution[700, 0.1], 350] Mathematica
~= 0.00 =BINOMDIST(350,700,0.1,0) Excel
Poisson
frequency distribution as a function of X {int ...}
P(X) = P(X-1) m/X
=
mx e-m/X! s2 =m
n large & p small P(X) @B(X)
m=np
For example, estimating the expected number of positives
in a given sized library of cDNAs, genomic clones,
combinatorial chemistry, etc. X= # of hits.
Zero hit term = e-m
Normal
frequency distribution as a function of X {-...}
Z= (X-m)/s
Normalized (standardized) variables
N(X) = exp(-Z2/2) / (2ps)1/2
probability density function
npq large N(X) @B(X)
Mean, variance, &
linear correlation coefficient
Expectation E (rth moment) of random variables X for any distribution f(X)
First moment= Mean m; variance s2 and standard deviation s
E(Xr) = Xr f(X)
m=E(X)
s2 =E[(X-m)2]
Pearson correlation coefficient C= cov(X,Y) = E[(X-mX )(Y-mY)]/(sX sY)
Independent X,Y implies C =,
but C =0 does not imply independent X,Y. (e.g. Y=X2)
P = TDIST(C*sqrt((N-2)/(1-C2)) with dof= N-2 and two tails.
where N is the sample size.
www.stat.unipg.it/IASC/Misc-stat-soft.html
One form of HIV-1 Resistance
Association test for CCR-5 & HIV resistance
Alleles
CCR-5+
D ccr-5
total
Obs Neg ObsSeroPos total
ExpecNeg ExpecPos
1278
1368
2646
1305
1341
130
78
208
103
105
1408
1446
2854
dof=(r-1)(c-1)=1
ChiSq=sum[(o-e)^2/e]=
15.6
P
0.00008
Samson et al. Nature 1996 382:722-5
Association test for CCR-5 & HIV resistance
Alleles
CCR-5+
D ccr-5
total
Obs Neg ObsSeroPos total
ExpecNeg ExpecPos
1278
1368
2646
1305
1341
130
78
208
103
105
1408
1446
2854
dof=(r-1)(c-1)=1
ChiSq=sum[(o-e)^2/e]=
15.6
P
0.00008
Samson et al. Nature 1996 382:722-5
But what if we test more than one locus?
Y= Number of Sib Pairs (Assocation)
X= Number of Alleles (Hypotheses) Tested
Y= Number of Sib Pairs (Association)
X= Population frequency (p)
GRR=1.5, p= 0.5 (population frequency)
1,600
GRR=1.5, #alleles=1E6
1,400
1E+10
1,200
1E+9
1,000
1E+8
800
600
1E+7
|
400
1E+6
200
1E+5
0
1E+4
1E+4
1E+6
1E+8
1E+10
1E+12
1E+14
1E+16
1E+18
1E+20
1E+22
1E+3
|
Y= Number of Sib Pairs (Association)
X= Genotypic Relative Risk (GRR)
1E+2
1
0.1
0.01
0.001
0.0001 0.00001
1E-06
1E-07
1E-08
1E-09
#alleles=1E6, p=0.5 (population frequency)
1E+8
The future of genetic studies
of complex human diseases.
Ref (Note above graphs are
active spreadsheets -- just click)
1E+7
1E+6
[based on Risch &
Merikangas (1996)
|
Science 273: 1516]
1E+5
1E+4
1E+3
1E+2
GRR = Genotypic relative risk
|
1E+1
0.001
0.01
1.001
0.1
1.01
1
1.1
2
10
11
100
101
1000
1,001
10,001
10000
1-GRR
GRR
Class outline
(1) Topic priorities for homework since last class
(2) Quantitative exercises so far: psycho-statistics,
combinatorials, exponential/logistic, bits, association &
multi-hypotheses
(3) Project level presentation & discussion
(4) Discuss communication/presentation tools
Spontaneous chalkboard discussions of t-test,
genetic code, non-coding RNAs & predicting
deleteriousness of various mutation types.
Download