Basics of Linkage Analysis

advertisement
Basics of Linkage Analysis
1.
2.
3.
4.
5.
Idea of Linkage Analysis
Types of Linkage Analysis
Parametric Linkage Analysis
Nonparametric Linkage Analysis
Conclusions
Basics of Linkage Analysis
1.
2.
3.
4.
5.
Idea of Linkage Analysis
Types of Linkage Analysis
Parametric Linkage Analysis
Nonparametric Linkage Analysis
Conclusions
Gene mapping problem
Lähde: Morgan Genetics Tutorial. http://morgan.rutgers.edu/morganwebframes/level1/page2/karyotype.html
Linkage Analysis
•
•
One of the two main approaches in gene mapping.
Uses pedigree data.
Genetic linkage and linkage analysis
•
•
•
•
Two loci are linked if they appear closeby in the same
chromosome.
The task of linkage analysis is to find markers that are linked to
the hypothetical disease locus
Complex diseases in focus  usually need to search for one
gene at a time
Requires mathematical modelling of meiosis
Meiosis and crossover
•
•
Number of crossover sites is thought to follow Poisson distribution.
Their locations are generally random and independent of each other.
The simple idea
DIS
Marker
Recombination
fraction

Find

that maximises
L( |data )
Obtain measure for degree of evidence in favour of linkage (LOD
score)
Markers
and
inheritance
1
2
4
3
2
2
1
3
3
1
4
2
Father
2
3
1
3
4
1
Child
•
•
•
Polymorphic loci whose locations are known
Point mutations (SNP) or lengths of repetitive sequences
Inherited together with the chromosomal segments
Mother
Markers and information
•
•
•
•
Two individuals share same allele label  they share the allele IBS
(identical by state)
Two individuals share an allele with same grandparental origin  they
share an allele IBD (identical by descent)
IBS sharing can easily be deduced from genotypes.
IBD sharing provides more information. One can try to deduce IBD
sharing based on family structure and inheritance.
Markers and information
1,2
2,3
The children share allele 1 IBS.
They also share it IBD.
1,2
1,3
Markers and information
1,2
1,3
The children share allele 1 IBS.
They do not share alleles IBD.
1,2
1,3
Markers and information
1,1
2,3
The children share allele 1 IBS.
They either share or do not share it IBD.
1,2
1,3
Marker maps
Building blocks of linkage analysis
Pedigree structures
Chr. 1
1
1
2
5
1 12 1
2 14 1
2
3
1
2
1 2
1 2
1
2
Chr. 2
1 3
3 4
4 5
4 7
1
1
2 3 2
4 3 4
4 2 1
4 2 3
Genotypes
Phenotypes
Chr. 22
2 1 1 3 2
2 2 3 3 4
Building blocks of linkage analysis
•
Information about disease model (in parametric analysis)
 0.99   (aa), probability of a homozygote being affected


   0.8   (Aa), probability of a heterozygote being affected
 0.001  (AA), probability of a non-carrier being affected


(phenocopy rate)
•
Information about environmental variables
13 0 0 5
110010000000000000000000000000
120020000000000000000000000000
131210000000000000000000000000
140020000000000000000000000000
151211312145163233542543253245
160022443425971713255346252121
170011512252665113325534554421
181221332245663514541543253245
193411342252665541455336252321
1 10 3 4 2 1 3 4 2 5 4 3 1 6 3 2 1 3 5 5 2 3 4 2 4 5 4 3 5 1
1 11 5 6 1 1 1 4 1 4 5 5 6 7 2 7 3 3 5 5 2 3 4 6 2 5 3 1 4 1
1 12 5 6 2 2 3 4 2 4 4 5 1 7 3 7 3 3 5 5 2 3 4 6 5 5 2 1 5 1
1 13 5 6 1 2 3 4 1 4 5 5 6 7 2 7 3 3 5 5 2 3 4 6 2 5 3 1 5 1
1 14 7 8 1 1 5 3 2 2 2 5 6 6 5 5 1 4 3 4 5 5 3 3 5 5 4 2 2 5
1 15 7 8 2 1 1 3 2 2 2 5 6 6 5 5 1 4 3 4 5 5 3 3 5 2 4 3 2 4
1 16 0 0 1 1 5 5 2 4 5 6 3 1 1 1 3 4 4 4 3 7 1 3 5 4 3 2 1 4
1 17 16 12 1 2 5 4 2 4 5 5 3 7 1 7 3 3 4 5 7 3 3 6 4 5 2 1 4 5
1 18 16 12 2 1 5 3 4 2 6 4 1 1 1 3 3 3 4 5 3 2 1 6 4 5 2 1 4 1
0 0.0 0.0 0
1 2 3 4 5 6 7 8 9 10 11 12 13
12
0.99 0.01
1
0.001000 0.999000 0.999000
3 5 # M0
0.172 0.036 0.176 0.283 0.333
3 5 # M1
0.100 0.345 0.310 0.164 0.081
(---clip---)
3 5 # M10
0.169 0.432 0.147 0.130 0.122
3 5 # M11
0.397 0.204 0.151 0.043 0.205
00
0.10 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
1 0.1 0.45
Example of linkage analysis results for one chromosome
2
1,5
1
0,5
-0,5
-1
23
4
21
2
19
0
16
7
14
5
12
3
10
0
78
56
33
11
0
Basics of Linkage Analysis
1.
2.
3.
4.
5.
Idea of Linkage Analysis
Types of Linkage Analysis
Parametric Linkage Analysis
Nonparametric Linkage Analysis
Conclusions
Types of linkage analysis
•
•
•
•
•
Parametric vs. non-parametric
Dichotomous vs. continuous phenotypes
Elston-Stewart vs. Lander-Green vs. heuristic
Two-point vs. multipoint
Genome scan vs. candidate gene
Basics of Linkage Analysis
1.
2.
3.
4.
5.
Idea of Linkage Analysis
Types of Linkage Analysis
Parametric Linkage Analysis
Nonparametric Linkage Analysis
Conclusions
Maximum likelihood estimation
•
•
•
•
•
•
A common approach in statistical estimation
Define hypotheses
Generate likelihood function
Estimate
Test hypotheses
Draw statistical conclusions
Hypotheses in linkage analysis
H0:
–  = 0.5
– the disease locus is not linked to the marker(s)
H A:
–   0.5
– the disease locus is linked to the marker(s)
Likelihood function for a single nuclear family
Lj = gF P(gF) P(yF | gF)
gM P(gM)P(yM | gM)
gOi P(gOi | gF, gM) P(yOi | gO)
The parameter  is incorporated
here
G = genotype probabilities
y = phenotype probabilities
Several independent families
• The likelihood functions of multiple indpendent families are
combined:
• L =  Lj
or
logL =  log Lj
Testing of hypotheses
•
•
Compute values of likelihood function under null and alternative
hypotheses.
Their relationship is expressed by LOD score (essentially derived from
the likelihood ratio test statistic.
LOD ( ' )  log 10
L(   ' )
 log 10 L(   ' )  log 10 L(  0.5)
L(  0.5)
On significance levels
•
•
•
•
•
P-value gives a probability that a null hypothesis is rejected even though it was true.
A LOD-score threshold of 3 corresponds to a single-test p-value of approximately
0.0001.
In genome-wide gene mapping study, one conducts several (partially dependent)
statistical tests.
Applying the aforementioned threhold, the global p-value of 400 mutually
independent test would be
1 - (1-0.0001)400 = 0.039  0.05.
What if one focuses on individual candidate regions?
An example of ML estimation
•
•
Single marker, dominant disease
All genotypes known
1,3
2,3 1,4
2,4
1,2 1,4 1,2 3,4 1,2
Paternal haplotype combinations
( P ( g ) P ( g

L(   ' )

L(  ½)  ( P( g ) P( g
F
LOD ( ' )  log 10
gF
| g F ,   ' ))
Oi
| g F ,  ½))
gOi
F
gF
Oi
gOi
Haplotype combinations of children,
assuming unlinked loci
2,4
1,3
2,3 1,4
log 10{
1,2 1,4 1,2 3,4 1,2
1
D
½
½
1
D
3
+
1-
1-
3
D
1
+
1-
1-


3
D
3
D


1
D
1
+
3
+
3
+
1
+
4 { ½  [ (1   ) 4 
1
1 
0
0
2
(1   ) 2 ] 
0
2
0
½  [ (1   ) 
2
4
4
1
(1   )1]} /(½) 7 }
0.56
0.5
LOD
score
0.0
0.0
0.14
Recombination fraction
LOD>3 taken as evidence of linkage.
0.5
Basics of Linkage Analysis
1.
2.
3.
4.
5.
Idea of Linkage Analysis
Types of Linkage Analysis
Parametric Linkage Analysis
Nonparametric Linkage Analysis
Conclusions
Idea of nonparametric linkage analysis
•
•
•
No assumption is made on disease model.
The tests measure IBD sharing of alleles among affected relatives.
ASP (Affected-Sib-Pair test) is the simplest form of NPL
– Requires nuclear families of two affected children
– Extendable to arbitrary pedigrees, missing data, and arbitrary group
of affected relatives
Example analysis for one marker
12
13
34
13
12
23
34
23
13
34
24
34
Idea of ASP test
•
•
•
Collect large number of families with two affected offspring and deduce
IBD status for each pair of offspring.
Let us mark the number of sib-pairs with IBD status zero by n0.
Respectively, n1 ja n2 are observed counts of the sib-pairs that share 1
or 2 alleles IBD.
Compare the counts against the expected distribution by computing the
value of the 2 test statistic.
Test statistic for ASP
(ni  ei ) 2 (n0  e0 ) 2 (n1  e1 ) 2 (n2  e2 ) 2
S 



ei
e0
e1
e2
i 0
2
•
where e0 = 0.25n, e1 = 0.5n ja e2 = 0.25n.
•
•
•
2-test with 2 degrees of freedom.
homozygous parents are a problem.
lots of variants and implementations.
Father Mother
1,4
2,3
1,4
2,4
2,3
4,4
2,5
1,3
2,5
1,4
3,4
2,4
3,5
2,4
4,5
2,5
2,2
4,5
3,3
2,5
3,5
4,4
1,2
2,4
1,5
2,2
2,5
1,3
Child
1,3
4,4
2,4
2,3
4,5
2,4
2,3
5,5
2,4
3,5
4,5
1,4
2,5
1,5
Child
1,3
4,4
3,4
2,3
1,2
2,4
2,5
5,5
2,5
3,5
3,4
1,4
2,5
1,5
0
1
2
1
1
1
1
1
1
1
1
1
1
1
7
Idea of nonparametric linkage analysis
Observed
Expected
1
2,25
1
4,5
7
2,25
9
9
2
2
2
(
1

2
,
25
)
(
1

4
,
5
)
(
7

2
,
25
)
2 


 13,45
2,25
4,5
2,25
Compare to the 2 cumulative distribution function (with 2
degrees of freedom): P=0.0012
The sample is too small for the 2 test to be reliable.
Basics of Linkage Analysis
1.
2.
3.
4.
5.
Idea of Linkage Analysis
Types of Linkage Analysis
Parametric Linkage Analysis
Nonparametric Linkage Analysis
Conclusions
Conclusions
•
•
•
•
Linkage analysis is a pedigree-based approach to gene mapping.
Parametric vs. nonparametric methods.
Hypothesis-driven vs. explorative analysis.
Meta-analysis becoming increasingly popular.
Download