T - STAT 115

advertisement
Haplotype Inference
Stat 115
Outline
• Haplotype inference and Clark’s
algorithm
• Haplotype inference using EM and Gibbs
sampling
• Hapmap project
• Affymetrix SNP chip
• SNP chip for LOH studies
2
Single Nucleotide Polymorphisms
An illustration
A1A1, A2B2, A3A3
A1B1, B2B2, B3B3
A1A1, B2B2, A3B3
A1B1, B2B2, A3B3
4
B1B1, B2B2, A3B3
A1B1, B2B2, A3B3
A1
B1
A1
B1
B2
B2
B2
B2
A3
B3
A3
B3
or
A1
B1
B2
B2
B3
A3
Haplotype
• Haplotype: cluster of SNPs with LD
– Block with 10 SNPs has 210 possible haplotypes
– Only observe 5-6 haplotypes (> 90% cases)
– Tagging SNPs: subset of SNP to ID a haplotype
• Association (with disease) studies using
haplotype is more accurate than using single
SNP genotype
5
Haplotype Inference
• Genotyping only tells an individual is e.g.
Aa BB Cc, but it doesn’t tell whether
haplotype is: ABC + aBc, or ABc + aBC
• Haplotype can often be inferred if parental
genotype is known
– Similar to blood typing, e.g. F: A, M: AB, C: B
 F: AO, M: AB, C: BO
• Otherwise, look at the population genotypes,
infer common haplotypes
6
Ambiguity of SNP-Based Haplotypes
Population Frequencies of
Ambiguous Individuals
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
pi=0.5
pi=0.4
pi=0.3
pi=0.2
pi=0.1
0
5
10
15
No. of Loci
7
Population Frequencies of
Ambiguous Triads
20
25
pi=0.5
pi=0.4
pi=0.3
pi=0.2
pi=0.1
0
5
10
15
No. of Loci
20
25
Existing Computational Haplotype
Reconstruction Approaches
• Parsimony Approach
Clark, 1990
• E-M Algorithm
Excoffer & Slatkin, 1995; Chiano & Clayton,
1998; Hawley & Kidd, 1995; Long et al., 1995
• Pseudo-Gibbs Sampler
Stephens et al., 2001
8
Haplotype Inference
Clark’s Algorithm
1.
2.
3.
4.
Construct haplotypes from unambiguous individuals
Remove samples that can be explained as combinations
of haplotypes discovered already
Propose haplotype that would explain most remaining
Iterate 2 & 3 until finish
•
Disadvantages:
•
•
•
9
Depend on # of m ambiguous subjects
Cannot get started when n is small
Pr(failure to start)  [1-1/(1+4N)-4N /(1+4N)2]n
Statistical Model for Haplotype
Haplotype
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
A
A
A
A
C
C
C
C
C
C
G
G
C
C
G
G
Frequency
C
G
C
G
C
G
C
G
-----------------
1
2
3
4
5
6
7
8
Haplotype Pool
2
1
4
8
2
6
3
6
6
5
7
6
1
1
• Each individual’s two haplotypes are treated as
random draws from a pool of haplotypes with certain
frequencies that can satisfy the genotyping
10
Haplotype Inference
EM and Gibbs Sampler
• Observe genotype Y, estimate haplotype pair Z for
each individual and haplotype frequency 
• Initialize haplotype frequencies
• Iteration:
– Estimate Z given Y, 
– Estimate  given Y, Z
1
6
1   2 
11
1 6
18 


1 6   25 18   2 3 
Gibbs Sampler
• Pseudo-counts for  is ( 1 ,
, m )
• Conditional distribution for Gibbs sampler:
zi | , Y , i  1,
,n
Update each person’s
haplotypes
 g h
P( zi1  g , zi 2  h | , Y ) 
  gh
 | Z ,Y
g  h yi
Haplotype Count in Z
Prior for 
Update  by a draw from its posterior
P( | Z , Y )  Dirichlet(n( z)   )
12
Gibbs Sampler
• Pseudo-counts for  is ( 1 ,
, m )
• Conditional distribution for Gibbs sampler:
zi | , Y , i  1,
,n
Update each person’s
haplotypes
P( zi1  g , zi 2  h | Z[  i ] , Y ) 
 | Z ,Y
Haplotype Count in Z
ˆgˆh

g  h  yi
Prior for 
Update  by a draw from its posterior
P( | Z , Y )  Dirichlet(n( z)   )
13
ˆg ˆh
Example
1
6
1   2 
1 6
18 


1 6   25 18   2 3 
Update individual 2’s phase by sampling from
14
EM and Gibbs Sampling in Motif Finding
• Problem
– Observe: sequence S
– Unknown: motif θ and site location A (alignment), but
given one, can infer the other
• EM and Gibbs Sampler
– Initialize random motif θ
– Iterate:
• Given θ and sequence S, update site location A
• Given A and S, update θ
– EM updates by weighted average
– Gibbs sampling updates by sampling
15
Haplotype Inference
Partition-Ligation
• When #SNP is big, # possible haplotypes is too
big, so divide and conquer
– Consider an inferred sub-haplotype as one allele
16
Partition-Ligation

5
’
3
’
L
Level 3
Level 2
Level 1
Level 0
K
“Block by block”
L=K 
17
2
faster
“Piece by piece”
Stephens et al., 2001
Hapmap of Human Genome
• HapMap: catalog of common genetic variants in
human
– What are these variants
– Where do they occur in our DNA
– How are they distributed within populations and
between populations around the world
• Goals:
– Define haplotype “blocks” across the genome
– Identify reference set of SNPs: “tag” each haplotype
– Enable unbiased, genome-wide association studies
18
Affymetrix GeneChip® Human Mapping
100K Set (2 Arrays, now 500K) Coverage
2.5kb
5.8kb
0.30
19
40 Probes Used Per SNP
-4 -1 0 +1 +4 -4 -1 0 +1 +4
20
-4 -1 0 +1 +4
GeneChip Mapping Assay Overview
SNP call for individuals at chr1
AA
21
BB
AB
Genome Aberration Studies
• Karyotype: A picture of the chromosomes in a cell that is
used to check for abnormalities in the chromosomes.
• Often finding a cure for cancer starts with finding the
genetic changes that cause a
cell to grow wildly out of
control.
• Most often, these changes
activate cancer-promoting genes
(oncogenes) or inactivate
cancer-squelching genes
(tumor-suppressors).
22
Genome Aberration Study Technologies
• ArrayCGH: Array Comparative Genome
Hybridization
– cDNA array with long probes (~MB) for each
genomic region
– Collect normal & cancer samples, differentially
label, mix and hybridize
– Check for regions duplicated or lost in cancer
• Affymetrix SNP chip
23
SNP Chip for LOH
• Loss of Heterozygosity: tumor suppressor
gene inactivation by allelic loss in cancers
Normal
First genetic hit
Cancer
OR
24
T
T
XT
T
XT
XT XT
A
B
A
B
A
A A
LOH
26
A
M
A
M
Type
Gender
Ploidy(numeric)
Contamination(numeric)
1p36. 32
1p36. 31
1p36. 23
1p36. 22
1p36. 21
pot assium volt age- gat ed channel, shaker - r elat ed
vesicle- associat ed m em br ane pr ot ein 3
ar ginine- glut am ic acid dipept ide ( RE) r epeat s
car bonic anhydr ase VI pr ecur sor
m it ochondr ial car r ier pr ot ein M G C4399
kinesin f am ily m em ber 1B
TAR DNA binding pr ot ein
M AD2 hom olog
shor t - chain dehydr ogenase/ r educt ase 1
lung t ype- I cell m em br ane- associat ed
1p35. 3
hypot het ical pr ot ein FLJ10199
SM ART/ HDAC1 associat ed r epr essor pr ot ein
KI AA0445 gene pr oduct
hypot het ical pr ot ein FLJ10521
pair ed box gene 7, isof or m 2
neur oblast om a, suppr ession of t um or igenicit y 1
hypot het ical pr ot ein FLJ32784
endot helin conver t ing enzym e 1
hepar an sulf at e pr ot eoglycan 2 ( per lecan)
ephr in r ecept or EphA8 pr ecur sor
up- r egulat ed in liver cancer 1
int er leukin 22 r ecept or
r unt - r elat ed t r anscr ipt ion f act or 3
hypot het ical pr ot ein RP1- 317E23
1p35. 2
hypot het ical pr ot ein FLJ20045
ni t er f er on induced 6- 16 pr ot ein, isof or m c
hypot het ical pr ot ein M G C16491
hypot het ical pr ot ein FLJ13171
1p36. 13
1p36. 12
AA
hypot het ical pr ot ein FLJ22639
t um or necr osis f act or r ecept or super f am ily ,
calm odulin- like pr ot ein AF490905
act in r elat ed pr ot ein M 2
WD r epeat dom ain 8 pr ot ein
hypot het ical pr ot ein M O T8
1p36. 11
1p35. 1
1p34. 3
1p34. 2
1p34. 1
1p33
1p32. 3
m at r iln 1, car t ilage m at r ix pr ot ein
f at t y acid binding pr ot ein 3
hypot het ical pr ot ein M G C1203
hippocalcin
HM G 2 like
gap junct ion pr ot ein, bet a 5 ( connexin 31. 1)
splicing f act or pr oline/ glut am ine r ich
t ekt in 2
glut am at e r ecept or , ionot r opic, kainat e 3
hypot het ical pr ot ein FLJ23231
Rag C pr ot ein
m icr of ilam ent and act in f ilam ent cr oss- linker
hypot het ical pr ot ein FLJ14490
pot assium volt age- gat ed channel, KQ T- like
hum an im m unodef iciency vir us t ype I
hypot het ical pr ot ein DKFZp434N2435
EBNA1 binding pr ot ein 2
UDP- G al: bet aG lcNAc bet a 1, 4hypot het ical pr ot ein FLJ21156
hypot het ical pr ot ein SP192
M AP kinase- int er act ing ser ine/ t hr eonine kinase
UM P- CM P kinase
PRO 0529 pr ot ein
ELAV ( em br yonic let hal, abnor m al vision,
m esenchym al st em cell pr ot ein DSCD28
oxyst er ol- binding pr ot ein- like pr ot ein 9
or igin r ecognit ion com plex, subunit 1- like
solut e car r ier f am ily 1 ( glut am at e t r anspor t er ) ,
hypot het ical pr ot ein FLJ10407
t et r at r icopept ide r epeat dom ain 4
1p32. 2
1p32. 1
1p31. 3
1p31. 2
phosphat idic acid phosphat ase t ype 2B
disabled hom olog 1
LO C115209
hypot het ical pr ot ein FLJ10986
bet a- am yloid binding pr ot ein pr ecur sor
ubiquit in specif ic pr ot ease 1
f or khead box D3
r ecept or t yr osine kinase- like or phan r ecept or 1
hypot het ical pr ot ein FLJ10770
lept in r ecept or
phosphodiest er ase 4B, cAM P- specif ic
UDP- glucur onic acid/ UDP- N- acet ylgalact osam ine
gr owt h ar r est and DNA- dam age- inducible, alpha
r et inal pigm ent epit helium - specif ic pr ot ein
densin- 180
pr ost aglandin E r ecept or 3 ( subt ype EP3)
sim ilar t o hypot het ical pr ot ein FLJ20156
1p31. 1
acyl- Coenzym e A dehydr ogenase, C- 4 t o C- 12
t r iosephosphat e isom er ase 1
adenylat e kinase 5
hypot het ical pr ot ein M G C27382
dead r inger - like 2
BB
FKBP- associat ed pr ot ein isof or m FAP48
hypot het ical pr ot ein FLJ23033
KI AA0923 pr ot ein
B- cell CLL/ lym phom a 10
1p22. 3
1p22. 2
1p22. 1
1p21. 3
1p21. 1
1p13. 3
1p13. 2
1p13. 1
1p36. 23
1p36. 22
43572C
504548C
214248C
999827A
15353A
1p36. 21
1p35. 2
hypot het ical pr ot ein FLJ20045
int er f er on induced 6- 16 pr ot ein, isof or m c
hypot het ical pr ot ein M G C16491
hypot het ical pr ot ein FLJ13171
1p36. 11
1181C
49062A
550598A
662006A
66597C
739433A
341648C
44126A
59031C
1178529C
54968A
50805C
64386C
61295A
1517914G
1p35. 1
1p34. 3
1p34. 2
1p34. 1
914793C
1320782C
59196A
1p33
1482739A
559379C
567153A
615640G
57708A
41707A
65165A
986286C
839645A
412048A
806113A
84798C
597975A
745539A
992106G
605895C
1756447A
756037G
560683A
1546616A
488777C
1104565A
1p32. 3
251913A
48983A
m at r iln 1, car t ilage m at r ix pr ot ein
f at t y acid binding pr ot ein 3
hypot het ical pr ot ein M G C1203
hippocalcin
HM G 2 like
gap junct ion pr ot ein, bet a 5 ( connexin 31. 1)
splicing f act or pr oline/ glut am ine r ich
t ekt in 2
glut am at e r ecept or , ionot r opic, kainat e 3
hypot het ical pr ot ein FLJ23231
Rag C pr ot ein
m icr of ilam ent and act in f ilam ent cr oss- linker
hypot het ical pr ot ein FLJ14490
pot assium volt age- gat ed channel, KQ T- like
hum an im m unodef iciency vir us t ype I
hypot het ical pr ot ein DKFZp434N2435
EBNA1 binding pr ot ein 2
UDP- G al: bet aG lcNAc bet a 1, 4hypot het ical pr ot ein FLJ21156
hypot het ical pr ot ein SP192
M AP kinase- int er act ing ser ine/ t hr eonine kinase
UM P- CM P kinase
PRO 0529 pr ot ein
ELAV ( em br yonic let hal, abnor m al vision,
m esenchym al st em cell pr ot ein DSCD28
oxyst er ol- binding pr ot ein- like pr ot ein 9
or igin r ecognit ion com plex, subunit 1- like
solut e car r ier f am ily 1 ( glut am at e t r anspor t er ) ,
hypot het ical pr ot ein FLJ10407
t et r at r icopept ide r epeat dom ain 4
1p32. 2
1p32. 1
1p31. 3
1p31. 2
58017A
46075A
503703A
54694A
1132908A
618232A
834578A
932426G
43128A
610038C
559390C
596482C
61261A
54515A
831474A
600801C
1178724C
981501A
vesicle- associat ed m em br ane pr ot ein 3
ar ginine- glut am ic acid dipept ide ( RE) r epeat s
car bonic anhydr ase VI pr ecur sor
m it ochondr ial car r ier pr ot ein M G C4399
kinesin f am ily m em ber 1B
TAR DNA binding pr ot ein
M AD2 hom olog
shor t - chain dehydr ogenase/ r educt ase 1
lung t ype- I cell m em br ane- associat ed
1p35. 3
1p36. 12
617899C
1221868A
pot assium volt age- gat ed channel, shaker - r elat ed
hypot het ical pr ot ein FLJ10199
SM ART/ HDAC1 associat ed r epr essor pr ot ein
KI AA0445 gene pr oduct
hypot het ical pr ot ein FLJ10521
pair ed box gene 7, isof or m 2
neur oblast om a, suppr ession of t um or igenicit y 1
hypot het ical pr ot ein FLJ32784
endot helin conver t ing enzym e 1
hepar an sulf at e pr ot eoglycan 2 ( per lecan)
ephr in r ecept or EphA8 pr ecur sor
up- r egulat ed in liver cancer 1
int er leukin 22 r ecept or
r unt - r elat ed t r anscr ipt ion f act or 3
hypot het ical pr ot ein RP1- 317E23
1p36. 13
579755A
52047C
44561C
614545C
hypot het ical pr ot ein FLJ22639
t um or necr osis f act or r ecept or super f am ily ,
calm odulin- like pr ot ein AF490905
act in r elat ed pr ot ein M 2
WD r epeat dom ain 8 pr ot ein
hypot het ical pr ot ein M O T8
phosphat idic acid phosphat ase t ype 2B
disabled hom olog 1
LO C115209
hypot het ical pr ot ein FLJ10986
bet a- am yloid binding pr ot ein pr ecur sor
ubiquit in specif ic pr ot ease 1
f or khead box D3
r ecept or t yr osine kinase- like or phan r ecept or 1
hypot het ical pr ot ein FLJ10770
lept in r ecept or
phosphodiest er ase 4B, cAM P- specif ic
UDP- glucur onic acid/ UDP- N- acet ylgalact osam ine
gr owt h ar r est and DNA- dam age- inducible, alpha
r et inal pigm ent epit helium - specif ic pr ot ein
densin- 180
pr ost aglandin E r ecept or 3 ( subt ype EP3)
sim ilar t o hypot het ical pr ot ein FLJ20156
1p31. 1
acyl- Coenzym e A dehydr ogenase, C- 4 t o C- 12
t r iosephosphat e isom er ase 1
adenylat e kinase 5
hypot het ical pr ot ein M G C27382
dead r inger - like 2
FKBP- associat ed pr ot ein isof or m FAP48
hypot het ical pr ot ein FLJ23033
KI AA0923 pr ot ein
B- cell CLL/ lym phom a 10
1p22. 3
1p21. 3
polypyr im idine t r act binding pr ot ein 2
PRO 0806 pr ot ein
1p21. 2
palm delphin
hypot het ical pr ot ein M G C14816
CG I - 30 pr ot ein
palm delphin
hypot het ical pr ot ein M G C14816
CG I - 30 pr ot ein
alpha 1 t ype XI collagen, isof or m B
am ylase, alpha 1A; salivar y
pr ot ein ar ginine N- m et hylt r ansf er ase 6
hypot het ical pr ot ein DKFZp586G 0123
synt axin 4 binding pr ot ein
guanine nucleot ide binding pr ot ein, alpha
pr okinet icin 1 pr ecur sor
hypot het ical pr ot ein FLJ22457
wingless- t ype M M TV int egr at ion sit e f am ily ,
put at ive hom eodom ain t r anscr ipt ion f act or
t r ipar t it e m ot if - cont aining 33 pr ot ein
t hyr oid st im ulat ing hor m one, bet a
nescient helix loop helix 2
CD58 ant igen, ( lym phocyt e f unct ion- associat ed
m annosidase, alpha, class 1A, m em ber 2
t r ypt ophanyl t RNA synt het ase 2 ( m it ochondr ial)
3- phosphoglycer at e dehydr ogenase
m elanom a ant igen
991936C
531058C
496143A
149727C
1696740A
149720A
1p13. 3
1229893C
61001C
47810G
54377G
1p13. 2
808082A
550755C
39555C
260508C
50314A
59536A
807769A
hypot het ical pr ot ein BM - 005
t r ansf or m ing gr owt h f act or , bet a r ecept or I I I
gr owt h f act or independent 1
down- r egulat or of t r anscr ipt ion 1
PTPL1- associat ed RhoG AP 1
calponin 3
sem aphor in W
1p21. 1
1p13. 1
alpha 1 t ype XI collagen, isof or m B
am ylase, alpha 1A; salivar y
pr ot ein ar ginine N- m et hylt r ansf er ase 6
hypot het ical pr ot ein DKFZp586G 0123
synt axin 4 binding pr ot ein
guanine nucleot ide binding pr ot ein, alpha
pr okinet icin 1 pr ecur sor
hypot het ical pr ot ein FLJ22457
wingless- t ype M M TV int egr at ion sit e f am ily ,
put at ive hom eodom ain t r anscr ipt ion f act or
t r ipar t it e m ot if - cont aining 33 pr ot ein
t hyr oid st im ulat ing hor m one, bet a
nescient helix loop helix 2
CD58 ant igen, ( lym phocyt e f unct ion- associat ed
m annosidase, alpha, class 1A, m em ber 2
1p12
1p11. 2
43572C
504548C
214248C
999827A
15353A
579755A
52047C
44561C
617899C
1181C
614545C
49062A
1221868A
550598A
662006A
66597C
t r ypt ophanyl t RNA synt het ase 2 ( m it ochondr ial)
3- phosphoglycer at e dehydr ogenase
m elanom a ant igen
LOH
739433A
341648C
44126A
59031C
1178529C
54968A
50805C
64386C
61295A
1517914G
914793C
1320782C
59196A
1482739A
559379C
567153A
615640G
57708A
41707A
65165A
986286C
839645A
412048A
806113A
84798C
597975A
745539A
992106G
605895C
1756447A
756037G
560683A
1546616A
488777C
1104565A
58017A
46075A
503703A
54694A
1132908A
618232A
834578A
932426G
43128A
610038C
559390C
596482C
61261A
54515A
831474A
600801C
1178724C
981501A
RET
251913A
48983A
563365A
56588A
596079C
1p22. 1
41738G
314019C
271683C
611980C
55639A
1598201G
547765A
1002025G
609828A
877193G
39533A
730553A
43379A
1p22. 2
985403A
polypyr im idine t r act binding pr ot ein 2
PRO 0806 pr ot ein
273278A
694296A
609730C
574502A
pr ot ein kinase C- like 2
T- cell act ivat ion leucine r epeat - r ich pr ot ein
563365A
56588A
596079C
hypot het ical pr ot ein BM - 005
t r ansf or m ing gr owt h f act or , bet a r ecept or I I I
gr owt h f act or independent 1
down- r egulat or of t r anscr ipt ion 1
PTPL1- associat ed RhoG AP 1
calponin 3
Type
Gender
Ploidy(numeric)
Contamination(numeric)
calcium act ivat ed chlor ide channel 2 pr ecur sor
LI M dom ain only 4
43379A
pr ot ein kinase C- like 2
T- cell act ivat ion leucine r epeat - r ich pr ot ein
1p12
1p11. 2
1p36. 31
730553A
calcium act ivat ed chlor ide channel 2 pr ecur sor
LI M dom ain only 4
sem aphor in W
1p21. 2
1p36. 32
273278A
694296A
609730C
574502A
Score
Chro 1
1p36. 33
1p36. 33
H1648
Chro 1
Score
H1648
dChipSNP: From SNP Call to LOH Call
985403A
41738G
314019C
271683C
611980C
55639A
1598201G
547765A
1002025G
609828A
877193G
39533A
991936C
531058C
496143A
149727C
1696740A
149720A
1229893C
61001C
47810G
54377G
808082A
550755C
39555C
260508C
50314A
59536A
807769A
1p11. 1
1p11. 1
Non-info
1q11
1q11
1q12
1q12
1q21. 1
1q21. 2
AB
1q21. 3
1q22
1q23. 1
1q23. 2
1q23. 3
1q24. 1
1q24. 2
1q24. 3
1q25. 1
1q25. 2
1q25. 3
m yom egalin
t hior edoxin int er act ing pr ot ein
pr ot ein kinase, AM P- act ivat ed, bet a 2
gap junct ion pr ot ein, alpha 8, 50kD ( connexin
hypot het ical pr ot ein DJ328E19. C1. 1
Fc f r agm ent of I gG , high af f init y I a, r ecept or
lecuine- r ich acidic pr ot ein- like pr ot ein
cat hepsin K ( pycnodysost osis)
cingulin
t um or - r elat ed pr ot ein
sm all pr oline- r ich pr ot ein 2B
t r anscr ipt ion r epr essor p66 com ponent of t he
pot assium int er m ediat e/ sm all conduct ance
hypot het ical pr ot ein ASH1
lam in A/ C
neur ot r ophic t yr osine kinase, r ecept or , t ype 1
Fc r ecept or - like pr ot ein 3
spect r in, alpha, er yt hr ocyt ic 1 ( ellipt ocyt osis
Fc f r agm ent of I gE, high af f init y I , r ecept or
PI G - M m annosylt r ansf er ase
nat ur al kiler cell r ecept or 2B4
Fc r ecept or hom olog expr essed in B cells
KI S
cell division cycle associat ed 1
pr e- B- cell leukem ia t r anscr ipt ion f act or 1
r et inoid X r ecept or , gam m a
hypot het ical pr ot ein BC014341
PO U dom ain, class 2, t r anscr ipt ion f act or 1
DKFZP564B167 pr ot ein
der m at opont in pr ecur sor
basic leucine zipper nuclear f act or 1 ( JEM - 1)
pair ed m esoder m hom eobox 1, isof or m pm x- 1b
m yocilin
LO C92346
t um or necr osis f act or ( ligand) super f am ily,
Kelch m ot if cont aining pr ot ein
G pr ot ein- coupled r ecept or 52
t enascin R ( r est r ict in, janusin)
r egulat or of G - pr ot ein signalling 18
r egulat or of G - pr ot ein signalling 13
1q31. 3
H f act or 1 ( com plem ent )
cr um bs hom olog 1
NI M A ( never in m it osis gene a) - r elat ed kinase 7
1q32. 3
1q41
1q42. 11
1q42. 12
1q42. 13
1q42. 2
1q42. 3
nuclear r ecept or subf am ily 5, gr oup A, m em ber 2
G pr ot ein- coupled r ecept or 25
t r oponin T2, car diac
HSPC150 pr ot ein sim ilar t o ubiquit in- conjugat ing
m yogenin
KI AA0663 gene pr oduct
gliom a am plif ied on chr om osom e 1 pr ot ein
hypot het ical pr ot ein FLJ10748
solut e car r ier f am ily 26, m em ber 9, isof or m b
I KK- r elat ed kinase epsilon
decay acceler at ing f act or f or com plem ent ( CD55,
hypot het ical pr ot ein FLJ11751
calcium / calm odulin- dependent pr ot ein kinase I G
hypot het ical pr ot ein FLJ10724
hypot het ical pr ot ein FLJ10876
DKFZP434B168 pr ot ein
act ivat ing t r anscr ipt ion f act or 3 delt a Zip
pr osper o- r elat ed hom eobox 1
cent r om er e pr ot ein F ( 350/ 400kD)
NY- REN- 45 ant igen
Usher syndr om e t ype I I a pr ot ein
hypot het ical pr ot ein FLJ10252
CG I - 115 pr ot ein
hypot het ical pr ot ein BC016711
hypot het ical pr ot ein DKFZp547M 236
M AP/ m icr ot ubule af f init y- r egulat ing kinase 1
dual specif icit y phosphat ase 10, isof or m b
hypot het ical pr ot ein FLJ13840
t oll- like r ecept or 5
t um or pr ot ein p53 binding pr ot ein, 2
hypot het ical pr ot ein M G C27277
hypot het ical pr ot ein FLJ10773
poly( ADP- r ibosyl) t r ansf er ase
chaper one, ABC1 act ivit y of bc1 com plex like
hypot het ical pr ot ein FLJ12517
r as hom olog gene f am ily , m em ber U
alpha 1 act in pr ecur sor
hypot het ical pr ot ein FLJ11413
hypot het ical pr ot ein FLJ22584
hypot het ical pr ot ein M G C13186
pot assium channel, subf am ily K, m em ber 1
TAR RNA binding pr ot ein 1
t r anslocase of out er m it ochondr ial m em br ane 20
nidogen ( enact in)
lect in, galact oside- binding, soluble, 8
zona pellucida B pr ot ein
1q43
27
1q21. 3
1423037A
1q22
1001781C
57446C
465623C
1393043C
60242C
1529565A
1q23. 2
611403A
69523A
995857A
834549A
568683C
863464C
41085C
586879A
414131A
55390A
48930C
66604A
1274923G
53953A
596140C
1q23. 1
1q23. 3
1q24. 1
1q24. 2
1q24. 3
1q25. 1
58799A
762925A
1q25. 2
601722C
617186A
172566A
535599A
817739C
600311C
52268A
44609C
56982A
598597C
600641A
63070C
442987C
54736A
244051C
986811G
548258A
267805A
1q25. 3
1233472A
619158A
576964A
H f act or 1 ( com plem ent )
cr um bs hom olog 1
NI M A ( never in m it osis gene a) - r elat ed kinase 7
97398A
187013A
1269830C
244230A
613997C
273100A
57606A
453914A
576563A
559268C
841000C
63183A
616449G
520459A
1q32. 2
1q32. 3
1q41
1q42. 11
1q42. 12
915140C
1306431A
363865C
994115G
998404A
41236C
54534A
239106A
75427G
613888C
592117C
832886C
503450A
KARP- 1- binding pr ot ein
50770A
617061A
53628G
53077A
42524A
818634A
1074445A
613127C
1003704C
272735C
1180703C
1q42. 13
1q42. 2
1q42. 3
nuclear r ecept or subf am ily 5, gr oup A, m em ber 2
G pr ot ein- coupled r ecept or 25
t r oponin T2, car diac
HSPC150 pr ot ein sim ilar t o ubiquit in- conjugat ing
m yogenin
KI AA0663 gene pr oduct
gliom a am plif ied on chr om osom e 1 pr ot ein
hypot het ical pr ot ein FLJ10748
solut e car r ier f am ily 26, m em ber 9, isof or m b
I KK- r elat ed kinase epsilon
decay acceler at ing f act or f or com plem ent ( CD55,
hypot het ical pr ot ein FLJ11751
calcium / calm odulin- dependent pr ot ein kinase I G
hypot het ical pr ot ein FLJ10724
hypot het ical pr ot ein FLJ10876
DKFZP434B168 pr ot ein
act ivat ing t r anscr ipt ion f act or 3 delt a Zip
pr osper o- r elat ed hom eobox 1
cent r om er e pr ot ein F ( 350/ 400kD)
NY- REN- 45 ant igen
Usher syndr om e t ype I I a pr ot ein
hypot het ical pr ot ein FLJ10252
CG I - 115 pr ot ein
hypot het ical pr ot ein BC016711
hypot het ical pr ot ein DKFZp547M 236
M AP/ m icr ot ubule af f init y- r egulat ing kinase 1
dual specif icit y phosphat ase 10, isof or m b
hypot het ical pr ot ein FLJ13840
t oll- like r ecept or 5
t um or pr ot ein p53 binding pr ot ein, 2
hypot het ical pr ot ein M G C27277
hypot het ical pr ot ein FLJ10773
poly( ADP- r ibosyl) t r ansf er ase
chaper one, ABC1 act ivit y of bc1 com plex like
hypot het ical pr ot ein FLJ12517
r as hom olog gene f am ily, m em ber U
alpha 1 act in pr ecur sor
hypot het ical pr ot ein FLJ11413
hypot het ical pr ot ein FLJ22584
hypot het ical pr ot ein M G C13186
pot assium channel, subf am ily K, m em ber 1
TAR RNA binding pr ot ein 1
t r anslocase of out er m it ochondr ial m em br ane 20
nidogen ( enact in)
lect in, galact oside- binding, soluble, 8
zona pellucida B pr ot ein
1q43
1004843C
1104487C
745095A
1423037A
1001781C
57446C
465623C
1393043C
60242C
1529565A
611403A
69523A
995857A
834549A
568683C
863464C
41085C
586879A
414131A
55390A
48930C
66604A
1274923G
53953A
596140C
58799A
762925A
601722C
617186A
172566A
535599A
817739C
600311C
52268A
44609C
56982A
598597C
600641A
63070C
442987C
54736A
244051C
986811G
548258A
267805A
589915G
1233472A
619158A
576964A
600155A
76295C
459946C
564052A
571816A
54386A
837185A
611390A
97398A
187013A
1269830C
244230A
613997C
273100A
57606A
453914A
576563A
559268C
841000C
63183A
616449G
520459A
915140C
1306431A
363865C
994115G
998404A
41236C
54534A
239106A
75427G
613888C
592117C
832886C
503450A
KARP- 1- binding pr ot ein
50770A
617061A
53628G
53077A
42524A
zinc f inger pr ot ein 238
hom eo box A7
hypot het ical pr ot ein FLJ10157
t r anscr ipt ion f act or B2, m it ochondr ial
hypot het ical pr ot ein M G C15548
BI A2 pr ot ein
hypot het ical pr ot ein FLJ22301
818634A
1074445A
613127C
1003704C
272735C
1180703C
choliner gic r ecept or , m uscar inic 3
hypot het ical pr ot ein FLJ21195 sim ilar t o pr ot ein
f um ar at e hydr at ase pr ecur sor
1q44
609777A
56038C
56926A
53936G
1689127C
582250A
522286C
536366A
r egulat or of G - pr ot ein signalling 18
r egulat or of G - pr ot ein signalling 13
1q31. 3
1q32. 1
45499C
57526A
r egucalcin gene pr om ot or r egion r elat ed pr ot ein
hypot het ical pr ot ein FLJ10244
hypot het ical pr ot ein FLJ25438
LI M hom eobox pr ot ein 4
synt axin 6
glut am at e- am m onia ligase ( glut am ine synt hase)
chr om osom e 1 open r eading f r am e 14
chr om osom e 1 open r eading f r am e 17
hypot het ical pr ot ein FLJ10083
NS1- binding pr ot ein
pr ot eoglycan 4
1q31. 2
600155A
76295C
459946C
564052A
571816A
54386A
837185A
611390A
m yom egalin
t hior edoxin int er act ing pr ot ein
pr ot ein kinase, AM P- act ivat ed, bet a 2
gap junct ion pr ot ein, alpha 8, 50kD ( connexin
hypot het ical pr ot ein DJ328E19. C1. 1
Fc f r agm ent of I gG , high af f init y I a, r ecept or
lecuine- r ich acidic pr ot ein- like pr ot ein
cat hepsin K ( pycnodysost osis)
cingulin
t um or - r elat ed pr ot ein
sm all pr oline- r ich pr ot ein 2B
t r anscr ipt ion r epr essor p66 com ponent of t he
pot assium int er m ediat e/ sm all conduct ance
hypot het ical pr ot ein ASH1
lam in A/ C
neur ot r ophic t yr osine kinase, r ecept or , t ype 1
Fc r ecept or - like pr ot ein 3
spect r in, alpha, er yt hr ocyt ic 1 ( ellipt ocyt osis
Fc f r agm ent of I gE, high af f init y I , r ecept or
PI G - M m annosylt r ansf er ase
nat ur al kiler cell r ecept or 2B4
Fc r ecept or hom olog expr essed in B cells
KI S
cell division cycle associat ed 1
pr e- B- cell leukem ia t r anscr ipt ion f act or 1
r et inoid X r ecept or , gam m a
hypot het ical pr ot ein BC014341
PO U dom ain, class 2, t r anscr ipt ion f act or 1
DKFZP564B167 pr ot ein
der m at opont in pr ecur sor
basic leucine zipper nuclear f act or 1 ( JEM - 1)
pair ed m esoder m hom eobox 1, isof or m pm x- 1b
m yocilin
LO C92346
t um or necr osis f act or ( ligand) super f am ily,
Kelch m ot if cont aining pr ot ein
G pr ot ein- coupled r ecept or 52
t enascin R ( r est r ict in, janusin)
1q31. 1
589915G
zinc f inger pr ot ein 238
hom eo box A7
hypot het ical pr ot ein FLJ10157
t r anscr ipt ion f act or B2, m it ochondr ial
hypot het ical pr ot ein M G C15548
BI A2 pr ot ein
hypot het ical pr ot ein FLJ22301
choliner gic r ecept or , m uscar inic 3
hypot het ical pr ot ein FLJ21195 sim ilar t o pr ot ein
f um ar at e hydr at ase pr ecur sor
1q44
1q21. 2
1004843C
1104487C
745095A
56926A
53936G
1689127C
582250A
522286C
536366A
1q31. 2
1q32. 2
609777A
56038C
57526A
r egucalcin gene pr om ot or r egion r elat ed pr ot ein
hypot het ical pr ot ein FLJ10244
hypot het ical pr ot ein FLJ25438
LI M hom eobox pr ot ein 4
synt axin 6
glut am at e- am m onia ligase ( glut am ine synt hase)
chr om osom e 1 open r eading f r am e 14
chr om osom e 1 open r eading f r am e 17
hypot het ical pr ot ein FLJ10083
NS1- binding pr ot ein
pr ot eoglycan 4
1q31. 1
1q32. 1
1q21. 1
45499C
Conflict
Example: bladder cancer data
28
Chro All
A A
M F
1
2
3
4
5
6
7
8
9
10
11
12
13
29
14
15
16
17
18
19
20
21
22
X
S S S S S
M M M M F
B
F
B B
M F
B
F
B B
M F
B
F
Score
H1648
H1395
H128
H2107
H2141
H2171
H289
HCC1187
HCC1007
HCC1143
HCC1599
HCC1937
HCC2218
HCC38
LOH Summary Score
Type
Gender
Ploidy(numeric)
Contamination(numeric)
• Ming Lin, et al. 2004
Bioinformatics.
– Look at LOH calls
within 14MB window
in all N patients

R( x)  

N
i 1
{ x  b  t  x  b}
Ci (t )
{ x b t  x  b}
Di (t )
{(Ci (t ), Di (t )),i  1,, N ,0  t  L}
Ci (t )  1, if LOSS; -1 if RET; 0, o.w
Di (t )  0, if Non-informative; 1, o.w
Problems With LOH Analysis
• LOH analyses require the comparison of tumor
genotype to its normal germline counterpart.
But for cell lines, leukemia samples, and
archival samples, paired normal DNA is often
unavailable.
Copy number analysis
•
•
•
•
30
Collect many normal samples, test SNP profile
Use normal to set standard as 2 copies of each chr
Compare with cancer sample SNP profile
Calculate copy number (aberrations) for each chr
region
SNPchip Copy Number Analysis
• Add A allele (PM-MM) and B allele (PM-MM)
signal together
– Give AA and AB comparable total signal
• Summarize 40 probes to one signal value for each
SNP marker
Sam pleSNPi signal
Raw copy num ber( SNPi ) 
2
Avg( All norm alsam pleSNPi signal)
• Refine copy number with HMM
31
SNPchip Copy Number Analysis
• Observed: raw copy number
• Hidden states: true copy number
{0,1,2,…17}
• Emission probability
( Raw #  MeanFold )
~ t ( 40 )
StdevFold
• Transition probability
– Prefer to stay in same state
• Run HMM to infer the best path
32
From LOH Call to Copy number
33
Limitations
• Aneuploidy in tumor samples
– Copy number is no longer integers because the median
copy number of each chip is normalized around 2.
– E.g. Triploid: 0 copy~0, 1 copy~0.67, 2 copy~1.33, 3
copy~2…
• Sample contamination
– E.g. 30% normal + 70% tumor
– 0 copy in tumor: 0.3*2 (normal cp) +0.7*0 (tumor)=0.3
– 1 copy in tumor: 0.3*2 (normal cp) +0.7*1 (tumor)=1.3
34
Sanger Cancer Genome Project
35
Summary
• Haplotype inference
– Clarks: resolve unambiguous first, propose new
haplotypes to maximize explanation
– EM & Gibbs: iteratively infer haplotype frequency and
individuals’ haplotypes
• Affymetrix SNPchip allows better disease
association studies
– Simultaneous genotype thousands of SNPs: call AA,
AB, BB
– Detect LOH: window sliding LOH score to summarize
LOH / RET / NoInfo / Conflict
– Detect copy number changes: use normal samples to set
standard 2 copies, run HMM on cancer samples to
obtain accurate copy numbers
36
Acknowledgement
• Cheng Li & Yuhyun Park
• Jun Liu & Tim Niu
• Kenneth Kidd, Judith Kidd and Glenys
Thomson
• Joel Hirschhorn
• Greg Gibson & Spencer Muse
37
Download