Haplotype Inference Stat 115 Outline • Haplotype inference and Clark’s algorithm • Haplotype inference using EM and Gibbs sampling • Hapmap project • Affymetrix SNP chip • SNP chip for LOH studies 2 Single Nucleotide Polymorphisms An illustration A1A1, A2B2, A3A3 A1B1, B2B2, B3B3 A1A1, B2B2, A3B3 A1B1, B2B2, A3B3 4 B1B1, B2B2, A3B3 A1B1, B2B2, A3B3 A1 B1 A1 B1 B2 B2 B2 B2 A3 B3 A3 B3 or A1 B1 B2 B2 B3 A3 Haplotype • Haplotype: cluster of SNPs with LD – Block with 10 SNPs has 210 possible haplotypes – Only observe 5-6 haplotypes (> 90% cases) – Tagging SNPs: subset of SNP to ID a haplotype • Association (with disease) studies using haplotype is more accurate than using single SNP genotype 5 Haplotype Inference • Genotyping only tells an individual is e.g. Aa BB Cc, but it doesn’t tell whether haplotype is: ABC + aBc, or ABc + aBC • Haplotype can often be inferred if parental genotype is known – Similar to blood typing, e.g. F: A, M: AB, C: B F: AO, M: AB, C: BO • Otherwise, look at the population genotypes, infer common haplotypes 6 Ambiguity of SNP-Based Haplotypes Population Frequencies of Ambiguous Individuals 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 pi=0.5 pi=0.4 pi=0.3 pi=0.2 pi=0.1 0 5 10 15 No. of Loci 7 Population Frequencies of Ambiguous Triads 20 25 pi=0.5 pi=0.4 pi=0.3 pi=0.2 pi=0.1 0 5 10 15 No. of Loci 20 25 Existing Computational Haplotype Reconstruction Approaches • Parsimony Approach Clark, 1990 • E-M Algorithm Excoffer & Slatkin, 1995; Chiano & Clayton, 1998; Hawley & Kidd, 1995; Long et al., 1995 • Pseudo-Gibbs Sampler Stephens et al., 2001 8 Haplotype Inference Clark’s Algorithm 1. 2. 3. 4. Construct haplotypes from unambiguous individuals Remove samples that can be explained as combinations of haplotypes discovered already Propose haplotype that would explain most remaining Iterate 2 & 3 until finish • Disadvantages: • • • 9 Depend on # of m ambiguous subjects Cannot get started when n is small Pr(failure to start) [1-1/(1+4N)-4N /(1+4N)2]n Statistical Model for Haplotype Haplotype T T T T T T T T T T T T T T T T A A A A C C C C C C G G C C G G Frequency C G C G C G C G ----------------- 1 2 3 4 5 6 7 8 Haplotype Pool 2 1 4 8 2 6 3 6 6 5 7 6 1 1 • Each individual’s two haplotypes are treated as random draws from a pool of haplotypes with certain frequencies that can satisfy the genotyping 10 Haplotype Inference EM and Gibbs Sampler • Observe genotype Y, estimate haplotype pair Z for each individual and haplotype frequency • Initialize haplotype frequencies • Iteration: – Estimate Z given Y, – Estimate given Y, Z 1 6 1 2 11 1 6 18 1 6 25 18 2 3 Gibbs Sampler • Pseudo-counts for is ( 1 , , m ) • Conditional distribution for Gibbs sampler: zi | , Y , i 1, ,n Update each person’s haplotypes g h P( zi1 g , zi 2 h | , Y ) gh | Z ,Y g h yi Haplotype Count in Z Prior for Update by a draw from its posterior P( | Z , Y ) Dirichlet(n( z) ) 12 Gibbs Sampler • Pseudo-counts for is ( 1 , , m ) • Conditional distribution for Gibbs sampler: zi | , Y , i 1, ,n Update each person’s haplotypes P( zi1 g , zi 2 h | Z[ i ] , Y ) | Z ,Y Haplotype Count in Z ˆgˆh g h yi Prior for Update by a draw from its posterior P( | Z , Y ) Dirichlet(n( z) ) 13 ˆg ˆh Example 1 6 1 2 1 6 18 1 6 25 18 2 3 Update individual 2’s phase by sampling from 14 EM and Gibbs Sampling in Motif Finding • Problem – Observe: sequence S – Unknown: motif θ and site location A (alignment), but given one, can infer the other • EM and Gibbs Sampler – Initialize random motif θ – Iterate: • Given θ and sequence S, update site location A • Given A and S, update θ – EM updates by weighted average – Gibbs sampling updates by sampling 15 Haplotype Inference Partition-Ligation • When #SNP is big, # possible haplotypes is too big, so divide and conquer – Consider an inferred sub-haplotype as one allele 16 Partition-Ligation 5 ’ 3 ’ L Level 3 Level 2 Level 1 Level 0 K “Block by block” L=K 17 2 faster “Piece by piece” Stephens et al., 2001 Hapmap of Human Genome • HapMap: catalog of common genetic variants in human – What are these variants – Where do they occur in our DNA – How are they distributed within populations and between populations around the world • Goals: – Define haplotype “blocks” across the genome – Identify reference set of SNPs: “tag” each haplotype – Enable unbiased, genome-wide association studies 18 Affymetrix GeneChip® Human Mapping 100K Set (2 Arrays, now 500K) Coverage 2.5kb 5.8kb 0.30 19 40 Probes Used Per SNP -4 -1 0 +1 +4 -4 -1 0 +1 +4 20 -4 -1 0 +1 +4 GeneChip Mapping Assay Overview SNP call for individuals at chr1 AA 21 BB AB Genome Aberration Studies • Karyotype: A picture of the chromosomes in a cell that is used to check for abnormalities in the chromosomes. • Often finding a cure for cancer starts with finding the genetic changes that cause a cell to grow wildly out of control. • Most often, these changes activate cancer-promoting genes (oncogenes) or inactivate cancer-squelching genes (tumor-suppressors). 22 Genome Aberration Study Technologies • ArrayCGH: Array Comparative Genome Hybridization – cDNA array with long probes (~MB) for each genomic region – Collect normal & cancer samples, differentially label, mix and hybridize – Check for regions duplicated or lost in cancer • Affymetrix SNP chip 23 SNP Chip for LOH • Loss of Heterozygosity: tumor suppressor gene inactivation by allelic loss in cancers Normal First genetic hit Cancer OR 24 T T XT T XT XT XT A B A B A A A LOH 26 A M A M Type Gender Ploidy(numeric) Contamination(numeric) 1p36. 32 1p36. 31 1p36. 23 1p36. 22 1p36. 21 pot assium volt age- gat ed channel, shaker - r elat ed vesicle- associat ed m em br ane pr ot ein 3 ar ginine- glut am ic acid dipept ide ( RE) r epeat s car bonic anhydr ase VI pr ecur sor m it ochondr ial car r ier pr ot ein M G C4399 kinesin f am ily m em ber 1B TAR DNA binding pr ot ein M AD2 hom olog shor t - chain dehydr ogenase/ r educt ase 1 lung t ype- I cell m em br ane- associat ed 1p35. 3 hypot het ical pr ot ein FLJ10199 SM ART/ HDAC1 associat ed r epr essor pr ot ein KI AA0445 gene pr oduct hypot het ical pr ot ein FLJ10521 pair ed box gene 7, isof or m 2 neur oblast om a, suppr ession of t um or igenicit y 1 hypot het ical pr ot ein FLJ32784 endot helin conver t ing enzym e 1 hepar an sulf at e pr ot eoglycan 2 ( per lecan) ephr in r ecept or EphA8 pr ecur sor up- r egulat ed in liver cancer 1 int er leukin 22 r ecept or r unt - r elat ed t r anscr ipt ion f act or 3 hypot het ical pr ot ein RP1- 317E23 1p35. 2 hypot het ical pr ot ein FLJ20045 ni t er f er on induced 6- 16 pr ot ein, isof or m c hypot het ical pr ot ein M G C16491 hypot het ical pr ot ein FLJ13171 1p36. 13 1p36. 12 AA hypot het ical pr ot ein FLJ22639 t um or necr osis f act or r ecept or super f am ily , calm odulin- like pr ot ein AF490905 act in r elat ed pr ot ein M 2 WD r epeat dom ain 8 pr ot ein hypot het ical pr ot ein M O T8 1p36. 11 1p35. 1 1p34. 3 1p34. 2 1p34. 1 1p33 1p32. 3 m at r iln 1, car t ilage m at r ix pr ot ein f at t y acid binding pr ot ein 3 hypot het ical pr ot ein M G C1203 hippocalcin HM G 2 like gap junct ion pr ot ein, bet a 5 ( connexin 31. 1) splicing f act or pr oline/ glut am ine r ich t ekt in 2 glut am at e r ecept or , ionot r opic, kainat e 3 hypot het ical pr ot ein FLJ23231 Rag C pr ot ein m icr of ilam ent and act in f ilam ent cr oss- linker hypot het ical pr ot ein FLJ14490 pot assium volt age- gat ed channel, KQ T- like hum an im m unodef iciency vir us t ype I hypot het ical pr ot ein DKFZp434N2435 EBNA1 binding pr ot ein 2 UDP- G al: bet aG lcNAc bet a 1, 4hypot het ical pr ot ein FLJ21156 hypot het ical pr ot ein SP192 M AP kinase- int er act ing ser ine/ t hr eonine kinase UM P- CM P kinase PRO 0529 pr ot ein ELAV ( em br yonic let hal, abnor m al vision, m esenchym al st em cell pr ot ein DSCD28 oxyst er ol- binding pr ot ein- like pr ot ein 9 or igin r ecognit ion com plex, subunit 1- like solut e car r ier f am ily 1 ( glut am at e t r anspor t er ) , hypot het ical pr ot ein FLJ10407 t et r at r icopept ide r epeat dom ain 4 1p32. 2 1p32. 1 1p31. 3 1p31. 2 phosphat idic acid phosphat ase t ype 2B disabled hom olog 1 LO C115209 hypot het ical pr ot ein FLJ10986 bet a- am yloid binding pr ot ein pr ecur sor ubiquit in specif ic pr ot ease 1 f or khead box D3 r ecept or t yr osine kinase- like or phan r ecept or 1 hypot het ical pr ot ein FLJ10770 lept in r ecept or phosphodiest er ase 4B, cAM P- specif ic UDP- glucur onic acid/ UDP- N- acet ylgalact osam ine gr owt h ar r est and DNA- dam age- inducible, alpha r et inal pigm ent epit helium - specif ic pr ot ein densin- 180 pr ost aglandin E r ecept or 3 ( subt ype EP3) sim ilar t o hypot het ical pr ot ein FLJ20156 1p31. 1 acyl- Coenzym e A dehydr ogenase, C- 4 t o C- 12 t r iosephosphat e isom er ase 1 adenylat e kinase 5 hypot het ical pr ot ein M G C27382 dead r inger - like 2 BB FKBP- associat ed pr ot ein isof or m FAP48 hypot het ical pr ot ein FLJ23033 KI AA0923 pr ot ein B- cell CLL/ lym phom a 10 1p22. 3 1p22. 2 1p22. 1 1p21. 3 1p21. 1 1p13. 3 1p13. 2 1p13. 1 1p36. 23 1p36. 22 43572C 504548C 214248C 999827A 15353A 1p36. 21 1p35. 2 hypot het ical pr ot ein FLJ20045 int er f er on induced 6- 16 pr ot ein, isof or m c hypot het ical pr ot ein M G C16491 hypot het ical pr ot ein FLJ13171 1p36. 11 1181C 49062A 550598A 662006A 66597C 739433A 341648C 44126A 59031C 1178529C 54968A 50805C 64386C 61295A 1517914G 1p35. 1 1p34. 3 1p34. 2 1p34. 1 914793C 1320782C 59196A 1p33 1482739A 559379C 567153A 615640G 57708A 41707A 65165A 986286C 839645A 412048A 806113A 84798C 597975A 745539A 992106G 605895C 1756447A 756037G 560683A 1546616A 488777C 1104565A 1p32. 3 251913A 48983A m at r iln 1, car t ilage m at r ix pr ot ein f at t y acid binding pr ot ein 3 hypot het ical pr ot ein M G C1203 hippocalcin HM G 2 like gap junct ion pr ot ein, bet a 5 ( connexin 31. 1) splicing f act or pr oline/ glut am ine r ich t ekt in 2 glut am at e r ecept or , ionot r opic, kainat e 3 hypot het ical pr ot ein FLJ23231 Rag C pr ot ein m icr of ilam ent and act in f ilam ent cr oss- linker hypot het ical pr ot ein FLJ14490 pot assium volt age- gat ed channel, KQ T- like hum an im m unodef iciency vir us t ype I hypot het ical pr ot ein DKFZp434N2435 EBNA1 binding pr ot ein 2 UDP- G al: bet aG lcNAc bet a 1, 4hypot het ical pr ot ein FLJ21156 hypot het ical pr ot ein SP192 M AP kinase- int er act ing ser ine/ t hr eonine kinase UM P- CM P kinase PRO 0529 pr ot ein ELAV ( em br yonic let hal, abnor m al vision, m esenchym al st em cell pr ot ein DSCD28 oxyst er ol- binding pr ot ein- like pr ot ein 9 or igin r ecognit ion com plex, subunit 1- like solut e car r ier f am ily 1 ( glut am at e t r anspor t er ) , hypot het ical pr ot ein FLJ10407 t et r at r icopept ide r epeat dom ain 4 1p32. 2 1p32. 1 1p31. 3 1p31. 2 58017A 46075A 503703A 54694A 1132908A 618232A 834578A 932426G 43128A 610038C 559390C 596482C 61261A 54515A 831474A 600801C 1178724C 981501A vesicle- associat ed m em br ane pr ot ein 3 ar ginine- glut am ic acid dipept ide ( RE) r epeat s car bonic anhydr ase VI pr ecur sor m it ochondr ial car r ier pr ot ein M G C4399 kinesin f am ily m em ber 1B TAR DNA binding pr ot ein M AD2 hom olog shor t - chain dehydr ogenase/ r educt ase 1 lung t ype- I cell m em br ane- associat ed 1p35. 3 1p36. 12 617899C 1221868A pot assium volt age- gat ed channel, shaker - r elat ed hypot het ical pr ot ein FLJ10199 SM ART/ HDAC1 associat ed r epr essor pr ot ein KI AA0445 gene pr oduct hypot het ical pr ot ein FLJ10521 pair ed box gene 7, isof or m 2 neur oblast om a, suppr ession of t um or igenicit y 1 hypot het ical pr ot ein FLJ32784 endot helin conver t ing enzym e 1 hepar an sulf at e pr ot eoglycan 2 ( per lecan) ephr in r ecept or EphA8 pr ecur sor up- r egulat ed in liver cancer 1 int er leukin 22 r ecept or r unt - r elat ed t r anscr ipt ion f act or 3 hypot het ical pr ot ein RP1- 317E23 1p36. 13 579755A 52047C 44561C 614545C hypot het ical pr ot ein FLJ22639 t um or necr osis f act or r ecept or super f am ily , calm odulin- like pr ot ein AF490905 act in r elat ed pr ot ein M 2 WD r epeat dom ain 8 pr ot ein hypot het ical pr ot ein M O T8 phosphat idic acid phosphat ase t ype 2B disabled hom olog 1 LO C115209 hypot het ical pr ot ein FLJ10986 bet a- am yloid binding pr ot ein pr ecur sor ubiquit in specif ic pr ot ease 1 f or khead box D3 r ecept or t yr osine kinase- like or phan r ecept or 1 hypot het ical pr ot ein FLJ10770 lept in r ecept or phosphodiest er ase 4B, cAM P- specif ic UDP- glucur onic acid/ UDP- N- acet ylgalact osam ine gr owt h ar r est and DNA- dam age- inducible, alpha r et inal pigm ent epit helium - specif ic pr ot ein densin- 180 pr ost aglandin E r ecept or 3 ( subt ype EP3) sim ilar t o hypot het ical pr ot ein FLJ20156 1p31. 1 acyl- Coenzym e A dehydr ogenase, C- 4 t o C- 12 t r iosephosphat e isom er ase 1 adenylat e kinase 5 hypot het ical pr ot ein M G C27382 dead r inger - like 2 FKBP- associat ed pr ot ein isof or m FAP48 hypot het ical pr ot ein FLJ23033 KI AA0923 pr ot ein B- cell CLL/ lym phom a 10 1p22. 3 1p21. 3 polypyr im idine t r act binding pr ot ein 2 PRO 0806 pr ot ein 1p21. 2 palm delphin hypot het ical pr ot ein M G C14816 CG I - 30 pr ot ein palm delphin hypot het ical pr ot ein M G C14816 CG I - 30 pr ot ein alpha 1 t ype XI collagen, isof or m B am ylase, alpha 1A; salivar y pr ot ein ar ginine N- m et hylt r ansf er ase 6 hypot het ical pr ot ein DKFZp586G 0123 synt axin 4 binding pr ot ein guanine nucleot ide binding pr ot ein, alpha pr okinet icin 1 pr ecur sor hypot het ical pr ot ein FLJ22457 wingless- t ype M M TV int egr at ion sit e f am ily , put at ive hom eodom ain t r anscr ipt ion f act or t r ipar t it e m ot if - cont aining 33 pr ot ein t hyr oid st im ulat ing hor m one, bet a nescient helix loop helix 2 CD58 ant igen, ( lym phocyt e f unct ion- associat ed m annosidase, alpha, class 1A, m em ber 2 t r ypt ophanyl t RNA synt het ase 2 ( m it ochondr ial) 3- phosphoglycer at e dehydr ogenase m elanom a ant igen 991936C 531058C 496143A 149727C 1696740A 149720A 1p13. 3 1229893C 61001C 47810G 54377G 1p13. 2 808082A 550755C 39555C 260508C 50314A 59536A 807769A hypot het ical pr ot ein BM - 005 t r ansf or m ing gr owt h f act or , bet a r ecept or I I I gr owt h f act or independent 1 down- r egulat or of t r anscr ipt ion 1 PTPL1- associat ed RhoG AP 1 calponin 3 sem aphor in W 1p21. 1 1p13. 1 alpha 1 t ype XI collagen, isof or m B am ylase, alpha 1A; salivar y pr ot ein ar ginine N- m et hylt r ansf er ase 6 hypot het ical pr ot ein DKFZp586G 0123 synt axin 4 binding pr ot ein guanine nucleot ide binding pr ot ein, alpha pr okinet icin 1 pr ecur sor hypot het ical pr ot ein FLJ22457 wingless- t ype M M TV int egr at ion sit e f am ily , put at ive hom eodom ain t r anscr ipt ion f act or t r ipar t it e m ot if - cont aining 33 pr ot ein t hyr oid st im ulat ing hor m one, bet a nescient helix loop helix 2 CD58 ant igen, ( lym phocyt e f unct ion- associat ed m annosidase, alpha, class 1A, m em ber 2 1p12 1p11. 2 43572C 504548C 214248C 999827A 15353A 579755A 52047C 44561C 617899C 1181C 614545C 49062A 1221868A 550598A 662006A 66597C t r ypt ophanyl t RNA synt het ase 2 ( m it ochondr ial) 3- phosphoglycer at e dehydr ogenase m elanom a ant igen LOH 739433A 341648C 44126A 59031C 1178529C 54968A 50805C 64386C 61295A 1517914G 914793C 1320782C 59196A 1482739A 559379C 567153A 615640G 57708A 41707A 65165A 986286C 839645A 412048A 806113A 84798C 597975A 745539A 992106G 605895C 1756447A 756037G 560683A 1546616A 488777C 1104565A 58017A 46075A 503703A 54694A 1132908A 618232A 834578A 932426G 43128A 610038C 559390C 596482C 61261A 54515A 831474A 600801C 1178724C 981501A RET 251913A 48983A 563365A 56588A 596079C 1p22. 1 41738G 314019C 271683C 611980C 55639A 1598201G 547765A 1002025G 609828A 877193G 39533A 730553A 43379A 1p22. 2 985403A polypyr im idine t r act binding pr ot ein 2 PRO 0806 pr ot ein 273278A 694296A 609730C 574502A pr ot ein kinase C- like 2 T- cell act ivat ion leucine r epeat - r ich pr ot ein 563365A 56588A 596079C hypot het ical pr ot ein BM - 005 t r ansf or m ing gr owt h f act or , bet a r ecept or I I I gr owt h f act or independent 1 down- r egulat or of t r anscr ipt ion 1 PTPL1- associat ed RhoG AP 1 calponin 3 Type Gender Ploidy(numeric) Contamination(numeric) calcium act ivat ed chlor ide channel 2 pr ecur sor LI M dom ain only 4 43379A pr ot ein kinase C- like 2 T- cell act ivat ion leucine r epeat - r ich pr ot ein 1p12 1p11. 2 1p36. 31 730553A calcium act ivat ed chlor ide channel 2 pr ecur sor LI M dom ain only 4 sem aphor in W 1p21. 2 1p36. 32 273278A 694296A 609730C 574502A Score Chro 1 1p36. 33 1p36. 33 H1648 Chro 1 Score H1648 dChipSNP: From SNP Call to LOH Call 985403A 41738G 314019C 271683C 611980C 55639A 1598201G 547765A 1002025G 609828A 877193G 39533A 991936C 531058C 496143A 149727C 1696740A 149720A 1229893C 61001C 47810G 54377G 808082A 550755C 39555C 260508C 50314A 59536A 807769A 1p11. 1 1p11. 1 Non-info 1q11 1q11 1q12 1q12 1q21. 1 1q21. 2 AB 1q21. 3 1q22 1q23. 1 1q23. 2 1q23. 3 1q24. 1 1q24. 2 1q24. 3 1q25. 1 1q25. 2 1q25. 3 m yom egalin t hior edoxin int er act ing pr ot ein pr ot ein kinase, AM P- act ivat ed, bet a 2 gap junct ion pr ot ein, alpha 8, 50kD ( connexin hypot het ical pr ot ein DJ328E19. C1. 1 Fc f r agm ent of I gG , high af f init y I a, r ecept or lecuine- r ich acidic pr ot ein- like pr ot ein cat hepsin K ( pycnodysost osis) cingulin t um or - r elat ed pr ot ein sm all pr oline- r ich pr ot ein 2B t r anscr ipt ion r epr essor p66 com ponent of t he pot assium int er m ediat e/ sm all conduct ance hypot het ical pr ot ein ASH1 lam in A/ C neur ot r ophic t yr osine kinase, r ecept or , t ype 1 Fc r ecept or - like pr ot ein 3 spect r in, alpha, er yt hr ocyt ic 1 ( ellipt ocyt osis Fc f r agm ent of I gE, high af f init y I , r ecept or PI G - M m annosylt r ansf er ase nat ur al kiler cell r ecept or 2B4 Fc r ecept or hom olog expr essed in B cells KI S cell division cycle associat ed 1 pr e- B- cell leukem ia t r anscr ipt ion f act or 1 r et inoid X r ecept or , gam m a hypot het ical pr ot ein BC014341 PO U dom ain, class 2, t r anscr ipt ion f act or 1 DKFZP564B167 pr ot ein der m at opont in pr ecur sor basic leucine zipper nuclear f act or 1 ( JEM - 1) pair ed m esoder m hom eobox 1, isof or m pm x- 1b m yocilin LO C92346 t um or necr osis f act or ( ligand) super f am ily, Kelch m ot if cont aining pr ot ein G pr ot ein- coupled r ecept or 52 t enascin R ( r est r ict in, janusin) r egulat or of G - pr ot ein signalling 18 r egulat or of G - pr ot ein signalling 13 1q31. 3 H f act or 1 ( com plem ent ) cr um bs hom olog 1 NI M A ( never in m it osis gene a) - r elat ed kinase 7 1q32. 3 1q41 1q42. 11 1q42. 12 1q42. 13 1q42. 2 1q42. 3 nuclear r ecept or subf am ily 5, gr oup A, m em ber 2 G pr ot ein- coupled r ecept or 25 t r oponin T2, car diac HSPC150 pr ot ein sim ilar t o ubiquit in- conjugat ing m yogenin KI AA0663 gene pr oduct gliom a am plif ied on chr om osom e 1 pr ot ein hypot het ical pr ot ein FLJ10748 solut e car r ier f am ily 26, m em ber 9, isof or m b I KK- r elat ed kinase epsilon decay acceler at ing f act or f or com plem ent ( CD55, hypot het ical pr ot ein FLJ11751 calcium / calm odulin- dependent pr ot ein kinase I G hypot het ical pr ot ein FLJ10724 hypot het ical pr ot ein FLJ10876 DKFZP434B168 pr ot ein act ivat ing t r anscr ipt ion f act or 3 delt a Zip pr osper o- r elat ed hom eobox 1 cent r om er e pr ot ein F ( 350/ 400kD) NY- REN- 45 ant igen Usher syndr om e t ype I I a pr ot ein hypot het ical pr ot ein FLJ10252 CG I - 115 pr ot ein hypot het ical pr ot ein BC016711 hypot het ical pr ot ein DKFZp547M 236 M AP/ m icr ot ubule af f init y- r egulat ing kinase 1 dual specif icit y phosphat ase 10, isof or m b hypot het ical pr ot ein FLJ13840 t oll- like r ecept or 5 t um or pr ot ein p53 binding pr ot ein, 2 hypot het ical pr ot ein M G C27277 hypot het ical pr ot ein FLJ10773 poly( ADP- r ibosyl) t r ansf er ase chaper one, ABC1 act ivit y of bc1 com plex like hypot het ical pr ot ein FLJ12517 r as hom olog gene f am ily , m em ber U alpha 1 act in pr ecur sor hypot het ical pr ot ein FLJ11413 hypot het ical pr ot ein FLJ22584 hypot het ical pr ot ein M G C13186 pot assium channel, subf am ily K, m em ber 1 TAR RNA binding pr ot ein 1 t r anslocase of out er m it ochondr ial m em br ane 20 nidogen ( enact in) lect in, galact oside- binding, soluble, 8 zona pellucida B pr ot ein 1q43 27 1q21. 3 1423037A 1q22 1001781C 57446C 465623C 1393043C 60242C 1529565A 1q23. 2 611403A 69523A 995857A 834549A 568683C 863464C 41085C 586879A 414131A 55390A 48930C 66604A 1274923G 53953A 596140C 1q23. 1 1q23. 3 1q24. 1 1q24. 2 1q24. 3 1q25. 1 58799A 762925A 1q25. 2 601722C 617186A 172566A 535599A 817739C 600311C 52268A 44609C 56982A 598597C 600641A 63070C 442987C 54736A 244051C 986811G 548258A 267805A 1q25. 3 1233472A 619158A 576964A H f act or 1 ( com plem ent ) cr um bs hom olog 1 NI M A ( never in m it osis gene a) - r elat ed kinase 7 97398A 187013A 1269830C 244230A 613997C 273100A 57606A 453914A 576563A 559268C 841000C 63183A 616449G 520459A 1q32. 2 1q32. 3 1q41 1q42. 11 1q42. 12 915140C 1306431A 363865C 994115G 998404A 41236C 54534A 239106A 75427G 613888C 592117C 832886C 503450A KARP- 1- binding pr ot ein 50770A 617061A 53628G 53077A 42524A 818634A 1074445A 613127C 1003704C 272735C 1180703C 1q42. 13 1q42. 2 1q42. 3 nuclear r ecept or subf am ily 5, gr oup A, m em ber 2 G pr ot ein- coupled r ecept or 25 t r oponin T2, car diac HSPC150 pr ot ein sim ilar t o ubiquit in- conjugat ing m yogenin KI AA0663 gene pr oduct gliom a am plif ied on chr om osom e 1 pr ot ein hypot het ical pr ot ein FLJ10748 solut e car r ier f am ily 26, m em ber 9, isof or m b I KK- r elat ed kinase epsilon decay acceler at ing f act or f or com plem ent ( CD55, hypot het ical pr ot ein FLJ11751 calcium / calm odulin- dependent pr ot ein kinase I G hypot het ical pr ot ein FLJ10724 hypot het ical pr ot ein FLJ10876 DKFZP434B168 pr ot ein act ivat ing t r anscr ipt ion f act or 3 delt a Zip pr osper o- r elat ed hom eobox 1 cent r om er e pr ot ein F ( 350/ 400kD) NY- REN- 45 ant igen Usher syndr om e t ype I I a pr ot ein hypot het ical pr ot ein FLJ10252 CG I - 115 pr ot ein hypot het ical pr ot ein BC016711 hypot het ical pr ot ein DKFZp547M 236 M AP/ m icr ot ubule af f init y- r egulat ing kinase 1 dual specif icit y phosphat ase 10, isof or m b hypot het ical pr ot ein FLJ13840 t oll- like r ecept or 5 t um or pr ot ein p53 binding pr ot ein, 2 hypot het ical pr ot ein M G C27277 hypot het ical pr ot ein FLJ10773 poly( ADP- r ibosyl) t r ansf er ase chaper one, ABC1 act ivit y of bc1 com plex like hypot het ical pr ot ein FLJ12517 r as hom olog gene f am ily, m em ber U alpha 1 act in pr ecur sor hypot het ical pr ot ein FLJ11413 hypot het ical pr ot ein FLJ22584 hypot het ical pr ot ein M G C13186 pot assium channel, subf am ily K, m em ber 1 TAR RNA binding pr ot ein 1 t r anslocase of out er m it ochondr ial m em br ane 20 nidogen ( enact in) lect in, galact oside- binding, soluble, 8 zona pellucida B pr ot ein 1q43 1004843C 1104487C 745095A 1423037A 1001781C 57446C 465623C 1393043C 60242C 1529565A 611403A 69523A 995857A 834549A 568683C 863464C 41085C 586879A 414131A 55390A 48930C 66604A 1274923G 53953A 596140C 58799A 762925A 601722C 617186A 172566A 535599A 817739C 600311C 52268A 44609C 56982A 598597C 600641A 63070C 442987C 54736A 244051C 986811G 548258A 267805A 589915G 1233472A 619158A 576964A 600155A 76295C 459946C 564052A 571816A 54386A 837185A 611390A 97398A 187013A 1269830C 244230A 613997C 273100A 57606A 453914A 576563A 559268C 841000C 63183A 616449G 520459A 915140C 1306431A 363865C 994115G 998404A 41236C 54534A 239106A 75427G 613888C 592117C 832886C 503450A KARP- 1- binding pr ot ein 50770A 617061A 53628G 53077A 42524A zinc f inger pr ot ein 238 hom eo box A7 hypot het ical pr ot ein FLJ10157 t r anscr ipt ion f act or B2, m it ochondr ial hypot het ical pr ot ein M G C15548 BI A2 pr ot ein hypot het ical pr ot ein FLJ22301 818634A 1074445A 613127C 1003704C 272735C 1180703C choliner gic r ecept or , m uscar inic 3 hypot het ical pr ot ein FLJ21195 sim ilar t o pr ot ein f um ar at e hydr at ase pr ecur sor 1q44 609777A 56038C 56926A 53936G 1689127C 582250A 522286C 536366A r egulat or of G - pr ot ein signalling 18 r egulat or of G - pr ot ein signalling 13 1q31. 3 1q32. 1 45499C 57526A r egucalcin gene pr om ot or r egion r elat ed pr ot ein hypot het ical pr ot ein FLJ10244 hypot het ical pr ot ein FLJ25438 LI M hom eobox pr ot ein 4 synt axin 6 glut am at e- am m onia ligase ( glut am ine synt hase) chr om osom e 1 open r eading f r am e 14 chr om osom e 1 open r eading f r am e 17 hypot het ical pr ot ein FLJ10083 NS1- binding pr ot ein pr ot eoglycan 4 1q31. 2 600155A 76295C 459946C 564052A 571816A 54386A 837185A 611390A m yom egalin t hior edoxin int er act ing pr ot ein pr ot ein kinase, AM P- act ivat ed, bet a 2 gap junct ion pr ot ein, alpha 8, 50kD ( connexin hypot het ical pr ot ein DJ328E19. C1. 1 Fc f r agm ent of I gG , high af f init y I a, r ecept or lecuine- r ich acidic pr ot ein- like pr ot ein cat hepsin K ( pycnodysost osis) cingulin t um or - r elat ed pr ot ein sm all pr oline- r ich pr ot ein 2B t r anscr ipt ion r epr essor p66 com ponent of t he pot assium int er m ediat e/ sm all conduct ance hypot het ical pr ot ein ASH1 lam in A/ C neur ot r ophic t yr osine kinase, r ecept or , t ype 1 Fc r ecept or - like pr ot ein 3 spect r in, alpha, er yt hr ocyt ic 1 ( ellipt ocyt osis Fc f r agm ent of I gE, high af f init y I , r ecept or PI G - M m annosylt r ansf er ase nat ur al kiler cell r ecept or 2B4 Fc r ecept or hom olog expr essed in B cells KI S cell division cycle associat ed 1 pr e- B- cell leukem ia t r anscr ipt ion f act or 1 r et inoid X r ecept or , gam m a hypot het ical pr ot ein BC014341 PO U dom ain, class 2, t r anscr ipt ion f act or 1 DKFZP564B167 pr ot ein der m at opont in pr ecur sor basic leucine zipper nuclear f act or 1 ( JEM - 1) pair ed m esoder m hom eobox 1, isof or m pm x- 1b m yocilin LO C92346 t um or necr osis f act or ( ligand) super f am ily, Kelch m ot if cont aining pr ot ein G pr ot ein- coupled r ecept or 52 t enascin R ( r est r ict in, janusin) 1q31. 1 589915G zinc f inger pr ot ein 238 hom eo box A7 hypot het ical pr ot ein FLJ10157 t r anscr ipt ion f act or B2, m it ochondr ial hypot het ical pr ot ein M G C15548 BI A2 pr ot ein hypot het ical pr ot ein FLJ22301 choliner gic r ecept or , m uscar inic 3 hypot het ical pr ot ein FLJ21195 sim ilar t o pr ot ein f um ar at e hydr at ase pr ecur sor 1q44 1q21. 2 1004843C 1104487C 745095A 56926A 53936G 1689127C 582250A 522286C 536366A 1q31. 2 1q32. 2 609777A 56038C 57526A r egucalcin gene pr om ot or r egion r elat ed pr ot ein hypot het ical pr ot ein FLJ10244 hypot het ical pr ot ein FLJ25438 LI M hom eobox pr ot ein 4 synt axin 6 glut am at e- am m onia ligase ( glut am ine synt hase) chr om osom e 1 open r eading f r am e 14 chr om osom e 1 open r eading f r am e 17 hypot het ical pr ot ein FLJ10083 NS1- binding pr ot ein pr ot eoglycan 4 1q31. 1 1q32. 1 1q21. 1 45499C Conflict Example: bladder cancer data 28 Chro All A A M F 1 2 3 4 5 6 7 8 9 10 11 12 13 29 14 15 16 17 18 19 20 21 22 X S S S S S M M M M F B F B B M F B F B B M F B F Score H1648 H1395 H128 H2107 H2141 H2171 H289 HCC1187 HCC1007 HCC1143 HCC1599 HCC1937 HCC2218 HCC38 LOH Summary Score Type Gender Ploidy(numeric) Contamination(numeric) • Ming Lin, et al. 2004 Bioinformatics. – Look at LOH calls within 14MB window in all N patients R( x) N i 1 { x b t x b} Ci (t ) { x b t x b} Di (t ) {(Ci (t ), Di (t )),i 1,, N ,0 t L} Ci (t ) 1, if LOSS; -1 if RET; 0, o.w Di (t ) 0, if Non-informative; 1, o.w Problems With LOH Analysis • LOH analyses require the comparison of tumor genotype to its normal germline counterpart. But for cell lines, leukemia samples, and archival samples, paired normal DNA is often unavailable. Copy number analysis • • • • 30 Collect many normal samples, test SNP profile Use normal to set standard as 2 copies of each chr Compare with cancer sample SNP profile Calculate copy number (aberrations) for each chr region SNPchip Copy Number Analysis • Add A allele (PM-MM) and B allele (PM-MM) signal together – Give AA and AB comparable total signal • Summarize 40 probes to one signal value for each SNP marker Sam pleSNPi signal Raw copy num ber( SNPi ) 2 Avg( All norm alsam pleSNPi signal) • Refine copy number with HMM 31 SNPchip Copy Number Analysis • Observed: raw copy number • Hidden states: true copy number {0,1,2,…17} • Emission probability ( Raw # MeanFold ) ~ t ( 40 ) StdevFold • Transition probability – Prefer to stay in same state • Run HMM to infer the best path 32 From LOH Call to Copy number 33 Limitations • Aneuploidy in tumor samples – Copy number is no longer integers because the median copy number of each chip is normalized around 2. – E.g. Triploid: 0 copy~0, 1 copy~0.67, 2 copy~1.33, 3 copy~2… • Sample contamination – E.g. 30% normal + 70% tumor – 0 copy in tumor: 0.3*2 (normal cp) +0.7*0 (tumor)=0.3 – 1 copy in tumor: 0.3*2 (normal cp) +0.7*1 (tumor)=1.3 34 Sanger Cancer Genome Project 35 Summary • Haplotype inference – Clarks: resolve unambiguous first, propose new haplotypes to maximize explanation – EM & Gibbs: iteratively infer haplotype frequency and individuals’ haplotypes • Affymetrix SNPchip allows better disease association studies – Simultaneous genotype thousands of SNPs: call AA, AB, BB – Detect LOH: window sliding LOH score to summarize LOH / RET / NoInfo / Conflict – Detect copy number changes: use normal samples to set standard 2 copies, run HMM on cancer samples to obtain accurate copy numbers 36 Acknowledgement • Cheng Li & Yuhyun Park • Jun Liu & Tim Niu • Kenneth Kidd, Judith Kidd and Glenys Thomson • Joel Hirschhorn • Greg Gibson & Spencer Muse 37