Tiling Arrays Ho-Ryun Chung enhancer ... promotor exon intron exon DNA -Gene exon pre-mRNA Transcription exon intron Capping Splicing Polyadenylation Where are the exons of a transcript? cap exon exon AAAAAAA mRNA Nuclear export cap exon exon AAAAAAA Translation Protein enhancer ... promotor exon intron exon DNA -Gene exon pre-mRNA Transcription exon intron Capping Splicing Polyadenylation Where do proteins bind? cap exon exon AAAAAAA mRNA Nuclear export cap exon exon AAAAAAA Translation Protein Sequence alignment PCR Sequencing Microarray Pattern recognition Chromatin immunoprecipitation Gene prediction Clustering the functional genomics toolbox RNA immunoprecipitation Mass spectrometry Network inference Yeast two hybrid Modeling and simulation Tandem affinity purification In situ hybridization Immunofluorescence ... tiling array enhancer ... promotor exon intron exon Genome Probes ... tiling array enhancer ... promotor exon intron exon Genome Probes RNA ... tiling array enhancer ... promotor exon intron exon Genome Probes Signal ... tiling array enhancer ... promotor exon intron exon Genome Probes Chromatin immunoprecipitated DNA ... tiling array enhancer ... promotor exon intron exon Genome Probes Signal Sequence alignment PCR Sequencing Microarray Pattern recognition Chromatin immunoprecipitation Gene prediction Clustering the functional genomics toolbox RNA immunoprecipitation Mass spectrometry Network inference Yeast two hybrid Modeling and simulation Tandem affinity purification In situ hybridization Immunofluorescence ... Chromatin nucleosome beads on the string 30 nm fiber ... indentify protein-bound regions ? ... indentify protein-bound regions ? but how? ... 1 make sure the protein sticks to the DNA crosslinking – use formaldehyde ... 2 break chromatin into small pieces fragmentation – use sound = sonification ... 3 fish for the protein of interest Magnetic bead immunoprecipitation – use antibody ... 4 remove unbound stuff Magnetic bead washing ... 5 get unbound stuff ... 6 reverse crosslinks ... 7 isolate DNA ... 8 amplify & label DNA ... 9 hybridize to tiling array ... Chromatin Immunoprecipitation = ChIP an approach to enrich DNA fragments bound by the protein of interest • frequency of protein- binding at a site • efficiency of the antibody fold enrichment – some unbound DNA is also isolated • specificity of the antibody ... enrichment versus selection bound regions selection 100% bound regions e.g. 1% Genome enrichment factor 100 bound regions Genome Genome 1% x 100 = 100 99% x 1 = 99 50% 50% ... enriched DNA fragments array-specific background probe-specific additive noise measured intensity of probe k on array i array-specific gain factor xk for most of the probes more or less uniform -> can we fit bi bk? Yik = ai + εk + bi bk xk exp(ηik) probe-specific gain factor DNA abundance hybridization efficiency efficiency of ChIP efficiency of amplification labeling probe- & array-specific multiplicative noise Intensity ... the nature of the signal Tiling array probes Probe 1 Probe 2 different concentrations of samples or different hybridization properties? ... probe-specific hybridization properties remove bias due to amplification and labeling as well as probe specific behavior Iexperiment(i) Snorm(i) = I control(i) BUT I(i) = specific hybridization + genome probe i cross-hybridizations ... probe-specific hybridization properties remove bias due to amplification and labeling as well as probe specific behavior Iexperiment(i) Snorm(i) = I control(i) BUT I(i) = specific hybridization + cross-hybridizations genome probe i Most probes have constant specific hybridization ... affinity measure ΚU for cross hybridizations Calculate the contribution Ks of each duplex: Ks = exp[-∆Gs] s Sum the contributions Ks of all duplexes: KU = ΣK s × fs sample probe ... calculate ΚU as a sum of all duplexes Dynamic programming R(j) = [R(j – 1) × elongation(j) + start(j)] R(j – 1) R(j) ... calculate ΚU as a sum of all duplexes Dynamic programming R(j) = [R(j – 1) × elongation(j) + start(j)] KU(j) = KU(j – 1) + R(j) × end(j) R(j – 1) R(j) R(j) × end(j) ... measured intensities as function of ΚU ln(Intensity) Pearson correlation r = 0.63 linear regression ln(I) = α + β ln(ΚU) slope β intercept α ln(KU) ln(Intensity) ... intensity as score of evidence? evidence? threshold on intensity ln(affinity measure) ... alternative score of evidence ln(Intensity) Score of evidence ^ S(i) = {ln[I(i)] – ln[I(i)]} threshold signal above background threshold on intensity ln(affinity measure) ... other approaches – MAT Model-based analysis of tiling arrays ... other approaches – MAT Model-based analysis of tiling arrays for affymetrix tiling-arrays ... other approaches – MA2C Model-based analysis of two-color arrays mean Individual channel variance co-variance ... other approaches – SNN standard normal normalization log( I i ) − log( I i ) xˆi = sd(log( I i ) ) ... peak-calling ... smooth the data running average ... smooth the data running average problem: outliers ... smooth the data running average – trimmed mean, discard x % lowest and highest ... smooth the data running average – trimmed mean, discard x % lowest and highest problem: variable number of probes in a window of given length ... smooth the data upweight windows with more probes trimmed mean xˆ = n p TM ( x) number of probes in window ... find the peak summits how would you do this? ... does it help to account for probe-specific biases??? ... does it help to account for probe-specific biases??? Affymetrix Agilent NimbleGen better ... does it help to account for probe-specific biases??? not really better ... the control removes probe-specific biases, but only if properly scaled