Bioinformatics Lectures at Rice Lecture 2: High throughput technologies in genomics By Li Zhang Microarrays •Biology: The biological problems •Technology: Microarray mechanism; experimental procedures •Statistical methods: data analysis, checking quality, exploration, discovery. Microarray technology • Microarray technology measure copy number of molecules in a mixture on a small slide. • Thousands or millions of different kinds of molecules can be measured simultaneously, thus creating large volumes of data per biological sample. • The molecules can be DNA, RNA or protein. Major types of microarrays • Two color short oligo arrays http://www.youtube.com/watch?v=VNsThMNjKhM&feature=related • Single color short oligo arrays Synthesized by photolithography: http://www.youtube.com/watch?v=ui4BOtwJEXs&feature=related (Eric Lander) • Bead arrays The experimental procedure to produce microarray data Affymetrix Gene expression Analysis Sample preparation protocol: RNA isolation cDNA synthesis cRNA synthesis Hybrdization Amplification Scan http://www.digizyme.com/competition/examples/genechip.html Targets of Microarray measurements • mRNA gene expression • SNP genotyping • DNA copy number (aneuploidy, chromosomal aberration,LOH) • DNA methylation • ChIP-chip. Protein-DNA binding site • Nucleosome binding site Some key aspects of microarray technology •Parellel. The technology is design to measure a larger number of different molecules. •Almost comprehensive. It can work for some or most of the molecules, but not for all, which will result in some missing data. •Noise and bias. The signals can be affected by unwanted source, e.g., cross-hybridization, which creates biases. Contamination also may have asymmetrical distribution. •Nonlinear response. Saturation causes non-linear behavior. •Evolving annotation. Identity of the molecules may change, reflecting new knowledge through time. •No units. The numbers are often on relative scale, which means the data have are not been calibrated. Next generation sequencing techniques Sequence by synthesis on an array • Illumina/SOLiD/454 Life sciences http://www.youtube.com/watch?v=g0vGrNjpyA8 (1.5 hr video, from a meeting in 2010) Illumina’s animation. (http://www.youtube.com/watch?v=l99aKKHcxC4&feature=related) (3 min) Solid’s animation. http://www.youtube.com/watch?v=nlvyF8bFDwM Complete Genomics ( Nanoball sequencing). Nano-ball of Complete Genomics Some key aspects of next generation sequencing technology • Compared with microarrays, NGS has less noise, no cross hybridization, and no saturation. • Bias remains a problem. Some sequences simply cannot be dealt with properly. These include high GC sequences, repeats, etc. • Mapping to the genome can be challenging. But paired-ends help a lot. • Biases partly come from PCR amplification, whose efficiency differ depending on the sequences. 3rd Generation sequencing • Single molecule, with no PCR amplification. • No fluorescence dyes, hence less reagent cost. • Longer sequences • Remaining problem: erratic base calling. Ion torrent (http://www.youtube.com/watch?v=yVf2295JqUg) Pacific Biosciences (http://www.youtube.com/watch?v=v8p4ph2MAvI) Nano-pores (http://www.youtube.com/watch?v=8kPfQNzR4FI&feature=results_main& playnext=1&list=PL0AC36A831CCB8690) Challenges ahead • Complexity of human diseases • Heterogeneity • Biological samples are fragile, subject to degradation, contamination. • Biases, batch effects, standards.