Bioinformatics Lectures at Rice

advertisement
Bioinformatics Lectures at Rice
Lecture 2: High throughput
technologies in genomics
By Li Zhang
Microarrays
•Biology: The biological problems
•Technology: Microarray mechanism;
experimental procedures
•Statistical methods: data analysis, checking
quality, exploration, discovery.
Microarray technology
• Microarray technology measure copy
number of molecules in a mixture on a small
slide.
• Thousands or millions of different kinds of
molecules can be measured simultaneously,
thus creating large volumes of data per
biological sample.
• The molecules can be DNA, RNA or protein.
Major types of microarrays
• Two color short oligo arrays
http://www.youtube.com/watch?v=VNsThMNjKhM&feature=related
• Single color short oligo arrays
Synthesized by photolithography:
http://www.youtube.com/watch?v=ui4BOtwJEXs&feature=related (Eric Lander)
• Bead arrays
The experimental procedure to
produce microarray data
Affymetrix Gene expression Analysis Sample
preparation protocol:
RNA isolation
cDNA synthesis
cRNA synthesis
Hybrdization
Amplification
Scan
http://www.digizyme.com/competition/examples/genechip.html
Targets of Microarray measurements
• mRNA gene expression
• SNP genotyping
• DNA copy number (aneuploidy, chromosomal
aberration,LOH)
• DNA methylation
• ChIP-chip. Protein-DNA binding site
• Nucleosome binding site
Some key aspects of microarray technology
•Parellel. The technology is design to measure a larger number of different
molecules.
•Almost comprehensive. It can work for some or most of the molecules,
but not for all, which will result in some missing data.
•Noise and bias. The signals can be affected by unwanted source, e.g.,
cross-hybridization, which creates biases. Contamination also may have
asymmetrical distribution.
•Nonlinear response. Saturation causes non-linear behavior.
•Evolving annotation. Identity of the molecules may change, reflecting new
knowledge through time.
•No units. The numbers are often on relative scale, which means the data
have are not been calibrated.
Next generation sequencing
techniques
Sequence by synthesis on an array
• Illumina/SOLiD/454 Life sciences
http://www.youtube.com/watch?v=g0vGrNjpyA8 (1.5 hr video,
from a meeting in 2010)
Illumina’s animation.
(http://www.youtube.com/watch?v=l99aKKHcxC4&feature=related) (3
min)
Solid’s animation.
http://www.youtube.com/watch?v=nlvyF8bFDwM
Complete Genomics ( Nanoball sequencing).
Nano-ball of Complete Genomics
Some key aspects of next generation
sequencing technology
• Compared with microarrays, NGS has less noise,
no cross hybridization, and no saturation.
• Bias remains a problem. Some sequences simply
cannot be dealt with properly. These include high
GC sequences, repeats, etc.
• Mapping to the genome can be challenging. But
paired-ends help a lot.
• Biases partly come from PCR amplification,
whose efficiency differ depending on the
sequences.
3rd Generation sequencing
• Single molecule, with no PCR amplification.
• No fluorescence dyes, hence less reagent cost.
• Longer sequences
• Remaining problem: erratic base calling.
Ion torrent (http://www.youtube.com/watch?v=yVf2295JqUg)
Pacific Biosciences
(http://www.youtube.com/watch?v=v8p4ph2MAvI)
Nano-pores
(http://www.youtube.com/watch?v=8kPfQNzR4FI&feature=results_main&
playnext=1&list=PL0AC36A831CCB8690)
Challenges ahead
• Complexity of human diseases
• Heterogeneity
• Biological samples are fragile, subject to
degradation, contamination.
• Biases, batch effects, standards.
Download