Intro to Microarray Analysis Courtesy of Professor Dan Nettleton Iowa State University (with some edits) Some Basic Biology Genes are DNA sequences that code for proteins. (e.g. gene lengths perhaps 1000 basepairs to 2.5 million basepairs) An organism is made by its proteins. Genes Proteins Organism Complementary Base Pairing Each DNA nucleotide has one of four bases (A, T, C, G). Each messenger RNA (mRNA) nucleotide has one of four bases (A,U, C, or G). Complementary Base Pairing: DNA A T C G <-> <-> <-> <-> mRNA U A G C Transcription and Translation mRNA “reads” the DNA template. A sequence of three mRNA nucleotides code for an amino acid. DNA template sequence: CGTAAGACA... (transcription) mRNA sequence: GCAUUCUGU... (translation) protein sequence: alanine phenylalanine cysteine ... Microarrays Measure mRNA Abundance in Cells At any point in time, cells of living organisms contain mRNA sequences waiting to be converted into proteins. In a microarray experiment, we extract mRNA from biological samples and use microarrays to measure how much of each mRNA sequence is present in each sample. Why Do Microarray Experiments? We have the gene sequences. What are their functions? If we can learn how a gene’s level of activity changes across varying conditions, we gain clues about its function. Some Examples from Iowa State Jim Reecy from Animal Science: muscle undergoing hypertrophy (muscle building) vs. stable muscle Anne Bronikowski in Genetics: wheel-running mice vs. non-runners Roger Wise, Rico Caldo in Plant Pathology: interaction between multiple isolates of powdery mildew fungus and multiple genotypes of barley. Affymetrix GeneChips Affymetrix is a company that manufactures GeneChips. Short sequences representing small pieces of genes of interest are synthetically assembled and attached to a GeneChip. Probe sequences are chosen to have good and relatively uniform hybridization (‘binding’) characteristics. A probe is chosen to match a portion of its target mRNA transcript that is unique to that sequence. Simplified Example ... gene 1 ... oligo probe for gene 1 ATTACTAAGCATAGATTGCCGTATA ...gene 2 shared blue regions indicate high degree of sequence similarity throughout much of the transcript ... GCGTATGGCATGCCCGGTAAACTGG oligo probe for gene 2 9 Affymetrix GeneChips Each gene (more accurately `sequence of interest’) is represented by many short oligo probes. Each short oligo probe is made-up of 25 nucleotides. Thousands of these probes are placed together on a small chip called a GeneChip. How is this used to measure a sample’s mRNA? Affymetrix GeneChips Only one sample is placed on each GeneChip. The mRNA that has been extracted from a biological sample can be labeled (dyed) and hybridized to a GeneChip. During hybridization the mRNA strands from the sample bind to their respective complementary oligo probe. Expression Measures Scanning of an Affymetrix GeneChip yields one intensity value for each probe (cell). A high intensity value for a probe (cell) implies that many sequences from the biological sample were able to bind to the sequences in the probe (cell). There is concern that some of the mRNA that binds to a particular probe should not really be there (considered a mistake or non-specific binding). To try and measure this ‘background noise’, for each perfectly created probe, a mismatch probe is also created and used. A Probe Set for Measuring Expression Level of a Particular Gene gene sequence ...TGCAATGGGTCAGAAGGACTCCTATGTGCCT... perfect match sequence AATGGGTCAGAAGGACTCCTATGTG mismatch sequence AATGGGTCAGAACGACTCCTATGTG probe pair probe cell probe set (11 probe pairs representing a gene) 13 Different Probe Pairs Represent Different Parts of the Same Gene gene sequence Probes are selected to be specific to the target gene and have good hybridization characteristics. 14 Affymetrix GeneChips Fluorescence coming from the squares (probes) tells researchers whether a gene is greatly expressed (white and red features) or not (blue and black features). (credit: Affymetrix) Expression Measures For each probe set (i.e., gene) on a GeneChip, it is often desirable to summarize the probe cell intensities with one number that serves as a measure of the expression of the gene in the biological sample whose RNA was hybridized to the GeneChip. There are a number of approaches to this summarization procedure (e.g. RMA). Some take the ‘background noise’ into account, while others do not. Statistical Analysis Use statistical methods to summarize the expression values on each chip, i.e. get a single expression value for each gene. Use statistical methods to normalize the expression values, i.e. try to remove variation due to technological sources. (Some procedures do summarization and normalization simultaneously). Perform a classical statistical analysis (ANOVA, t-test, etc) on a gene-by-gene basis. Account for multiple testing, and provide a list of `interesting’ genes with an estimated False Discovery Rate (FDR). Microarray facility at U-Iowa University of Iowa Carver College of Medicine Holden Comprehensive Cancer Center http://dna-9.int-med.uiowa.edu/?q=node/12