Joshua Wu SID 11174269 CS6325 Introduction to Bioinformatics Homework Assignment II Due on April 8th, 2008 Please type the answers to the questions 1-4. 1. What is the main principle of Microarray? The main principles of DNA microarrays has the following 4 steps. A. Preparation of the samples The first step of the process is done by preparing samples from the genomic source. Then the PCR is used to amplify the selected DNA. B. Construction of the arrays There are three main methods of constructing the array. Spotting of DNA fragments directly onto the slide Arraying of prefabricated oligonucleotides In-situ synthesis of oligonucleotides, done on the chip C. Preparation of the probes Fluorescent probes are prepared which hybridize to the microarray. The probes are prepared from messenger RNA from the cells or tissues of interest. Then the mRNA is extracted which makes it easier by identifying a key property by which the mRNA can be isolated. Then it is converted into DNA by the use of reverse transcriptase enzyme. During this reaction, the DNA is labeled by the incorporation of the fleorescent or radioactive nucleotides into the DNA. The two samples are labeled using red and green fuorescent dyes. Then the labeled DNA is then hybridized to the microarray slide. D. Hybridization It is the reaction that occurs between the fluorescent probes and the DNA on the microarray. It depends on the application of the array Detecting mutations: requires high hybridization stringencies: lower salt concentration and higher temperature, over short time periods (hours) Expression monitoring: requires lower stringencies, to ensure low-copy number sequences anneal: overnight hybridizations with lower temperatures and higher salt concentrations. After hybridization, the microarray is scanned using a laser confocal scanning microscope, which illuminates each spot of the DNA and separately measures the fluorescence for each dye. This produces data to determine the ratio, and in turn the relative abundance of the sequence of each specific gene in the messenger RNA or DNA samples. The pattern can Joshua Wu SID 11174269 be used to identify the genes that are expressed differently in the tissues or cells. 2. What are the three main methods for constructing arrays? Spotting of DNA fragments directly onto the slide Arraying of prefabricated oligonucleotides In-situ synthesis of oligonucleotides, done on the chip 3. Briefly describe two main types of microarray; cDNA array—cDNA clones are PCR amplified and the PCR products are printed on to slides with micropins. The clones are verified by sequencing. cDNA array is hybridized with Cy-dye labeled sample and identical control for the same set of experiments. Data from different sets of experiments are difficult to compare because different controls are used. Although frequently called cDNA array, genomic DNA can also be PCR amplified and printed to produce genomic DNA arrays. Oligonucleotide arrays—Synthetic oligos may be printed using a similar process for cDNA array printing. Compared to Affy arrays, it is flexible and longer oligos may be used. Spotted oligo array is frequently hybridized with a mix of sample and control. 4. What are the three types of experimental replicates? Biological replicates—use different cell cultures prepared in parallel Technical replicates: use one cell culture, first processed and then split just before hybridization Sample replicates: use one cell culture, first split and then processed. 5. Give a list of genes which are ranked based on their p-values. Using false Discovery Rate (FDR) to select the informative genes, assuming the FDR is 5% (α= 5%), total number of genes is 1000. Gene Rank p-value Rank (jα)/m p-value 1 0.00005 0.000006 2 0.00010 0.000008 3 0.00015 0.0000011 Joshua Wu SID 11174269 4 0.00020 0.0000020 5 0.00025 0.0000040 6 0.00030 0.000060 7 0.00035 0.0000980 8 0.00040 0.0001003 9 0.00045 0.0002456 10 0.00050 0.0007098 11 0.00055 0.001120 12 0.00060 0.001190 13 0.00065 0.002310 14 0.00070 0.002318 15 0.00075 0.002401 Joshua Wu SID 11174269 6. Suppose a biologist conducted a microarray experiment to test the gene expression patterns of normal persons and cancer patients. The experimental results are listed in the following table. Based on this information, using Naïve Bayesian classifier to predict if a person, whose gene expression pattern is G1 > 20; G2>200; and G3>5, is a cancer patient or not? Person ID G1 G2 G3 Cancer? 1 <10 100 … 200 >5 Yes 2 <10 100 … 200 >5 No 3 <10 100 … 200 >5 Yes 4 10 … 20 100 … 200 >5 No 5 10 … 20 100 … 200 <5 Yes 6 10 … 20 >200 <5 No 7 10 … 20 >200 <5 Yes 8 10 … 20 >200 >5 No 9 >20 >200 <5 No 10 >20 100 … 200 <5 Yes Class: C1: has_cancer = ‘yes’ C2: has_cancer = ‘no’ Data sample: X = ( G1 > 20, G2 > 200, G3 > 5) P(C) : P(has_cancer = “yes”) = 5/10=0.5 P(has_cancer = “no”) = 5/10 = 0.5 P(G1 = “>20” | has_cancer = “yes”) = 1/5 = 0.2 P(G1 = “>20” | has_cancer = “no”) = 1/5 = 0.2 P(G2 = “>200” | has_cancer = “yes”) = 1/5 = 0.2 P(G2 = “>200” | has_cancer = “no”) = 3/5 = 0.6 P(G3 = “>5” | has_cancer = “yes”)=2/5 = 0.4 P(G3 = “>5” | has_cancer = “no”)=3/5 = 0.6 P(X|C): P(X|has_cancer = “yes”) = 0.2*0.2*0.4=0.016 P(X|has_cancer = “no”) = 0.2*0.6*0.6=0.072 P(X|C)*P(C) : P(X|has_cancer=”yes”)*P(has_cancer=”yes”)=0.5*0.016=0.008 P(X|has_cancer=”no”)*P(has_cancer=”no”)=0.5*0.072=0.036 Therefore, under the given condition, the patient most likely doesn’t have cancer.