HW2

advertisement
Joshua Wu SID 11174269
CS6325 Introduction to Bioinformatics
Homework Assignment II
Due on April 8th, 2008
Please type the answers to the questions 1-4.
1. What is the main principle of Microarray?
The main principles of DNA microarrays has the following 4 steps.
A. Preparation of the samples
The first step of the process is done by preparing samples from the genomic
source. Then the PCR is used to amplify the selected DNA.
B. Construction of the arrays
There are three main methods of constructing the array.



Spotting of DNA fragments directly onto the slide
Arraying of prefabricated oligonucleotides
In-situ synthesis of oligonucleotides, done on the chip
C. Preparation of the probes
Fluorescent probes are prepared which hybridize to the microarray. The
probes are prepared from messenger RNA from the cells or tissues of
interest. Then the mRNA is extracted which makes it easier by identifying a
key property by which the mRNA can be isolated. Then it is converted into
DNA by the use of reverse transcriptase enzyme. During this reaction, the
DNA is labeled by the incorporation of the fleorescent or radioactive
nucleotides into the DNA. The two samples are labeled using red and green
fuorescent dyes. Then the labeled DNA is then hybridized to the microarray
slide.
D. Hybridization
It is the reaction that occurs between the fluorescent probes and the DNA on
the microarray. It depends on the application of the array
Detecting mutations: requires high hybridization stringencies:
lower salt concentration and higher temperature, over short time
periods (hours)
Expression monitoring: requires lower stringencies, to ensure
low-copy number sequences anneal: overnight hybridizations with
lower temperatures and higher salt concentrations.
After hybridization, the microarray is scanned using a laser confocal
scanning microscope, which illuminates each spot of the DNA and
separately measures the fluorescence for each dye. This produces data to
determine the ratio, and in turn the relative abundance of the sequence of
each specific gene in the messenger RNA or DNA samples. The pattern can
Joshua Wu SID 11174269
be used to identify the genes that are expressed differently in the tissues or
cells.
2. What are the three main methods for constructing arrays?
 Spotting of DNA fragments directly onto the slide
 Arraying of prefabricated oligonucleotides
 In-situ synthesis of oligonucleotides, done on the chip
3. Briefly describe two main types of microarray;
cDNA array—cDNA clones are PCR amplified and the PCR products are printed
on to slides with micropins. The clones are verified by sequencing. cDNA array is
hybridized with Cy-dye labeled sample and identical control for the same set of
experiments. Data from different sets of experiments are difficult to compare
because different controls are used. Although frequently called cDNA array,
genomic DNA can also be PCR amplified and printed to produce genomic DNA
arrays.
Oligonucleotide arrays—Synthetic oligos may be printed using a similar process
for cDNA array printing. Compared to Affy arrays, it is flexible and longer oligos
may be used. Spotted oligo array is frequently hybridized with a mix of sample
and control.
4. What are the three types of experimental replicates?
Biological replicates—use different cell cultures prepared in parallel
Technical replicates: use one cell culture, first processed and then split just
before hybridization
Sample replicates: use one cell culture, first split and then processed.
5. Give a list of genes which are ranked based on their p-values. Using false
Discovery Rate (FDR) to select the informative genes, assuming the FDR is 5%
(α= 5%), total number of genes is 1000.
Gene Rank p-value
Rank
(jα)/m
p-value
1
0.00005
0.000006
2
0.00010
0.000008
3
0.00015
0.0000011
Joshua Wu SID 11174269
4
0.00020
0.0000020
5
0.00025
0.0000040
6
0.00030
0.000060
7
0.00035
0.0000980
8
0.00040
0.0001003
9
0.00045
0.0002456
10
0.00050
0.0007098
11
0.00055
0.001120
12
0.00060
0.001190
13
0.00065
0.002310
14
0.00070
0.002318
15
0.00075
0.002401
Joshua Wu SID 11174269
6. Suppose a biologist conducted a microarray experiment to test the gene
expression patterns of normal persons and cancer patients. The experimental
results are listed in the following table. Based on this information, using Naïve
Bayesian classifier to predict if a person, whose gene expression pattern is G1 >
20; G2>200; and G3>5, is a cancer patient or not?
Person ID
G1
G2
G3
Cancer?
1
<10
100 … 200
>5
Yes
2
<10
100 … 200
>5
No
3
<10
100 … 200
>5
Yes
4
10 … 20
100 … 200
>5
No
5
10 … 20
100 … 200
<5
Yes
6
10 … 20
>200
<5
No
7
10 … 20
>200
<5
Yes
8
10 … 20
>200
>5
No
9
>20
>200
<5
No
10
>20
100 … 200
<5
Yes
Class:
C1: has_cancer = ‘yes’
C2: has_cancer = ‘no’
Data sample: X = ( G1 > 20, G2 > 200, G3 > 5)
P(C) : P(has_cancer = “yes”) = 5/10=0.5
P(has_cancer = “no”) = 5/10 = 0.5
P(G1 = “>20” | has_cancer = “yes”) = 1/5 = 0.2
P(G1 = “>20” | has_cancer = “no”) = 1/5 = 0.2
P(G2 = “>200” | has_cancer = “yes”) = 1/5 = 0.2
P(G2 = “>200” | has_cancer = “no”) = 3/5 = 0.6
P(G3 = “>5” | has_cancer = “yes”)=2/5 = 0.4
P(G3 = “>5” | has_cancer = “no”)=3/5 = 0.6
P(X|C): P(X|has_cancer = “yes”) = 0.2*0.2*0.4=0.016
P(X|has_cancer = “no”) = 0.2*0.6*0.6=0.072
P(X|C)*P(C) : P(X|has_cancer=”yes”)*P(has_cancer=”yes”)=0.5*0.016=0.008
P(X|has_cancer=”no”)*P(has_cancer=”no”)=0.5*0.072=0.036
Therefore, under the given condition, the patient most likely doesn’t have cancer.
Download