Exam 1 Stat 565

advertisement
Exam 1 Stat 565, DUE Nov 6 by 3 pm
Attestation: I ________________________________________ have worked on this
exam by myself without consulting or colluding with anybody else.
Signature:___________________________________________________________
Part 1 (40 points, 4 points each): Write TRUE or FALSE for the following WITH reasons
(5 sentence limit on each):
a. When doing a microarray experiment become aware that the two conditions come
from distributions with different shapes, we are better off doing a non-parametric
method like Wilcoxon rank sum test, as we do not have equal variance.
b. The commonly used p-value has the same interpretation in Bayesian and
frequentist framework.
c.
In general permutation tests and bootstrap tests are similar in results.
d. In a study we have 4 biological replicates for Treatment A and 7 biological
replicates for Treatment B. We decide to do a Welch’s t test. In this case, the
degrees of freedom would be 9.
e. To show “equivalence” in hypothesis testing, i.e. showing that the new drug is as
good as the existing gold standard, we are better off using Bayesian methods of
hypothesis testing.
f. For an Affymertrix© data set on probes, we find the Signal Intensity of one
probe-set is 2^5.5 and the detection called is: Present. It is not possible for
another probe-set (of the same study) to have a Signal Intensity of 2^5.9 and the
detection call is: Absent.
g. Quantile Normalization forces the data sets to be iid (essentially have the same
exact numbers) and hence it removes features from the data and should be used
sparingly.
h. Fixed circle segmentation is the easiest and the best way to construct data from
images.
i. In the microarray probe design stage statisticians are seldom involved and so it is
safe to say that this stage does not have any statistical issues.
j. We run a microarray experiment with 8400 genes. However, all we are interested
is the 45 genes that relate to a particular function. So we can analyze the 45 genes
without worrying about the rest of the genes on the array.
Part 2: Short answers for data related problems. You may use software or do these
by hand. Total 50 points. (10 points each)
1. For the given data on 4 replicate arrays (p1-p4) with 15 genes (spots) each (for a
very small micro-array) experiment, compute the quantile normalized values.
p1 3.456 1.123 5.678 0.954 7.234 1.346 3.210 1.234 6.009 1.008 2.567 3.807 2.348 3.502 10.213
p2 4.389 0.595 4.851 2.223 4.456 2.852 3.667 4.235 5.285 1.120 3.011 4.447 2.399 3.194 5.762
p3 2.128 0.001 5.913 1.661 6.675 2.449 5.071 2.975 4.330 1.993 2.748 4.950 1.695 2.847 5.082
P4 5.860 0.243 6.336 4.095 13.370 1.986 4.525 0.739 6.473 0.776 1.316 3.739 2.788 4.376 14.526
2. You are given some raw Affy data on a gene with 16 technical replicates.
Compute signal and detection value using Wilcoxon Signed rank test and the
Tukey-Biweight algorithm.
MM 193.87 185.97 1092.22 777.89 173.82 59.53 561.35 128.98 667.61 709.07 61.53 105.4 234.5 417.5 80.98 46.77
PM 4358.5 2123
2138
740.4 17393.5 1702.9 1528.4 12863.6 7690.4 814.3 64332.3 1082.5 1216.8 15244.6 1855.1 2927.7
3. You are given 150 p-values (SA_3.xls) from t-tests run for comparing treatment
and control for 150 selected genes. Use alpha=.05 and provide the lists that reject
H_0 using the following criteria: a. Uncorrected (PCER) b. FWER c. FDR d.
Step-down Holm e. Step-up Hommel.
4. In an exam, there is a problem that 40% of students know the correct answer.
However, there is 15% chance that a student picked the wrong answer even if
he/she knows it and there is also a 25% chance that a student does not know the
answer but guessed it correctly. If a student did get the problem right, what is the
chance that this student really knows the answer?
5. Discuss when and where you would use the following tests for the 2-sample
problem: pooled t, paired t, Welch’s t, Wilcoxon Rank sum, permutation test,
bootstrap test.
Part 3: Data Problems: (10 points) EDITED Oct 30
1. For the data on the web zipped as 2015_ex1_prob1.zip you are given 6 files. The
result is from a design comparing two treatments with 3 replicates each (a1-3, b13). The Target file and the descriptor file are also given for your interest.
a. Provide the image plots for the raw data.
b. Please use rma and MAS5 to normalize the affy-batch data. Normalize
within condition (a and b) and across conditions (a and b together)
c. Provide box-plots for the raw and normalized data within and across
conditions.
d. What observations do you make from this?
Download