Exam 1 Stat 565, DUE Nov 6 by 3 pm Attestation: I ________________________________________ have worked on this exam by myself without consulting or colluding with anybody else. Signature:___________________________________________________________ Part 1 (40 points, 4 points each): Write TRUE or FALSE for the following WITH reasons (5 sentence limit on each): a. When doing a microarray experiment become aware that the two conditions come from distributions with different shapes, we are better off doing a non-parametric method like Wilcoxon rank sum test, as we do not have equal variance. b. The commonly used p-value has the same interpretation in Bayesian and frequentist framework. c. In general permutation tests and bootstrap tests are similar in results. d. In a study we have 4 biological replicates for Treatment A and 7 biological replicates for Treatment B. We decide to do a Welch’s t test. In this case, the degrees of freedom would be 9. e. To show “equivalence” in hypothesis testing, i.e. showing that the new drug is as good as the existing gold standard, we are better off using Bayesian methods of hypothesis testing. f. For an Affymertrix© data set on probes, we find the Signal Intensity of one probe-set is 2^5.5 and the detection called is: Present. It is not possible for another probe-set (of the same study) to have a Signal Intensity of 2^5.9 and the detection call is: Absent. g. Quantile Normalization forces the data sets to be iid (essentially have the same exact numbers) and hence it removes features from the data and should be used sparingly. h. Fixed circle segmentation is the easiest and the best way to construct data from images. i. In the microarray probe design stage statisticians are seldom involved and so it is safe to say that this stage does not have any statistical issues. j. We run a microarray experiment with 8400 genes. However, all we are interested is the 45 genes that relate to a particular function. So we can analyze the 45 genes without worrying about the rest of the genes on the array. Part 2: Short answers for data related problems. You may use software or do these by hand. Total 50 points. (10 points each) 1. For the given data on 4 replicate arrays (p1-p4) with 15 genes (spots) each (for a very small micro-array) experiment, compute the quantile normalized values. p1 3.456 1.123 5.678 0.954 7.234 1.346 3.210 1.234 6.009 1.008 2.567 3.807 2.348 3.502 10.213 p2 4.389 0.595 4.851 2.223 4.456 2.852 3.667 4.235 5.285 1.120 3.011 4.447 2.399 3.194 5.762 p3 2.128 0.001 5.913 1.661 6.675 2.449 5.071 2.975 4.330 1.993 2.748 4.950 1.695 2.847 5.082 P4 5.860 0.243 6.336 4.095 13.370 1.986 4.525 0.739 6.473 0.776 1.316 3.739 2.788 4.376 14.526 2. You are given some raw Affy data on a gene with 16 technical replicates. Compute signal and detection value using Wilcoxon Signed rank test and the Tukey-Biweight algorithm. MM 193.87 185.97 1092.22 777.89 173.82 59.53 561.35 128.98 667.61 709.07 61.53 105.4 234.5 417.5 80.98 46.77 PM 4358.5 2123 2138 740.4 17393.5 1702.9 1528.4 12863.6 7690.4 814.3 64332.3 1082.5 1216.8 15244.6 1855.1 2927.7 3. You are given 150 p-values (SA_3.xls) from t-tests run for comparing treatment and control for 150 selected genes. Use alpha=.05 and provide the lists that reject H_0 using the following criteria: a. Uncorrected (PCER) b. FWER c. FDR d. Step-down Holm e. Step-up Hommel. 4. In an exam, there is a problem that 40% of students know the correct answer. However, there is 15% chance that a student picked the wrong answer even if he/she knows it and there is also a 25% chance that a student does not know the answer but guessed it correctly. If a student did get the problem right, what is the chance that this student really knows the answer? 5. Discuss when and where you would use the following tests for the 2-sample problem: pooled t, paired t, Welch’s t, Wilcoxon Rank sum, permutation test, bootstrap test. Part 3: Data Problems: (10 points) EDITED Oct 30 1. For the data on the web zipped as 2015_ex1_prob1.zip you are given 6 files. The result is from a design comparing two treatments with 3 replicates each (a1-3, b13). The Target file and the descriptor file are also given for your interest. a. Provide the image plots for the raw data. b. Please use rma and MAS5 to normalize the affy-batch data. Normalize within condition (a and b) and across conditions (a and b together) c. Provide box-plots for the raw and normalized data within and across conditions. d. What observations do you make from this?