Homework 2 – DAVID functional annotation BMI 6030 – Eilbeck Chad Hodge u0584663 Section I 1. How many DAVID IDs have been analyzed? a. 1234 DAVID IDs were analyzed 2. How many gene IDs were submitted? a. 1238+43 = 1281? (not 43, but 4?) screen shot. 3. How many genes from the original gene list will not be included in the DAVID analysis? a. 18 (4?) 4. Why are there more genes included in GOTERM_BP_1 analysis than in GOTERM_BP_5? a. As noted in the instructions, BP_1 is a more general, higher level node in the ontology, whereas BP_5 is more specific. Thusly, the more general term is going to be less restrictive by the very nature on the ontology, and will have more genes included. 5. This dataset is derived from vein endothelial cells. Are there any terms in either chart that you might expect to see with this dataset? a. After reading the Hang article, and doing a little bit of Google’ing I would expect to see these sorts of terms, related to vein endothelial cells: Angiogenesis, blood vessel, morphogenesis, actin filament based movement, artery, and perhaps even umbilical and cobalt, given Hangs focus. 6. Which chart provides these ‘endothelial cell’ relevant terms? a. GOTERM_BP_5 has many of these. BP_1 is much more generalized, as one would expect due to its order in the ontology. 7. Between the GOTERM_BP_FAT and GOTERM_BP_ALL, which is the most significant term in each chart? Which of these two GO terms is most informative? a. The most significant term for GOTERM_BP_FAT, in terms of its p-value, is “RNA biosynthetic process”(m-phase of mitotic cells?) b. The most significant term for GOTERM_BP_ALL, in terms of its p-value is “Regulation of transport” (organelle organization?) c. The GOTERM_BP_ALL term of “regulation of transport” is much more descriptive, and thus more informative, especially so when you drill in to the term itself and read its description. (m-phase) Section II 8. What is the most significant GO Molecular Function term? a. Based on p-value, that would be “deoxyribonuclease activity” (nucleotide binding?) 9. What is the most significant Interpro domain? a. “EGF” according to p-value, which is epidermal growth factor-like domain. (pleckstrin homology?) 10. Are these terms (or related term) identified in the both analyses? a. The molecular function does not show EGF, or any of its clan memebers. b. Interpro does not have the “deoxyribonuclease activity” term, nor any of its child term on its list. c. In terms of just general overlap between the two, by looking at the functional annotation cluster report, there appear to be some overlaps, such as protein kinase, as well as serine/threonine protein kinase, and insulin like growth factor binding, aspartyltRNA, aminoacyl-tRNA, as well as several others. d. Nucleotide binding in both analysis 11. Look for kinase activity related terms in these two charts. How similar are these terms between the two charts? a. There is a lot of kinase overlap, such as protein kinase, serine/threonine and tyrosine. Section III 12. What is the most significant KEGG pathway? a. Based on p-value, the most significant KEGG pathway is “nucleotide excision repair” (DNA replication) 13. How many genes are associated with this term? a. There are 8 genes associated with this term. b. (16?) Section IV 14. We found that ‘programmed cell death’ (Fisher’s exact test, P = 2.1x10-7) is only significantly observed in the upregulated genes, which indicates that apoptosis is initiated in response to mimicked hypoxia in HUVECs. Does your GO Biological Process analysis confirm this statement? a. Yes. I do see a programmed cell death entry, but not at the exact same fisher exact value, and also I do see the cell death in the downregulated genes, not just in the up. i. GOTERM_BP_FAT programmed cell death 1.8E-5 ii. Apoptosis? iii. P values are significant for both, but not exact. 15. Are there any other Biological Processes that are identified by the upregulated and downregulated dataset analyses that may be relevant a. Interestingly, when I look at the fisher exact score in the downregulated genes, I see that ‘programmed cell death’ is the most significant biological process. Induction of cell death as well. Section V 16. How many DAVID IDs are in the current ‘combined’ gene list? a. 1524 17. Compare the BP_FAT tables from all three gene lists. What is the benefit of carrying out a combined analysis in comparison to looking at the upregulated and downregulated gene lists independently? a. The genes must pass a threshold in order to be considered significant. Combining these data sets changes those counts, and thus what it considers significant. This pushes some terms from the edge of significance to one side of the other of that line.