BIO 224 Laboratory Oct 4 & 6, 2010 Gene Expression: Lab Assignment 5 (due Wed, October 13th; 15 pts) 1. Go to the UniGene site (http://www.ncbi.nlm.nih.gov/unigene) and answer the following questions: A. Examine the Homo sapiens entry (click on the name). Why does Homo sapiens have so many clusters that have been identified? If it is projected that there are approximately 35,000 protein-coding genes within the genome, why would there be such a discrepancy as to the number of clusters that have been identified? B. Find your favorite organism on the UniGene homepage and click on the species name. Describe the type of information you find in the entry. In addition, explain the “Histogram of cluster sizes” table listed for your species (note: cluster size is on the left, and # of clusters is on the right side). C. Find the UniGene accession number entry that corresponds to your assigned human mRNA sequence. Record it below (note: you can either search the UniGene database directly or find it's cross-listing within the UniProt entry you worked with last assignment). D. What types of information did you find in regards to Gene Expression? E. In specific, what did you find out about where this gene is found to be expressed? (Note: don't use the GEO profile to answer this question since this will be addressed in a separate question- Q4). F. For the EST profile, explain the information that is listed for the tissue that has the highest expression (e.g. what do the numbers mean?) Why does this only relay relative expression estimates? G. How many mRNA sequences and EST sequences have been documented for your protein in this database? H. For the EST sequences, examine the first three listed that are from different cDNA libraries and answer the following: a. How long was the sequence read? (Sequence length) b. From what tissues did these libraries originate from? c. What was the vector that the cDNA was cloned into? (hint: need to click on the library link to find out how it was constructed) 2. Go to the Mammalian Gene Collection (http://mgc.nci.nih.gov/ ) and answer the following questions: A. Find the full length MGC clone for your assigned human mRNA and record the Image ID #. (note: if more than one exists then record the total number of clones and select only one for the next questions) B. From what type of tissue did this sequence come from (can find this out by examining the library link)? C. What type of vector was used to construct this cDNA library? D. What “universal” vector primers could you use to sequence the cDNA insert? BIO 224 Laboratory Oct 4 & 6, 2010 3. Go to the Digital Differential Display website (http://www.ncbi.nlm.nih.gov/UniGene/info_ddd.shtml) and answer the following questions. A. What is Digital Differential Display (DDD) based on? How are differences in expressed genes evaluated? Why is it important to know the number of EST sequences that have been examined within each library for the DDD analyses? (Note: you will need to go to the following website to find out more info on DDD: http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=handbook.section.865) B. Click on the "Begin DDD for" link at the top of the page and list the species Homo sapiens. Use the information you obtained in question 1H regarding cDNA libraries from which your ESTs were derived. Why are normalized or subtracted libraries not particularly useful in DDD analyses? C. Compare two different cDNA libraries that you found for the above ESTs using DDD. What were the libraries you chose to examine (and from what tissues)? (clip notes: Within Homo sapiens, edit library and then find your library in the list- try to look for the tissue type and also cross reference the library accession to make sure you have the correct library; after chosing a name for the library, accept changes; then click “New” for a new library and then find the second library for comparison) D. Was your protein differentially expressed in the tissues (note: if your protein was not significantly differentially expressed, it would not be listed in the results of the differential display analysis)? E. List the top three proteins that were differentially expressed in the comparison of these libraries? Which library significantly expressed more of these particular proteins? Explain what these proteins do (hint: accession numbers) and try to connect this to the differences noted between the two libraries. (meaning, try to come up with an explanation or hypothesis as to why they might be differentially expressed given the tissues you chose). 4. Go to back to the UniGene entry for your human protein. Click on the Gene Expression Omnibus (GEO) link within the Gene Expression section. Examine some of the entries. Pick one of the microarray experiments listed to examine it in more detail. Study the bar graph figure to the right of the description by clicking on it to view the results and also examine the description of the experiment (click on the GDS??? accession number listed next to the check box). A) Briefly describe what the scientists were testing in the microarray experiment or in other words the goal. (e.g. expression profiles of normal, precancer and cancer cells, note: often there is literature cited and you can view their abstract). B) What were the results of the experiment with regards to only your gene of interest (i.e. mRNA expression level). To answer this you will need to click on the bar graph shown to the right of the initial GEO entry. There is a "Graph caption help" function that can assist your interpretation of the data.