BIO 224 Laboratory CSU, Sacramento Dr. Tom Peavy September 27 & 29, 2010 Assignment 4 Protein Structure & Signature Sequences (due Wednesday, October 6 ) 1. Using your assigned human protein sequence, answer the following questions: A) Search for and list your UniProtKB/Swiss-Prot accession number entry for your protein. __________________ ( UniProtKB database site: http://www.uniprot.org/) . You can use the GenBank protein accession number or the official abbreviation for your gene. If there are more than one entries, list them and then examine what is essentially different between them. Then choose the one that seems to be the most thorough or appropriate to work with. Next, Examine your UniProt entry for the following info: B) Describe the types of information provided in the “General annotation/Comments” section of the UniProtKB/Swiss-Prot entry. (don’t relay the exact info for your particular protein but rather categorize the topics). Why might you need to know these pieces of information? C) Describe the types of information provided in the “Sequence annotation/features” section provide? Why might you need to know these pieces of information? D) Find the Gene Ontology (GO) consortium entries for your protein within the UniProt entry. List and describe what each of these GO entries mean and what kind of evidence supports their designation (evidence codes such as “traceable author statement”). E) Has the 3-D structure for your protein been determined? If so, provide their PDB accession entry numbers (if more than 3 entries, then just provide the first 3). F) Has the 3-D structure of your protein (or portion of the protein) been used to generate a theoretical model for another homolog? (Click on the ModBase entry). If so, examine the entry to determine what was modeled (which species homolog) and using what template (which crystal structure or in essence, primary database link)? G) Follow the links to your protein on the following sites and describe the information provided at the various sites. (not exhaustively, but the emphasis of each site and provide some of the key information you found out about your protein from this site). If you do not have any information for some of these sites (meaning not listed), then please contact the instructor for a suitable substitute. i. ii. iii. iv. InterPro Pfam SMART PRINTS BIO 224 Laboratory CSU, Sacramento v. Dr. Tom Peavy September 27 & 29, 2010 PROSITE H) Choose one of your Prosite links and list its entry number__________ Then copy and paste the consensus pattern for this protein and then explain what it means. (in other words, interpret the pattern). Try to choose a prosite entry that does have a consensus pattern so as to complete the question. I) Go to the ScanProsite search engine and enter your human protein sequence directly into the left hand text box to search the database (http://us.expasy.org/tools/scanprosite/ ). What kind of information did you receive? Compare it to the Prosite links in the UniProt entry given for your protein (e.g. did you get the hits for Prosite and did you receive any additional info). 2. Using your assigned human mRNA/protein, predict the following physical and chemical properties for the protein: A) Using the mRNA sequence for your protein (download or copy directly from accession entry), what frame is your mRNA translated from to generate the full length protein sequence? (i.e frame 1, 2 or 3) Use the following program: Translate http://br.expasy.org/tools/dna.html B) Does your protein have any transmembrane regions? Does the output appear to be consistent with your protein having a signal peptide? (double check with your UniProt entry to see whether a signal peptide was documented and then discuss your profile output) TMpred http://www.ch.embnet.org/software/TMPRED_form.html C) What is the pI and MW of the mature protein (remember to take into consideration the signal peptide if it has one)? Compute pI/Mw tool http://us.expasy.org/tools/pi_tool.html D) How many peptides would be generated if trypsin was used to proteolytically cleave your protein? (choose the “sophisticated model” for the trypsin analysis that is listed as the third entry under the "Please Select" menu) PeptideCutter http://us.expasy.org/tools/peptidecutter/ Peptides derived from their parent proteins (like above) are often used in proteomics projects. Why is this so? And how are the peptides utilized? E) Does your protein have any potential N-linked glycosylation sites? NetNGlyc: http://www.cbs.dtu.dk/services/NetNGlyc/ F) What is the predicted subcellular location for your protein? TargetP: http://www.cbs.dtu.dk/services/TargetP/ BIO 224 Laboratory CSU, Sacramento Dr. Tom Peavy September 27 & 29, 2010