Bioinformatics and Genomics Lab Dr. Wood Seattle Pacific University Purpose: The goal of this lab is to introduce you to the tools and resources used to extract information from genome sequence data and to familiarize you with the information resources that will enable you to apply this information in clinical practice. Part I: What can we learn from DNA? Scenario: You are a laboratory assistant who has just taken a job at the Centers for Disease Control in Atlanta, GA. Your supervisor has just begun work on a disease outbreak occuring in a small town in Washington state called Coupeville. A large number of high school athletes have acquired skin infections that are resistant to standard antibiotic treatment. Clinicians on site are confident that they have identified the causative agent of the disease and are working hard to find an effective treatment. Given the size of the outbreak the CDC has decided to begin an investigation into the source of the outbreak which requires further molecular characterization. You have been given two tasks. First, confirm the identity of the responsible bacterium. Second, determine the cause of the antibiotic resistance. Once this is complete the laboratory will genetically profile all isolates to determine if a one or more individual strains are involved in this outbreak (you will not do this during this lab). Procedure: Identify the causative agent using rDNA comparison. Your supervisor has requested that the laboratory sequence the rDNA of the bacterium isolated from patients in the outbreak. The sequence has been provided to you below and you must use it to verify the clinical identification of the agent causing this outbreak. (modified from http://rdp.cme.msu.edu/assigngen/basicinstr.jsp ): Find the closest match to your sequence Go to the Ribosomal Database Project at http://rdp.cme.msu.edu/index.jsp. Go to the Sequence Match analysis tool (also called SEQMATCH). Paste your unknown rDNA sequence into the text box. Change the options below to: o Strain: Both o Source: Isolates o o o o Size: >1200 Quality: Good Taxonomy: Nomenclatural KNN matches: 1 Click on "Submit". Go to "view selectable matches." Only the closest match will be displayed. Record the Genus and species of the closest relative. Name of organism:______________________________ What diseases are caused by this type of bacterium? Determine why the organism is resistant to antibiotics. Your supervisor has asked the laboratory to provide you with the sequence of a particular gene that she believes is involved in the ability of this pathogen to resist antimicrobic therapy. This sequence is available below. Determine the identity of the gene encoded by this sequence and investigate the mechanism by which it confers resistance. a. Identify the unknown gene. Using the BLAST program discussed in lab (see links below) identify the name of the unknown gene and answer the following questions: I. From NCBI BLAST (http://www.ncbi.nlm.nih.gov/blast/ ) select protein blast near the center of the page. II. Paste your unknown sequence into the box and select the BLAST button near the bottom of the screen. III. Review the information and answer the following questions: b. Questions: I. What is the e-value of the match between your protein and its best match? What does this tell you about these two proteins? II. Is your protein identical to the best match? If not, how many of the amino acids are exact matches? III. What is the name of your unknown protein based on similarity with its best match? IV. What is the name of the gene that makes your protein? c. Investigate protein domains. Using the PFAM program below investigate any protein domains in your unknown protein to help you determine its function. I. From the PFAM programs at the Sanger center (http://pfam.sanger.ac.uk/) select the Sequenc e Search link. II. Paste your unknown sequence into the box and hit go. III. Review the information and answer the following questions. d. Questions: I. What is a protein domain? II. Evaluate the three best scoring domains. List each below along with the e-value of the match and briefly describe its function . i. ii. iii. e. Further investigation. You should work on this section at home. You will need to research the function of the gene you identified using internet or other sources. Feel free to work with your lab partner or in groups to answer these questions. I. What class of antibiotics would you expect this pathogen to be resistant to? II. How does general type of protein work in normal cells? III. How does the presence of this protein provide resistance to antibiotics? IV. What antibiotics might work to treat this disease? Part II: The human genome and disease. In this part of the laboratory you will investigate the link between genetic alterations in the human genome and disease. Use the information found at http://www.ncbi.nlm.nih.gov/disease/ and other sites you locate on the internet to answer the following questions. Questions: 1. What gene or genes are mutated in patients with Cystic Fibrosis? 2. How many mutations are associated with CF? 3. Which chromosome contains the gene whose alteration leads to CF? 4. What specific microorganisms are commonly associated with this disease? 5. How is this disease treated? 6. Select one other genetic disease (it does not need to be microbial in nature). Note the chromosome or chromosomes involved, the gene or genes involved, the specific mutation or mutations and briefly review the symptoms and treatment for the disease. Resources: Your unknown rDNA sequence: TTTTATGGAGAGTTTGATCCTGGCTCAGGATGAACGCTGGCGGCGTGCCTAATACATGCAAGTCGAGCGAACGGACG AGAAGCTTGCTTCTCTGATGTTAGCGGCGGACGGGTGAGTAACACGTGGATAACCTACCTATAAGACTGGGATAACT TCGGGAAACCGGAGCTAATACCGGATAATATTTTGAACCGCATGGTTCAAAAGTGAAAGACGGTCTTGCTGTCACTA TAGATGGATCCGCGCTGCATTAGCTAGTTGGTAAGGTAACGGCTTACCAAGGCAACGATGCATAGCCGACCTGAGAG GGTGATCGGCCACACTGGAACTGAGACACGGTCCAGACTCCTACGGGAGGCAGCAGTAGGGAATCTTCCGCAATGGG CGAAAGCCTGACGGAGCAACGCCGCGTGAGTGATGAAGGTCTTCGGATCGTAAAACTCTGTTATTAGGGAAGAACAT ATGTGTAAGTAACTGTGCACATCTTGACGGTACCTAATCAGAAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTA ATACGTAGGTGGCAAGCGTTATCCGGAATTATTGGGCGTAAAGCGCGCGTAGGCGGTTTTTTAAGTCTGATGTGAAA GCCCACGGCTCAACCGTGGAGGGTCATTGGAAACTGGAAAACTTGAGTGCAGAAGAGGAAAGTGGAATTCCATGTGT AGCGGTGAAATGCGCAGAGATATGGAGGAACACCAGTGGCGAAGGCGACTTTCTGGTCTGTAACTGACGCTGATGTG CGAAAGCGTGGGGATCAAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGAGTGCTAAGTGTTAGGGGG TTTCCGCCCCTTAGTGCTGCAGCTAACGCATTAAGCACTCCGCCTGGGGAGTACGACCGCAAGGTTGAAACTCAAAG GAATTGACGGGGACCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGAAGCAACGCGAAGAACCTTACCAAATCTTG ACATCCTTTGACAACTCTAGAGATAGAGCCTTCCCCTTCGGGGGACAAAGTGACAGGTGGTGCATGGTTGTCGTCAG CTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTAAGCTTAGTTGCCATCATTAAGTTGGGCA CTCTAAGTTGACTGCCGGTGACAAACCGGAGGAAGGTGGGGATGACGTCAAATCATCATGCCCCTTATGATTTGGGC TACACACGTGCTACAATGGACAATACAAAGGGCAGCGAAACCGCGAGGTCAAGCAAATCCCATAAAGTTGTTCTCAG TTCGGATTGTAGTCTGCAACTCGACTACATGAAGCTGGAATCGCTAGTAATCGTAGATCAGCATGCTACGGTGAATA CGTTCCCGGGTCTTGTACACACCGCCCGTCACACCACGAGAGTTTGTAAACCCGAAGCCGGTGGAGTAACCTTTTAG GAGCTAGCCGTCGAAGGTGGGACAAATGATTGGGGTGAAGTCGTAACAAGGTAGCCGTATCGGAAGGTGCGGCTGGA TCACCTCCTTTCT Sequence of unknown protein provided by your supervisor: mkkikivpli emmereikiy gmwkldwdhs kdywaiakel rnyplgkats ivddnsntia vstpsydvyp dktsykidgk kgmkklgvge ngninaphll grqigwfisy livvvvgfgi nslgvkdini viipgmqkdq iieedyikqq hllgyvgpin htliekkkkd fmygmsneey wwqkdkswgg diesdypfyn kdtknkvwkk dkdnpnmmma yfyaskdkei qdrkikkvsk sihienlkse mqqawvqddt seelkqkeyk gkdiqltida nvltedkkep ynqtryevvn aqisnknldn niiskeninl invkdvqdkg nntiwaiedk nkkrvdaqyk rgkiwdrnnv fvplktvkkm gykddavigk kvqksiynnm llnkfqitts gniqlqqaie eilladsgyg ltmgmmqvvn masynakisg nfkqvykdss iktnygnidr elantgtaye deylsdfakk kgleklydkk kndygsgtai pgstqkilta ssdfiffarv qgeilinpvq kthkediyrs kvydelyeng yisksdngev nvqfnfvked igivpknvsk fhlttnetes lqhedgyrvt hpqtgellal miglnnktld alelgskkfe ilsiysalen yanligksgt nkkydide