Is it E. coli O157:H7? Using Bioinformatics to Develop and Test Hypotheses Joanna R. Klein, Northwestern College, St. Paul, MN Introduction Bioinformatics is used extensively by researchers and is an area that students need to become competent in, especially considering rapid advances in genome sequencing projects. Just as in any inquiry based lab, bioinformatics is most meaningful when students learn the tools while using them to test hypotheses. With this goal in mind, an activity was designed for students to learn how to use some specific bioinformatics tools both in developing a hypothesis and then in testing whether the hypothesis is correct. Description of Activity This activity takes a case study approach in which students are asked to design a PCR-based diagnostic test for E. coli O157:H7 by identifying a gene that is specific to this pathogenic strain. To do this, students are provided a set of unknown gene sequences that they identify by performing BLAST searches at NCBI. They review the function of the gene products and they develop hypotheses about which might be unique to O157:H7. They then test their hypotheses by using the integrated microbial genomes (IMG) database to search specific bacterial genomes for each gene. Step 1: Learn about PCR and Gel Electrophoresis Case Study Scenario: Elizabeth and Colin, two novice scientists, were asked to test E. coli samples from Lake Johanna to determine if any are the pathogenic strain O157:H7. How should they go about this task? They have heard that PCR is often used in bacterial identification, but they don’t know much about how it works. As a pre-lab assignment, students are directed to go through virtual labs on PCR and Gel electrophoresis (Figure 1) and read a section in their textbook about the use of PCR in clinical diagnosis. They answer a set of questions. Figure 1: Learn Genetics Virtual Labs http://learn.genetics.utah.edu/ content/labs/pcr/ Step 2: Use BLAST to determine the identity of unknown sequences Case Study Scenario: Now that Elizabeth and Colin have a better understanding of PCR, they need to decide how to apply the technique to their problem. See if you can help them out! They are hoping to use PCR to amplify a gene that is present in O157:H7, but not in other strains of E. coli. But what specific gene should they look for? Their research supervisor provided them with 4 sets of primers that they could potentially use. Each set has two primers – an upstream primer and a downstream primer – to amplify a specific gene. They were given the nucleotide sequence that each set amplifies, which is found in appendix A, however the name of the gene is missing from the file and their supervisor is unfortunately unable to be reached. Joanna Klein, Ph.D. Dept. of Biology and Biochemistry Northwestern College 3003 Snelling Ave. N. St. Paul, MN 55113 651-286-7468 jrklein@nwc.edu Students are emailed a file containing the gene sequences of 4 unknown genes. These 4 genes were chosen because each are expected to be found differentially in E. coli strains and students have been previously exposed to them directly or indirectly, so prior material is reinforced. Students determine the name of the product encoded by each sequence by doing a blastx search at NCBI (Figure 2). Unknown genes: 1. Glyceraldehyde 3-phosphate dehydrogenase (GAPDH) 2. Cytochrome c oxidase 3. Tryptophanase 4. Shiga toxin subunit A Figure 2: Tryptophanase BLAST results. A blastx search was done with the student unknown sequence 3. The top matches are all to tryptophanase genes from various organisms so the probable identity of the gene is evident. Step 3: Develop a hypothesis about which gene is specific to E. coli O157:H7 Case Study Scenario: Elizabeth and Colin have some idea of the function of each of these genes based on their previous course work and make a prediction as to what species of bacteria would contain each of these genes. Students review or research the function of each gene and write down whether they predict if the gene would be: found in all species of bacteria found in all E. coli found in just E. coli O157:H7 absent from E. coli They write down an appropriate hypothesis to test. Step 4: Test the hypothesis using IMG Case Study Scenario: Elizabeth and Colin have heard of a useful database, Integrated Microbial Genomes (IMG), where they might be able to test their hypothesis. They plan to search several genomes to determine if they contain the gene sequences. Which genomes should they search? Students are asked to make a list of genomes to search and then search each for the four different genes at IMG (Figures 3 & 4). Students write down a conclusion regarding their hypothesis. Figure 4: Gene search at IMG. Four bacterial genomes were selected (E. coli K12 DH1, E. coli 0157:H7 Sakai and P. aeruginosa PA01. The genomes were queried with the gene names “glyceraldehyde”, “cytochrome c oxidase”, “tryptophanase” and “shiga toxin”. Results show that shiga toxin is the only gene specific to O157:H7. Assessment Students were given a pre- and post-test on 12 key terms and concepts covered in the activity and showed significant gains. (Table 1) Table 1 – Assessment results Number of Course Students Pre-testa Spring 2010 7 6.4 Spring 2011 15 3 a average score out of 24 points. Post-testa 17.1 14.8 Gain 10.7 11.8 Practical Implementation Audience – microbiology course with freshmen-senior biology majors Format - Computer based activity is a useful substitute for a lab on molecular methods and can be incorporated into online learning courses. Acknowledgements Figure 3: Integrated Microbial Genome (IMG) database. http://img.jgi.doe.gov/cgi-bin/w/main.cgi I’d like to thank Shellie Kieke, Concordia University – St. Paul, and Ruth Gyure, Western Connecticut State University, for piloting this lab in their microbiology courses and providing feedback and assessment data.