Bioinformatics Worksheet Instructions Biology 22 Winter 2009 Helpful hint: Keep the following items open for successful navigation of the databases: A. For quick links to the databases, use Dr. Colavito’s homepage, Bioinformatics Tools link from Course page. B. For an online version of the Bioinformatics worksheet, open the posted version on Dr. Colavito’s homepage from the Lecture Notes page. C. To store data as you retrieve it from the databases, copy and paste the information into a text file. 1. Accessing the genomic sequence to study Database Go To Steps to Take National Center for Biotechnology Information (NCBI) http://homepage.smc.edu/colavito_mary/biology22/course.htm Choose the Bioinformatics Tools link Choose the sequence number shown on your worksheet Copy the nucleotide sequence as it appears on the screen and save it for future use (#2). 2. Locating the gene within the genomic sequence Search Tool Go To Steps to Take Information for Summary Tables Genscan at MIT http://genes.mit.edu/GENSCAN.html Paste the nucleotide sequence from Step 1 into the white query box Click on Run Genscan Record the number of exons predicted from the scan Copy the Results information for future use (#3 and #7). Copy the largest predicted peptide sequence from the results and save it in your text file Number of Nucleotides in the genomic sequence Number of exons Intron-Exon Junction Comparison (see #7) 1 Bioinformatics Worksheet Instructions Biology 22 Winter 2009 3. Identifying the predicted protein sequence Search Tool Go To Steps to Take Information for Summary Tables BLAST at NCBI (Basic Local Alignment Search Tool) http://blast.ncbi.nlm.nih.gov/Blast.cgi Under Basic BLAST, Choose protein blast Paste the predicted peptide sequence in the query box Under Choose Search Set, select Swissprot protein sequences Under Program Selection, choose blastp algorithm (protein-protein BLAST) Click on the blue BLAST button at the bottom of the page Scroll down the results page to “sequences producing significant alignments” Click the link next to the human sequence with the highest score [If there is no high-scoring human sequence, go back to the GenScan output, select the next largest protein and repeat the BLAST analysis] Copy the amino acid sequence generated by the BLAST search to use in the future (#4a) Note the abbreviation for the gene name under the “Features” heading on the sequence page for future use (#5 and #6). Number of amino acids (Be sure to take this from the protein sequence detected by the BLAST search rather than the one predicted by Genscan) Protein Name 4a. Determining the functional domains of the protein Search Tool Go To Steps to Take Information for Summary Tables Pfam at the Sanger Institute http://pfam.sanger.ac.uk/search Paste the saved human sequence from #3 into the white query box Click on the “Go” button to the right of the box On the results page, note the rows highlighted in green, showing significant matches for subunits of the protein Click on the link on the far left of the row to find a description of the protein subunit Protein Domains and their Functions Overall Function of Protein in the Cell 2 Bioinformatics Worksheet Instructions Biology 22 Winter 2009 4b. Alternative method for determining overall protein function (Use these steps only if step 4a didn’t provide this information) Database Go To Steps to Take Information for Summary Table Universal Protein Resource (UniProt) at European Bioinformatics Institute, Swiss Institute of Bioinformatics and Protein Information Resource http://www.uniprot.org/ Under “Search In” choose Protein Knowledgebase Type the protein name in the Query box and select search Choose an entry related to humans and click on the accession number Note both the general information and material presented by keyword Overall Function of Protein in the Cell 5. Determining Chromosomal Location and Identifying Inherited Disease Relationship Database Go To Steps to Take Information for Summary Table Online Mendelian Inheritance in Man (OMIM) at NCBI http://www.ncbi.nlm.nih.gov/sites/entrez?db=OMIM Type the name of the gene in the search box and click Go Select an entry that matches the gene name Make note of the abbreviation of the gene name for future use (#6). Note the “Gene Map Locus” listed with the entry Note the name of the disease listed with the entry Read the “Genotype/Phenotype Correlations” and/or “Allelic Variants” portions of the entry to learn how the disease is inherited Gene Location on the Chromosome Related Disease Inheritance Pattern for Disease Development 3 Bioinformatics Worksheet Instructions Biology 22 Winter 2009 6. Detecting Single Nucleotide Polymorphisms Search Tool Go To Steps to Take Information for Summary Table National Center for Biotechnology Information (NCBI) http://www.ncbi.nlm.nih.gov/sites/entrez?db=snp Type the name of the gene (usually a three letter code) in the query box and click Go Select the “Human” tab above the search results Under any of the entries, select “Gene View” For any missense mutation you observe, provide the Exon Number mRNA position Change in nucleotide at that position Amino Acid position Change in amino acid at that position 7. Studying Intron-Exon Junctions Database Go To Steps to Take Information for Summary Table National Center for Biotechnology Information (NCBI) http://homepage.smc.edu/colavito_mary/biology22/course.htm Choose the Bioinformatics Tools link Choose GenBank reference Select the sequence number shown on your worksheet Use this numbered version of your sequence to perform the intron-exon junction comparison Choose any two consecutive exons from the GenScan results in #2. Examine the Intron-Exon Junctions at the 5’ and 3’ ends of the intron between these two exons Compare these to the consensus sequence shown in your textbook (duplicated in the figure presented below) Exon Numbers Sequence of Nucleotides at junctions Comparison to consensus sequences 4