Small Brains, BIG Ideas Bioinformatics Use of genomic Tools The purpose of this laboratory is to become familiar with some of the tools that are used for the analysis of DNA and protein sequences. All these tools are available on the internet (and most require a connection to the internet). Databases and tools for sequence alignments http://www.ncbi.nlm.nih.gov This is the portal for many resources, including PubMed (references to publications: www.pubmed.org), access to some books (follow “Books” link), DNA, mRNA, protein sequence databases, etc. We will use the following links: Access to all databases: http://www.ncbi.nlm.nih.gov/gquery/gquery.fcgi?itool=toolbar Sequence databases: http://www.ncbi.nlm.nih.gov/sites/entrez?db=nuccore Database of homologous genes http://www.ncbi.nlm.nih.gov/sites/entrez?db=homologene Site for comparing 2 sequences (“BLAST2”) http://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastSearch&PROG_DEF=blastn& BLAST_PROG_DEF=megaBlast&BLAST_SPEC=blast2seq Site for comparing insect sequences (12 Drosophila species, bee, mosquito, etc): http://flybase.org/blast/ Site for making phylogenetic trees http://www.ebi.ac.uk/Tools/clustalw2/index.html http://align.genome.jp/ Site for making reverse complement of sequence http://www.bioinformatics.org/SMS/rev_comp.html Site for translating a sequence: http://www.expasy.ch/tools/dna.html Site for making primers: http://frodo.wi.mit.edu/ Source of other sites: http://www.bioinformaticsonline.org/links/ch_09_t_6.html 1 Small Brains, BIG Ideas Bioinformatics Exercises The purpose of this exercise is to introduce you to the tools that are used to study the structure of a gene, compare its sequence with that of its homologs in different animals, etc. 1- Choose a gene (e.g., per2, etc). 2- Find the DNA sequence of the gene. 3- Find the RNA sequence of the gene. 4- Make a Word file with the sequence. Use BLAST 2 to identify exons, introns, 5’ and 3’ untranslated region; identify the transcription start site and the start of the coding region. Mark these features on the genomic sequence. 5- Use BLAST against all genomes to identify homologs using the mRNA sequences (why use the mRNA sequence?). 6- Repeat the same exercise using the protein sequence. Identify conserved regions. 7- Make a Word file with the sequence of homologous genes in FASTA format: > Name 1 (sequence 1) Sequence ……………………….. > Name 2 (sequence 2) Sequence ……………………….. > Name 3 (sequence 3) Sequence ……………………….. etc….. Note: - The “>” is critical for this format, as is the name that is on the first line. The sequence must start on the following line. - Some programs only consider the first letters of the name, so make sure that the different names differ in the first letters - Make sure you use the Courier font (a “proportional font”, i.e. one in which each letter uses the same space). 8- Copy the text into one of the programs for making phylogenetic trees. Make a tree first using species that are phylogeneticaly close, then more distant ones. Repeat the same process using the protein sequence. Comment on the results obtained in both cases. 9- Repeat the BLAST search you did in (5) but using only non-coding sequences (5’, 3’, introns). Investigate the conservation of these sequences in different species and animal groups. 10- Use these tools to answer a question you find interesting. 2