Exercise 3. Inspecting the primary structure of a gene 1) Fetching the sequence Obtain the human beta globin gene (AY260740) from GenBank (Nucleotide database) Choose FASTA display and save using Send 2) Find exon-intron boundaries Use HMMgene, http://www.cbs.dtu.dk/services/HMMgene/ program that can be used to predict genes of vertebrates and C. elegans. Under “Options”, choose 3 best predictions, check “Predict signals” and submit. Check http://www.cbs.dtu.dk/services/HMMgene/hmmgene1_1.php for more information. briefly describe HMMgene Inspect the scores. Which lines correspond to the best prediction? How different are different predictions? How many exons are predicted? Are all the proposed donor and acceptor sites valid? Draw the structure of the gene on paper. Try also FGENESH: http://linux1.softberry.com/berry.phtml?topic=fgenesh&group=programs&subgroup=gfind Help: http://linux1.softberry.com/berry.phtml?topic=fgenesh&group=help&subgroup=gfind briefly describe FGENESH Do the HMMgene and FGENESH results match? What extra information FGENESH provides compared to HMMgene? What is the difference between exon and ORF in the FGENESH results? Take a look at the pdf file Compare the predicted exon intron boundaries to the one in NCBI accession AY260740. 3) Use MEGA 5 to calculate nucleotide frequencies and defining exon-intron boundaries Open MEGA 5, open the file you saved in the beginning “analyze” “nucleotide sequences” “protein coding” “standard genetic code” Go to Data -> select genes and domains, Remove the existing Data domain (mouse right click), add introns, exons and utr’s as new domains. Use information predicted by HMMgene and FGENESH. Mark exons as coding sequence and define codon start (i.e. which site in the domain corresponds to the first codon position). After closing the editor, open Sequence Data Explorer. Check whether you see conservation around acceptor and donor sites and the start codon. Go to Statistics -> nucleotide composition. Does nucleotide composition change among domains? What is the GC content, how does it vary between different domains? Go to Data -> translate sequences. Check the amino acid composition (in Statistics), does that change among exons? 4) Check the result by exporting the data as PHYLIP, copy the amino acid sequence and do BLASTp. Do you get betaglobins as a result?