Exercise3_2015

advertisement
Exercise 3. Inspecting the primary structure of a gene
1) Fetching the sequence
Obtain the human beta globin gene (AY260740) from GenBank (Nucleotide database)
Choose FASTA display and save using Send
2) Find exon-intron boundaries
Use HMMgene, http://www.cbs.dtu.dk/services/HMMgene/ program that can be used to predict genes
of vertebrates and C. elegans. Under “Options”, choose 3 best predictions, check “Predict signals” and
submit. Check http://www.cbs.dtu.dk/services/HMMgene/hmmgene1_1.php for more information.






briefly describe HMMgene
Inspect the scores. Which lines correspond to the best prediction?
How different are different predictions?
How many exons are predicted?
Are all the proposed donor and acceptor sites valid?
Draw the structure of the gene on paper.
Try also FGENESH:
http://linux1.softberry.com/berry.phtml?topic=fgenesh&group=programs&subgroup=gfind
Help: http://linux1.softberry.com/berry.phtml?topic=fgenesh&group=help&subgroup=gfind





briefly describe FGENESH
Do the HMMgene and FGENESH results match?
What extra information FGENESH provides compared to HMMgene?
What is the difference between exon and ORF in the FGENESH results?
Take a look at the pdf file
Compare the predicted exon intron boundaries to the one in NCBI accession AY260740.
3) Use MEGA 5 to calculate nucleotide frequencies and defining exon-intron boundaries
Open MEGA 5, open the file you saved in the beginning
“analyze”
“nucleotide sequences”
“protein coding”
“standard genetic code”
Go to Data -> select genes and domains, Remove the existing Data domain (mouse right click), add
introns, exons and utr’s as new domains. Use information predicted by HMMgene and FGENESH. Mark
exons as coding sequence and define codon start (i.e. which site in the domain corresponds to the first
codon position). After closing the editor, open Sequence Data Explorer.



Check whether you see conservation around acceptor and donor sites and the start codon.
Go to Statistics -> nucleotide composition. Does nucleotide composition change among
domains? What is the GC content, how does it vary between different domains?
Go to Data -> translate sequences. Check the amino acid composition (in Statistics), does that
change among exons?
4) Check the result by exporting the data as PHYLIP, copy the amino acid sequence and do BLASTp. Do
you get betaglobins as a result?
Download