Lab Assignment 1

advertisement
BIO 224 Laboratory
CSU, Sacramento
October 8, 2004
Lab Assignment 6
1)
Go to NCBI and search for the human RBP4 gene (use the search menu
for Gene)
a) What is the genomic location (chromosomal location) for RBP4?
b) Which 2 genes flank it? (meaning which two genes are on either side?)
(use the full name – not the gene anacronym)?
c) How many exons does RBP4 have? And in which exon does the start
codon exist ?
d) Change the Display to “Gene Table” and list the exons sizes and
sequence positions (e.g. exon 1 is 300 bp in length from bases 1-300, exon 2 is
70 bp in length and is from 500-570; preferably, copy and paste the table into this
word document)
e) On which strand is the gene transcribed? (note the orientation of the
genome is 5’ to 3’ for the upper strands, the opposite side is referred to as the
reverse complement)
f) Examine the Map Viewer link to the homologous region in mouse and
rat . What chromosome is the RBP4 gene on in mouse and rat?
g) Is the RBP4 gene flanked by the same two genes that you identified
above for both mouse and rat?
h) If different, how could this be so?
2. Download the genomic nucleotide region (link is NC_000010), the mRNA
(NM_006744) and the protein (NP_006735). Copy and paste the Fasta
sequences into a word document for the following analyses.
a) First, compare the genomic nucleotide sequence to the refseq mRNA
sequence using a local alignment program (BLAST2 seq
http://www.ncbi.nlm.nih.gov/blast/bl2seq/bl2.html or LALIGN
http://xylian.igh.cnrs.fr/bin/lalign-guess.cgi ). Can you correctly identify the exons
in this manner? Is the alignment perfect? If not, why might it not be?
b) Use the ORF Finder program within NCBI
(http://www.ncbi.nlm.nih.gov/gorf/gorf.html ) to find the open reading frames for the
genomic nucleotide sequence (NC_000010). Does it match the exon-intron
structures as identified in question 1d or is it much more complex? If more
complex, why is it more complex? (hint: exon-intron splice junctions are not stop
codons)
c) Use the GeneSeqer (http://bioinformatics.iastate.edu/cgi-bin/gs.cgi ) program to
align your mRNA sequence (NM_006744) with the genomic nucleotide sequence
(NC_000010) to determine the exon-intron boundaries. Does this program
correctly predict the exon-intron junctions?
BIO 224 Laboratory
CSU, Sacramento
October 8, 2004
d) Use the Splice Site Prediction program
(http://www.fruitfly.org/seq_tools/splice.html ) to detect the donor and acceptor splice
sites for the genomic nucleotide sequence (NC_000010). (use the default
settings). Does the splice site prediction directly match up with the exon-intron
structure that is known? Do the splice sites that match have high scores? (note:
the actual splice site is within the middle of the donor and acceptor sequences
and has a larger font size).
c) Then predict the gene structure within the genomic nucleotide region
(NC_000010) using the following gene finder programs  and then compare
their efficiency with regard to their predictive capability (criteria is the ability to
correctly detect exon-intron structure and the resulting predicted protein
sequence).
i) use Gene Mark (http://opal.biology.gatech.edu/GeneMark/ ) (use the
Eukaryotic version). Check the box for predicted genes into protein. Compare
the exons that it recognized with the known structure above from question 1e.
Also, compare the predicted protein sequence using the global ALIGN program
(http://www2.igh.cnrs.fr/bin/align-guess.cgi ).
ii) use the program GeneScan (http://genes.mit.edu/GENSCAN.html ) and
repeat the above process (exon prediction and predicted protein sequence)
iii) use GrailEXP (http://grail.lsd.ornl.gov/grailexp/ ) and perform the analysis by
checking the box for Galahad EST/mRNA/cDNA alignments (make sure the
drop down menu is placed on the first setting “GrailEXP Database ….”) and
also check the box below it for Gawain Gene Models (using the options below
it for “only use similarities to human ESTs/mRNAs)
iv) Which gene prediction was the most successful? Why do you think that
it was?
Download