BIO 224 Laboratory CSU, Sacramento October 8, 2004 Lab Assignment 6 1) Go to NCBI and search for the human RBP4 gene (use the search menu for Gene) a) What is the genomic location (chromosomal location) for RBP4? b) Which 2 genes flank it? (meaning which two genes are on either side?) (use the full name – not the gene anacronym)? c) How many exons does RBP4 have? And in which exon does the start codon exist ? d) Change the Display to “Gene Table” and list the exons sizes and sequence positions (e.g. exon 1 is 300 bp in length from bases 1-300, exon 2 is 70 bp in length and is from 500-570; preferably, copy and paste the table into this word document) e) On which strand is the gene transcribed? (note the orientation of the genome is 5’ to 3’ for the upper strands, the opposite side is referred to as the reverse complement) f) Examine the Map Viewer link to the homologous region in mouse and rat . What chromosome is the RBP4 gene on in mouse and rat? g) Is the RBP4 gene flanked by the same two genes that you identified above for both mouse and rat? h) If different, how could this be so? 2. Download the genomic nucleotide region (link is NC_000010), the mRNA (NM_006744) and the protein (NP_006735). Copy and paste the Fasta sequences into a word document for the following analyses. a) First, compare the genomic nucleotide sequence to the refseq mRNA sequence using a local alignment program (BLAST2 seq http://www.ncbi.nlm.nih.gov/blast/bl2seq/bl2.html or LALIGN http://xylian.igh.cnrs.fr/bin/lalign-guess.cgi ). Can you correctly identify the exons in this manner? Is the alignment perfect? If not, why might it not be? b) Use the ORF Finder program within NCBI (http://www.ncbi.nlm.nih.gov/gorf/gorf.html ) to find the open reading frames for the genomic nucleotide sequence (NC_000010). Does it match the exon-intron structures as identified in question 1d or is it much more complex? If more complex, why is it more complex? (hint: exon-intron splice junctions are not stop codons) c) Use the GeneSeqer (http://bioinformatics.iastate.edu/cgi-bin/gs.cgi ) program to align your mRNA sequence (NM_006744) with the genomic nucleotide sequence (NC_000010) to determine the exon-intron boundaries. Does this program correctly predict the exon-intron junctions? BIO 224 Laboratory CSU, Sacramento October 8, 2004 d) Use the Splice Site Prediction program (http://www.fruitfly.org/seq_tools/splice.html ) to detect the donor and acceptor splice sites for the genomic nucleotide sequence (NC_000010). (use the default settings). Does the splice site prediction directly match up with the exon-intron structure that is known? Do the splice sites that match have high scores? (note: the actual splice site is within the middle of the donor and acceptor sequences and has a larger font size). c) Then predict the gene structure within the genomic nucleotide region (NC_000010) using the following gene finder programs and then compare their efficiency with regard to their predictive capability (criteria is the ability to correctly detect exon-intron structure and the resulting predicted protein sequence). i) use Gene Mark (http://opal.biology.gatech.edu/GeneMark/ ) (use the Eukaryotic version). Check the box for predicted genes into protein. Compare the exons that it recognized with the known structure above from question 1e. Also, compare the predicted protein sequence using the global ALIGN program (http://www2.igh.cnrs.fr/bin/align-guess.cgi ). ii) use the program GeneScan (http://genes.mit.edu/GENSCAN.html ) and repeat the above process (exon prediction and predicted protein sequence) iii) use GrailEXP (http://grail.lsd.ornl.gov/grailexp/ ) and perform the analysis by checking the box for Galahad EST/mRNA/cDNA alignments (make sure the drop down menu is placed on the first setting “GrailEXP Database ….”) and also check the box below it for Gawain Gene Models (using the options below it for “only use similarities to human ESTs/mRNAs) iv) Which gene prediction was the most successful? Why do you think that it was?