Second bioinformatics lab:Exercise on disease:

advertisement
Second bioinformatics lab: Exercise on disease (developed in part by
Sarah C. R. Elgin, Washington University)
It is well known that smoking leads to an increased risk for lung cancer, but how
does genetics play into the risk? The transformation of a normal cell into a cancerous cell
can result from many causes. In one model, factors that lead to an increased rate of
mutation in DNA increases the chances that a protooncogene (the normal form of a gene)
will be mutated into an oncogene (a cancer-causing gene), causing a normal cell to be
transformed into a cancerous cell.
In this module, you will examine one proto-oncogene, K-Ras that has been
associated with many cancers, including lung cancer. You will be examining a cDNA
sequence for K-Ras that contains a
mutation. You will analyze the mutant KRas protein using the bioinformatics tools
presented in lab. You will investigate the
mutation, and find out what is known, if
anything, about the biological impact of the
mutation.
See our textbook for a discussion of
Ras: pages 407, 412, and 596-597 (6th ed).
Summarizing, insulin or a growth factor
binds to a receptor, two bound receptors
come together (dimerization) and activate
each other. Each growth factor receptor is a
tyrosine kinase that puts a phosphate (i.e.,
phosphorylate) on tyrosines (a particular
amino acid) located on the other receptor’s
tail. Thus, they put phosphate on each
other to activate each other. Next, adaptor
proteins (GRB2, which has an SH2 domain
that binds to phosphorylated tyrosines, and
SOS, which is a guanine nucleotide
exchange factor or GEF) come in and are
activated by binding to phosphorylated
tyrosines located on the receptor. The
adapator proteins then activate Ras. Ras is
off when GDP is bound, but “on” when a
new GTP comes in and displaces the old
GDP. Ras activates Raf, which activates
MEK, which activates map kinase (one map
kinase type is ERK- the kinase that we
studied earlier). ERK turns on transcription
factors that turn on certain genes required
for cell division. Fig. 19-41 is shown (see
also Fig. 14-18). In a cancerous cell, the
mutant Ras cannot be shut off and the cell
will divide and divide to form a tumor.
1
Working with Primary Protein Structure Information
Using the tools from the last lab, we will search for your k-Ras gene in the Gene
database. Gene will contain the RefSeq sequence for your protein, which you will
download in FASTA format. FASTA format is defined in your Glossary. Be sure to
review the FASTA definition before begin your work. QUESTIONS ARE POSTED AT
THE END, TAKE GOOD NOTES, TRY TO ANSWER SOME OF THEM AS YOU GO
ALONG (or else you will have to redo some of this work). Ask me about any answers to
these questions (if you do not know the answer and have made a good attempt). Some of
these questions may be on the final exam.
You will continue to learn about your protein using the SwissProt database. The
SwissProt database is a database maintained by the Swiss Bioinformatics Institute and
contains entries for thousands of proteins. You can search for the k-Ras protein that you
are studying by using the gene name given in Gene. The SwissProt entry contains some
of the same information that you found in Gene, but also contains a lot information about
the protein sequence, structure, and function that is summarized in a short, easy-to-read
format.
The ultimate goal for today’s lab is to create a multiple sequence alignment for
your protein using Clustal W. You will use this alignment to identify the protein
mutation, to observe regions of high sequence conservation, and to make an evolutionary
tree for the evolution of Ras. The protein mutation is important to identify since it is
important for the understanding of the link between the protein structure and cancer. The
regions of high sequence conservation are important because they often correspond to
regions in the protein that are important to the protein’s function (e.g., active site,
regulatory region).
Part 1 – Obtaining the basics: Getting sequence information and viewing
the SwissProt and GenBank entries for your protein.
Directions: Follow this guide sheet and answer the questions at the end of this
file.
Translating your patient’s cDNA
1. Below is the mutant cDNA sequence for a patient with a cancer caused by a mutant kRas protein-highlight all the sequence and then hit control-C to copy to clipboard.
BTW, we know that this is DNA since there are no U bases (only RNA has U, instead
DNA uses T). Also, we always report the coding strand sequence (not the template
strand that actually helps make the mRNA). Furthermore, this cDNA came from RT
PCR since the sequence ends in polyA (polyA tails found in mRNA).
GGCCGCGGCGGCGGAGGCAGCAGCGGCGGCGGCAGTGGCGGCGGCGAAGGTGGCG
GCGGCTCGGCCAGTACTCCCGGCCCCCGCCATTTCGGACTGGGAGCGAGCGCGGC
GCAGGCACTGAAGGCGGCGGCGGGGCCAGAGGCTCAGCGGCTCCCAGGTGCGGGA
GAGAGGCCTGCTGAAAATGACTGAATATAAACTTGTGGTAGTTGGAGCTTGTGGC
GTAGGCAAGAGTGCCTTGACGATACAGCTAATTCAGAATCATTTTGTGGACGAAT
ATGATCCAACAATAGAGGATTCCTACAGGAAGCAAGTAGTAATTGATGGAGAAAC
CTGTCTCTTGGATATTCTCGACACAGCAGGTCAAGAGGAGTACAGTGCAATGAGG
GACCAGTACATGAGGACTGGGGAGGGCTTTCTTTGTGTATTTGCCATAAATAATA
CTAAATCATTTGAAGATATTCACCATTATAGAGAACAAATTAAAAGAGTTAAGGA
CTCTGAAGATGTACCTATGGTCCTAGTAGGAAATAAATGTGATTTGCCTTCTAGA
2
ACAGTAGACACAAAACAGGCTCAGGACTTAGCAAGAAGTTATGGAATTCCTTTTA
TTGAAACATCAGCAAAGACAAGACAGGGTGTTGATGATGCCTTCTATACATTAGT
TCGAGAAATTCGAAAACATAAAGAAAAGATGAGCAAAGATGGTAAAAAGAAGAAA
AAGAAGTCAAAGACAAAGTGTGTAATTATGTAAATACAATTTGTACTTTTTTCTT
AAGGCATACTAGTACAAGTGGTAATTTTTGTACATTACACTAAATTATTAGCATT
TGTTTTAGCATTACCTAATTTTTTTCCTGCTCCATGCAGACTGTTAGCTTTTACC
TTAAATGCTTATTTTAAAATGACAGTGGAAGTTTTTTTTTCCTCTAAGTGCCAGT
ATTCCCAGAGTTTTGGTTTTTGAACTAGCAATGCCTGTGAAAAAGAAACTGAATA
CCTAAGATTTCTGTCTTGGGGTTTTTGGTGCATGCAGTTGATTACTTCTTATTTT
TCTTACCAATTGTGAATGTTGGTGTGAAACAAATTAATGAAGCTTTTGAATCATC
CCTATTCTGTGTTTTATCTAGTCACATAAATGGATTAATTACTAATTTCAGTTGA
GACCTTCTAATTGGTTTTTACTGAAACATTGAGGGAACACAAATTTATGGGCTTC
CTGATGATGATTCTTCTAGGCATCATGTCCTATAGTTTGTCATCCCTGATGAATG
TAAAGTTACACTGTTCACAAAGGTTTTGTCTCCTTTCCACTGCTATTAGTCATGG
TCACTCTCCCCAAAATATTATATTTTTTCTATAAAAAGAAAAAAATGGAAAAAAA
TTACAAGGCAATGGAAACTATTATAAGGCCATTTCCTTTTCACATTAGATAAATT
ACTATAAAGACTCCTAATAGCTTTTCCTGTTAAGGCAGACCCAGTATGAAATGGG
GATTATTATAGCAACCATTTTGGGGCTATATTTACATGCTACTAAATTTTTATAA
TAATTGAAAAGATTTTAACAAGTATAAAAAATTCTCATAGGAATTAAATGTAGTC
TCCCTGTGTCAGACTGCTCTTTCATAGTATAACTTTAAATCTTTTCTTCAACTTG
AGTCTTTGAAGATAGTTTTAATTCTGCTTGTGACATTAAAAGATTATTTGGGCCA
GTTATAGCTTATTAGGTGTTGAAGAGACCAAGGTTGCAAGGCCAGGCCCTGTGTG
AACCTTTGAGCTTTCATAGAGAGTTTCACAGCATGGACTGTGTCCCCACGGTCAT
CCAGTGTTGTCATGCATTGGTTAGTCAAAATGGGGAGGGACTAGGGCAGTTTGGA
TAGCTCAACAAGATACAATCTCACTCTGTGGTGGTCCTGCTGACAAATCAAGAGC
ATTGCTTTTGTTTCTTAAGAAAACAAACTCTTTTTTAAAAATTACTTTTAAATAT
TAACTCAAAAGTTGAGATTTTGGGGTGGTGGTGTGCCAAGACATTAATTTTTTTT
TTAAACAATGAAGTGAAAAAGTTTTACAATCTCTAGGTTTGGCTAGTTCTCTTAA
CACTGGTTAAATTAACATTGCATAAACACTTTTCAAGTCTGATCCATATTTAATA
ATGCTTTAAAATAAAAATAAAAACAATCCTTTTGATAAATTTAAAATGTTACTTA
TTTTAAAATAAATGAAGTGAGATGGCATGGTGAGGTGAAAGTATCACTGGACTAG
GAAGAAGGTGACTTAGGTTCTAGATAGGTGTCTTTTAGGACTCTGATTTTGAGGA
CATCACTTACTATCCATTTCTTCATGTTAAAAGAAGTCATCTCAAACTCTTAGTT
TTTTTTTTTTACAACTATGTAATTTATATTCCATTTACATAAGGATACACTTATT
TGTCAAGCTCAGCACAATCTGTAAATTTTTAACCTATGTTACACCATCTTCAGTG
CCAGTCTTGGGCAAAATTGTGCAAGAGGTGAAGTTTATATTTGAATATCCATTCT
CGTTTTAGGACTCTTCTTCCATATTAGTGTCATCTTGCCTCCCTACCTTCCACAT
GCCCCATGACTTGATGCAGTTTTAATACTTGTAATTCCCCTAACCATAAGATTTA
CTGCTGCTGTGGATATCTCCATGAAGTTTTCCCACTGAGTCACATCAGAAATGCC
CTACATCTTATTTCCTCAGGGCTCAAGAGAATCTGACAGATACCATAAAGGGATT
TGACCTAATCACTAATTTTCAGGTGGTGGCTGATGCTTTGAACATCTCTTTGCTG
CCCAATCCATTAGCGACAGTAGGATTTTTCAAACCTGGTATGAATAGACAGAACC
CTATCCAGTGGAAGGAGAATTTAATAAAGATAGTGCTGAAAGAATTCCTTAGGTA
ATCTATAACTAGGACTACTCCTGGTAACAGTAATACATTCCATTGTTTTAGTAAC
3
CAGAAATCTTCATGCAATGAAAAATACTTTAATTCATGAAGCTTACTTTTTTTTT
TTGGTGTCAGAGTCTCGCTCTTGTCACCCAGGCTGGAATGCAGTGGCGCCATCTC
AGCTCACTGCAACCTCCATCTCCCAGGTTCAAGCGATTCTCGTGCCTCGGCCTCC
TGAGTAGCTGGGATTACAGGCGTGTGCCACTACACTCAACTAATTTTTGTATTTT
TAGGAGAGACGGGGTTTCACCCTGTTGGCCAGGCTGGTCTCGAACTCCTGACCTC
AAGTGATTCACCCACCTTGGCCTCATAAACCTGTTTTGCAGAACTCATTTATTCA
GCAAATATTTATTGAGTGCCTACCAGATGCCAGTCACCGCACAAGGCACTGGGTA
TATGGTATCCCCAAACAAGAGACATAATCCCGGTCCTTAGGTAGTGCTAGTGTGG
TCTGTAATATCTTACTAAGGCCTTTGGTATACGACCCAGAGATAACACGATGCGT
ATTTTAGTTTTGCAAAGAAGGGGTTTGGTCTCTGTGCCAGCTCTATAATTGTTTT
GCTACGATTCCACTGAAACTCTTCGATCAAGCTACTTTATGTAAATCACTTCATT
GTTTTAAAGGAATAAACTTGATTATATTGTTTTTTTATTTGGCATAACTGTGATT
CTTTTAGGACAATTACTGTACACATTAAGGTGTATGTCAGATATTCATATTGACC
CAAATGTGTAATATTCCAGTTTTCTCTGCATAAGTAATTAAAATATACTTAAAAA
TTAATAGTTTTATCTGGGTACAAATAAACAGGTGCCTGAACTAGTTCACAGACAA
GGAAACTTCTATGTAAAAATCACTATGATTTCTGAATTGCTATGTGAAACTACAG
ATCTTTGGAACACTGTTTAGGTAGGGTGTTAAGACTTACACAGTACCTCGTTTCT
ACACAGAGAAAGAAATGGCCATACTTCAGGAACTGCAGTGCTTATGAGGGGATAT
TTAGGCCTCTTGAATTTTTGATGTAGATGGGCATTTTTTTAAGGTAGTGGTTAAT
TACCTTTATGTGAACTTTGAATGGTTTAACAAAAGATTTGTTTTTGTAGAGATTT
TAAAGGGGGAGAATTCTAGAAATAAATGTTACCTAATTATTACAGCCTTAAAGAC
AAAAATCCTTGTTGAAGTTTTTTTAAAAAAAGCTAAATTACATAGACTTAGGCAT
TAACATGTTTGTGGAAGAATATAGCAGACGTATATTGTATCATTTGAGTGAATGT
TCCCAAGTAGGCATTCTAGGCTCTATTTAACTGAGTCACACTGCATAGGAATTTA
GAACCTAACTTTTATAGGTTATCAAAACTGTTGTCACCATTGCACAATTTTGTCC
TAATATATACATAGAAACTTTGTGGGGCATGTTAAGTTACAGTTTGCACAAGTTC
ATCTCATTTGTATTCCATTGATTTTTTTTTTCTTCTAAACATTTTTTCTTCAAAC
AGTATATAACTTTTTTTAGGGGATTTTTTTTTAGACAGCAAAAACTATCTGAAGA
TTTCCATTTGTCAAAAAGTAATGATTTCTTGATAATTGTGTAGTAATGTTTTTTA
GAACCCAGCAGTTACCTTAAAGCTGAATTTATATTTAGTAACTTCTGTGTTAATA
CTGGATAGCATGAATTCTGCATTGAGAAACTGAATAGCTGTCATAAAATGAAACT
TTCTTTCTAAAGAAAGATACTCACATGAGTTCTTGAAGAATAGTCATAACTAGAT
TAAGATCTGTGTTTTAGTTTAATAGTTTGAAGTGCCTGTTTGGGATAATGATAGG
TAATTTAGATGAATTTAGGGGAAAAAAAAGTTATCTGCAGATATGTTGAGGGCCC
ATCTCTCCCCCCACACCCCCACAGAGCTAACTGGGTTACAGTGTTTTATCCGAAA
GTTTCCAATTCCACTGTCTTGTGTTTTCATGTTGAAAATACTTTTGCATTTTTCC
TTTGAGTGCCAATTTCTTACTAGTACTATTTCTTAATGTAACATGTTTACCTGGA
ATGTATTTTAACTATTTTTGTATAGTGTAAACTGAAACATGCACATTTTGTACAT
TGTGCTTTCTTTTGTGGGACATATGCAGTGTGATCCAGTTGTTTTCCATCATTTG
GTTGCGCTGACCTAGGAATGTTGGTCATATCAAACATTAAAAATGACCACTCTTT
TAATTGAAATTAACTTTTAAATGTTTATAGGAGTATGTGCTGTGAAGTGATCTAA
AATTTGTAATATTTTTGTCATGAACTGTACTACTCCTAATTATTGTAATGTAATA
AAAATAGTTACAGTGACAAAAAAAAAAAAAAA
4
2. Go to the Sequence Manipulation Site (http://bioinformatics.org/sms/ ). We want to
get the amino acid sequence from this cDNA sequence from the cancer patient. We will
compare this amino acid sequence with DNA from a person with the normal gene.
3. In the menu to the left, Click on “show translation” found under the heading “DNA
figures.” Paste the above sequence into the first box, and under this box, “Show the
translation for…” you want to click on the drop down box and click “reading frame 2.”
After hitting submit, a new window pops up and
it contains the original nucleotide base sequence
with a one letter amino acid symbol. Note that
the amino acid is listed above the three bases and
that amino acid number 61 is M (stands for
methionine- see table to right from your text)- this
is the actual beginning of the protein (all proteins
begin with methionine). Highlight the web page
with the info and paste into a Word file
(remember to take the word file with you or send
to yourself by email when done).
4. In the menu to the left, Click on “Translate”
found under the heading “DNA analysis”. Clear
the search box, then paste your patient’s cDNA
sequence into the search box. Choose a reading
frame from the pull-down menu. Use “Reading
Frame 2” when translating the sequence at the
Sequence Manipulation Suite. Click “Submit.”
5. You should be able to find the sequence of your protein by finding the first methionine
(M), then continuing until you see the first “*” which is a stop codon. Copy the protein
sequence in that region, starting with the first “M” and paste it into a word document.
Save the results in the same word file that you have started. Now you have saved the file
of the mutant protein sequence.
6. Using “Entrez Gene” on the NCBI website
(http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=search&DB=gene), find the entry
for the protein you are studying by searching with the protein name- note the other
possible ways of searching. Cut and paste in the following gene name: Homo sapiens
KRas2. After the search, you will find 5 entries, pick the one that is “KRAS” and from
Homo sapiens. For this KRAS entry, answer the following:
a. what is the official symbol:
b. note some other names (aliases):
c. located on which chromosome (remember we have 23 pairs of homologous
chromosomes- 46 total with Mom giving you one of the homologous chrosomes, and Dad
giving you the other):
d. Gene ID:
5
7. Click on the highlighted KRAS and you open Entrez Gene for more info on the gene.
Read the summery info—can you understand this information? When it says
“Alternative splicing leads to variants encoding two isoforms that differ in the C-terminal
region” this means that one pre-mRNA is made from the gene, but the processing of the
pre-mRNA can differ (different sections are cut out). So two proteins (isoform a and
isoform b) result from the same gene/pre-mRNA. In this case, isoforms are different
proteins from the same gene. Highlight all on this web page, hit control-c to copy and
then paste into your Word file. When you are all done, print off this Word file and put it
in your notebook.
On this page, look at the Genomic Regions: the 5’ is the beginning and the 3’ is
the end. Note that the short vertical red line means a section that is used for the protein
(coding regions), and the blue section is not (called an untranslated region- UTR-- so,
there is a 5’ UTR at the beginning and a 3’ UTR at the end of the sequence). Note the
first three coding regions are the same for both isoform a and b, but the last one differs
(in isoform b, the last coding region is right up against the 3’ UTR). Next, glance
through the Genomic Context (again, note it is on chromosome 12) and then
Bibliography (you will be asked to look at one paper on this list).
Go down to Pathways and look at KEGG pathway: Insulin signaling pathway
04910. In this path, note that insulin activates the receptor, which activates the insulin
receptor substrate (IRS), which turns on GRB2, SOS, Ras, Raf, MEK, ERK1/2 (same
path we looked at before in Xenopus oocytes). MAKE SURE AS MUCH OF THE
TABLE IS IN VIEW and then hit the “print screen” button and go to your Word file and
hit control-v to paste the pathway into your Word file. Note that the insulin path is a little
different from the growth factor path (see text figure on page 1of this protocol); the
insulin receptor is already dimerized and there is an insulin receptor substrate (IRS)
inserted into the path before Grb2/SOS. The end results in this insulin path are
(1)___________________________________ and (2)___________________________.
On the KRAS page, go down to RefSeq section. Compare variant a and b now (use text
in web site; which exons are used by what isoform/variant)—note which isoform or
variant is rare and which is common:
Click on the mRNA Sequence for “variant (b)” (not variant (a)). Use the RefSeq
entries for the mRNA and protein sequences for K-Ras2 isoform b – also called “variant
(b).” Go down to the protein sequence (/translation=") and save the sequence in
FASTA format in your Word file. Remember this is the un-mutated protein sequencename it the “protein sequence for the normal protooncogene.” Use the font “courier
new” and get rid of all gaps. Also, go to the bottom of the web page and copy the gene –
copy the sequence of bases that begins after ORIGIN, and goes down to number 5281and put it in your Word file for your lab notebook.
6
8. Go to the ExPASy website (http://us.expasy.org/) and search for the SwissProt entry
for your protein using “kras2.” Be sure to select the human protein from the list of
results. Make sure the information in the entry is the same as you saw in the Gene entry.
If your protein is an enzyme, the EC number is a good way to double-check—it is
_______________. You may want to record the SwissProt entry number (primary
accession number P01116) in case you want to find this entry again. Note that we could
get the normal gene here also (at bottom, directly in FASTA format)- you want to save
this version also.
Part 2:: Protein-protein BLAST --Finding homologous (similar) proteins
9. Search for similar proteins by a BLAST (from NCBI home page or:
http://www.ncbi.nlm.nih.gov/BLAST) search using the RefSeq or SwissProt protein
FASTA sequence (the unmutated protein sequence). BLAST is a program that
compares your input sequence to all the sequences in a database (that you choose). This
program aligns the most similar segments between the two sequences (using a scoring
matrix similar to BLOSUM -see entry). This scoring method gives penalties for gaps and
gives the highest score for identical residues. Substitutions are scored based on how
conservative the changes are (a nonpolar small amino acid replaced by a nonpolar large
amino acid). The output shows a list of sequences, with the highest scoring sequence at
the top. The scoring output is given as an E-value. The lower the E-value, the higher
scoring the sequence is. E-values in the range of 1^-100 to 1^-50 are very similar (or
even identical) sequences. Sequences with E-values 1^-10 and higher need to be
examined based on other methods to determine homology. An Evalue of 1^-10 for a
sequence can be interpreted as, “a 1 in 1^10 chance that the sequence was pulled from the
database by chance alone (has no homology to the query sequence).”
First, under Protein, select PSI-PHI BLAST. Then paste the FASTA formatted protein
sequence in the search box. Select the nrprotein database. Click “BLAST” to begin.
You may need to wait a few minutes before the results page opens. On the next page that
appears, you will see that putative conserved domains have been found. Select
“Format.” After obtaining the results, choose 5 sequences from various positions in the
results (under Sequences producing significant alignments). Be sure not to choose any
sequences that are human (see SOURCE), since they are the same as your search
sequence. Choose ones for “lower” animals: rat, Tetraodon nigroviridis, Xenopus,
Rivulus marmoratus, Oryzias latipes (Japanese medaka), etc. The goal is to choose a
variety of sequences that greatly differ in evolutionary distance from the human protein.
Be sure to choose a good variety of sequences from the BLAST search. The more varied
the sequences, the more interesting the resulting phylogram will be.
Be sure the wild type human (RefSeq) and mutant sequences only differ by one
amino acid residue. If more differences are found, there may have been a mistake in the
translation of the mutant sequence.
For each of the five sequences, click on the sequence name to view the GenBank
entry for the sequence. Then view the sequence in FASTA format. Copy and paste all the
FASTA formatted sequences into the same Word file (get rid of any gaps or numbers). At
7
the beginning of this file, make sure that you have your mutant protein sequence (see
very start of this exercise), also in FASTA format.
10. This Word file will be used to create the multiple sequence alignment, so the
formatting is very important. Get rid of all gaps –esp those at the end of each line (go to
the end of each line, and hit delete until you start deleting sequence amino acid symbols).
You should end up with a Word file that contains the 5 sequences from the BLAST
search plus the un-mutated human protein sequence and your mutant sequence for a total
of 7 sequences.
Each sequence should be in FASTA format and contain a title line (starting with >, then
text, then a return). Shorten the text to contain JUST the species information so it will fit
in one line!! For example, you should erase the “gi” line and add in something simpler
like “pig,” “cow,” etc. Your mutant sequence should read “>mutant”. At the end of each
title, be sure to press return to separate it from the rest of the sequence.
Part 3 – Multiple Sequence Alignment
11. Go to the ClustalW website (http://www.ebi.ac.uk/clustalw/index.html) and enter (by
using “copy” and “paste”) all your FASTA formatted sequence into the data entry box.
The default parameters will work for us, except for the output order.
a. Select “input” for the Output order
b. Press “run”
12. When the results come up, click on Edit on the top bar of the internet explorer tool
bar, and then highlight Select All. Or you can highlight the central text (getting rid of the
heading at the top of the page) and copy. After items are highlighted, hit control-c to
copy and then paste into your Word file.
Next, let’s center in on the alignment – click on View Alignment File and copy all and
paste into your Word file. It may look broken up. Follow these steps to make it readable
again.
a. Select the alignment text (highlight it with your mouse)
b. Change the font to size 10 and Courier New
c. Change the page set-up (first hit FILE) to landscape
d. Save the file to your desktop
13. To save the “cladogram tree,” make sure that the diagram is in the center of your
monitor, then hit the “Print Screen/SysRq” key on your keyboard. Then go to your Word
file and make sure that you are on the stop you want the diagram, then hit Control-V or
the paste icon (or under Edit) to put the picture in your Word file. Repeat the process but
first click on “Show as phylogram Tree.” From the web site: “Phylogram is a branching
diagram (tree) assumed to be an estimate of a phylogeny, branch lengths are proportional
to the amount of inferred evolutionary change. A Cladogram is a branching diagram
(tree) assumed to be an estimate of a phylogeny where the branches are of equal length,
thus cladograms show common ancestry, but do not indicate the amount of evolutionary
"time" separating taxa. Tree distances can be shown, just click on the diagram to get a
menu of options. The ".dnd" file is a file that describes the phylogenetic tree.” With your
phylogram, you might note that Xenopus separated into a species about 400 million
8
yeaers ago. Other animals: birds arose about 170 million years ago; Mammals about 220
million years ago; Reptiles about 320 million years ago; Amphibians about 400 million
years ago and fish about 500 million years ago.
Scroll through the file and alignment and make sure none of the blocks of sequences are
separated by a page break. Save and print the alignment- it will be part of your lab
notebook.
OMIM search
14. Search the OMIM database
(http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=OMIM&cmd=search&term=)
by typing in: “lung Cancer” KRAS2. You have to put parentheses around the two words
to make sure that you link them (or you get pages of breast cancer, etc).
15. Click on the first two items that come up on lung cancer and Ras. Read through
them. If you want, when a new page comes up, use Edit, Find (on this web page) and
type in ras to go right to a note about ras. An outline for the entry is provided to the left
of the window-go to the references and read the first article listed. Go to the Allelic
Variants section. Scroll until you see the entry for the Gly12Cys mutation (or G12C).
Answer appropriate questions at the end of this exercise.
Examination of K-Ras structure with Kinemage:
1. Open the program Kinemage (should be on C drive) or download it from the internet.
2. Download the file called: c14Recp.kin and save it next to the Kinemage program.
3. See figure below, small part on right: Ras is essentially a flat plane (made up of 6
BLUE beta-pleated sheets), with GREEN alpha helices sitting on both sides and on the
edge—see lower right hand small figure in illustration below.
#1 Amino Acid
here
#12
Glycine
In P/G1
loop and
helps
bind GTP
Gly12 here
Find the structures of Ras noted in the figure above with Kinemage. Open Mage
program, open file c14Recp.kin and then you have to go from the first figure (that of
Src) to the second (showing Ras) – you will not need the third figure.
9
4. Ras: View1 of 4. The backbone of the amino acid chain is in white, the bound GTP
analogue in pink, and the Mg++ in yellow. Mg is captured by Ras to help the protein
bind to the phosphate part of GTP- thus, Mg is a cofactor (also a “trace element”required in trace amounts). Rotate the Ras so that you get the same arrangement as
shown in the PowerPoint slide (GTP on lower right side, see n terminus on the left side)the flat sheet of the 6 beta strands is up and down, with the GTP binding site on top (see
figure above). Make sure you read the captions for each of the four views of Ras. Zoom
out (top right slider bar) to find n terminus- remember that the P loop is just down the
chain from the N terminus (the beginning of the protein). In this orientation, which of the
two G’s represent Glycine 12 (versus Glycine 13)? The G on the left or right?
With the GTP site on lower right, can you count down from the N terminus?
5. View2 (remember how you go from the first view to the second?): Glycine 12 (where
the most common mutation occurs) and Glycine 13 are labeled in green and are found on
what is called the G1 or P loop. These 2 Glycines are located in a critical part of one of
the main GTP-binding loops (the G1 or P loop). They are the two major sites of
mutations that convert this enzyme into an oncogene - when these Gly's mutate to Cys,
the GTP cannot be broken down (GTPase activity of ras is reduced) so Ras stays in the
"on" state more of the time- causing cancer.
Draw (block diagram only, not atoms and bonds) GTP structure, then point to and name
the three parts of GTP
Can you identify the 3 parts of GTP parts in the kinemage image?
6. To see details of interactions at the binding site, go to View3 and turn on "interact"
(click on its box). Now, you can see the R group sidechains in cyan and weak H bonds in
purple. Remember that you can zoom in, and alter the Z slab so that you can see atoms
behind the plane of view.
With the view three and interaction shown, you
can also click on the atoms and find the number,
name of the amino acid and what atom of the
amino acid (the beginning of the amino acid has
an amino group; for glycine 12, click on parts of
the amino acid white chain and looking for “n;”
the tail end of the amino acid has the carboxy
group noted by “c,” and “ca” stands for the
alpha carbon in the middle of the amino acid).
See figure to right (Fig. 3-3 in sixth ed) showing
how two amino acids are connected). In the
Kinemage image, find the “amino” group NH at
the beginning of glycine and the beginning of
10
the second amino acid alanine. Note the R group comes off of the alpha (or central)
carbon and, for glycine, the R group consists of two H atoms. The R group of Alanine
has a carbon and three hydrogen atoms (called a methyl group).
7. Glycine 13 interacts with what part of GTP? (see view three, and click on interactions
to see blue lines representing weak H bonds; which of GTP’s 3 parts interacts with
glycine 13?).
8. Does glycine 12 (the one that mutates most commonly) have any weak interactions
(blue line) with the GTP? Yes or No (circle one)
How do you interpret this answer?
9. Which of the three parts of GTP interacts with the cofactor (in yellow)?
10. How many weak bonds link GTP to Ras?
11. Draw and compare the R group of glycine with that of cysteine; which R group is
larger?
12. What type of amino acid are they (remember there are three types of amino acids
based on the R group)? Why might changing from a G to a C cause a problem?
Questions over bioinformatics exercise on Ras and cancer
1. In the first or second reference from step 7 on page 6 (Entrez Gene, bibliography)
above, go to a paper and read the abstract: who is the first author on this first article and
what is the reference for this article (journal name, volume, page numbers, year)?
2. Describe how many and what categories of patients were involved in the study?
3. What did the researchers find out about K-Ras mutations?
11
4. What conclusion(s) did the researchers come to about K-Ras mutations based on their
data? (Summarize and put into your own words)
5. Using your Word file, look at the gene sequence for the normal k-Ras, and the mutant,
cancer causing k-Ras. Look for the point mutation in the genes. To do this, use your
“The Sequence Manipulation Suite: Show Translation- for the cancer patient’s mutant
ras: Results for 5312 residue sequence starting "GGCCGCGGCG"” and the equivalent
for the normal gene (see hint below). Find the first M or methionine (proteins begin with
this amino acid), and then count down to the 12th amino acid.
What is the three base code in the normal gene for this 12th amino acid--specifies G: ___
What is the three base code in the mutant gene for the 12th amino acid--specifies C:____
Hint: see my alignment below…note that the amino acid is listed above the first base of
the triplet that specifies the amino acid, start with M (methionine) and count down 12
amino acids…
NORMAL GENE:
181 aatgactgaa tataaacttg tggtagttgg agct ??? ggc gtaggcaaga gtgccttgac
Mutant gene;
61
M T E Y K L V V V G A
C G V G K S A L T
181 AATGACTGAATATAAACTTGTGGTAGTTGGAGCT ???GGCGTAGGCAAGAGTGCCTTGAC
Note that the M is above ATG- this is where
the gene actually starts. Look at your Genetic
code table; why does it say methionine is
AUG? The genetic code table gives the codon
for an amino acid from mRNA- not DNA.
However, gene DNA sequences are given for
the coding strand of DNA (not the template
strand because the coding strand and mRNA
are the same except that T is replaced with a
U in the mRNA).
DNA:
Coding strand: ---ATG--Template strand: ---TAC--mRNA:
---AUG---
protein:
methionine
Note how A pairs with T (or U in mRNA) and
G pairs with C.
6. List the steps from insulin to the final cellular events- use the Kegg pathway for
insulin.
12
7. Elk-1, c-Jun, and c-Fos are transcription factors. In general, what do they do as a result
of Ras activation?
8. If Ras were mutated to be always active, what part of the pathway becomes irrelevant?
Concerning its Entrez Gene entry
9. Fill in the following information
a. Write the GeneID number here ________________.
b. What is the gene name?
c. Where in the human genome is this gene located?
d. What is the RefSeq number for the mRNA sequence for isoform b?
e. What is the RefSeq number for the protein sequence for isoform b?
Concerning Swiss-Prot Entry
10. How many splice variants are there of K-Ras2 and what are they called?
11. Describe how K-Ras2 is activated and inactivated.
12. What proteins does K-Ras2 interact with? (Hint: GDP and GTP are not proteins)
Concerning Multiple Sequence Alignment with ClustalW:
13. What is the mutation in the amino acid mutant sequence? Write it in the following
format “Res123Res” where the first Res is the three-letter code for the amino acid in the
un-mutated (wild type) protein and the second Res is the amino acid in the mutated
protein. In place of “123” put the amino acid residue number of the mutation.
14. Is the mutation in a region of conservation (look for sequences with many * or :
(blanks mean no conservation; what does the colon mean?)? YES
NO
If the sequence stays the same throughout the evolutionary path, the sequence (or part of
a gene/protein) is said to be conserved- this probably means that the sequence plays a
crucial role in the active site (or in a regulatory region).
15. Based on the alignment, what span of amino acids is LEAST conserved? Does this
correlate with the region specified in the Swiss/Prot entry as “hypervariable”? These
sections or sequences can vary over evolutionary time because they may not be important
to the functioning of the protein (they are probably not in the active site; although the
sequence could play a role in regulation).
13
Download