251-06 Exam1 2-15

advertisement
Bio/CS 251 Bioinformatics
EXAM 1
Dr. James
February 15, 2005
Spring 2006
1. (10 pts) A Bioinformatics student was asked to draw the chemical structure of an adenine- and thyminecontaining dinucleotide derived from DNA. The student’s answer is shown below. The student made more
than six major errors. One of them is circled, numbered, and explained.
a. Find four (4) other errors, circle them, number them, and briefly explain what is wrong and how it would
look if correct.
#1 – extra phosphate should
not be present
#2 – Uracil not present in DNA; this base should be thymine (5-methyl-uracil) or cytosine
#3 – Adenine base structure: reverse N8 and C9 to N9 and C8
#4 – Adenine bond to deoxyribose is incorrect: C1’ of deoxyribose should bond to N9 of Adenine
#5 – Terminal sugar is ribose, not deoxyribose; replace 2’ OH with H
#6 – C5’ of 2nd deoxyribose is bonded to 5 atoms. Carbon can bond to only 4 atoms. The –OH group
must be removed.
#7, 8 – Various double bonds are misplaced in both the Adenine and Uracil rings
b. On the diagram, correctly number each ring atom of the sugar.
c. Draw an arrow along the diagram to show the strand polarity, indicating the 5’ and 3’ end of the strand.
2. (14 points) Consider the following peptide sequences, written from NH2- to COOH end:
(refer to your handouts where needed)
1
H
2
PC
3
PC
4
H
5
6
7
PU PC PU
8
H
9
PU
10
PU
11
H
12
H
13
PU
14
H
15
H
16 17 18
PC PU H
I - Leu – Arg – Asp – Val – Tyr – His – Gln – Leu – Asn – Ser – Val – Met – Thr – Trp – Leu - Glu – Asn – Ile –
1
2
3
4
5
6
7
8
9
10
11
12
13
14 15 16
17
18
o
+
o
+
o
+
o
+
o
+
o
+
+
o
o
+
o
+
II – Ile – His – Val – Lys – Trp – Arg – Ala – Lys – Leu – Arg – Met – Lys – Glu – Pro – Gly – Arg – Leu – Lys (o = hydrophobic; + = Charged, basic)
a. Above each aa in the first peptide (I), record the polarity as Hydrophobic (H), Polar Uncharged (PU), or
Polar Charged (PC).
b. Choose the peptide most likely to form an amphipathic –helix, and do the following:
Peptide I will form an amphipathic –helix
(1) Using the rules for formation of an –helix, record the first five pairs of aa that would interact
via H-bonding to form the -helix, by filling in the blanks below (e.g., Tyrx – Trpy, etc).
__Leu__
__Tyr___
__Arg__
__His___
__Asp__
__Gln__
__Val__
__Leu__
__Tyr__
__Asn__
(2) Based on your understanding of –helix structure, explain how you know that this –helix would be
amphipathic.
An amphipathic molecule is one that is charged at one end, or along one side, and uncharged at
the other end, or on the opposite side. An amphipathic –helix has a hydrophobic face and a
hydrophilic face. An –helix contains ~3.6 aa per turn. Thus, every 3rd or 4th amino acid will
occupy the same face, or side, of the helix. In an –helix with a hydrophobic
face, one would expect that hydrophobic aa would occupy, e.g.,
position # 1 -- 3/4 -- 7 -- 11 -- 14 -- 17/18 -- 21 -- etc.
In peptide #1 above, the occurrence of hydrophobic residues is close to the expected pattern.
c. Which peptide, or part of a peptide, would most likely participate in forming a -sheet? Explain,
pointing out the basic features that would occur in this -strand. (…get the pun?)
In peptide II, aa 1-12 would form an amphipathic -strand, with hydrophobic aa (o) extending above
and basic aa (+) extending below the plane of the -sheet.
d. Which peptide would most likely contain a bend, or U-turn? Explain by drawing the relevant
tetrapeptide, and show how this bend would be stabilized by ionic interactions between two of the four
R-groups.
Peptide II would contain a bend, involving aa 13-16, as shown below. Proline-14 would create the
bend, Glycine-15 would cause least interference with the bend since its R-group is a simple H, and
the U-turn would be stabilized by ionic bonding between the oppositely charged R-groups of
Glu-13 and Arg-16.
3. (6 pts) Examine the two short oligonucleotides below, and answer the following questions:
5’ T G C T A C G A A T C A G T C A C 3’
5’ T A T A A A G G G G G T T T A T A 3’
a. Which one of these molecules is more likely to form a stem-loop secondary structure (a hairpin)?
Draw out this oligonucleotide’s secondary structure as it would appear in two dimensions.
G
G
G
G
T
T
 
A A
T
A

A
T
T

A
A
3’

T 5’

G
b. Which one of these oligonucleotides would have a higher Tm (melting temperature), i.e., if each
molecule was double-stranded, which one would anneal more stably with its complementary
strand? Briefly explain why.
The GC content of the top oligonucleotide is higher (8 G or C) than the bottom oligonucleotide
(5 G). Therefore, the top strand would form a more stable double-stranded molecule than the
bottom strand, which is more AT-rich (and therefore less stable).
4. (40 points) Please refer to the Genetic Code handouts for this question:
Consider the bacterial DNA sequence below, which codes for part of a gene:
__________________________________5’ UTR_________________________________

RBS
5’
3’
A T C C A C G G A C C G C A G G A G G T C C A A G T G A C C G
T A G G T G C C T G G C G T C C T C C A G G T T C A C T G G C
A U C C A C G G A C C G C A G G A G G U C C A A G U G A C C G
__

T A T G T A C G T T G C C C T A G A G A A T A C C A T A A A C G A A
A T A C A T G C A A C G G G A T C T C T T A T G G T A T T T G C T T
mRNA
3’
5’
U A U G U A C G U U G C C C U A G A G A A U A C C A U A A A C G A A
M
Y
V
A
L
E
N
T
I
N
E
a.
If you know that the bottom strand is the anti-coding (antisense) strand, in which direction must the
gene be transcribed and translated? Briefly justify.
The anti-coding strand is also called the antisense or template or mRNA-unlike strand.
The top strand must therefore be the coding, or sense strand. A gene is always oriented from
5’ to 3’ on the coding strand (left to right in this case), and is transcribed from 5’ to 3’ (left to
right) from the template strand.
b.
The first base of the sequence is the +1 startsite.
(1) Does the sequence shown above contain a promoter? Why or why not?
This sequence cannot contain a promoter, since it begins at the +1 startsite of
transcription. The promoter lies upstream (to the left of) the startsite. The promoter
DNA sequence recruits RNA Polymerase to the DNA, and determines where RNA
polymerase will begin transcription (at the +1 startsite).
(2) Write the first base in the RNA sequence here ___A____
The template base T on the bottom strand will serve to specify an A as the 1st nucleotide
of the transcript.
c.
On the sequence above, do the following:
(1) Underline and label the sequence in the DNA that enables a ribosome to bind to the
corresponding mRNA. What is the name given to this sequence?
AGGAGGT in the coding strand of the DNA. In the mRNA the corresponding AGGAGGU
would serve as the Ribosome Binding Site (RBS), which is also known as the ShineDalgarno sequence.
(2) Find the beginning of the coding region, and underline each codon on the mRNA-like strand of
the DNA.
See the diagram above
(3) Underline and label the 5’ UTR for this gene. The 5’ UTR begins at the +1 startsite and ends
just prior to the START codon (ATG). It is designated by the orange line.
(4) Beginning with the first codon in the DNA sequence, transcribe the coding region, codon by
codon, aligning this transcribed sequence underneath the DNA sequence.
(5) Underneath your transcript, translate the sequence into one-letter amino acid code (e.g., C for
Cysteine). How do you know that you have the correct peptide sequence?
Hint: Happy Valentine’s Day!
(6) Consider just the first three amino acids:
For each aa, write all of the possible codon sequences here (leave blanks empty, as
appropriate):
aa1
aa2
aa3
___AUG_____
___UAC_____
__GUU____
___________
___UAU_____
__ GUC____
___________
___________
___GUA___
___________
___________
___GUG____
(a) Determine the minimal # of tRNAs that could be used to translate these three amino acids,
as follows:
In the space below, draw the anticodon-codon pairing that could occur for
each set of codons that can be served by a single tRNA. In other words, show each unique
tRNA anticodon only once -- do not repeat the same tRNA.
A-C:
Codon:
3’ U A C
5’ A U G
5’
3’
3’ A U G 5’
5’ U A C 3’
U
A-C:
Codon:
3’ C A I 5’
5’ G U A 3’
C
U
3’ C A C 5’
5’ G U G 3’
(b) Determine what fraction of mutations at the first, second, and third codon positions will be
synonymous.
1st position synonymy:
0/9 = 0%
(c)
(d)
2nd position synonymy:
0/9 = 0%
3rd position synonymy:
4/9 = 44.4%
List the transversion mutations in the three codons above.
GUU  GUA
GUC  GUA
GUU  GUG
GUC  GUG
Does a transversion mutation always cause a non-synonymous substitution? Explain.
No. In 4-fold degenerate codons, transversions at the 3rd position are always
synonymous. However, transversions at the 1st and 2nd positions are always
non-synonymous, and in 2-fold degenerate codons transversions are always
non-synonymous. In 3-fold degenerate ILE codons, transversions from
AUA  AUC or AUU are synonymous, and vice versa, but transversions from
AUC or AUU  AUG are non-synonymous.
Multiple choice: (2 points apiece, #s 5-7)
5. If Inosine were a legitimate base in DNA
a. the rules for complementary base pairing would be corrupted.
b. the integrity of DNA would be compromised by frequent deaminations of Adenine to Inosine.
c. I:A base pairing would distort the diameter of the double helix.
d. All of the above.
e. None of the above.
6. The chain reaction leading to Severe Combined Immunodeficiency Syndrome (SCID) occurs because
a. Ribonucleotide reductase (RNR) converts too much ATP to dATP.
b. Excess guanine and adenine cannot be reused to make guanosine and adenosine.
c. Adenosine deaminase fails to convert excess deoxyadenosine to deoxyinosine.
d. Excessive amounts of DNA accumulate in the nucleus of white blood cells.
7. In which one of the following ways does the initiation of transcription differ between prokaryotes and
eukaryotes?
a. In prokaryotes only, an AT-rich region near the transcription startsite must be melted apart.
b. In eukaryotes only, the promoter is transcribed along with the rest of the gene.
c. In prokaryotes only, many accessory proteins affect transcription by binding to additional
promoter, enhancer, and silencer elements in the DNA.
d. In eukaryotes only, RNA Polymerase cannot directly recognize promoter DNA sequences,
and instead relies upon the basal transcription complex.
8. (1 pt per answer) Circle all of the correct answers.
Mutations may eventually result from…
a. Copying mistakes that occur during DNA replication
b. DNA damage caused by, e.g., sunlight.
c. Deamination events, e.g, the oxidation of cytosine to uracil.
d. Loss of DNA surveillance proteins such as the mutS/MSH2 gene.
For this question open up a word document, Exam1_PAH, that you will save to your H drive. As you
answer the questions, type them in the word document properly identified as to what you are answering
and allowing space between your answers for clarity.
We suggest that you begin this question first and work on other questions while you are waiting for reply
from the web sites.
When you have finished this question, save the document in your H drive and also mail it to:
sjames@gettysburg.edu
9. (20 pts) Use the following web sites to locate and analyze the PAH gene from one bacterial species
and humans
http://www.ncbi.nlm.nih.gov/
a. Use NCBI to find the PAH gene from a bacterial species such as Xanthomonas or Bacillus
Provide in your answer (1) the full name of the organism (genus and species), (2) the full name
of the protein, and (3) show the protein sequence in a FASTA format.
I searched “Gene” for “PAH Bacillus cereus”
(1) full name of organism:
[Bacillus cereus E33L]
(2) full name of the protein:
phenylalanine 4-monooxygenase
(phenylalanine-4-hydroxylase)
pah phenylalanine 4-monooxygenase (phenylalanine-4hydroxylase) [Bacillus cereus E33L]
GeneID: 3026076 Locus tag: BCZK4102 updated 03-Dec-2005
SummaryGene type: protein coding
Gene name: pah
RefSeq status: Provisional
Organism: Bacillus cereus E33L (strain: E33L)
Lineage: Bacteria; Firmicutes; Bacillales; Bacillaceae; Bacillus; Bacillus
cereus group
(3) Link to protein sequence:
AAU16167
Protein sequence in a FASTA format:
AAU16167. Reports phenylalanine 4-m...[gi:51974617] BLink, Conserved Domains, Links
>gi|51974617|gb|AAU16167.1| phenylalanine 4-monooxygenase
(phenylalanine-4-hydroxylase) [Bacillus cereus E33L]
MTKKTEIPSHLKPFVSTQHYDQYTPVNHAVWRYIMRQNHSFLKDVAHPAYVNGLQSSGINIEAIPKVEEM
NECLASSGWGAVTIDGLIPGVAFFDFQGHGLLPIATDIRKVENIEYTPAPDIVHEAAGHAPILLDPTYAK
YVKRFGQIGAKAFSTKEEHDAFEAVRTLTIVKESPTSTPDEVTAAENNVLEKQKLVSGLSEAEQISRLFW
WTVEYGLIGNIDAPKIYGAGLLSSVGESKHCLTDAVEKVPFSIETCTSTTYDVTKMQPQLFVCESFEELT
EALEKFSETMAFKTGGKEGLEKAIRSENHATAELNSGLQITGTFTETIENDAGELIYMRTSSPTALAIHN
KQLANHSTSVHSDGFGTPIGLLTENIALENCTDEQLQSLGITIGNKAAFTFASGIHVKGTVTDIVKNDKK
IALISFINCTVTYNDRVLFDASWGSFDMAVGSTITSVFPGAADAAAFFPMDEEIQEIPAPLVLNELERMY
QTVRDIRNEGILHDAHIEQLVAIQEVLNKFYTKEWLLRLEILELLLEHNKGHETSAALLQQLSTFTTDEA
VTRLINNGLTLLPVKDVKNDATIN
b. Use NCBI to find the human counterpart(s) of the bacterial PAH gene. Into your document
paste the human-bacterial alignment for the PAH gene.
Go to the NCBI homepage, choose BLAST, then do a protein-protein Blast search (Blastp)
Alignment between Bacillus cereus PAH and human PAH (the human gene was …..th in the
list of Descriptions
>gi|18765885|gb|AAL78816.1|
Length=452
phenylalanine hydroxylase [Homo sapiens]
Score = 97.1 bits (240), Expect = 2e-18
Identities = 64/234 (27%), Positives = 105/234 (44%), Gaps = 46/234 (19%)
Query
58
Sbjct
218
Query
117
Sbjct
278
Query
177
Sbjct
314
Query
237
Sbjct
353
GINIEAIPKVEEMNECLAS-SGWGAVTIDGLIPGVAFFDFQGHGLLPIATDIRKVENIEY
G + + IP++E++++ L + +G+
+ GL+
F
+
IR
Y
GFHEDNIPQLEDVSQFLQTCTGFRLRPVAGLLSSRDFLGGLAFRVFHCTQYIRHGSKPMY
116
277
TPAPDIVHEAAGHAPILLDPTYAKYVKRFGQIGAKAFSTKEEHDAFEAVRTLTIVKESPT
TP PDI HE GH P+ D ++A++ + G
A
TPQPDICHELLGHVPLFSDRSFAQFSQEIGLASLGA------------------------
176
STPDEVTAAENNVLEKQKLVSGLSEAEQISRLFWWTVEYGLIGNIDAPKIYGAGLLSSVG
PDE
E+++ ++W+TVE+GL
D+ K YGAGLLSS G
--PDEYI-------------------EKLATIYWFTVEFGLCKQGDSIKAYGAGLLSSFG
236
ESKHCLTDAVEKVPFSIETCTSTTYDVTKMQPQLFVCESFEELTEALEKFSETM
E ++CL++ + +P +E
Y VT+ QP +V ESF + E + F+ T+
ELQYCLSEKPKLLPLELEKTAIQNYTVTEFQPLYYVAESFNDAKEKVRNFAATI
290
406
313
352
c. Answer the following questions in your document. Clearly indicate what question you are
answering.
(1) What type of algorithm was used to make the bacterial-human alignment? Blastp
(2) What proportion of the two sequence alignments are gapped? 19% (46/234)
(3) How similar are the two proteins? What two features of the alignment are used to
determine the overall similarity of the two proteins?
44% similar (105/234). The similarity value is calculated by summing the identities (64/234 –
27%) and the conservative amino acid substitutions.
d. Find the Entrez Gene page for this human gene from the NCBI site and use it to answer the
following questions:
Here is the Entrez Gene summary:
PAH phenylalanine hydroxylase [Homo sapiens]
GeneID: 5053 Primary source: HGNC:8582
updated 03-Feb-2006
SummaryOfficial Symbol: PAH and Name: phenylalanine
hydroxylase provided by HUGO Gene Nomenclature Committee
See related: HPRD:08943, MIM:261600
Gene type: protein coding
Gene name: PAH
Gene description: phenylalanine hydroxylase
RefSeq status: Reviewed
Organism: Homo sapiens
Lineage: Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
Mammalia; Eutheria; Euarchontoglires; Primates; Catarrhini; Hominidae;
Homo
Gene aliases: PKU; PKU1
Summary: PAH encodes the enzyme phenylalanine hydroxylase that is the ratelimiting step in phenylalanine catabolism. Deficiency of this enzyme activity
results in the autosomal recessive disorder phenylketonuria.
(1) What is the length of the gene? 79278 bp, from link to NC_000012
(2) What is the length of the mRNA transcript? Calculate the proportion of the gene that is
codegenic. (express this as a %)
mRNA Sequence NM_000277 : 2680 bp
The proportion of the gene that is codegenic, i.e., composed of sequences that code for
protein, is approximately
2680/79278 = 0.0338 = ~3.4%
(3) How many exons and introns does the human gene contain?
Starting from the Entrez Gene page above, select “Gene Table” in the “Display” box.
You will see that the human PAH gene contains 13 exons and 12 introns, as follows:
Exon information:
NM_000277 length: 2681 bp, number of exons: 13
NP_000268 length: 453 aa, number of exons: 13
EXON
Coding EXON
INTRON
coords
length
coords
length
coords
length
1 - 533
533 bp
474 - 533
60 bp
534 - 4705
4172 bp
4706 - 4813
108 bp
4706 - 4813
108 bp
4814 - 22685 17872 bp
22686 - 22869 184 bp
22686 - 22869 184 bp
22870 - 40053 17184 bp
40054 - 40142 89 bp
40054 - 40142 89 bp
40143 - 50940 10798 bp
50941 - 51008 68 bp
50941 - 51008 68 bp
51009 - 62271 11263 bp
62272 - 62468 197 bp
62272 - 62468 197 bp
62469 - 64653 2185 bp
64654 - 64789 136 bp
64654 - 64789 136 bp
64790 - 65847 1058 bp
65848 - 65917 70 bp
65848 - 65917 70 bp
65918 - 70652 4735 bp
70653 - 70709 57 bp
70653 - 70709 57 bp
70710 - 73172 2463 bp
73173 - 73268 96 bp
73173 - 73268 96 bp
73269 - 73824 556 bp
73825 - 73958 134 bp
73825 - 73958 134 bp
73959 - 77088 3130 bp
77089 - 77204 116 bp
77089 - 77204 116 bp
77205 - 78385 1181 bp
78386 - 79278 893 bp
78386 - 78429 44 bp
(4) Where did you find this information? From Gene Table, as described above
(5) On what chromosome does this gene reside? Specify the subchromosomal location,
e.g. 15q7.2
chromosome: 12; Location: 12q22-q24.2
e. PAH gene: Is the PAH gene associated with a disease or syndrome? Find the OMIM file for
this gene and list one disease associated with defects in this gene.
+261600 PHENYLKETONURIA
Alternative titles; symbols
PKU
PHENYLALANINE HYDROXYLASE DEFICIENCY
PAH DEFICIENCY
OLIGOPHRENIA PHENYLPYRUVICA
FOLLING DISEASE
PHENYLALANINE HYDROXYLASE, INCLUDED; PAH, INCLUDED
PKU1, INCLUDED
HYPERPHENYLALANINEMIA, MILD, INCLUDED; HPA, INCLUDED
PHENYLALANINEMIA, INCLUDED
Gene map locus 12q24.1
DESCRIPTION
Phenylketonuria is an inborn error of metabolism resulting from a deficiency of
phenylalanine hydroxylase (EC 1.14.16.1) and characterized by mental
retardation. There are other causes of hyperphenylalaninemia; Scriver et al.
(1994) reviewed the hyperphenylalaninemias of man and mouse.
CLINICAL FEATURES
Early diagnosis of phenylketonuria (PKU), a cause of mental retardation, is
important because it is treatable by dietary means. Features other than mental
retardation in untreated patients include a 'mousy' odor; light pigmentation;
peculiarities of gait, stance, and sitting posture; eczema; and epilepsy (Paine,
1957). Kawashima et al. (1988) suggested that cataracts and brain calcification
may be frequently overlooked manifestations of classic untreated PKU. Brain
calcification has been reported in dihydropteridine reductase (DHPR) deficiency
(261630). Pitt and O'Day (1991) found only 3 persons with cataracts among 46
adults, aged 28 to 71 years, with untreated PKU. They concluded that PKU is
not a cause of cataracts. Levy et al. (1970) screened the serum of 280,919
'normal' teenagers and adults whose blood had been submitted for syphilis
testing. Only 3 adults with the biochemical findings of PKU were found. Each
was mentally subnormal. Normal mentality is very rare among patients with
phenylketonuria who have not received dietary therapy.
The basic defect in PKU is phenylalanine hydroxylase deficiency. Evidence of
heterogeneity in phenylketonuria was presented by Auerbach et al. (1967) and
by Woolf et al. (1968).
Download