Problem Set 5

advertisement
Problem Set 5
(Due Dec. 9th, Tuesday, 8pm EST)
Please make sure to show your work and calculations and state any assumptions you
make in answering the following questions. Include the names of the people you worked
with at the top of your problem set.
Here’s a summary of files you need to submit:
If your name is John Harvard and you’re in Lan Zhang’s section:
JohnHarvard_ps5_LZ.doc
JohnHarvard_ps5_LZ.pdb
JohnHarvard_ps5_LZ.xls
1. Protein structure (35 points total)
You may find the following resources useful in answering this part:
 The Protein Data Bank (PDB) - http://www.rcsb.org/pdb/
 MacroMolecular DataBase (MMDB) - http://www.ncbi.nlm.nih.gov/Structure/
 Rasmol - http://www.umass.edu/microbio/rasmol/getras.htm (for v.2.6b2a) or
http://www.bernstein-plus-sons.com/software/rasmol/ (for v. 2.7.2.1)
 Biology WorkBench - http://workbench.sdsc.edu/
 Swiss-PdbViewer - http://us.expasy.org/spdbv/
In 1999, the atomic structure of GCSF-receptor complex was resolved. The following
paper describes discovery of this new cytokine-receptor recognition method and will be
useful in answering questions in this part.
Aritomi M, Kunishima N, Okamoto T, Kuroki R, Ota Y, Morikawa K. Atomic structure of
the GCSF-receptor complex showing a new cytokine-receptor recognition scheme.
Nature. 1999 Oct 14;401(6754):713-7.
1.1. The Protein Data Bank (PDB) and MacroMolecular DataBase (MMDB) (14 pts)
1.1.1 Complex of GCSF with its Receptor. According to the Aritomi et al. paper,
unlike the 1:2 ligand-receptor stoichiometry found in GH and EPO complexes with their
respective receptors, the stoichiometry of GCSF ligand with its receptor is 2:2. Go to the
Protein Data Bank (PDB). Using the accession code found at the very end of the Aritomi
et al. paper, download the .pdb crystal structure file that has no more than one 2:2
complex in asymmetric unit to your local computer. (Hint: Examine Table 1 in the article
to determine which form satisfies this criterion and then match the accession code with
that form.)
Page 1 of 12
1.1.1.1 What is the accession code of .pdb crystal structure file that you retrieved from
PDB (1pt)?
1.1.1.2 Open the .pdb file you downloaded in a text editor and examine it. Since you
downloaded the crystal structure which had no more than one 2:2 complex in asymmetric
unit, and applying what you already learned about GCSF and its Receptor, how many
distinct protein chains do you expect to find in this .pdb file and how many do you find
(1pt)? For each chain, indicate the chain label and whether it is ligand or receptor (2 pts).
1.1.1.3 Examine the .pdb file further in the text editor. A helpful Guide to the PDB File
Format can be found at:
http://www.rcsb.org/pdb/docs/format/pdbguide2.2/guide2.2_frame.html
Scroll down to the “ATOM” coordinate section. There are twelve columns of data,
including the leftmost section label ATOM. What does each of these columns indicate?
List the 12 definitions used in our .pdb file. Be careful, since CA indicates something
different in the protein from C, the Guide indicates the type of data that should be found
in a given column, and some columns in the Guide may not be in our .pdb. (6pts total, 0.5
pt per column)
1.1.2 Related domains of GCSF and GCSF Receptor. Go to the MacroMolecular
DataBase (MMDB) and enter the accession code in the search field to obtain an MMDB
structure summary. What are the conserved and 3D domains for each of the chains in our
.pdb file? List the chain name and any conserved and 3D domains that belong to it by
number and description by filling in the chart below (4 pts total, 1 pt per row). If for one
chain, there is no conserved or 3D domain indicated on the MMDB page, just leave it
blank. [Hint: Notice the symmetry!]
Chain Name Conserved Domain (CD) Number 3D Domain Number
and Description
and Description
1.2 Rasmol (12 pts)
Binding of GCSF to its Receptor. Go to the Rasmol site, download and install the
Rasmol program appropriate for your computer architecture to your local computer. Start
the Rasmol program (which opens two windows, Rasmol display window and Rasmol
command line window) and open our .pdb file. Use the Rasmol Reference Manual for
help on how to use the command line or using mouse controls to view and manipulate the
crystal structures, as well as other operations.
Page 2 of 12
1.2.1 Let’s examine the overall structure of the 2:2 dimer. Change the Display to
“Cartoons” and Colors to “Structure.” Under Options, enable “Labels” so you know
what structural features are found on which chains. What is the predominant secondary
structural feature of GCSF and of its GCSF Receptor? For each chain, indicate the Chain
Name, Secondary Structural Feature, and whether it is GCSF Ligand or GCSF Receptor
by filling in the chart below (4 pts total, 1 pt each row). [Hint: Notice, again, the
symmetry!]
Chain Name Secondary Structural Feature GCSF Ligand or GCSF Receptor?
1.2.2 Now let’s examine the interfaces between these chains, which are sites of interest
for binding and dimerization. Reset your view by typing “reset” at the command line and
then “”. Change the Display to “Wireframe” and Colors to “Chain.” Under Options,
disable “Hetero atoms.” We are particularly interested in amino acids with charged side
groups no more than 2.0 Angstroms from the centers of the interfaces between these
chains. Assume the centers of these interfaces are at the following residues:
A ASP113
B TYR143
C ASP113
D TYR143
If you’re clever and use various commands found in the Rasmol Reference Manual,
under the sections called “Predefined Sets” and “Within Expressions” you can:
1. select everything but the set of charged amino acids and hide it (same as coloring
everything but the charged amino acids black).
2. select and highlight with another color residues no more than 2.0 Angstroms from
the residues above.
3. label only the residues that pass requirements 1 (charged amino acid) AND 2 (
2.0 Angstroms from ASP113:A / TYR143:B / ASP113:C / TYR143:D) above and
fill in the chart below with those labels.
If you do not manage to select the appropriate residues using the command line
commands, under Options, enable “Slab Mode” and “Labels.” While in “Slab Mode,”
methodically slice through the .pdb file slowly from front to back using Ctrl-Left Mouse
button, see how to measure distance from atom to atom and fill in the table below (8 pts
total, 2 pts per row). Indicate residue name, number and chain, e.g. GLY123:A, and sort
them by chain, e.g. the GLY123:A would go in the row with Chain Name A. [Hint: Once
again, symmetry!]
Chain Name Charged Residues No More Than 2.0Å from Center of Interfaces
Page 3 of 12
1.3 Swiss-PdbViewer (9 pts)
Examination and modification of the structure of GCSF bound to its Receptor. Go
to the Swiss-PdbViewer site. Download and install the Swiss-PdbViewer program
appropriate for your computer architecture to your local computer. Start the SwissPdbViewer program (starts one window, DeepView / Swiss-PdbViewer 3.7) and open
our .pdb file.
Swiss-PdbViewer works similarly to RasMol, in that atoms or residues have to be
selected before they can be manipulated. The Control Panel lists the .pdb file being
manipulated, whether that .pdb file is visible in the viewport, whether it can be moved in
the viewport and characteristic features (like chain, whether alpha helix (h) or beta sheet
(s), whether backbone is visible, side chain is visible, label is visible, VDW radius
electron density, ribbon is visible, color of selected pulldown (default B for backbone
color)). Clicking on a chain letter highlights all amino acids in that chain, similarly for h
and s). Peruse the Help menu and the User Guide for information on the menus,
manipulation, display and rendering.
The Biophysics 101 Teaching Staff was able to crystallize a 13 amino acid residue
protein fragment of interest, which they named B101, from the GCSF-GCSF Receptor
system. However, due to miscalculation and experimental error during the x-ray
diffraction process, the resulting B101.pdb coordinate file was slightly damaged. With a
December 9, 2003 8PM EST deadline looming, they are asking your help in identifying,
space transforming and repairing the B101.pdb.
Download and open B101.pdb in Swiss-PdbViewer. It may look as though nothing has
happened. In fact, B101 has been centered in the same spatial grid as our original .pdb
file. In the Control Panel, uncheck “visible” and “can move” when our original .pdb is
selected. Now you should be able to see B101.
Let us now highlight our entire protein fragment so we can monitor any manipulations
more easily Change the protein to B101 in the Control Panel by click on the Control
Panel heading and pressing TAB. Pressing TAB again changes the Control Panel back to
our original .pdb file. Select all of B101’s amino acid residues (Ctrl+A or Select→All
from the menu), click on the word “col” in the Control Panel, and select a color for B101.
Now if you rotate and zoom, you should be able to see B101 more clearly as a solid color
wireframe among our original .pdb file, which incidentally is colored CPK.
Note that B101 does not have the same coordinate system and origin as our original .pdb
file, since it was crystallized independently. Our first task is to modify the coordinate
system of B101 to the same coordinate system of our original .pdb file. We can
randomly move and rotate B101 (by checking “visible” and “can move” for B101) while
keeping our original .pdb file fixed (by checking “visible” and unchecking “can move”
for our original .pdb file) and try to match them up by eye. But we have no idea where
B101 belongs in our original .pdb file and have a deadline to meet! Fortunately, Swiss-
Page 4 of 12
PdbViewer has a built-in iterative magical fitting algorithm, which attempts to spatially
overlap one protein structure on top of another.
Also note that B101 could be a fragment from any part of the GCSF-GCSF Receptor
system, since it was obtained independently. Our second task is to figure out where B101
is in our original .pdb file. Taking a cue from ideas learned earlier in the semester, you
can perform a protein sequence alignment between these two proteins to determine where
the B101 fragment most likely came from. We can certainly use other alignment
programs, used during the semester. But to do so, we would have to meticulously extract
the amino acid sequences from both B101 and our original .pdb file, perhaps by a Perl
script or by hand. Fortunately, Swiss-PdbViewer has a built-in structural alignment
feature to expedite this process. Click on Window and select Alignment to open the
Alignment window.
By default B101 rests in a different coordinate space and is aligned to the very beginning
of our original .pdb file. Once we perform an Iterative Magic Fit and Generate a
Structural Alignment, we can space transform and identify B101 with respect to our
original .pdb file.
1.3.1 First let’s perform an Iterative Magic Fit:




Make sure both proteins are visible and can be moved.
Select Iterative Magic Fit (Shift+Ctrl+M or Fit→Iterative Magic Fit from
the menu).
Click OK to use just the CA (carbon alpha) atoms in performing the fit, as
they are least prone to be inaccurate in B101.
The structure of B101 should have moved to fit some of the atoms in our
original .pdb file. Rotate and zoom in to the area and determine its
location with respect to our original .pdb file by selectively viewing only
the amino acids from one chain while hiding the rest and iterating this
until you find B101 in the visible chain.
Where has B101 been placed? Specify in the table below whether it is near GCSF ligand
or receptor (1pt) and near which chain from our original .pdb file (1 pt).
GCSF ligand or receptor Near which chain from our original .pdb file
1.3.2 Next let’s generate a Structural Alignment:
 Make sure both proteins are visible and can be moved and our original
.pdb file is under consideration in the Control Panel.
 Select Generate Structural Alignment (Ctrl+G or Fit→Generate Structural
Alignment from the menu).
 The sequence of B101 should have aligned somewhere in the Alignment
window. Mouse over the amino acids in the Alignment window to
determine its location with respect to our original .pdb file.
Page 5 of 12
Where did B101 align with respect to the amino acids in our original .pdb file? Specify
in the table below the start amino acid name, number and chain (1pt) and the end amino
acid name, number and chain of the original .pdb file (1pt), using the same naming
convention, e.g. GLY123:A.
Start:
End:
1.3.3 Note that as you zoom into the structural alignment of B101, one residue side chain
fails to overlap our original .pdb file. You can identify the side chain by clicking on the
identity toolbar icon
and then clicking on the amino acid of interest. Alternately,
as you mouse over B101’s residue letters in the Alignment window the structures will
blink in the viewport.
Which amino acid residue in B101 has this non-overlapping side chain (1pt)? What is
the corresponding amino acid residue in our original .pdb file (1pt)? State in the table
below the residue names, numbers and chains using the same naming convention above.
B101 residue name, number and
chain with nonoverlapping sidechain:
The corresponding amino acid
residue from original .pdb file’s
residue name, number and chain:
1.3.4 Use the torsion feature to make this sidechain overlap our original .pdb file as
follows. After you are done, submit a new .pdb file, called
FirstnameLastname_ps5_TFinitials.pdb (3 pts).
 It may help if you hide everything else from our original .pdb file but the
amino acid that you wish to overlap. To do so, make sure the original
.pdb file is under consideration in the Control panel. Select the
corresponding amino acid residue that you found in above in IV.1.c by
clicking on it in the Control Panel. Now click “show” and “side” in the
Control Panel header. Also uncheck “can move” since you want this side
chain to remain static.
 Now to align the nonoverlapping sidechain. Change the Control Panel so
now B101 is under consideration.

Click on the torsion toolbar icon
and then select the B101 residue
that you want to twist in the viewport. Remember the B101 residues are
one solid color, which you specified earlier in the PS. Avoid selecting the
original .pdb file residue by clicking on the part of the B101 sidechain,
since it does not overlap the original .pdb file like the backbones.
Page 6 of 12


Four left-right arrows should appear to the right of and below the torsion
toolbar icon. Use the topmost left-right arrows to rotate the sidechain so
that it overlaps the original .pdb files side chain.
Once you have a close overlap, make sure once again that B101 is under
consideration in the Control Panel and save your structure (Ctrl+S or
File→Save→Layer…) using the naming convention above.
Page 7 of 12
2. Mass spectroscopy (26 pts)
2.1 Mass spec basics (13 pts)
2.1.1 What are the three essential modules for every mass spectrometer? Explain
what the MALDI technique is to which of the three aforementioned subunits does it
belong? (6 pts)
2.1.2 Magnets are a classic type of mass analyzer used by mass spectrometers, name
two newer mass analyzer types and briefly explain (three sentences each) how they
resolve ions of different masses. (4 pts)
2.1.3 How can you use mass spectrometry to locate a disulfide bridge between two
peptides? (3 pts)
2.2 MS/MS (13 pts)
2.2.1 Describe the MS/MS technique in three sentences or less. (3 pts)
2.2.2 What’s the difference between b and y ions? (2 pts)
2.2.3 You isolate a peptide through co-immunoprecipitation with the human growth
hormone. After trypsin cleavage, you obtain a mass spectrum for one of the resulting
fragments. The spectrum you obtained has b-ion series values as follows (5 pts)
263.09 378.12 475.17 588.25 701.34 802.38 903.43 990.46 1089.53 1186.59 1285.65 1448.72 1535.75 1648.83
Use the provide values to solve the partial peptide sequence (if there is a
position that is ambiguous between amino acid X and Y, mark it X/Y):
http://i-mass.com/guide/aamass.html
2.2.4 After performing other enzymatic cleavages, obtaining many more spectra,
obtaining side chain fragmentation data, and piecing everything together, you find a
more complete and polished peptide sequence:
netkwkmmdpilttsvpvyslkvdkeyevrvrskqrnsgnygefsevlyvtlpqmsqft
ceedfyfpwlliifgifgltvmlfvflfskqqrikmlilppvpvpkikgidpdllkegk
leevntilaihdsykpefhsddswvefieldidepdekteesdtdrllssdhekshsnl
g
Use the tools you’ve learned in this course to find what the protein is. (3 pts)
Page 8 of 12
3. Metabolic network, chemical kinetics, Flux Balance Analysis (39 pts)
For this part, the following readings might be helpful in addition to the lecture and
section notes:
Schilling, CH, Edwards, JS, and Palsson, BO. Towards metabolic phenomics:
Analysis of genomic data using flux balances. Biotechnol Prog 15: 288-295 (1999).
Schilling, CH et al. Metabolic pathway analysis: Basic concepts and scientific
applications in the post-genomic era. Biotechnol Prog 15: 296-303 (1999).
Segre, D, Vitkup, D, and Church, GM. Analysis of optimality in natural and
perturbed metabolic networks. Proc. Nat. Acad. Sci USA 99: 15112-7 (2002).
To make it clearer, things you’re asked to do are highlighted in bold.
3.1 Using the formalism of chemical kinetics, formulate a system of differential
equations to describe the change of concentration with respect to time of each of the
species A through F in the figure below. The ith rate constant is denoted symbolically as
ki. (6 pts)
k5
E
B
k2
k1
k4
2B
k- 2
A
2D
k6
k7
k3
C
k- 7
D
k8
F
k9
3.2 Instead of focusing on the enzyme kinetic constants and concentrations of
metabolites, we can use a more tractable method of analyzing the fluxes (reaction rates).
Given the labeling of fluxes in the figure below, write a system of differential
equations to describe the change of concentration with respect to time of each of the
species A through F. Note a) the reaction rate incorporates all relevant rate constants; b)
v2 and v7 are the net reaction rates of reversible reactions. (6 pts)
Page 9 of 12
v5
B
v2
v1
E
2B
v4
2D
A
v6
v7
v3
C
D
v8
F
v9
3.3 We are going to examine this system more carefully at steady state. Which two
fundamental assumptions allow us to simplify this model for analysis at steady state
(each in a single sentence) (2 pts)? Rewrite the equations from question 3.2 under the
steady state condition (2 pts).
3.4 Rewrite the equations from question 3.3 in matrix notation, i.e.
S v  0
v  9 x 1 matrix of reaction rates
S  6 x 9 matrix of stoichiometric coefficients
Page 10 of 12
(5 pts)
3.5 Assume that v1 is limited between 0 and 15 mmol/hr. Formulate a linear
programming model to maximize the production of metabolite D (i.e. v8) and use Excel to
solve it. Note that in a real system, D may be biomass, ATP production, etc. Provide
below (a) the explicit mathematical model in a format similar to the following
example; and (b) the optimal solution found using Excel. Please submit your Excel
file, named FirstnameLastname_ps5_TFinitials.xls, in which only the answer report
should be generated. Name the corresponding worksheet and answer report (NOT the
Excel file) as “3.5-FBA-Wildtype” and “3.5-FBA-Wildtype-Answer Report”,
respectively. [Hint: The constraints include the mass balance of metabolites, nonnegativity constraints, and the input limit constraint. Refer to the answer key to the Excel
bonus question in Problem Set 1 for how to solve linear programming problems using
Excel.] (7 pts)
Example of a mathematical programming model :
Max
x2
subject to
x1  x 2  x3  0
.......
x3  15
x1 , x 2 , ...  0
Example of an optimal solution :
objective = 5
[x1, x2, ……] = [2, 5, … …]
3.6. Suppose that the system undergoes a mutation that removes the reaction of
converting B and E to D, as shown in the figure below. How can we model such a
mutation in the scheme of flux balance analysis? Assuming the same limit on the
availability of A as in question 3.5, what is now the possible maximum production rate of
metabolite D? Provide (a) the explicit mathematical model; and (b) the optimal
solution found using Excel. Submit your Excel file (the same file for question 3.5,
named FirstnameLastname_ps5_TFinitials.xls), in which only the answer report should
be generated. Name the corresponding worksheet and answer report (NOT the Excel file)
as “3.6-FBA-Mutant” and “3.6-FBA-Mutant-Answer Report”, respectively. (4 pts)
Page 11 of 12
v5
B
v2
v1
2B
E
v4
STOP
2D
A
v6
v7
v3
C
D
v8
F
v9
3.7 For the perturbed metabolic network in the previous question, an alternative
assumption one can make, instead of maximizing the production of D, is the
minimization of metabolic adjustment (MOMA). That is, the deviation of the metabolic
fluxes from their wild type values is minimized in a mutant. Find the new flux
distribution under this assumption. Provide (a) the explicit mathematical model; and
(b) the optimal solution. Submit your Excel file (the same file for question 3.5, named
FirstnameLastname_ps5_TFinitials.xls), in which only the answer report should be
generated. Name the corresponding worksheet and answer report (NOT the Excel file) as
“3.7-MOMA-Mutant” and “3.7-MOMA-Mutant-Answer Report”, respectively. Hint:
You will need the answer from question 3.5 as input for this part; use the Euclidean
distance between two vectors to define the deviation and solve a quadratic programming
(QP) problem. (7 pts)
Page 12 of 12
Download