Lab 3: Data analysis of a complex HPLC/MS/MS data set

advertisement
Lab 3: Data analysis of a complex HPLC/MS/MS data set
One of the primary uses for HPLC/MS/MS systems is analysis of complex samples. In
biology/biochemistry, the protein composition of a cellular extract presents an extremely
complex problem. With thousands of proteins in each cell at varying concentrations,
much research is being done to find effective ways to sample deep into the proteome.
Proteins can be studied directly by mass spectrometry, but the information we can gather
is limited. In electrospray sources, the typical protein can charge up to the + double
digits. To simplify the ion being studied at any given time, it is useful to break the protein
up into smaller pieces, peptides. This is done through the process of digestion with a
protease, and one protease that is used frequently is Trypsin. A protein is a long chain
comprised of the 20 amino acids. Trypsin cleaves this chain after every lysine and
arginine (except those immediately before proline). A trypsin digestion generates what
we call tryptic peptides.
A complex proteome takes months to analyze and is therefore too complicated for a
single lab experiment. Here we will use a standard software package to “search” our data,
and using those results, select a couple cases for further analysis.
I.
Generating Mascot Search Results
Mascot is a software package that takes protein sequences, chops them into peptide
sequences based on cleavage rules, and then compares the mass of the theoretical
fragments of those peptides to the observed masses in a MS/MS spectrum. As we will see
later, this is something we can do manually for a single MS/MS, but would completely
consume us if we attempted to manually assign the 1.2 million MS/MS spectra in
complex data sets.
Here we will be studying MPS1, a single human protein. You will find the .RAW file and
the .mgf file for this data in C:\5181_lab3. First, open the raw file by double clicking on
it. Notice its complexity as a chromatogram. This is not something we want to do
manually.
The raw file represents a single HPLC run in tandem with Orbitrap MS/LTQ MS/MS
scans. The instrument is set to select the five most intense peaks from the MS scan and
collect their masses in succession in the LTQ for MS/MS fragmentation. There is also a
Dynamic Exclusion setting where a peak is excluded from the possibility of being
selected for MS/MS. If a peak is sequenced twice within 30s, then that mass is excluded
for 3.5 min.
On the desktop is a shortcut to a webpage. Double click on this. It will bring up a window
like the one given on Figure 1. You have linked to our mass spec webserver at
bluemoon.colorado.edu/mascot. There are several things we want to set up correctly
before pressing “Start Search…”
1. Your name –
2. Search title –
3.
4.
5.
6.
7.
8.
9.
this will help you in later identifying searches
this is perhaps even more useful in find your search at a
later date
Enzyme –
There are many proteases one could use to cut up the
protein. We have used trypsin and should be the selection
here.
Allowed Missed Cleavages –
Sometimes the proteolysis is incomplete.
This tells us how many missed sites are allowed in a
peptide assignment.
Fixed Modifications –
Common practice to facilitate the trypsin digestion
is to reduce the disulfide bonds and alkylate them. We use
iodoacetamide which creates the Carbamidomethyl
modifications to cystines. This amounts to adding ~58
daltons to each cystine residue we observe. (This may be
important later)
Variable Modifications – Unlike the fixed modifications, we don’t know
whether or not these have occurred. In this case, we are
curious about possible phosphorylation of the alcoholic
residues (serine, threonine, and tyrosine). Please select
PhosphoNL (STY) on this list.
Monoisotopic vs. Average –
This depends on how the data was collected.
The data you will be searching is monoisotopic.
Data File –
Here is where you put the mgf file. Select browse and find
it in the 5181_lab3 directory.
Start Search –
And off we go….
After some clicking and whirring, the software will kick out a search results page. The
top of this looks like figure 2. Before looking at the data, we need to format it. There are
three things we need to look at before looking at peptide assignments. In 1, we want to
select “peptide summary” and in 2, we want the ion score cutoff to be 20. This will
remove most of the false positive hits (MS/MS spectra that are being assigned to certain
peptides incorrectly). Now click the “format as” button. When the page reloads, you will
see as the first entry below this window what is highlighted in 3. Click on this to find the
coverage. (Question 1)
Now we are ready to look at peptides. As you scroll down you will find a giant pile of
assignments. How can we use this? Well, now is the time to select two distinct cases. One
is the case where we got high scoring (greater than mowse = 50) hits for a peptide in both
phosphorylated and unphosphorylated forms. The second case is any peptide with a score
higher than mowse = 100 and a charge state of +2. (there are a few, and some have many
hits over 100. In those cases, you select the highest scoring hit to study). I selected
DFETLKVDFLSK as my phos/unphos case and TPSSNTLDDYMSCFR as my
over 100 case so get your own! (one set per person in each group) Pick yours
now (Question 2)
Figure 3 shows how we can extract the relevant information from the hit data. For the
further analysis we do, we need the observed m/z, the observed MW (of the peptide) and
the scan number it originated from (this correlates to the exact MS/MS that yielded this
peptide hit. Once this is done, we can move away from Mascot and look at our data
manually in Qual Browser.
** From here, I will take you through a sample analysis. Once you understand what
I am doing, you should do this analysis for your selected peptides.
II.
Determining the % Phosphorylation of a Peptide
First we will look at the DFETLKVDFLSK peptide. To do this, we must isolate the
chromatographic peak that represents this peptide. To do this, we create an extract ion
chromatogram. (Question 3)
In Qual Browser, you select the chromatogram window by pushing in the pushpin in the
upper right corner (it will become green). By right clicking we bring up a menu that has
several selections. Select the “Ranges…” option and the Chromatogram Ranges window
is displayed. (See figure 4) The fields we will use are:
1. Time range – This text box allows us to narrow our displayed time range
2. Plots display – Here we can select one or multiple plots to be displayed within
the chromatogram window. Each plot has a distinct set of properties in the
Plot properties box below.
3. Scan filter – In a single data set there are MS scans and MS/MS scans. The
scan filter allows you to preferentially display certain scan types.
4. Plot type – This is how the plotting algorithm decides how to calculate
intensity at every time point. The two selections that are commonly used
are Base Peak and TIC. Base Peak defines the intensity as the height of the
most intense peak in the mass spectrum at that point. TIC sums all the
intensity to get a Total Ion Chromatogram. Base Peak is more useful for
determining the large chromatographic peaks whereas the TIC is useful for
finding regions of a high abundance of lower intensity but different mass
ions.
5. Range(s) – This text box is only active in Base Peak mode and allows us to
specify a mass range from which the algorithm must select the base peak.
For a very narrow range, we can generate an eXtracted Ion
Chromatogram.
From the calculated values for the different charge states, we can generate XICs that will
be stacked for easy comparison. The inset of figure 5 shows what this looks like. The
mass accuracy of the Orbitrap scans is very high and so we use a very tight mass window
around our calculated m/z values (+ 0.01 m/z). This yields the plot shown in the
chromatogram of figure 5. An interesting feature of this plot is the stacking of two or
more peaks as different charge states are represented. We can easily pick out the most
likely peak that represents our peptide. Figure 5 is showing the XIC for all charge states
of unphosphorylated DFETLKVDFLSK. For further confirmation, we can check the scan
numbers included in these peaks against the scan number of our positive assignment.
Typically, the scan number attributed to a scan is found early in the appropriate peak
(Question 4)
With our XIC ready, we can now integrate and label our peaks so we can read off and
interpret those values (this process is shown graphically on figure 6. First we select Peak
Detection  Toggle Detection in All Plots… This will integrate all the displayed
chromatographic peaks. Second, we select Display Options… from the pull down menu.
There are several tabs here, but the one we want is Labels. In the Labels tab, we can
check area and height to display them on the plot. Click Ok and the chromatogram is now
prepared for reading off the data. Record the area and height for each charge state of each
peak that represents the peptide. We can now repeat this process for the phosphorylated
peptide. (Questions 5 – 8)
A sample data sheet from excel is shown for the DFETLKVDFLSK peptide on figure 7.
III.
Manual validation of a peptide hit
Now to our high scoring peptide. As shown in figure 8, we first generate an XIC for the
peptide but this time, we remove the scan filter. This will show us all scans on a single
plot, both MS and MS/MS. From the scan number we found on the Mascot search, we
know where the MS/MS was taken. The MS scan previous to the MS/MS that was
assigned contains the peak selected that gave the hit. In the case of my peptide,
TPSSNTLDDYMSCFR, the m/z for the +2 is 897.38. You can see that it is not
nearly the most intense peak in the MS, but it is not MS intensity that makes a
good assignment, it is MS/MS quality. This MS scan produced 5 MS/MS scans
(as they all do in this experiment). This is shown in figure 10, where I have blown
up the time region where the peak was assigned. The 5 MS/MS are of varying
quality, some with many fragments and some with very few. For the high scoring
assignments, however, it will always look something like the top MS/MS on figure
10, which is the MS/MS that got the good assignment for TPSSNTLDDYMSCFR.
Our next step is to make 2 copies of the MS/MS (simply print with the mass
spectrum selected and choose “selected cell only” and “one page”. You will also
want to make sure that they print in landscape as opposed to portrait.). From this
point, our analysis moves away from the computer and to a table with our
spectra, a pen or pencil, and a calculator.
De Novo Sequencing (Question 9)
The first of the two annotation methods we will be looking at is the way things
were done in the beginning of peptide mass spectrometry. It is still useful,
however, for validation and for finding identifications for high quality yet
unassigned spectra. The method comes from an understanding of how the
peptide backbone fragments. As we move along the chain, there are three places
the peptide can be cleaved:
x
O
y
z
R
N
N
R
H
a
b
c
O
The designation abcxyz will be useful in the next section, but for now it is useful to know
that the most stable place for cleavage is at the peptide bond (recall that a peptide bond is
formed from the carboxylic acid of an amino acid reacting with the amine of another to
form an amide). If a peptide cleaves at successive peptide bonds, the mass values in the
fragmentation spectrum will be spaced according to the mass of an amino acid. This is
demonstrated in figure 11. While some experience can be useful in this process, it is best
to start with the most intense peaks above the parent mass (the mass of the ion that was
fragmented). This will allow us to only consider singly charged ions (all doubly charged
fragments will be less than the parent m/z). I began by finding the difference between the
most intense peak, 1093, and the next intense peak higher in mass, 1206. The difference
between these is 113, which corresponds to either leucine or isoleucine. From there I
move on to the next peak and find the difference, 101, which is threonine. I repeat this
several times and can find a TAG sequence. Now when we already have a peptide
assignment, this TAG sequence will allow us to get our bearings. I found I/LTNSS as my
initial TAG. In the sequence TPSSNTLDDYMSCFR, I see this sequence in reverse
towards the beginning of the peptide. Now my job becomes easier. If the major
peaks from 1595 to 1093 represent those amino acids, then the next lower peak
should be less by the mass of D, 115. I subtract 115 from 1093 and get 978. I
can repeat this process to find the tag all the way down to 482. I have positively
assigned this spectrum to correspond with fair certainty to the peptide sequence
TPSSNTLDDYMSCFR. While this technique was fairly straightforward for this
high scoring case, in lower scoring cases, this can be quite difficult. Also, for
unidentified cases, it can sometimes seem like you are flouncing through a field
of corn. Anyhow, on to the second method.
Annotation of the spectra according to specific ions (Question 10)
In this method we will calculate masses of a specific set of ions and then
annotate the spectrum. In the diagram of potential cleavages above, the vertical
line represents the cut and the horizontal lines represent the direction of the
observed charge. The most well behaved cleavage and charging gives yield to y
ions. A strong y-ion series is typically what leads to high scoring spectra, so we
will see if we can find the y ions in our spectrum. We calculate the masses of the
y ions by starting with the first amino acid and assigning it the parent mass, MH +.
(This is your calculated peptide mass + 1) We then move down the amino acid
sequence, subtracting the previous amino acid masses, each in turn. In the end
you will have a table like the one in figure 12. To finish off the table, start at the
lowest mass and assign numbers going back up. We should be able to assign
these y ions to peaks in the spectrum based on these calculated masses. I have
shown how this is done in figure 12.
Summary
Well, in this lab we learned how to deal with a real complex data set. We know
how to work with chromatograms and mass spectra of peptides in detail. There is
much more to learn in this field, but for now, this will give you a good introduction
to how biochemists use mass spectrometry to deal with the complexities of
cellular life.
Download