Profinder_Webinar_w_Edits_Final_17Dec13 And now to today`s

advertisement
Profinder_Webinar_w_Edits_Final_17Dec13
And now to today's presentation.
We are pleased to have Dr. Theo Sana joining us as our presenter. Dr. Sana is currently a Senior
Scientist in Integrated Metabolomics and Proteomics Applications at Agilent Technologies.
He spent several years in the systems biology group, developing intact protein, the metabolomics
workflows using multi-dimensional LC and mass spectrometry. Dr. Sana's current interest is
applying his experience in transcriptomics, proteomics, and metabolomics to help Agilent
advance integrative multi-omic solutions for our customers.
And it looks like we're ready to begin. It gives me great pleasure to welcome Dr. Theo Sana. You
may begin, Theo.
Thank you very much for the introduction, Joan. And welcome to today's webinar for
MassHunter Profinder, the latest feature extraction software tool belonging to Agilent suite of
MassHunter software.
Now today, I'll introduce you to some of the exciting capabilities of Profinder. And through a
series of slides and brief software demos, hopefully I'll be able to convince you of the advantages
of using Profinder in your metabolomics research.
Profinder is truly the first profiling centric software workflow for feature extraction. It is the only
commercially available batch data processing tool that's been optimized to minimize false
positives and false negative results.
I'll show you how simple it is to get started and to review your results in only four windows, as
shown on the screen, giving you the control you need in order to step through the compound
groups efficiently.
We continuously listen to our customers' request for improvements and demand for software
product quality. So the question may be what is in it for me, using this software tool.
Now, a substantial proportion of metabolomics research is based on LC/MS and GC/MS
workflows, just based on the publications. And so, we have decided, a couple years ago, to
optimize the software, the MassHunter Qual MFE software, so that I can better support batch
processing.
Our customers requested more sophisticated feature extraction software that they can extract
multiple data files, robustly, for discovery-based LC/MS analyses. And that it should support
both untargeted, as well as targeted analyses.
Profinder can be used, for example, in nutrition research, just as much as in biomarker discovery.
The net result is that all these efforts have led to an approximately 50% share of the market for
Agilent in metabolomics. And we continue to improve by offering faster, better data processing
software across all markets.
The Agilent metabolomics post acquisition workflow is summarized simply in five distinct
stages. Each one, of course, is very important, and you're familiar with them, but feature finding
is really key here.
As the old saying goes, garbage in, garbage out. And we really want to minimise that. To do that,
we've come up with batch feature finding. So it was very important for our software team to
design new features in the feature extraction tool that could handle many files at a time. So that
when you went to differential analysis in Mass Profiler Professional, for example, you could
annotate and identify your metabolites more confidently, and then map the results into pathways.
In this particular example, you could see that for a comprehensive metabolomics solution to
work, we've got two main processing software tools. One, of course, is MassHunter Qual, in
which we have molecular feature extraction. And that's very good for looking at data that has MS
and MS/MS where you want to extract MS/MS spectra, and for that kind of processing. And it's
really optimized for a few files, and you can go into depth.
Mass Profinder, on the other hand, is this new tool that can do batch extraction of multiple data
files at MS to the one level. And it could be used for the export portion of the results, the
compound exchange file can then be also used in Mass Profiler Pro. So both MassHunter Qual
and MPP produce a common share file that can be read by Mass Profiler Professional, where you
do all your statistics and pathway analysis.
Now. I've shown over here, GC/MS, and that's because even though we're rolling out with
LC/MS support in Profinder, there are plans in place to support GC/MS data as well, in batch
format in Profinder in the next release after that. So we'll be supporting LC/MS and GC/MS
workflows.
Well, we'll all aware of the hurdles that we face when we're using feature extraction software.
For our customers, their main goal is to find and correct. They extract all chromatographic peaks
in a sample. But the challenges are many.
There's incomplete peak separation, and we face unresolved peaks that can contribute to false
peak detection, excessive missing values, and incorrect identification. All of this can lead to
wasted efforts and decreased productivity. And moreover, even false biomarkers. So we need a
way to minimise that.
What is the MassHunter Profiler workflow solution then? What's new about it?
It's a one-shot process for untargeted and targeted feature extraction under one roof. It has
unprecedented visibility into the feature extraction process called chemometirc profiling, giving
the user greater control.
There's also a new isotope grouper that recognises the isotope pattern of all the common organic
elements, except for halogens that have a different pattern. But if you want to use an isotope
grouper with halogens capability, it supports that as well.
It's designed to process many samples. Feature extraction is capable of handling large data sets
simultaneously. In one test, approximately half a million features, 500,000 features, were
extracted with a computer that had 16 gigabytes of RAM. And obviously, the higher the amount
of RAM your computer has, the more features you'll be able to extract.
By including the novel recursive batch molecular feature extractor, or RMFE, we can now
perform a cross sample analysis, and use a consensus spectra at two levels, both the NFE level,
and by the find by ion level, which is a separate algorithm. This results in greater quality results.
A compound centric workflow applied across large data sets means the user can manually review
results in Batch mode. And there's only a maximum of four windows. That's a friendly graphical
user interface. And really importantly, it reduces processing time and it's free to our Mass
Profiler Pro customers.
So how do we get started?
Well, in the first example, I'm going to show you a brief video of how you can add or remove
samples and assign grouping information right at the beginning. This will help you later
downstream, when you want to filter and do the overlay of the plots in order to do the editing.
So I'll start the video.
[VIDEO PLAYBACK]
-Importing data files into Mass Profinder is really straightforward. You navigate to the directory
where the files are located, and you select the files. In this case, I have 12 with four conditions.
There's three replicates per condition. Simply load the files, and now you can start to assign the
grouping information. As I have three replicates, I will either type them in or you can fill it
down.
-Once you've assigned all the grouping information for your replicates, so CA, CC, FK, Wild
Type-- WT. You simply press OK, and the 12 data files are imported into the program, and
you're ready to perform batch recursive feature extraction.
[END VIDEO PLAYBACK]
OK, so I'm back live again.
The next window that you see, once you've imported your data and assigned the groups, is a
wizard that allows you to select an algorithm, and you are given the choices of three. In effect,
there's two main algorithms. Batch recursive feature extraction, and batch targeted feature
extraction, which I'll describe a little bit later.
But the batch molecular feature extraction is an evolution of molecular feature extraction, which
we use in MassHunter Qual. And it's recursive because it can build a consensus spectra for each
of the compounds and then go ahead and re-extract the original data files. This reduces the
number of false positives.
So, a good practise would be to do batch molecular feature extraction first on your files, optimize
the settings through a series of wizards, and then go to batch recursive feature extraction, which
is a combination of both batch MFE and find by ion.
Now, how did feature extraction work?
Well, in batch molecular feature extraction, we used the traditional MFE on each individual data
file. From the raw data, it finds all the co-eluting ions that are related. So those who are not
familiar with this process, it finds all the co-eluting ions, isotopes, adducts, such as sodium,
potassium, and dimers, and groups them into a single feature, and it integrates the values for all
of those different ions into one value.
So the way MFE works is different from some other programs where each feature is just an ion.
So in this case, a feature had the attributes of retention time, an integrated mass, and also, an m/z.
So, we create a compound chromatogram from those group of ions, and one feature, then, is one
compound. But in batch molecular feature extraction, which you use in Profinder, we're now
aligning the features across all sample files to build a consensus spectrum. This allows us to do
the recursive analysis and re-extract the batch files, the original data files. And this greatly
increases the quality of the results that you get.
So when should I use batch MFE?
As I mentioned before, regular MFE doesn't work across multiple data files in MassHunter Qual.
Batch MFE is based on recursion across multiple data files. And it uses an average concensus
spectrum for re-extracting the data.
So as I mentioned, use batch MFE to optimize the settings prior to using batch recursive feature
extraction. I won't go through all the different wizards that you have for each of the algorithms,
because that's pretty technical and it's for another discussion.
But I can show you an example of how the batch feature extraction works, after I introduce you
to do the workflow.
Batch recursive feature extraction, then, is a combination of our RMFE, recursive molecular
feature extraction, and find by ion. At the very end of that, you will get a big reduction in false
positives and false negatives, and you can then edit your compounds.
What's really unique about Profinder is that it has filtering at two different levels. One after batch
molecular feature extraction, and then other filtering parameters that you can set, after find by
ion. So this makes it highly robust.
After RMFE, what we end up with is the quality of your target list has increased, and that gets
fed into find by ion. And after you've done find by ion analysis-- and all this happens in the
background, by the way-- that improves the overall quality of your final compound group list,
and this reduces the amount of manual cleanup.
So I'll give you a brief introduction into the batch recursive feature extraction, and then we'll
regroup in just a couple of minutes.
[VIDEO PLAYBACK]
-In MassHunter Profinder, batch recursive feature extraction is the algorithm that one uses in
order to perform the most robust analysis. It is composed of two programs, actually. It contains
in it batch molecular feature extraction, which is shown above. And another program that's called
find by ion, essentially, find by formula. Both of them are combined into one program under
batch recursive feature extraction.
-There are eight steps in this combined program. Four of the steps are the same MFE parameters.
So you set the settings for those, and you can optimize them in batch molecular feature
extraction first, if you like.
-The advantage of doing that is that it's faster to run, and once you optimize the settings, you can
go to the next steps on Page 5, where the find by ion parameters begin.
-Essentially, you go through two steps of recursive analysis. So you can do two steps of post
alignment filtering, increasing your confidence in the results. Once you're satisfied that the
parameter settings you've chosen are good and optimized, you're ready to begin.
[END VIDEO PLAYBACK]
I'm back alive again. And what you see in front of you, in this slide, is the results of batch
recursive feature extraction. This is an example showing four groups of treatments, or four
conditions. They're colored by the conditions-- black, red, green, and blue. And you'll see there's
multiple chromatogram overlays, which are the replicates for each of those groups.
And so that's how you can immediately see whether there's differences between treatments, and
also where there's differences between replicates, and go in and inspect them more closely, and I
will show you how you do that.
Now, as I mentioned before, there's four windows. In the top left-hand corner you have the
compound groups. That's the total number of compounds that were found across all the data files,
and each row is summarized to one compound group. And when you broadcast that compound
group, below, to another table, gives you the details for that compound.
So all the data files are listed on the far left column, and all the extraction information for each
individual data file for that particular highlighted compound above is shown. There's a flag that's
shown as green, if it's a good quality compound. And there's also a score associated with that. So
you can filter to rank based on those parameters as well.
And if you look to the right, then you have your EIC overlays in the chromatogram's window,
and you can have control over that. You can see as many of the individual chromatograms as you
want. And you can then also inspect the mass spectra results to the right as well, and edit that if
you need to.
So here's another brief video, showing you the results of batch recursive feature extraction
example, and I'll start that now.
[VIDEO PLAYBACK]
-After batch recursive feature extraction has finished processing its job, the next step is to take
some time in reviewing the results.
-The results are summarized in four different tables that are related to each other or that are very
interactive. The compound groups table tells you the number of compounds that it found across
the 12 data files.
-In this case, these were yeast samples that are extracted. 1,435 compounds were found across
the 12 files. And for each row, then, represents a compound, and below it we have the compound
details across the data files.
-So this is very nice for Batch mode where you can actually look at the data files, inspect the
scores and the area-- the height, the mass, and retention time very quickly. Under the
chromatogram results, you can overlay the chromatograms based on the sample grouping
information we entered earlier, or you can look at the compounds individually. So these are the
replicates for the first group, and the replicates for the next group on a different color, et cetera.
-And if you want to look at the mass spectral results, you can take a look here, zoomed in, and
look at the isotopes for each of the different compounds.
-I can also overlay all of the compounds for this particular compound for 12 data files like this,
so that I can see the difference in retention time, which is very minimal in this particular case.
-Now, you can sort based on the scores, so that you only look at the low scores and try and sort
those out, and that will save you time.
-If you want to look at all 12 chromatograms you can do that simultaneously. So you'd select 12,
and now you have a summary of all 12 chromatograms. In this scenario, we have four different
groups, with one or more of the replicates in one of the groups being integrated differently from
the consensus.
-So if you want to fix this, you can go switch into List mode, and it will show you the particular
replicate in which that happened, and you can override that if you want to. So you simply drag
across the area that you want to integrate, which is similar to the others, and right-click and
extract manual compounds, and now it has done that.
[END VIDEO PLAYBACK]
OK, so that was batch recursive feature extraction, a short demo on the capabilities. Obviously,
with the larger screen you'll be able to see many more rows, and also the mass spectral results,
too.
Now, the third algorithm is batch targeted feature extraction. By targeted, what we mean is you
have a list of formulae-- annotated formulae-- that reside in a database, or on an Excel
spreadsheet, or some other source where you keep them, or you download them from, so let's say
a pathway database.
Now, we support the compound exchange file format, the CSV file format, or Agilent's personal
compound database, or personal compound database in library format. Each of those can be
uploaded into Profinder, and then you can very specifically extract the compounds in your
database only. So you're not doing a totally untargeted, apiori, naive analysis. You're only
looking for those compounds in the database.
So, I'll show you an example of how that works in the next video.
[VIDEO PLAYBACK]
-One of the advantages of using batch targeted feature extraction is that you're using a database
of compounds that you already want to look for. In this particular case, I had 30 compounds in
the database, and a total of 11 were found in 12 files or fewer. In some cases, it found, for
example, arginine in seven files, and didn't find them in the remaining five.
-If we take the case of leucine, leucine was found in all 12 files. And then when we look at the
chromatogram results, we see that they're slightly different between sample group CA, CC, FK,
and Wild Type.
-Now, we can either look at these in a List mode, so you look at the individual different
chromatograms, or you can look at them, as I was looking them over here, and make a decision
as to whether you should be integrating both peaks or not.
-So, if for some reason, you think that the program has made the call incorrectly, you can simply
override it and only integrate the peak that is similar to the other groups. So that when you're
doing the differential analysis later on in your statistical software, such as Mass Profiler
Professional, that the areas are going to be based on the same retention time window and the
same peak heights.
[END VIDEO PLAYBACK]
So that was the last algorithm in Profinder. It's a relatively small program with three different
algorithms that processes large number of files very quickly, and it's really batch compound
centric, to give the user maximum flexibility in order to judge whether the extraction, and also
the integrations were correct. It really is a big advancement over the previous software that we
had.
What's the advantage?
Well, together with Mass Profiler Pro, it gives you a complete solution, a robust solution.
The support of instruments, just as a summary, it supports LC/TOF and Q-TOF, and GC/MS
planned. The customer benefits is that you can load raw data files without any preconversion of
those data files before you load them. You can do both untargeted and targeted feature finding in
there. It gives you maximum flexibility for both discovery profiling in a targeted way, with a list
that you already have, or you want to be totally untargeted.
The ion grouping is compound centric, and this is key. It doesn't just look at single ions and have
them be an individual feature. We actually look at co-related ions and group them together. And
we use the concept of recursive analysis, all in situ, all in the background. The user doesn't see
that to give you higher quality results.
So that before you go into Mass Profiler Professional, for example, or any statistical package,
you have your high quality results. You don't need to do recursion back and forth between-ping-ponging between programs anymore. So this will reduce your time spent on the analysis.
There's a feature quality score in the compound details table. It's called a Q score, and that is a
combination of different metrics that Agilent uses to look at the quality of a EIC, and it helps
with the ranking approach. So the ones with very high scores, you can review later, but
immediately you want to look at the ones that are borderline scores, because you can set the
score and inspect those first.
Feature visualization and editing. Yes, it definitely has that. A few tools to let you reintegrate
compounds, if you're not satisfied with what you see, or just a test whether something makes a
difference. And also, to be able to correlate that with the mass spectral results.
And as I mentioned before, we take advantage of two separate peak alignment steps, both for
MFE and recursive MFE, and for find by ion, concatenated together. Therefore, the end result is
a very robust, feature-finding analysis.
So how do you get a copy of MassHunter Profinder? Well, the manufacturing release date is set
for December 6. And after that, it will be available for free for MPP customers. It will be placed
on the MPP supplemental DVD, and it will also be available on the Agilent SubscribeNet . So
when it's available in the Agilent SubscribeNet , then you get an email with a link and you can
download it.
If you want any further information, then, I would suggest to contact your local product
specialist for further information, and they will be able to help you with this.
So after this slide, I will entertain any questions. I wanted to keep it brief today, so that you got a
flavor of MassHunter Profinder, the new tool. And there will be other collateral with more detail,
on the technical detail, in a form of videos available on the Agilent website in the near future.
And we would like to thank you, Theo, for a very informative presentation.
Before we move to the question and answer session, why don't we bring the audience back into
the standard mode for WebEx, so we can have questions entered a little bit easier. Let's go ahead
and do that.
Well, what I can say to the listeners is that as far as the number of files that one can process, it's
going to be highly dependent on the memory that you have installed on your computer. I have 24
gigs of RAM on my computer, and I can routinely extract between 70 to 75 files at a time. I
haven't tried more.
If you have more memory, and we have customers who do, I would suggest doing some testing
with batch recursive MFE first to see how many features you can extract. It's really a
combination of the number of files times the number of features per file. It's like a matrix table.
So if you have a machine that has close to 100 gigabytes of memory, I'm confident you could do
100 to 200 files at a time. It's just a matter of how long it's going to take to process. So you'll go
from a few hours, to perhaps overnight, but it'll get done when you wake up the next morning.
It's very much a memory-intensive routine. The files are large. And so, this is not an easy
problem to solve and to address, and we're doing it in a step-wise process so that we can
gradually get it to be faster and faster with more and more files.
OK, thank you. And we did get a question about how many samples in total can Profinder under
handle for a computer with 16 gigabytes of memory?
So we've tested, for 16 gigs, about 40, comfortably. 40 files that would have a couple of
thousand features each. So you test it yourself. But like I said before, it really depends on how
much memory you've installed, and you can determine empirically. It's a function of both the file
size and the number of files that you've got.
I see a question from the panelist, from David. Does this work on a Mac, too?
No, it does not. It works on Windows 7 only at the moment. And that's the version you need to
run this. Windows 7, which is a multicore instrument is needed to run this. And that's why we
can actually do what we do. Because on a 32-bit system, it would obviously not work. And that's
what we were limited to in the past .
And Theo, what type of data files does Profinder support?
The data files-- these are Agilent raw data files. So those are the data files. They support
Agilent.d files.
Now, right now it supports LC/MS.d files, but it will be supporting GC/MS.d files as well.
And another question that's come in is does the software handle more samples in FBF mode than
MFE mode?
That's a good question. It is one that I actually cannot tell you because I haven't tested that. But I
would believe in MFE mode it would probably do more, just because it is a program that's faster
and can handle more samples. That's been my experience.
So what we try to do is to run an [INAUDIBLE] recursive MFE first, and whittle down the set of
features that get passed on to find buy ion for recursive analysis.
And another question is, are there any plans for Profinder to support GC/MS data as well?
Again, yes. I think I addressed that question. There are. So that's in the works already, but I'm not
sure when that would be released, but it's definitely on the plans. We will be supporting GC/MS
data shortly.
So another question that's come in, Theo, is does it matter how many cores the computer has?
Well, yes, it does matter. Because if you have a multicore instrument, you're going to be able to
split the jobs and run it in parallel. So that is an advantage. That's why we're using Windows 7.
OK.
I've got a workstation, a Z800 24-gigs, with a dual core set up.
And does Profinder do multi-variate analysis, such as PspCI?
Not at the moment.
Another question, Theo, is does the software change the .p file contents after its processed with
Profinder?
Once you save-- It doesn't, no. What you save is a Profinder project. That is the part that I didn't
show, I believe. So what you do is that you save the project as an .profinder project, and when
you load it next time, it will remember the path to the data files that you used, and load the data
again.
As far as exporting the results, then you can export a CSE format, a CEF format, and a couple of
other formats.
But yeah, that's a very good point. It does not alter the .d file. But what I would do is if you were
going to process using MassHunter Qual and Profinder, I would just create a duplicate of the .d
files in two different directories and process them with those two different programs, and keep
them separate.
And another question, Theo, that's come in-- we've, actually, I think had a few questions about
this is are there any plans to configure Profinder to accept data from other sources?
No, actually. Definitely not at this time. There's no discussion about that. Because this tool was
designed around processing Agilent .d files in a very efficient way. And to actually accept other
vendors' raw files, would mean that you'd have access to the actual raw file themselves and be
able to read them. And that's not something that our software group has a mandate to do.
OK. Thank you. And another question that's come in is how big is the project file generated by
Profinder?
The Profinder project is actually quite large. We're working on making those smaller.
So, obviously, if you've got 50 or 60 or 70 files, it can be over 100 megabytes, 150 megabytes,
so they're not small. But with the amount of disc space that we have these days, that shouldn't be
an issue.
The main point is irrespective of that fact, when you load the data, the Profinder project at a later
time, once you've saved it and you want to reload it, it loads very fast. So even if you have 70 to
100 files, it loads very quickly.
So yeah, the project file sizes are large, but as far as loading, it goes quick.
While we're waiting, I will say that one of the benefits, one of the new things that Profinder does,
as opposed to MassHunter Qual is that it doesn't keep all the extracted chromatograms in
memory at one time.
When the user is toggling from one row to the next row in the table, it automatically draws the
information for you. So it updates it, it activates it. So that's the real key step in saving
computation and analysis time. So that when you want to go from one row the next row, the
program updates very fast.
Whereas in the past it would take-- if you had a lot of features, thousands and thousands of
features in your data file, it would take you a long time to go to the next row because of the
nature of the program. So we made improvements in that respect.
So please join me in thanking Theo Sana for his presentation. On behalf of the Life Sciences
Group at Agilent Technologies, we thank you for joining us today. We hope to see you at a
future eSeminar.
We will leave the question box open for a few more minutes to receive any feedback or
additional questions. But we are now going to close the audio lines.
Thank you, once again, for attending. From Agilent Technologies, we wish you a good day.
Download