Profinder_Webinar_w_Edits_Final_17Dec13 And now to today's presentation. We are pleased to have Dr. Theo Sana joining us as our presenter. Dr. Sana is currently a Senior Scientist in Integrated Metabolomics and Proteomics Applications at Agilent Technologies. He spent several years in the systems biology group, developing intact protein, the metabolomics workflows using multi-dimensional LC and mass spectrometry. Dr. Sana's current interest is applying his experience in transcriptomics, proteomics, and metabolomics to help Agilent advance integrative multi-omic solutions for our customers. And it looks like we're ready to begin. It gives me great pleasure to welcome Dr. Theo Sana. You may begin, Theo. Thank you very much for the introduction, Joan. And welcome to today's webinar for MassHunter Profinder, the latest feature extraction software tool belonging to Agilent suite of MassHunter software. Now today, I'll introduce you to some of the exciting capabilities of Profinder. And through a series of slides and brief software demos, hopefully I'll be able to convince you of the advantages of using Profinder in your metabolomics research. Profinder is truly the first profiling centric software workflow for feature extraction. It is the only commercially available batch data processing tool that's been optimized to minimize false positives and false negative results. I'll show you how simple it is to get started and to review your results in only four windows, as shown on the screen, giving you the control you need in order to step through the compound groups efficiently. We continuously listen to our customers' request for improvements and demand for software product quality. So the question may be what is in it for me, using this software tool. Now, a substantial proportion of metabolomics research is based on LC/MS and GC/MS workflows, just based on the publications. And so, we have decided, a couple years ago, to optimize the software, the MassHunter Qual MFE software, so that I can better support batch processing. Our customers requested more sophisticated feature extraction software that they can extract multiple data files, robustly, for discovery-based LC/MS analyses. And that it should support both untargeted, as well as targeted analyses. Profinder can be used, for example, in nutrition research, just as much as in biomarker discovery. The net result is that all these efforts have led to an approximately 50% share of the market for Agilent in metabolomics. And we continue to improve by offering faster, better data processing software across all markets. The Agilent metabolomics post acquisition workflow is summarized simply in five distinct stages. Each one, of course, is very important, and you're familiar with them, but feature finding is really key here. As the old saying goes, garbage in, garbage out. And we really want to minimise that. To do that, we've come up with batch feature finding. So it was very important for our software team to design new features in the feature extraction tool that could handle many files at a time. So that when you went to differential analysis in Mass Profiler Professional, for example, you could annotate and identify your metabolites more confidently, and then map the results into pathways. In this particular example, you could see that for a comprehensive metabolomics solution to work, we've got two main processing software tools. One, of course, is MassHunter Qual, in which we have molecular feature extraction. And that's very good for looking at data that has MS and MS/MS where you want to extract MS/MS spectra, and for that kind of processing. And it's really optimized for a few files, and you can go into depth. Mass Profinder, on the other hand, is this new tool that can do batch extraction of multiple data files at MS to the one level. And it could be used for the export portion of the results, the compound exchange file can then be also used in Mass Profiler Pro. So both MassHunter Qual and MPP produce a common share file that can be read by Mass Profiler Professional, where you do all your statistics and pathway analysis. Now. I've shown over here, GC/MS, and that's because even though we're rolling out with LC/MS support in Profinder, there are plans in place to support GC/MS data as well, in batch format in Profinder in the next release after that. So we'll be supporting LC/MS and GC/MS workflows. Well, we'll all aware of the hurdles that we face when we're using feature extraction software. For our customers, their main goal is to find and correct. They extract all chromatographic peaks in a sample. But the challenges are many. There's incomplete peak separation, and we face unresolved peaks that can contribute to false peak detection, excessive missing values, and incorrect identification. All of this can lead to wasted efforts and decreased productivity. And moreover, even false biomarkers. So we need a way to minimise that. What is the MassHunter Profiler workflow solution then? What's new about it? It's a one-shot process for untargeted and targeted feature extraction under one roof. It has unprecedented visibility into the feature extraction process called chemometirc profiling, giving the user greater control. There's also a new isotope grouper that recognises the isotope pattern of all the common organic elements, except for halogens that have a different pattern. But if you want to use an isotope grouper with halogens capability, it supports that as well. It's designed to process many samples. Feature extraction is capable of handling large data sets simultaneously. In one test, approximately half a million features, 500,000 features, were extracted with a computer that had 16 gigabytes of RAM. And obviously, the higher the amount of RAM your computer has, the more features you'll be able to extract. By including the novel recursive batch molecular feature extractor, or RMFE, we can now perform a cross sample analysis, and use a consensus spectra at two levels, both the NFE level, and by the find by ion level, which is a separate algorithm. This results in greater quality results. A compound centric workflow applied across large data sets means the user can manually review results in Batch mode. And there's only a maximum of four windows. That's a friendly graphical user interface. And really importantly, it reduces processing time and it's free to our Mass Profiler Pro customers. So how do we get started? Well, in the first example, I'm going to show you a brief video of how you can add or remove samples and assign grouping information right at the beginning. This will help you later downstream, when you want to filter and do the overlay of the plots in order to do the editing. So I'll start the video. [VIDEO PLAYBACK] -Importing data files into Mass Profinder is really straightforward. You navigate to the directory where the files are located, and you select the files. In this case, I have 12 with four conditions. There's three replicates per condition. Simply load the files, and now you can start to assign the grouping information. As I have three replicates, I will either type them in or you can fill it down. -Once you've assigned all the grouping information for your replicates, so CA, CC, FK, Wild Type-- WT. You simply press OK, and the 12 data files are imported into the program, and you're ready to perform batch recursive feature extraction. [END VIDEO PLAYBACK] OK, so I'm back live again. The next window that you see, once you've imported your data and assigned the groups, is a wizard that allows you to select an algorithm, and you are given the choices of three. In effect, there's two main algorithms. Batch recursive feature extraction, and batch targeted feature extraction, which I'll describe a little bit later. But the batch molecular feature extraction is an evolution of molecular feature extraction, which we use in MassHunter Qual. And it's recursive because it can build a consensus spectra for each of the compounds and then go ahead and re-extract the original data files. This reduces the number of false positives. So, a good practise would be to do batch molecular feature extraction first on your files, optimize the settings through a series of wizards, and then go to batch recursive feature extraction, which is a combination of both batch MFE and find by ion. Now, how did feature extraction work? Well, in batch molecular feature extraction, we used the traditional MFE on each individual data file. From the raw data, it finds all the co-eluting ions that are related. So those who are not familiar with this process, it finds all the co-eluting ions, isotopes, adducts, such as sodium, potassium, and dimers, and groups them into a single feature, and it integrates the values for all of those different ions into one value. So the way MFE works is different from some other programs where each feature is just an ion. So in this case, a feature had the attributes of retention time, an integrated mass, and also, an m/z. So, we create a compound chromatogram from those group of ions, and one feature, then, is one compound. But in batch molecular feature extraction, which you use in Profinder, we're now aligning the features across all sample files to build a consensus spectrum. This allows us to do the recursive analysis and re-extract the batch files, the original data files. And this greatly increases the quality of the results that you get. So when should I use batch MFE? As I mentioned before, regular MFE doesn't work across multiple data files in MassHunter Qual. Batch MFE is based on recursion across multiple data files. And it uses an average concensus spectrum for re-extracting the data. So as I mentioned, use batch MFE to optimize the settings prior to using batch recursive feature extraction. I won't go through all the different wizards that you have for each of the algorithms, because that's pretty technical and it's for another discussion. But I can show you an example of how the batch feature extraction works, after I introduce you to do the workflow. Batch recursive feature extraction, then, is a combination of our RMFE, recursive molecular feature extraction, and find by ion. At the very end of that, you will get a big reduction in false positives and false negatives, and you can then edit your compounds. What's really unique about Profinder is that it has filtering at two different levels. One after batch molecular feature extraction, and then other filtering parameters that you can set, after find by ion. So this makes it highly robust. After RMFE, what we end up with is the quality of your target list has increased, and that gets fed into find by ion. And after you've done find by ion analysis-- and all this happens in the background, by the way-- that improves the overall quality of your final compound group list, and this reduces the amount of manual cleanup. So I'll give you a brief introduction into the batch recursive feature extraction, and then we'll regroup in just a couple of minutes. [VIDEO PLAYBACK] -In MassHunter Profinder, batch recursive feature extraction is the algorithm that one uses in order to perform the most robust analysis. It is composed of two programs, actually. It contains in it batch molecular feature extraction, which is shown above. And another program that's called find by ion, essentially, find by formula. Both of them are combined into one program under batch recursive feature extraction. -There are eight steps in this combined program. Four of the steps are the same MFE parameters. So you set the settings for those, and you can optimize them in batch molecular feature extraction first, if you like. -The advantage of doing that is that it's faster to run, and once you optimize the settings, you can go to the next steps on Page 5, where the find by ion parameters begin. -Essentially, you go through two steps of recursive analysis. So you can do two steps of post alignment filtering, increasing your confidence in the results. Once you're satisfied that the parameter settings you've chosen are good and optimized, you're ready to begin. [END VIDEO PLAYBACK] I'm back alive again. And what you see in front of you, in this slide, is the results of batch recursive feature extraction. This is an example showing four groups of treatments, or four conditions. They're colored by the conditions-- black, red, green, and blue. And you'll see there's multiple chromatogram overlays, which are the replicates for each of those groups. And so that's how you can immediately see whether there's differences between treatments, and also where there's differences between replicates, and go in and inspect them more closely, and I will show you how you do that. Now, as I mentioned before, there's four windows. In the top left-hand corner you have the compound groups. That's the total number of compounds that were found across all the data files, and each row is summarized to one compound group. And when you broadcast that compound group, below, to another table, gives you the details for that compound. So all the data files are listed on the far left column, and all the extraction information for each individual data file for that particular highlighted compound above is shown. There's a flag that's shown as green, if it's a good quality compound. And there's also a score associated with that. So you can filter to rank based on those parameters as well. And if you look to the right, then you have your EIC overlays in the chromatogram's window, and you can have control over that. You can see as many of the individual chromatograms as you want. And you can then also inspect the mass spectra results to the right as well, and edit that if you need to. So here's another brief video, showing you the results of batch recursive feature extraction example, and I'll start that now. [VIDEO PLAYBACK] -After batch recursive feature extraction has finished processing its job, the next step is to take some time in reviewing the results. -The results are summarized in four different tables that are related to each other or that are very interactive. The compound groups table tells you the number of compounds that it found across the 12 data files. -In this case, these were yeast samples that are extracted. 1,435 compounds were found across the 12 files. And for each row, then, represents a compound, and below it we have the compound details across the data files. -So this is very nice for Batch mode where you can actually look at the data files, inspect the scores and the area-- the height, the mass, and retention time very quickly. Under the chromatogram results, you can overlay the chromatograms based on the sample grouping information we entered earlier, or you can look at the compounds individually. So these are the replicates for the first group, and the replicates for the next group on a different color, et cetera. -And if you want to look at the mass spectral results, you can take a look here, zoomed in, and look at the isotopes for each of the different compounds. -I can also overlay all of the compounds for this particular compound for 12 data files like this, so that I can see the difference in retention time, which is very minimal in this particular case. -Now, you can sort based on the scores, so that you only look at the low scores and try and sort those out, and that will save you time. -If you want to look at all 12 chromatograms you can do that simultaneously. So you'd select 12, and now you have a summary of all 12 chromatograms. In this scenario, we have four different groups, with one or more of the replicates in one of the groups being integrated differently from the consensus. -So if you want to fix this, you can go switch into List mode, and it will show you the particular replicate in which that happened, and you can override that if you want to. So you simply drag across the area that you want to integrate, which is similar to the others, and right-click and extract manual compounds, and now it has done that. [END VIDEO PLAYBACK] OK, so that was batch recursive feature extraction, a short demo on the capabilities. Obviously, with the larger screen you'll be able to see many more rows, and also the mass spectral results, too. Now, the third algorithm is batch targeted feature extraction. By targeted, what we mean is you have a list of formulae-- annotated formulae-- that reside in a database, or on an Excel spreadsheet, or some other source where you keep them, or you download them from, so let's say a pathway database. Now, we support the compound exchange file format, the CSV file format, or Agilent's personal compound database, or personal compound database in library format. Each of those can be uploaded into Profinder, and then you can very specifically extract the compounds in your database only. So you're not doing a totally untargeted, apiori, naive analysis. You're only looking for those compounds in the database. So, I'll show you an example of how that works in the next video. [VIDEO PLAYBACK] -One of the advantages of using batch targeted feature extraction is that you're using a database of compounds that you already want to look for. In this particular case, I had 30 compounds in the database, and a total of 11 were found in 12 files or fewer. In some cases, it found, for example, arginine in seven files, and didn't find them in the remaining five. -If we take the case of leucine, leucine was found in all 12 files. And then when we look at the chromatogram results, we see that they're slightly different between sample group CA, CC, FK, and Wild Type. -Now, we can either look at these in a List mode, so you look at the individual different chromatograms, or you can look at them, as I was looking them over here, and make a decision as to whether you should be integrating both peaks or not. -So, if for some reason, you think that the program has made the call incorrectly, you can simply override it and only integrate the peak that is similar to the other groups. So that when you're doing the differential analysis later on in your statistical software, such as Mass Profiler Professional, that the areas are going to be based on the same retention time window and the same peak heights. [END VIDEO PLAYBACK] So that was the last algorithm in Profinder. It's a relatively small program with three different algorithms that processes large number of files very quickly, and it's really batch compound centric, to give the user maximum flexibility in order to judge whether the extraction, and also the integrations were correct. It really is a big advancement over the previous software that we had. What's the advantage? Well, together with Mass Profiler Pro, it gives you a complete solution, a robust solution. The support of instruments, just as a summary, it supports LC/TOF and Q-TOF, and GC/MS planned. The customer benefits is that you can load raw data files without any preconversion of those data files before you load them. You can do both untargeted and targeted feature finding in there. It gives you maximum flexibility for both discovery profiling in a targeted way, with a list that you already have, or you want to be totally untargeted. The ion grouping is compound centric, and this is key. It doesn't just look at single ions and have them be an individual feature. We actually look at co-related ions and group them together. And we use the concept of recursive analysis, all in situ, all in the background. The user doesn't see that to give you higher quality results. So that before you go into Mass Profiler Professional, for example, or any statistical package, you have your high quality results. You don't need to do recursion back and forth between-ping-ponging between programs anymore. So this will reduce your time spent on the analysis. There's a feature quality score in the compound details table. It's called a Q score, and that is a combination of different metrics that Agilent uses to look at the quality of a EIC, and it helps with the ranking approach. So the ones with very high scores, you can review later, but immediately you want to look at the ones that are borderline scores, because you can set the score and inspect those first. Feature visualization and editing. Yes, it definitely has that. A few tools to let you reintegrate compounds, if you're not satisfied with what you see, or just a test whether something makes a difference. And also, to be able to correlate that with the mass spectral results. And as I mentioned before, we take advantage of two separate peak alignment steps, both for MFE and recursive MFE, and for find by ion, concatenated together. Therefore, the end result is a very robust, feature-finding analysis. So how do you get a copy of MassHunter Profinder? Well, the manufacturing release date is set for December 6. And after that, it will be available for free for MPP customers. It will be placed on the MPP supplemental DVD, and it will also be available on the Agilent SubscribeNet . So when it's available in the Agilent SubscribeNet , then you get an email with a link and you can download it. If you want any further information, then, I would suggest to contact your local product specialist for further information, and they will be able to help you with this. So after this slide, I will entertain any questions. I wanted to keep it brief today, so that you got a flavor of MassHunter Profinder, the new tool. And there will be other collateral with more detail, on the technical detail, in a form of videos available on the Agilent website in the near future. And we would like to thank you, Theo, for a very informative presentation. Before we move to the question and answer session, why don't we bring the audience back into the standard mode for WebEx, so we can have questions entered a little bit easier. Let's go ahead and do that. Well, what I can say to the listeners is that as far as the number of files that one can process, it's going to be highly dependent on the memory that you have installed on your computer. I have 24 gigs of RAM on my computer, and I can routinely extract between 70 to 75 files at a time. I haven't tried more. If you have more memory, and we have customers who do, I would suggest doing some testing with batch recursive MFE first to see how many features you can extract. It's really a combination of the number of files times the number of features per file. It's like a matrix table. So if you have a machine that has close to 100 gigabytes of memory, I'm confident you could do 100 to 200 files at a time. It's just a matter of how long it's going to take to process. So you'll go from a few hours, to perhaps overnight, but it'll get done when you wake up the next morning. It's very much a memory-intensive routine. The files are large. And so, this is not an easy problem to solve and to address, and we're doing it in a step-wise process so that we can gradually get it to be faster and faster with more and more files. OK, thank you. And we did get a question about how many samples in total can Profinder under handle for a computer with 16 gigabytes of memory? So we've tested, for 16 gigs, about 40, comfortably. 40 files that would have a couple of thousand features each. So you test it yourself. But like I said before, it really depends on how much memory you've installed, and you can determine empirically. It's a function of both the file size and the number of files that you've got. I see a question from the panelist, from David. Does this work on a Mac, too? No, it does not. It works on Windows 7 only at the moment. And that's the version you need to run this. Windows 7, which is a multicore instrument is needed to run this. And that's why we can actually do what we do. Because on a 32-bit system, it would obviously not work. And that's what we were limited to in the past . And Theo, what type of data files does Profinder support? The data files-- these are Agilent raw data files. So those are the data files. They support Agilent.d files. Now, right now it supports LC/MS.d files, but it will be supporting GC/MS.d files as well. And another question that's come in is does the software handle more samples in FBF mode than MFE mode? That's a good question. It is one that I actually cannot tell you because I haven't tested that. But I would believe in MFE mode it would probably do more, just because it is a program that's faster and can handle more samples. That's been my experience. So what we try to do is to run an [INAUDIBLE] recursive MFE first, and whittle down the set of features that get passed on to find buy ion for recursive analysis. And another question is, are there any plans for Profinder to support GC/MS data as well? Again, yes. I think I addressed that question. There are. So that's in the works already, but I'm not sure when that would be released, but it's definitely on the plans. We will be supporting GC/MS data shortly. So another question that's come in, Theo, is does it matter how many cores the computer has? Well, yes, it does matter. Because if you have a multicore instrument, you're going to be able to split the jobs and run it in parallel. So that is an advantage. That's why we're using Windows 7. OK. I've got a workstation, a Z800 24-gigs, with a dual core set up. And does Profinder do multi-variate analysis, such as PspCI? Not at the moment. Another question, Theo, is does the software change the .p file contents after its processed with Profinder? Once you save-- It doesn't, no. What you save is a Profinder project. That is the part that I didn't show, I believe. So what you do is that you save the project as an .profinder project, and when you load it next time, it will remember the path to the data files that you used, and load the data again. As far as exporting the results, then you can export a CSE format, a CEF format, and a couple of other formats. But yeah, that's a very good point. It does not alter the .d file. But what I would do is if you were going to process using MassHunter Qual and Profinder, I would just create a duplicate of the .d files in two different directories and process them with those two different programs, and keep them separate. And another question, Theo, that's come in-- we've, actually, I think had a few questions about this is are there any plans to configure Profinder to accept data from other sources? No, actually. Definitely not at this time. There's no discussion about that. Because this tool was designed around processing Agilent .d files in a very efficient way. And to actually accept other vendors' raw files, would mean that you'd have access to the actual raw file themselves and be able to read them. And that's not something that our software group has a mandate to do. OK. Thank you. And another question that's come in is how big is the project file generated by Profinder? The Profinder project is actually quite large. We're working on making those smaller. So, obviously, if you've got 50 or 60 or 70 files, it can be over 100 megabytes, 150 megabytes, so they're not small. But with the amount of disc space that we have these days, that shouldn't be an issue. The main point is irrespective of that fact, when you load the data, the Profinder project at a later time, once you've saved it and you want to reload it, it loads very fast. So even if you have 70 to 100 files, it loads very quickly. So yeah, the project file sizes are large, but as far as loading, it goes quick. While we're waiting, I will say that one of the benefits, one of the new things that Profinder does, as opposed to MassHunter Qual is that it doesn't keep all the extracted chromatograms in memory at one time. When the user is toggling from one row to the next row in the table, it automatically draws the information for you. So it updates it, it activates it. So that's the real key step in saving computation and analysis time. So that when you want to go from one row the next row, the program updates very fast. Whereas in the past it would take-- if you had a lot of features, thousands and thousands of features in your data file, it would take you a long time to go to the next row because of the nature of the program. So we made improvements in that respect. So please join me in thanking Theo Sana for his presentation. On behalf of the Life Sciences Group at Agilent Technologies, we thank you for joining us today. We hope to see you at a future eSeminar. We will leave the question box open for a few more minutes to receive any feedback or additional questions. But we are now going to close the audio lines. Thank you, once again, for attending. From Agilent Technologies, we wish you a good day.