Mallick 10.3 Features Write Pipeline Analysis Files to Alternate Directory Tree We will add support for maintaining two separate file trees for pipelines. The first will be read-only and contain only the original input data files. All files generated by the LabKey Server installation will be written to a second, parallel file structure. It will therefore contain all of the analysis result files, log files, and other outputs. This will be implemented as a new option when configuring a pipeline override. Pipelines will offer the ability to be backed by two directories. This will not be a common case so it will not be emphasized in the UI. The writeable location will be the primary pipeline directory, and the read-only location will be a supplemental location. We will first look for a file in the writeable location. If it’s not there and a read-only location is available, we will check for it in the read-only location. If there is no read-only location defined or the file doesn’t exist there, we will return the file’s path in the writeable location. This means that files will only be created in the writeable location. WorkDirectoryRemote and WorkDirectoryLocal will be made aware of this alternate location as well, which means that it will need to be serialized as part of the pipeline job. Cross-Project/Folder Queries LabKey will train the lab on existing features, and augment them to support the lab’s specific workflows, including saving a user-defined set of runs to reuse in the future. User Editable SILAC Ratios LabKey will add the ability to eliminate individual peptide-level quantitation results, and recalculate the protein-level rollup quantitation results. LabKey will need to know the exact algorithm used by the original quantitation engine to produce comparable results, for both Q3 and XPRESS. We will implement this using a new boolean value associated with peptide quantitation results in the database. It will be used to mark peptides that were flagged with quantitation data during the normal pipeline execution, but have since been excluded by user action. This will make it possible to determine which peptides and protein groups have been affected by any user tweaks. We will grab the algorithm used in the TPP tools to do the protein group level rollup. It’s a geometric mean of the peptide quantitations. mspicture Pipeline Support We will provide a configuration for the analysis pipeline that runs mspicture as part of the standard workflow, and links the result files to the rest of the MS2 run. We will also add code that for all existing MS2 runs, checks the file system for related mspicture output files based on a naming convention, and links them with the run in the same way. This will run as a pipeline job and can only be started by hitting a magic URL. It will be designed to be run multiple times without problems. Improved Row Count and Grid Interaction. We will work with the lab to prioritize MS1 and MS2 data views to add row counts and make it easier to interact with the grids (such as customizing the columns shown or adding custom filters), based on the lab’s specific work flows. Specifically, we’ll focus on the single-run MS2 views and the ProteinProphet comparison view. For the single run views, both the Query – Peptides and Query – Protein Groups views will support paging and row counts. For a flat peptides list, it will work like any standard grid. Query – Peptides will include two predefined custom views that are configured to be the same as the legacy Protein and ProteinProphet default views. They’ll be in the View dropdown, just like normal grids. If you choose one of them, we’ll do pagination and row counts based on the grouping unit. So, regardless of the number of peptides that might contribute evidence, we’ll treat the protein or protein group as the “row” for the grid. You can still filter on peptide criteria, and if your filter eliminates all of the peptides for the protein or protein group, it will disappear and not be included in the count. MS2 Run Comparison Improvements The current ProteinProphet comparison implementation expands each protein in each protein group so that it shows up as a separate row, and indicates the protein group number as the way to deal with ambiguously identified proteins. This is not ideal because the user cannot depend on the row count to accurately reflect the number of total identifications in each run. Instead, we will implement a new approach that normalizes protein groups across runs within the comparison. One or more protein groups from each run will be assigned to a group for the scope of the comparison. The comparison groups will include the transitive closure of all of the protein groups across the runs that share protein identifications. That is, if a protein group from run 1 has any overlapping proteins with a protein group from another run, they will be rolled up for the comparison purposes. This may end up pulling many protein groups from a single run into the same comparison group, but this will not be the common case. Each of these comparison groups will be shown as a row in the grid, and the user will be able to identify the total set of proteins in the group across all runs, and the proteins and protein groups from each individual run. Additionally, we will add the ability to compare based on fraction within a rollup run. A protein group will be considered to be identified in a fraction if any peptides from the fraction were assigned to the group. Associate Metadata with RAW files We will add support for associating customizable metadata with RAW files, such as sample properties or sample preparation data. This data will be integrated with the MS2 search results so that they can be combined in data grids and custom queries. LabKey will work with the lab to prioritize the list of possible places to add this functionality.