Mallick 10.3 Features

advertisement
Mallick 10.3 Features
Write Pipeline Analysis Files to Alternate Directory Tree
We will add support for maintaining two separate file trees for pipelines. The first will be read-only and
contain only the original input data files. All files generated by the LabKey Server installation will be
written to a second, parallel file structure. It will therefore contain all of the analysis result files, log files,
and other outputs.
This will be implemented as a new option when configuring a pipeline override. Pipelines will offer the
ability to be backed by two directories. This will not be a common case so it will not be emphasized in
the UI. The writeable location will be the primary pipeline directory, and the read-only location will be a
supplemental location.
We will first look for a file in the writeable location. If it’s not there and a read-only location is available,
we will check for it in the read-only location. If there is no read-only location defined or the file doesn’t
exist there, we will return the file’s path in the writeable location. This means that files will only be
created in the writeable location.
WorkDirectoryRemote and WorkDirectoryLocal will be made aware of this alternate location as well,
which means that it will need to be serialized as part of the pipeline job.
Cross-Project/Folder Queries
LabKey will train the lab on existing features, and augment them to support the lab’s specific workflows,
including saving a user-defined set of runs to reuse in the future.
User Editable SILAC Ratios
LabKey will add the ability to eliminate individual peptide-level quantitation results, and recalculate the
protein-level rollup quantitation results. LabKey will need to know the exact algorithm used by the
original quantitation engine to produce comparable results, for both Q3 and XPRESS.
We will implement this using a new boolean value associated with peptide quantitation results in the
database. It will be used to mark peptides that were flagged with quantitation data during the normal
pipeline execution, but have since been excluded by user action. This will make it possible to determine
which peptides and protein groups have been affected by any user tweaks.
We will grab the algorithm used in the TPP tools to do the protein group level rollup. It’s a geometric
mean of the peptide quantitations.
mspicture Pipeline Support
We will provide a configuration for the analysis pipeline that runs mspicture as part of the standard
workflow, and links the result files to the rest of the MS2 run.
We will also add code that for all existing MS2 runs, checks the file system for related mspicture output
files based on a naming convention, and links them with the run in the same way. This will run as a
pipeline job and can only be started by hitting a magic URL. It will be designed to be run multiple times
without problems.
Improved Row Count and Grid Interaction.
We will work with the lab to prioritize MS1 and MS2 data views to add row counts and make it easier to
interact with the grids (such as customizing the columns shown or adding custom filters), based on the
lab’s specific work flows.
Specifically, we’ll focus on the single-run MS2 views and the ProteinProphet comparison view. For the
single run views, both the Query – Peptides and Query – Protein Groups views will support paging and
row counts. For a flat peptides list, it will work like any standard grid.
Query – Peptides will include two predefined custom views that are configured to be the same as the
legacy Protein and ProteinProphet default views. They’ll be in the View dropdown, just like normal grids.
If you choose one of them, we’ll do pagination and row counts based on the grouping unit. So,
regardless of the number of peptides that might contribute evidence, we’ll treat the protein or protein
group as the “row” for the grid. You can still filter on peptide criteria, and if your filter eliminates all of
the peptides for the protein or protein group, it will disappear and not be included in the count.
MS2 Run Comparison Improvements
The current ProteinProphet comparison implementation expands each protein in each protein group so
that it shows up as a separate row, and indicates the protein group number as the way to deal with
ambiguously identified proteins. This is not ideal because the user cannot depend on the row count to
accurately reflect the number of total identifications in each run.
Instead, we will implement a new approach that normalizes protein groups across runs within the
comparison. One or more protein groups from each run will be assigned to a group for the scope of the
comparison. The comparison groups will include the transitive closure of all of the protein groups across
the runs that share protein identifications. That is, if a protein group from run 1 has any overlapping
proteins with a protein group from another run, they will be rolled up for the comparison purposes. This
may end up pulling many protein groups from a single run into the same comparison group, but this will
not be the common case. Each of these comparison groups will be shown as a row in the grid, and the
user will be able to identify the total set of proteins in the group across all runs, and the proteins and
protein groups from each individual run.
Additionally, we will add the ability to compare based on fraction within a rollup run. A protein group
will be considered to be identified in a fraction if any peptides from the fraction were assigned to the
group.
Associate Metadata with RAW files
We will add support for associating customizable metadata with RAW files, such as sample properties or
sample preparation data. This data will be integrated with the MS2 search results so that they can be
combined in data grids and custom queries. LabKey will work with the lab to prioritize the list of possible
places to add this functionality.
Download