Tissue Collection

advertisement
Interrogating Cancer Biology to Address
Clinically Relevant Questions
Clinical Proteomic Tumor Analysis
Consortium:
Ontology Considerations
Presented by Chris Kinsinger
NCI/CSSI/OCPPR
May 12, 2015
1
Map proteome/PTMs to each patient’s genome;
develop assays for pathways and candidate biomarkers
Inputs
TCGA tumor
collections
Prospective
tumor
collections
Analyses
Outputs
Analyze >100 tumors/cancer
• breast, ovarian, colon
• Proteome
• Phosphoproteome
• Other PTMs
• Molecular signatures
Targeted protein / PTM
• Cancer pathways
• Protein isoforms
• Protein targets
Biological mechanisms:
• Genome-proteome
relationships
• Genome-signaling
relationships
• Assays for newly
discovered proteins
Clinical applications:
• Are genomic aberrations detectable at protein
level?
• Can proteomic information provide a better
molecular taxonomy of cancer?
• What is their effect on protein function?
• Can genotypic information guide protein marker
development?
• Which events are drivers? Which are
passengers?
The Centers: Broad/FHCRC; PNNL; Vanderbilt; Wash U.; Johns Hopkins
TCGA CrCa
(transcriptomic molecular subtypes)
• Identified three transcriptomic
subtypes (mRNA clusters):
MSI/CIMP, Invasive, and CIN
• MSI/CIMP subtype is enriched
with hypermutated tumors
• Question: Can proteomics data
rediscover or redefine
colorectal cancer subtypes?
The Cancer Genome Atlas Network; Nature 487, 330-337 (2012)
doi:10.1038/nature11252
CrCa: Global proteome reveals 2 new subtypes
Transcriptome
Subtypes
genes
MSI/CIMP
Invasive
CIN
Proteome
Subtypes
•A
A
•B
B
C
•C
•D
De Sousa
•E
patient tumors
D
E
CPTAC workflow
Tissue
collection
Tissue
qualification
Data
generation
Data
analysis
Tissue Collection
Tissue
collection
Tissue
qualification
Data
generation
Data
analysis
Clinical and biospecimen data
TCGA forms
CPTAC Biospecimen
working group
~ caDSR integration
Tissue collection
• Case report forms (CRF)
– Submission CRF
• Basic demographic, clinical, and biospecimen data to ensure case
meets inclusion criteria
• 27 elements
– Baseline CRF
• Histology
• Diagnostic pathology
• IHC results (eg ER/PR for breast cancer)
• 41 elements
– 1-year follow-up CRF
• Status
• Surgical margin
• Chemotherapy
• Additional tumor events
• 24 elements
Additional forms due at 12 months
• Other malignancy form
–
–
–
–
Due if a subject has an additional cancer diagnosis
Diagnostic information
Surgical and treatment data
23 elements
• Pharmaceutical form
– Chemotherapy
– Response to therapy
– 8 elements
• Radiation supplemental treatment form
– Radiation therapy
– 9 elements
• Up to 134 data elements
Tissue qualification
Tissue
collection
Tissue
qualification
Data
generation
Data
analysis
Tissue qualification
• Criteria
– Pathology-based criteria
– Nucleic acid-based criteria
• No public facing data
• Challenge: how best to give access to pathology images?
Data generation
Tissue
collection
Tissue
qualification
Data
generation
Data
analysis
Experiment protocol
• PSI Mass Spectrometry Ontology [MS] (EBI)
– Database, Vol. 2013, Article ID bat009, doi:10.1093/database/bat009
– 8 main branches:
• Chemical compound
• Contact attribute
• External reference identifier
• File format
• Software
• Spectrum generation information
• Spectrum interpretation
• Standard
CPTAC Experimental Protocol
• Protocol (pdf)
• Protocol summary (xlsx)
– Analytical Sample Protocol (9 CDE)
• Starting amount
• Proteolysis
• Alkylation
• Enrichment
– Chromatography Protocol (8 CDE)
• Column length
• Injected
• Inside diameter
– Mass Spectrometry Protocol (8 CDE)
• Instrument
• Resolution
• Collision Energy
System map of LC-MS performance
metrics – ontology for quality?
Ion Source
Chromatography
(11 metrics)
•peptide resolution
•peak widths
•elution order
Autosampler
and Pump
LC column
(6 metrics)
•ion intensities
•electrospray
stability
• Over 40
performance
metrics
monitored
Mass
Spectrometer
Peptide
Identifications
MS scans
MS instrument
(20 metrics)
•MS1 and MS2 signal characteristics
•dynamic sampling
MS/MS scans
Computer-automated
interpretation of spectra
Peptide Identification
(5 metrics)
•numbers of identifications
•search scores
Rudnick et al. Mol. Cell. Proteomics (2010)
Data analysis
Tissue
collection
Tissue
qualification
Data
generation
Data
analysis
CPTAC Common Data Analysis Pipeline (CDAP) at
NIST
MS Data
Files Conversion /
iTRAQ
ReAdW4
Mascot2
Raw
mzML
mzXML
MGF
MS1
METADATA
Publicly Distributed
at the DCC
Database
Search
MSGF+
MZID
TSV
MS1 Data
Processing
ProMS
TXT
PEPTIDES.TSV
SUMMARY.TSV
iTRAQ.TSV
SPECTRA_COUNTS.TSV
PRECURSOR_AREA.TSV
QC
Phosphosite
PSM Report
Localization Calculations Generation
nist_
metrics
Phospho
RS
XML
MSQC
MSP
single_file
_report
PSM
Generalized Parsimony
(gene-based)
PSMLab
MZID
N. Edwards (Georgetown)
Peptide spectrum match files
• PSM
– Tab-delimited
– 22 data elements, including:
• MS2 scan number
• Peptide sequence
• Charge, score, precursor peak
– Feeds some higher analysis tools
• MZIdentML
– HUPO-PSI compliant format
– 5 elements, including
• Scan number
• Peptide sequence
• Gene accession number
– Element ids for spectra results
– Avoids semantic interpretation
– Extract PSMs without having to randomly address sequence collections
Protein reports: summary.tsv
• Goal: identify and quantify proteins in sample based on identified
peptides
• 12 data elements
–
–
–
–
–
–
–
–
Gene (NCBI Gene name)
Distinct peptides
Spectral counts
Gene description
Organism
Chromosome
Locus (cytoband)
Proteins (proteins associated with the gene)
Challenges
• Multi-faceted program: how to keep ontology needs updated?
• How to improve clinical annotation of biospecimens?
• Integrate HUPO-PSI ontologies/CV with cancer-specific resources
• Updating ontologies/CV in a rapidly evolving technology field
CPTAC Steering Committee & Working Groups
NCI
Biospecimens WG
•
•
•
•
•
•
•
•
•
Henry Rodriguez
Emily Boja
Tara Hiltke
Chris Kinsinger
Mehdi Mesri
Robert Rivers
Jerry Lee
Mandie White
Tim Crilley
Leidos Biomedical Inc.
•
•
•
•
•
•
•
•
•
•
Linda Hannick
Joy Beveridge
Michelle Hester
Kevin Groch
Kathy Terlesky
Ellen Miller
Lenny Smith
Maureen Dyer
Melissa Borucki
Carla Chorley
ESAC/Georgetown
•
•
•
•
•
•
Karen Ketchum
Nathan Edwards
Ratna Thangudu
Peter McGarvey
Shuang Cai
Mauricio Oberti
•
•
•
•
•
•
•
•
•
Molly Brewer
Sherri Davies
Mike Gillette
Doug Levine
David Ransohoff
Steve Skates
Rob Slebos
Mark Watson
David Tarin
Biospecimen Core Resource
•
•
•
•
•
•
Mark Watson
Dave Mulvihill
George Bijoy
Melissa McKenna
Brian Goetz
Amy Brink
CPTAC Steering Committee
•
•
•
•
•
•
•
•
•
•
•
•
Steve Carr
Daniel Chan
Xian Chen
Matthew Ellis
Daniel Liebler
Amanda Paulovich
Karin Rodland
Dick Smith
Reid Townsend
Bing Zhang
Hui Zhang
Zhen Zhang
Common Data Analysis
Pipeline
•
•
•
•
•
•
Paul Rudnick
Steve Stein
Sandy Markey
Jeri Roth
David Tabb
Sam Payne
Thank you
Download