Interrogating Cancer Biology to Address Clinically Relevant Questions Clinical Proteomic Tumor Analysis Consortium: Ontology Considerations Presented by Chris Kinsinger NCI/CSSI/OCPPR May 12, 2015 1 Map proteome/PTMs to each patient’s genome; develop assays for pathways and candidate biomarkers Inputs TCGA tumor collections Prospective tumor collections Analyses Outputs Analyze >100 tumors/cancer • breast, ovarian, colon • Proteome • Phosphoproteome • Other PTMs • Molecular signatures Targeted protein / PTM • Cancer pathways • Protein isoforms • Protein targets Biological mechanisms: • Genome-proteome relationships • Genome-signaling relationships • Assays for newly discovered proteins Clinical applications: • Are genomic aberrations detectable at protein level? • Can proteomic information provide a better molecular taxonomy of cancer? • What is their effect on protein function? • Can genotypic information guide protein marker development? • Which events are drivers? Which are passengers? The Centers: Broad/FHCRC; PNNL; Vanderbilt; Wash U.; Johns Hopkins TCGA CrCa (transcriptomic molecular subtypes) • Identified three transcriptomic subtypes (mRNA clusters): MSI/CIMP, Invasive, and CIN • MSI/CIMP subtype is enriched with hypermutated tumors • Question: Can proteomics data rediscover or redefine colorectal cancer subtypes? The Cancer Genome Atlas Network; Nature 487, 330-337 (2012) doi:10.1038/nature11252 CrCa: Global proteome reveals 2 new subtypes Transcriptome Subtypes genes MSI/CIMP Invasive CIN Proteome Subtypes •A A •B B C •C •D De Sousa •E patient tumors D E CPTAC workflow Tissue collection Tissue qualification Data generation Data analysis Tissue Collection Tissue collection Tissue qualification Data generation Data analysis Clinical and biospecimen data TCGA forms CPTAC Biospecimen working group ~ caDSR integration Tissue collection • Case report forms (CRF) – Submission CRF • Basic demographic, clinical, and biospecimen data to ensure case meets inclusion criteria • 27 elements – Baseline CRF • Histology • Diagnostic pathology • IHC results (eg ER/PR for breast cancer) • 41 elements – 1-year follow-up CRF • Status • Surgical margin • Chemotherapy • Additional tumor events • 24 elements Additional forms due at 12 months • Other malignancy form – – – – Due if a subject has an additional cancer diagnosis Diagnostic information Surgical and treatment data 23 elements • Pharmaceutical form – Chemotherapy – Response to therapy – 8 elements • Radiation supplemental treatment form – Radiation therapy – 9 elements • Up to 134 data elements Tissue qualification Tissue collection Tissue qualification Data generation Data analysis Tissue qualification • Criteria – Pathology-based criteria – Nucleic acid-based criteria • No public facing data • Challenge: how best to give access to pathology images? Data generation Tissue collection Tissue qualification Data generation Data analysis Experiment protocol • PSI Mass Spectrometry Ontology [MS] (EBI) – Database, Vol. 2013, Article ID bat009, doi:10.1093/database/bat009 – 8 main branches: • Chemical compound • Contact attribute • External reference identifier • File format • Software • Spectrum generation information • Spectrum interpretation • Standard CPTAC Experimental Protocol • Protocol (pdf) • Protocol summary (xlsx) – Analytical Sample Protocol (9 CDE) • Starting amount • Proteolysis • Alkylation • Enrichment – Chromatography Protocol (8 CDE) • Column length • Injected • Inside diameter – Mass Spectrometry Protocol (8 CDE) • Instrument • Resolution • Collision Energy System map of LC-MS performance metrics – ontology for quality? Ion Source Chromatography (11 metrics) •peptide resolution •peak widths •elution order Autosampler and Pump LC column (6 metrics) •ion intensities •electrospray stability • Over 40 performance metrics monitored Mass Spectrometer Peptide Identifications MS scans MS instrument (20 metrics) •MS1 and MS2 signal characteristics •dynamic sampling MS/MS scans Computer-automated interpretation of spectra Peptide Identification (5 metrics) •numbers of identifications •search scores Rudnick et al. Mol. Cell. Proteomics (2010) Data analysis Tissue collection Tissue qualification Data generation Data analysis CPTAC Common Data Analysis Pipeline (CDAP) at NIST MS Data Files Conversion / iTRAQ ReAdW4 Mascot2 Raw mzML mzXML MGF MS1 METADATA Publicly Distributed at the DCC Database Search MSGF+ MZID TSV MS1 Data Processing ProMS TXT PEPTIDES.TSV SUMMARY.TSV iTRAQ.TSV SPECTRA_COUNTS.TSV PRECURSOR_AREA.TSV QC Phosphosite PSM Report Localization Calculations Generation nist_ metrics Phospho RS XML MSQC MSP single_file _report PSM Generalized Parsimony (gene-based) PSMLab MZID N. Edwards (Georgetown) Peptide spectrum match files • PSM – Tab-delimited – 22 data elements, including: • MS2 scan number • Peptide sequence • Charge, score, precursor peak – Feeds some higher analysis tools • MZIdentML – HUPO-PSI compliant format – 5 elements, including • Scan number • Peptide sequence • Gene accession number – Element ids for spectra results – Avoids semantic interpretation – Extract PSMs without having to randomly address sequence collections Protein reports: summary.tsv • Goal: identify and quantify proteins in sample based on identified peptides • 12 data elements – – – – – – – – Gene (NCBI Gene name) Distinct peptides Spectral counts Gene description Organism Chromosome Locus (cytoband) Proteins (proteins associated with the gene) Challenges • Multi-faceted program: how to keep ontology needs updated? • How to improve clinical annotation of biospecimens? • Integrate HUPO-PSI ontologies/CV with cancer-specific resources • Updating ontologies/CV in a rapidly evolving technology field CPTAC Steering Committee & Working Groups NCI Biospecimens WG • • • • • • • • • Henry Rodriguez Emily Boja Tara Hiltke Chris Kinsinger Mehdi Mesri Robert Rivers Jerry Lee Mandie White Tim Crilley Leidos Biomedical Inc. • • • • • • • • • • Linda Hannick Joy Beveridge Michelle Hester Kevin Groch Kathy Terlesky Ellen Miller Lenny Smith Maureen Dyer Melissa Borucki Carla Chorley ESAC/Georgetown • • • • • • Karen Ketchum Nathan Edwards Ratna Thangudu Peter McGarvey Shuang Cai Mauricio Oberti • • • • • • • • • Molly Brewer Sherri Davies Mike Gillette Doug Levine David Ransohoff Steve Skates Rob Slebos Mark Watson David Tarin Biospecimen Core Resource • • • • • • Mark Watson Dave Mulvihill George Bijoy Melissa McKenna Brian Goetz Amy Brink CPTAC Steering Committee • • • • • • • • • • • • Steve Carr Daniel Chan Xian Chen Matthew Ellis Daniel Liebler Amanda Paulovich Karin Rodland Dick Smith Reid Townsend Bing Zhang Hui Zhang Zhen Zhang Common Data Analysis Pipeline • • • • • • Paul Rudnick Steve Stein Sandy Markey Jeri Roth David Tabb Sam Payne Thank you