Development and Experience with Tissue Banking Tools to Support Cancer Research Waqas Amin M.D, Anil V. Parwani M.D PhD and Michael J. Becich M.D, PhD1 Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA.USA 2Department of Pathology, University of Pittsburgh Medical Center, Pittsburgh, PA. USA Introduction: Over the last decade, the Department of Biomedical Informatics (DBMI) at the University of Pittsburgh has developed and deployed various tissue banking informatics tools to expedite translational medicine research. Deals with management of clinicopathologic annotation, inventory management and distribution of biospecimens that are collected and stored for translational research use by the scientific community. Tissue Banking Informatics: Aggregation: Process to associate tissue samples with valuable data including demographic, epidemiology, pathology, progression, vital status, therapy and outcomes related data. Standardization: Collected data must be uniform or shareable. This standardized approach to annotation is to ensure uniformity, consistency, and quality of collected data. This facilitates information sharing across multiple institutions. Searchable: Development of an information model supported by standardized data collection approach allows annotated tissue samples to be matched with the research queries, thereby facilitating better understanding of the experimental design and result. Data Requirement in Cancer Research: High quality, accurate and comprehensive data is required to support genomic, proteomic, clinical and translation research. Data must be acquired in accordance with legal and ethical subject polices. Type of Data Collection: Demographic data Patient clinical data Pathology block level data Patient treatment data Outcome and follow up data Biochemical data Genomic level data Cell and tissue level data Data Collection Standards: Development of Common Data Element (CDE): Standardized clinical annotations defined in detail utilizing metadata. Allows uniform, consistent shareable data collection across multiple institutes/systems. Development of CDEs are supervised by multidisciplinary team and CDE subcommittee developed consensus CDE incorporating following standards applicable for a organ specific tissue. ADASP (Association of Directors of Anatomic and Surgical Pathology (ADASP) Cancer Reporting Guidelines American Joint Committee on Cancer (AJCC) Cancer Staging Manual NAACCR (North American Association of Central Cancer Registry) Data Standards for Cancer Registries Data Sources: Data import from automated electronic systems like AP-LIS, CP-LIS, Radiology and Registry information System (RIS). Patient questionnaire, patient health record and treatment charts, existing databases, consultation with referring physicians, archived data and pathology reports. De-Identification of PHI: The purpose is to ensure proper confidentiality and privacy of human subjects based upon Institutional Review Board approved protocols. De-identification of PHI is done by an Honest Broker according to Health Insurance Portability and Accountability Act (HIPAA). regulations by designating unique codes to patient data related identifiers. Specimen collection and standardization Biospecimens are collected according to pathology and tissue banking standardized protocol. Biospecimens are collected and stored for tissue banking project , includes: Paraffin Blocks Fresh Frozen Tissue Blood Products includes: Serum Plasma Buffy Coat RBC WBC Tissue Banking Information Models and Architecture: Two types of information models that have been utilized in the development of tissue bank. Organ-specific databases (OSD) Cooperative Prostate Cancer Tissue Resource (CPCTR) (www.cpctr.info) Pennsylvania Cancer Alliance for Bioinformatics Consortium (PCABC) (www.pcabc.upmc.edu) Early Detection Research Network (EDRN) Colorectal and Pancreatic Neoplasm database SPORE Head and Neck Neoplasm Database Model Driven Approach (Database) National Mesothelioma Virtual Bank (NMVB) (www.mesotissue.org) OSD (Organ Specific Database): OSD is a three-tiered architecture, and implemented on an Oracle Application Server v10.1.2.3 running on a Windows 2003 and Oracle RDBMS v.10.2.0.2 running on an AIX 5L virtual host definition supported by IBM x3850 system hardware. Dynamic web pages are generated using Oracle http server and mod_plsql extensions for the database users. The data annotation engine is a flexible dynamic web-based tool, while the data query engine facilitates investigators to search deidentified information within the warehouse through a “point and click” interface. OSD Multi Tier Architecture: Presentation Metadata Curation Admin Security Metadata Engine Physical Data Common Data Elements (CDE) Definitions Application Data Layer HELP Builder Business Rules Engine Mapping Engine Metadata Data Layer Manual Annotation Data Query Security Engine Registration Authorization Data Import Export Authentication Security Data Layer OSD (Meta Data Builder Tool): OSD Feature List: To address the needs of the heterogeneous users we identified numerous criteria for success. Some requirements and features are listed below: Quick Statistics on overall data. Multi-mode search: Multiplex search and Advance search. Mechanism for keeping user’s orientated (e.g. help, persistence of last entered query text) Results in tabular forms, sorting on each column including access to full case report. Both Honest Broker and De-identified (researcher) access. Controlled access to subjects for different studies Feature List (Contd..) Standard and customized query results of the data. Individual research and consent based access to information. Quick search using cases saved in “My Cases”. Query Builder interface. On Line Help Manual Builder. This model can support multi institutional data enterprise model. User Management Module helps create, revoke, control users access and activities within the database. Business layer allows for creation of complex/logical data fields based on data interpretation by experts. OSD model Based Head and Neck Neoplasm Virtual Biorepository: It is Developing bioinformatics driven system to utilize multi model data sets from patient questionnaire, clinical, pathological, radiology and molecular systems Results in one architecture supported by a set of CDEs to facilitate basic science, clinical as well translational research Systems designed to facilitate semantic and syntactic interoperability in development of data elements (i.e., metadata or data descriptors using controlled vocabulary and ontology) Provides data entry, data mining and analysis tools. OSD Integration with other Data Sources: Genotype Lab data BIOS AP-LIS/ CP-LIS Bio-marker data Radiology (PET/CT) data Patient Insurance information SPORE H&N Neoplasm Database Epidemiology Project-1 questionnaire data Human Papilloma Virus Questionnaire data RIS Data Collection & Annotation Tool User Authentication Data Collection & Annotation Tool: User Management Module Data Collection & Annotation Tool Administrator can create, edit, revoke control user’s & their access to different applications Data Collection & Annotation Tool: Manual data collection module Case summary Data Collection & Annotation Tool Can switch quickly between different available applications as per user access rights Data Collection & Annotation Tool Quick over all review of Statistics on the collected database Data Collection & Annotation Tool Data Query template Data Collection & Annotation Tool: Standard view Data Collection & Annotation Tool Descriptions of different views for reference Data Collection & Annotation Tool Allows data export for Statistical analysis packages, such as SAS, etc. Data Collection & Annotation Tool Full Case Report View (Identified or Deidentified as per access level User can have multiple “My Case” lists for different studies Data Collection & Annotation Tool User can also select any data field to create personalized views & save under ”My Views” Data Collection & Annotation Tool Administrator can edit or create data views OSD based Databases Accruals: Total # Cases, Virtual Biorepository CPCTR Total Number of Biospecimens Tissue type Paraffin Blocks Frozen Blocks Blood/seru m/Plasma Prostate 7000 34641 17508 17508 Breast 3645 1760 847 823 Melanoma 1762 1885 168 112 Prostate 7327 5457 1642 415 EDRN Colorectal and Pancreatic Neoplasm Virtual Biorepository Pancreas and colon 2459 175 942 1254 SPORE’s Head & Neck Neoplasm Virtual Biorepository Head and Neck Neoplasm 11622 2237 0 1038 PCABC Amin et al. Tissue banking informatics 2010) Model Driven Database (MDD): NMVB is developed using a model-driven approach (MDD). Application components are generated from UML domain models. Java based application designed using a Model-Driven Development framework. MDD (contd.…) Web Tier: Construct web pages upon metadata dictionary Business Tier: Provides an object/relational mapping mechanism, a metadata interrogation mechanism, an application programming Interface and a set of shared services. Data Tier: Consists of domain database that houses clinically annotated data, indexes to support the query mechanism and security data. Virtual Component of NMVB: Statistical Data Query Interface Approved Investigator Query Interface Data Entry Interface www.mesotissue.org NMVB Accruals: Year Retrospective Cases Prospective Cases Overall NMVB Total 2006 515 8 523 2007 585 50 635 2008 605 105 710 2009 674 162 836 2010 (to date) 674 183 865 Conclusion: Informatics supported tissue banking initiatives act as a large source of annotated biospecimens and facilitates basic and clinical science research. Tissue banking infrastructure allows efficient governess, standardized capture of data and detailed standardized annotation at local institute and across multiple collaborating sites. Finally, tissue banking tools developed at DBMI (Department of biomedical informatics) provides an important knowledgebase for the development of integrated tissue banking efforts and benefit other tissue banking initiatives by providing consultation. Thank you