Core C: Data Management and Resource Dissemination Core Program Director/Principal Investigator (Last, First, Middle): Data Management and Resource Dissemination Core (Core C) 1. SPECIFIC AIMS The Data Management and Resource Dissemination Core (Core C) will provide the data management, data cleaning, data storage and data release to public databases for all information generated during this project. The core will inventory physical resources and submit them to public bio-repositories. In addition, Core C will provide statistical analysis, develop and audit standard operating procedures and compliance with good laboratory practices (GLP). Leadership: Dr. Dirk Dittmer, a leader in viral genome profiling and sequencing, leads Core C. Dr. Dittmer currently serves as the director of core laboratories (viral load, pathology, biomarkers, genomic) for the NIH funded AIDS malignancies clinical trials consortium (AMC) and as director of the Oral HIV/AIDS Research Alliance (OHARA) core laboratory. Over the past 10 years the Dittmer group has built an extensive computational infrastructure, which has now become a dedicated core within the UNC Department of Microbiology and Immunology. The group is experienced in the management, analysis, and interpretation of large volumes of complex data, such as expression profiling data, NextGen sequencing data and high throughput biochemical assays. Infrastructure: The Data Management and Resource Dissemination Core (Core C) will physically be located within the UNC Vironomics Core (https://www.med.unc.edu/vironomics). This facility was inaugurated two years ago as a core laboratory for high-throughput and genomics experiments in Virology. It is subsidized by the Lineberger comprehensive cancer center (LCCC) and operates under good clinical laboratory (GCLP) and good laboratory practices (GLP) standards. Clinical viral load assays, clinical genome profiling projects and high-throughput experimental projects are conducted through the core. The same, standardized and quality-controlled environment will be used for this project. Core C leverages these resources to connect the investigators from the Research Projects 1-3 and Cores A-B, and to support the complex data flow for this U19 program. Key Features: The Data Management and Resource Dissemination Core (Core C) responds to specific requirements of the RFA. o The core will establish and maintain a sophisticated laboratory information management system (LIMS) to facilitate the tracking of assay data as well as samples and reagents, the processing of raw data through analysis pipelines such as Z standardization, and communication between all projects and cores. This will be built on top of an open source SQL database environment (LabKey) and modeled after our clinical samples inventory and tracking system. o The core will provide primary secure data storage on a dedicated server with automatic backup to the central UNC 1800+ core HP Linux cluster (Kure). It will provide resources for computationally intensive processes like RNAseq analysis, and develop job scheduling and interfaces to the campus-wide UNC “Kure” computing cluster. o The core will provide statistical support and central QC/QA for all biochemical assays performed in this project. The core will develop, maintain and certify SOP in accordance with GLP standards. o The core will be responsible for timely dissemination of data and reagents to public repositories, NIH databases, and the research community at large. The Specific Aims of the Data Management and Resource Dissemination Core C derive from the requirements of the RFA and reflect the features and infrastructure of the core: Aim 1: Develop and maintain a central data, resource and sample management system. Aim 2: Ensure that all data is of high quality, conforms to GLP specifications, and is disseminated to all U19 project members and the scientific community at large in a timely fashion. Aim 3: Support the functional, bioinformatics, and statistical design and analysis of experimental data. PHS 398/2590 (Rev. 06/09) Page Continuation Format Page Core C: Data Management and Resource Dissemination Core Program Director/Principal Investigator (Last, First, Middle): 2. SIGNIFICANCE Successful implementation and execution of the research proposed in this U19 program necessitates a highly organized, central data management and communications system. The role of the Data Management and Resource Dissemination Core (Core C) is to facilitate the communication among Projects 1-3 and Cores A-B. Indeed the amount of data is not overly large, since the focus of this U19 program is to provide detailed biochemical data and since we only use expression profiling and Next-Gen sequencing in some of the aims. Core C fulfills the requirements stipulated in the RFA for a central hub for information and dissemination of data to the researchers in the program and to the scientific community (Figure 1). Core C will also provide the statistical assistance in design and evaluation of individual assays for each research project. We will seamlessly link all program personnel by facilitating data communication and integration across research projects and cores. As discussed in the approach section, we will leverage an established and customizable software system and dedicated hardware infrastructure that is available through the UNC Vironomics core. This U19 program benefits significantly from the trackrecord in systems biology and pre-existing collaborations between the Baric, Heise Damania, and Dittmer research laboratories (10, 15, 16, 22, 23, 25, 28, 30) Core C will play a central role throughout the project by: 1) providing data management and processing avenues for all aspects of the research; 2) furnishing a communication environment for monitoring the status of experiments and samples and a route for exchanging datasets and results; 3) working interactively with the other projects to provide statistical support; and 4) be responsible for resource distribution within the U19 cooperative and to the scientific community. We will work with program investigators to develop standard operating protocols (SOP) for experimental design, sample naming, data processing, annotation, quality control, and upload into a dedicated laboratory information management system (LIMS). This will capitalize on Dr. Dittmer’s expertise in developing laboratory assays, and sample flow guidelines for NIH cooperative trial groups (see publ. (4, 7, 14, 20, 21, 26)). Further, we will provide expertise and support during the validation and analysis of statistical and computational designs and will play an important role in facilitating the prioritization of hypothetical open reading frames (ORFs), unknown ORFs, and non-coding RNAs as proposed by the individual projects in this U19. Figure 1: Outline of data and reagent flow. Candidate uORFs, ncRNAs and hypothetical gene sequences are submitted to Core C, where they undergo prioritization and a project master file is opened. The sequence is then forwarded to Core B to generate expression constructs and detection reagents. Core B forwards these resources to the U19 projects as well as back to Core c (blue arrow) for submission to public repositories. Each of the projects delivers assay data to Core C, which performs QC, stores the data, and together with Core A disseminates direct data submission to public repositories. Note the unidirectional flow of data, which facilitates process management, also the projects work in parallel, so that their throughput is not interdependent. Only validated reagent are released from Core B, used at the same time in all projects. 3. INNOVATION The Data Management and Resource Dissemination Core C will develop a dedicated LIMS based on the open-source LabKey Server and its underlying SQL database (http://labkey.com). The Labkey server provides a flexible module-based system that is especially suited to this U19 program as it allows for the integration of diverse data types. This system will facilitate the tracking of experiments and samples, the processing of raw data. The Labkey software will be installed on a dedicated server. Hardware and software will be expanded annually to grow with the project. In addition, this U19-dedicated infrastructure will provide secure data storage, PHS 398/2590 (Rev. 06/09) Page Continuation Format Page Core C: Data Management and Resource Dissemination Core Program Director/Principal Investigator (Last, First, Middle): resources for computationally intensive processes such as RNAseq analysis and the dissemination of data to the projects, cores, and the research community. Core C has experience with configuring and maintaining LIMS systems. It currently operates several LIMS, some proprietary to harmonize with NIH clinical trials, others are open source. This infrastructure was developed by the Dittmer lab to support real-time qPCR based high-throughput profiling of mRNA and micro RNA (4, 7, 11, 14, 19-21). In addition, the existing LIMS manage data from chemical screens (31) and transgenic mouse experiments (9, 27). We will draw on this experience and our hardware and software infrastructure to ensure timely implementation and maintenance of project-specific LIMS environment. The system will be configured to enable data security, referential integrity checks, automatic scheduling of backups, and maintenance. As some of the core project assays will yield qualitative information (Western-Blot images), the core will also be responsible for unified quantitative scanning and interpretation of the data using open source NIH image software. This will streamline analysis and data access for personnel working on this grant. Rather than maintaining a lot of the information internally, we plan to submit all suitable data to NIH archives (GEO, SRA, Genbank) on a 6-12 months schedule. Likewise all reagents generated in this U19 project will be submitted to ATCC or Addgene quarterly. The Data Management and Resource Dissemination Core (Core C) will work closely with Core A to resolve certain administrative aspects (MTA, transport certifications, etc.) of making physical reagents available to public repositories. 4. APPROACH Aim 1: Develop and maintain a dedicated data management system. Rationale. Core C will work with Research Projects 1-3 and Cores A-B to define and implement a programwide data communication plan that ensures the efficient capture and subsequent transparency of all grantgenerated resources (this includes data and physical reagents). As a component of this plan, we will configure and maintain a dedicated, central server for data storage and dissemination. The central server will house assay read-outs, genomic, and sample information provided by the Research Projects. Thus the U19 group has complete control over all software and hardware that supports the project. Core C will act as a central hub for data storage, management, internal, and external data dissemination. We are fortunate to have this resource and institutional support in form of LCCC subsidies to the core available for this project. Dongmei Yang will develop a suitable web-interface on top of existing SQL database, using some of the same approaches she developed when designing a Proteomics/MassSpec database (see publ. (32)). In addition we will draw upon the expertise of the UNC bioinformatics group, which supports large sequencing efforts on campus, such as ENCODE and TCGA consortium projects. Design and Methods: (1) Core C server and task-specific LIMS. Core C is experienced in the implementation of customized, data management systems and currently uses multiple LIMS to support NIH clinical trials. These include first and foremost a dedicated Informatics Server. This server integrates a Network Attached Storage (NAS) using the Synology platform (http://www.synology.com), a UNIX-based 16-core, 3 TB file server and storage space on the UNC 1800-core HP Linux cluster (22TB workspace, unlimited tape-drive space). This server houses the website, web-based SOP (http://www.med.unc.edu/vironomics/protocols), and the sample submission Webportal. In addition, the server also disseminates the output of NextGen Sequencing data via a secure ftp site. Secondly, we maintain a proprietary LIMS for high-throughput real-time qPCR data (Roche Inc.) and sample inventory for clinical trials using trial specific distributed systems (GlobalTrace™ and FSTRF). In sum, we are very familiar with multiple data management approaches and have operated a LIMS for many years. (2) The open-source LabKey LIMS. For the purpose of this U19 application we plan to rely on the open source LabKey software (http://labkey.com/). From a data management perspective, the LabKey Server contains a number of compelling features that built upon basic functionalities that form the SQL standard. PHS 398/2590 (Rev. 06/09) Page Continuation Format Page Core C: Data Management and Resource Dissemination Core Program Director/Principal Investigator (Last, First, Middle): These include a database consistency checker that performs referential integrity checks, property and domain consistency checks, and schema consistency checks. Important for the integration of biochemical assays as proposed here, LabKey supports excel based data entry. Excel files are easily uploaded, or data can be copy/pasted into customizable project folders. A special feature of depositing large data sets on this site is a search function that can search through data sets or files for any key word the user is looking for. This is important when one user uploads files, but other users do not know what they were named, just what they contained. Once data is uploaded and integrated into specific projects, U19 members can then sort or filter specific columns to view specific data sets. Trends and reports are directly created from the selected data and visualized. LabKey Server runs under apache tomcat and accesses a relational database. It can be installed and deployed on windows, linux/unix and macOS operating systems. The web-based server allows users to access a workspace via web with role-based security and user authentication. LabKey Server allows flexible data management and secure data sharing, but also provides users a set of tools for analyzing the data. These tools include sorting/filtering as well as built-in visualization and analysis packages such as the open source statistical environment R (http://www.r-project.org/). Users can use the full power of the R programming environment to analyze data and create R views using the R script engine. LabKey Server provides NGS (initial focus on 454 instrument) data management and pipeline automation though one or more Galaxy genotyping workflows on a run. Improvement and expansion on the NGS capabilities are expected on the next several releases. LabKey facilitates intra-group and inter group collaboration. Our U19 group will manage and follow complex data-sets that may require different users to complete different tasks, Labkey calls this “Issue tracker”. From start to finish, this site is set up to enable users to upload 100’s of files, integrate those files and data sets into one database, and allow users to analyze and create reports all in one location We will work with the Administrative Core (Core A) to manage user accounts and define data access security roles. We will also work with Cores B and Research Projects 1-3 to customize the environment to meet projectspecific needs. (3) Data encapsulation. The guiding concept of our design will be encapsulation of data and meta information, which is borrowed from object oriented program such as has become integrated into NextGen sequencing data storage and analysis e.g. NCBI SRA archive (http://www.ncbi.nlm.nih.gov/ books/NBK47539/#SRA_Overview_BK). For real-time qPCR experiments and data we will follow the MIQE: Minimum Information for Publication of Quantitative Real-Time PCR Experiments standard (2). We will develop similar approaches and standardized reporting formats for the biochemical assays proposed by the different projects. (4) Dissemination of program information to program investigators —Core C will be responsible for disseminating protocols, experimental data, high-throughput data, and computational results. We will upload all data and results to the LIMS and publish a quarterly newsletter by e-mail, which includes progress measures and program updates. (5) Integration with and leverage of existing UNC resources. We do not expect to design a Bioinformatics and LIMS system from scratch as we have an outstanding infrastructure already in place at UNC. UNC is part of the NIH Cancer Genome ATLAS (http://genomics.unc.edu/projects/tcga.html) and the NIH Encode Project (http://genomics.unc.edu/projects/encode.html). Hence, in the case of NextGen sequencing data warehousing, QC, analysis and assemble, we already have a large collection of customized module available using the open source Galaxy programming environment (http://galaxyproject.org/) on the Kure cluster. Similarly, UNC is host to one of NIH HTS screening centers (http://pharmacy.unc.edu/research/centers/centerfor-integrative-chemical-biology-and-drug-discovery/summary-of-capabilities) and a Center for AIDS research, which houses an extensive biorepository. Each of these cores has experience with inventory and analysis systems, which we can rely on. PHS 398/2590 (Rev. 06/09) Page Continuation Format Page Core C: Data Management and Resource Dissemination Core Program Director/Principal Investigator (Last, First, Middle): (6) Training and support — Given the importance of central data management, it is imperative that all personnel are able to successfully use the system. We will provide written documentation to support the use of the system and, as needed, provide user training to members of the grant that are unfamiliar with the system. We envision initial trainings to cover topics such as security features within the system, implementation of SOP for data upload, data access, and adherence to pipeline protocols. The format and implementation of these trainings will be dictated based on the needs of grant personnel and may include onsite or remote training through Web conferences. Core C has abundant experience in providing training to end-users and will also provide ad-hoc user support. (7) Dissemination of program information to the scientific community — Release of information to the scientific community will occur through the public webpages and through deposit of data into public repositories. The public pages will describe the program, its mission, research projects and personnel, and educational opportunities. The site will provide access to publications, news and events related to the program, and links to data, SOP and related information. Rather than being responsible for the physical distribution it will provide links and template for reagent requests from Addgene (http://www.addgene.org) for program-generated vectors and plasmids, the ATCC (http://www.atcc.org/) for biologics including antibodies. The generation of reagents for further studies and external validation is as much a part of our proposal as the generation of specific functional data for each of the unknown and hypothetical ORFs, small and large non-coding RNAs. The advantage of using established biorepositories rather than duplicating their functionality on site, are standardization, added value through QC and administrative experience with handling reagents, which may have use and distribution restrictions. In addition to maintaining the Web portal, the Data Management and Resource Dissemination Core will be responsible for deposition of finished datasets into public archives. We will coordinate the release of highthroughput data into the appropriate public repositories such as GEO for microarray (e.g. GSE25839 and GSE28684 from our publication (25)) and sequencing data (e.g. genbank entry JQ619843.1, which describes the first sequence of a clinical isolate of KSHV from our publ. (28)). To ease access to these data, links to these public repositories will be included within the web-site. Expected Outcome, pitfalls, and future directions: The level of communication necessary for this program creates some challenges. The amount of data is not a great challenge compared to other primarily sequencing or systems biology-based programs. The variety of data is the main challenge, since some of the projects will generate high-throughput data, others conventional biochemical and low-throughput laboratory data. Finally all projects will generate physical entities, i.e. reagents. This creates a data management challenge in capturing all appropriate metadata related to not only the experimental procedure, but also detailed data and data processing details. We address this issue by using data and protocol encapsulation as the guiding principle for our database design and by using an existing, open source framework (LabKey) as a unified environment. An ongoing challenge that we will address is the timely communication of results between program personnel. We will implement a tracking system that will ensure that all grant members have access to the real-time status of projects executed at any of the research sites. This tracking system will include centralized status updates and automated email notifications that can be customized for specific groups of personnel. AIM 2: Ensure that all data is of high quality, conforms to GLP specifications, and is disseminated timely to all project members and the scientific community at large. Rationale. We will leverage an existing infrastructure to develop and implement data processing pipelines as well as SOP for all assay data and reagents that are generated by the Expression and Biochemical Evaluation Core (Core B) and the individual projects. These will include quality control protocols specialized for each of the technical platforms (biochemical assays, next-generation sequencing, and expression profiling). Based on the prior experience of Core C as a central lab for NIAD sponsored clinical trials we will follow NIAD DAIDS guidelines for good clinical laboratory practice standards (GCLP) modified for the specific assays used in the U19 proposal. PHS 398/2590 (Rev. 06/09) Page Continuation Format Page Core C: Data Management and Resource Dissemination Core Program Director/Principal Investigator (Last, First, Middle): These pipelines will include quality control protocols specialized for each of the technical platforms and will ensure that all information required for computational analysis is standardized (Z-scored by assay) to the needs of downstream analysis and is readily accessible to all personnel. In addition, these pipelines will play a critical role in dissemination of grant-related data and resources. To facilitate public access to grant-generated data, and resources, we will work closely with the Administrative Core (Core A) to develop a public interface and to submit data and resources (vectors) to public repositories (NCBI, Addgene). All records will be maintained to confirm to 21 CFR 312.62. (1) Program-wide SOP — Prior to the generation of data, we will work with the Research Projects and Cores to develop grant-wide SOP and data templates that will allow information about experiments to be easily uploaded and tracked. These SOP will apply to all grant-generated data, ensuring a uniform annotation for data communication and downstream integration. These SOP will facilitate effective project management by the Administrative Core and will outline the specific requirements for experimental design templates, experiment and sample naming and tracking, and manuscript preparation tracking. (2) QC/QA Implementation and sample tracking. These SOP will guide the implementation of good laboratory practices (GLP) across all projects of this proposal. Core C will perform annual audits to verify implementation, QC and QA for all individual assays. We will implement Levey-Jennings charts as a central measure of assay performance. We will work closely with the Administrative Core to customize the system using these SOP (See Aim 1, above) and provide the basis for monitoring samples and projects. This tracking system will incorporate features such as email notifications when data is uploaded to the system, and facilitate communication between personnel helping to ensure timely progress is made toward the goals of the grant. (3) Application-specific SOP — Given the complexity of the proposed research and the number of sources for data, it is critical that data pipeline protocols are determined prior to the collection and sharing of data. To facilitate the generation of uniform data, we will work with the Research Projects and Cores to generate application-specific SOP. The result will be the generation of experimental design templates that will capture and annotate each experiment. These templates will include (but are not limited to) information about sample source, treatments, replicates, and analytical methods. Existing infrastructure and protocols within the Research Projects and Cores will streamline this process, ensuring that templates compatible with LabKey are generated and rapidly implemented. (4) Data processing. We will also work with the Research Projects and Cores to generate specific guidelines and protocols for data processing. These protocols will be application-specific and delineate requirements for QC, data format, and pre-processing prior to submission to the Core. The guiding principle will be submission of all raw data bundled with experimental details to the Core. To support these pipelines, LabKey is capable of importing data in a variety of formats, including xml, excel, tsv, csv, or plain text files. The platform is extensible, and support for new formats or standards can be developed and integrated into the existing framework. At least for automated and high throughput assays, our experience has been that separating data analysis from data generation increases the robustness of the data, analogous to the double-blind study principle of clinical trials. Expected Outcome, pitfalls, and future directions: We are fortunate that Core C members have experience in GCLP and are trained under the much more demanding guidelines for clinical laboratory work. Hence, we do not foresee technical difficulties in implementing a rigorous quality control (QA) and quality assurance program (QA) for this U19 project. Implementing GLP standards across the different projects and biochemical assays will require some lead-time. We believe that we have adequate time as the first batch of unknown ORFs, hypothetical ORFs and noncoding RNAs first will have to be cloned and verified before subjected to the assays. While cloning and reagent building is in progress, Core C will develop and implement the data and quality assurance standards. Aim 3: Provide support for the functional analysis of experimental data. Rationale: Core C will provide computational and statistical analysis support for Core B and Projects 1-3. We will provide computational and statistical analysis support for the interpretation of data and results. As the different assays yield different results we will provide quantitative QC, QA measures. One approach we will explore is to convert all assay data into Z-standardized scores. This will allow us to compare across PHS 398/2590 (Rev. 06/09) Page Continuation Format Page Core C: Data Management and Resource Dissemination Core Program Director/Principal Investigator (Last, First, Middle): biochemical assays and with existing gene expression and short read data. All quality control and processed data will be captured through the unified LIMS structure, thus making them accessible to all research personnel and the scientific community at large. Design and Methods Principles of analyses. Having a central data storage and management core facilitates QC/QA in data analysis. Specifically, we will follow the operating guidelines (GLP, GCLP) developed for clinical trials, i.e. adhere to the same standards of data integrity that are required for research that supports FDA applications. In its simplest incarnation, we will establish a project and time dependent data locking procedure, at which point no investigator can change the data. Only once the assay data for a particular unidentified ORF, noncoding RNA or hypothetical gene have passed QC and are locked will we proceed to analysis and publication. This is the standard for clinical trials and safe guards data integrity against “fudging” by individual investigators, which led to unfortunate situations in the past. Our approach is based on four key organizing principles: Central Open Source data management using LabKey. Because all data is housed in a central LIMS, this facilitates central analysis and automatic fact/ data integrity checking. It also facilitates regular automatic uploads into public databases and ongoing curation and interaction with NIH. Data encapsulation. We following the data model established by the SRA sequencing archive (see above) and store data and assay information in linked files. Open Source analysis using R statistics environment. We will conduct all our statistical analysis in R studio (http://www.rstudio.com/). R studio supports Sweave and knitr, two packages, which support dynamic reports and reproducible research. We use these routinely in our expression profiling studies (14, 25, 28) Reproducible research using Sweave is a concept in statistical research, analogous to encapsulation, whereby the analysis code and data become part of the final report/manuscript (pdf generated by Latex) and can be repeated by any reader (13). Analysis, pitfalls, and future directions: Providing support for the interpretation and integration of the functional data, generated by the individual projects is key to the success of the overall program. We are aware of the challenges that are associated with trying to combine diverse experimental approaches and systems into a common analysis. Communication across all research personnel will be key as well as ensuring the timely dissemination of results. One of the challenges will be to integrate the different assay formats and data types. Z-standardization as used in high throughout screens provides one approach, but it is by no means the only one. Pilot experiments and experiments with viral proteins and ncRNAs of previously identified functions will provide the opportunity to calibrate the assays across the multiple groups and cores of the application. We are fortunate to have extensive bioinformatics and statistics support on the UNC campus, to help us solve data integration and analysis problems as they arise. Importantly, we can look back on our track record and will leverage over 10 years of experience in the interpretation of medium and high-throughput data on viral and host mRNA and miRNA expression (see publ. (3-8, 11, 14, 16-19, 21, 24, 25)), and the design and statistical analysis of biochemical and virologic screens in medium and high throughput formats (see publ. (1, 12, 15, 20, 29, 31)). 5. TIMELINE During the first months of Year 1, the Data Management and Resource Dissemination Core will implement and configure the central server and facilitate the development of SOP for data generation and processing pipelines. We will increase our hardware capacity annually. One of the benefits, time-wise, lies in the fact that the core will have some lead time while unified expression constructs and reagents are generated for each of the targeted unknown and hypothetical ORFs, and non-coding RNAs. This will provide ample time for database development such that data capture, SOPs, analysis methods and dissemination functionality will in place PHS 398/2590 (Rev. 06/09) Page Continuation Format Page Core C: Data Management and Resource Dissemination Core Program Director/Principal Investigator (Last, First, Middle): before when the first biochemical data emerge. Analytical support of functional interpretation and statistical support and maintenance will occur throughout the 5-year period. 6. VERTEBRATE ANIMALS: n/a 7. PROTECTION OF HUMAN SUBJECTS: n/a, no patient contact, no patient data, no cell lines. 8. INCLUSION ENROLLMENT REPORT: n/a 9. INCLUSION OF WOMEN AND MINORITIES: n/a 10. INCLUSION OF CHILDREN: n/a 11. RESOURCE SHARING: No GWAS data will be generated during this project. To share resources with the academic research community, we will use the uniform Material Transfer Agreement (MTA), which basically acknowledges that the materials are proprietary to Institutions of the Cooperative Agreement and permitting their use in a manner that is consistent with the Bayh-Dole Act and NIH funding requirements. Our individual NIH research grants require that research be made available to the scientific community and public. The primary method of data sharing is through peer-reviewed publications in scientific journals and by presentation at scientific meetings. In addition, data and results created from NIH supported research will be submitted to NIH in annual progress reports per the terms and conditions of this award. Core C together with Core A will be responsible for resource sharing. These two cores will ensure adherence to standards pertaining to dual use research of concern (DURC) and place restrictions on dissemination as described below. This project will generate the following information: Sequence data: Virus-specific primers and diagnostic assays, which we will make available as supplemental data to any publication or as posts on our website. Sequence information. Cores will ensure that all raw and manipulated datasets be made available to the public. We will submit consensus sequences to genbank as they are generated. We will make raw sequence reads available through submission to the NCBI short read archive at http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi as described above. We will make the specialized protocols available to the scientific community via publications in methods journals as we have done in the past. PHS 398/2590 (Rev. 06/09) Page Continuation Format Page Core C: Data Management and Resource Dissemination Core Program Director/Principal Investigator (Last, First, Middle): Any microarray data are 'Minimum Information About a Microarray Experiment' (MIAMI) compliant and will be submitted to the NCBI GEO database at http://www.ncbi.nlm.nih.gov/gds. Plasmids: Plasmids will be submitted to the Addgene repository (http://www.addgene.org/pgvec1) upon publication, except where select agent or dual use concerns require a more restrictive distribution. In those cases Core A staff will also acquire appropriate letters from the recipient institutions environmental health and safety officers and help coordinate CDC and/or USDA and Department of Commerce permits, prior to shipping of plasmids. Core A has a DOT trained and certified shipper. The program faculty will not send reagents to individuals or institutions that do not have appropriate documentation of appropriate select agent BSL3 facilities, might harbor ill-intentions or are conducting irresponsible research. Core A will work closely with the appropriate institutional Technology Transfer Office, Addgene and individuals involved in these transactions. The goal will be to provide reagents within a few months of receiving a request. Cell lines at other reagents: Novel cell lines will be submitted to the American Tissue culture collection (ATCC, http://www.atcc.org/) upon publication, unless the parent cell line is covered by an MTA that prohibits public access. In such cases cell lines will be made available upon request. Other reagents will be made available through the general BEI Resources clearing house as appropriate. For all other reagents/requests, a portion of each monthly conference call will involve a discussion of requests for reagents/collaborations. Project members have established a consistent process for evaluating requests for samples and reagents from outside scientists. In order of priority, these include: 1) requests for reagents that have been published in peer-reviewed journals; 2) requests which enhance/promote a specific agenda of the program projects and faculty; 3) requests that promote scientifically valid collaborations between project faculty and outside scientists; and 4) overall SARS-CoV, IAV, Ebola and HHV8 research and public health needs. The general format involves: a) establishing a working knowledge of the research agenda and credentials of the requestor, b) group discussion and agreement, 3) MTA agreement with the appropriate institution, or license agreement with a commercial entity, and 4) inventory checking and sending out of reagents. Core A will work closely with the appropriate institutional Technology Transfer Office and individuals involved in these transactions. The goal will be to provide reagents within a few months of receiving a request. Transgenic mice: No new transgenic mice will be generated in the course of this project. Specialized mice are based on commercially available strains (Jax Lab Inc.). Their distribution is governed by strain-specific MTAs. Mice will be submitted the NIH mutant mouse resource: http://www.mmrrc.org/index.html. UNC is one of the members of this NIAID approved organization. Restriction on Resource Sharing and compliance to limit information of pertaining to dual use research of concern. Core A staff will also acquire appropriate letters from the recipient institutions environmental health and safety officers and help coordinate CDC and/or USDA and Department of Commerce permits, prior to shipping of live SARS-CoV/WNV and influenza H1N1 2009 as appropriate. Dr. Sims is a DOT trained and certified shipper for Core A and all other Projects and Cores have their own trained shippers. The program faculty will not send reagents to individuals or institutions that do not have appropriate documentation of appropriate BSL2 (IAV) or BSL3 facilities might harbor ill-intentions or are conducting irresponsible research. 12. Select Agent Research: PHS 398/2590 (Rev. 06/09) Page Continuation Format Page Core C: Data Management and Resource Dissemination Core Program Director/Principal Investigator (Last, First, Middle): Core C does not handle, or store biological or chemical agents, only data. All physical reagents will be stored and handled by Core B and used in the projects. UNC is a registered Select Agent entity and SARS-CoV is in the process of being added to the license. A detailed outline of the containment and biohazard plan is provided in core A, which administers biohazard and select agent research for the overall U19 project. PHS 398/2590 (Rev. 06/09) Page Continuation Format Page Core C: Data Management and Resource Dissemination Core Program Director/Principal Investigator (Last, First, Middle): 13. BIBLIOGRAPHY 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. Bhatt, A. P., S. R. Jacobs, A. J. Freemerman, L. Makowski, J. C. Rathmell, D. P. Dittmer, and B. Damania. 2012. Dysregulation of fatty acid synthesis and glycolysis in non-Hodgkin lymphoma. Proc Natl Acad Sci U S A 109:11818-11823. Bustin, S. A., V. Benes, J. A. Garson, J. Hellemans, J. Huggett, M. Kubista, R. Mueller, T. Nolan, M. W. Pfaffl, G. L. Shipley, J. Vandesompele, and C. T. Wittwer. 2009. The MIQE guidelines: minimum information for publication of quantitative real-time PCR experiments. Clin Chem 55:611-622. Chang, H., D. P. Dittmer, Y. C. Shin, Y. Hong, and J. U. Jung. 2005. Role of Notch signal transduction in Kaposi's sarcoma-associated herpesvirus gene expression. J Virol 79:14371-14382. Chugh, P., K. Tamburro, and D. P. Dittmer. 2010. Profiling of pre-micro RNAs and microRNAs using quantitative real-time PCR (qPCR) arrays. J Vis Exp. Dittmer, D. P. 2011. Restricted Kaposi's sarcoma (KS) herpesvirus transcription in KS lesions from patients on successful antiretroviral therapy. MBio 2:e00138-00111. Dittmer, D. P. 2003. Transcription profile of Kaposi's sarcoma-associated herpesvirus in primary Kaposi's sarcoma lesions as determined by real-time PCR arrays. Cancer Res 63:2010-2015. Dittmer, D. P., C. M. Gonzalez, W. Vahrson, S. M. DeWire, R. Hines-Boykin, and B. Damania. 2005. Whole-genome transcription profiling of rhesus monkey rhadinovirus. J Virol 79:8637-8650. Fakhari, F. D., and D. P. Dittmer. 2002. Charting latency transcripts in Kaposi's sarcoma-associated herpesvirus by whole-genome real-time quantitative PCR. J Virol 76:6213-6223. Fakhari, F. D., J. H. Jeong, Y. Kanan, and D. P. Dittmer. 2006. The latency-associated nuclear antigen of Kaposi sarcoma-associated herpesvirus induces B cell hyperplasia and lymphoma. J Clin Invest 116:735-742. Gregory, S. M., L. Wang, J. A. West, D. P. Dittmer, and B. Damania. 2012. Latent Kaposi's sarcomaassociated herpesvirus infection of monocytes downregulates expression of adaptive immune response costimulatory receptors and proinflammatory cytokines. J Virol 86:3916-3923. Hilscher, C., W. Vahrson, and D. P. Dittmer. 2005. Faster quantitative real-time PCR protocols may lose sensitivity and show increased variability. Nucleic Acids Res 33:e182. Hilton, I. B., and D. P. Dittmer. 2012. Quantitative analysis of the bidirectional viral G-protein-coupled receptor and lytic latency-associated nuclear antigen promoter of Kaposi's sarcoma-associated herpesvirus. J Virol 86:9683-9695. Hothorn, T., and F. Leisch. 2011. Case studies in reproducibility. Brief Bioinform 12:288-300. Lock, E. F., R. Ziemiecke, J. Marron, and D. P. Dittmer. 2010. Efficiency clustering for low-density microarrays and its application to QPCR. BMC Bioinformatics 11:386. Nun, T. K., D. J. Kroll, N. H. Oberlies, D. D. Soejarto, R. J. Case, P. Piskaut, T. Matainaho, C. Hilscher, L. Wang, D. P. Dittmer, S. J. Gao, and B. Damania. 2007. Development of a fluorescencebased assay to screen antiviral drugs against Kaposi's sarcoma associated herpesvirus. Mol Cancer Ther 6:2360-2370. O'Hara, A. J., P. Chugh, L. Wang, E. M. Netto, E. Luz, W. J. Harrington, B. J. Dezube, B. Damania, and D. P. Dittmer. 2009. Pre-micro RNA signatures delineate stages of endothelial cell transformation in Kaposi sarcoma. PLoS Pathog 5:e1000389. O'Hara, A. J., W. Vahrson, and D. P. Dittmer. 2008. Gene alteration and precursor and mature microRNA transcription changes contribute to the miRNA signature of primary effusion lymphoma. Blood 111:2347-2353. O'Hara, A. J., L. Wang, B. J. Dezube, W. J. Harrington, Jr., B. Damania, and D. P. Dittmer. 2009. Tumor suppressor microRNAs are underrepresented in primary effusion lymphoma and Kaposi sarcoma. Blood 113:5938-5941. Papin, J., W. Vahrson, R. Hines-Boykin, and D. P. Dittmer. 2005. Real-time quantitative PCR analysis of viral transcription. Methods Mol Biol 292:449-480. Papin, J. F., W. Vahrson, and D. P. Dittmer. 2004. SYBR green-based real-time quantitative PCR assay for detection of West Nile Virus circumvents false-negative results due to strain variability. J Clin Microbiol 42:1511-1518. PHS 398/2590 (Rev. 06/09) Page Continuation Format Page Core C: Data Management and Resource Dissemination Core Program Director/Principal Investigator (Last, First, Middle): 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. Papin, J. F., W. Vahrson, L. Larson, and D. P. Dittmer. 2010. Genome-wide real-time PCR for West Nile virus reduces the false-negative rate and facilitates new strain discovery. J Virol Methods 169:103111. Peng, X., L. Gralinski, C. D. Armour, M. T. Ferris, M. J. Thomas, S. Proll, B. G. Bradel-Tretheway, M. J. Korth, J. C. Castle, M. C. Biery, H. K. Bouzek, D. R. Haynor, M. B. Frieman, M. Heise, C. K. Raymond, R. S. Baric, and M. G. Katze. 2010. Unique signatures of long noncoding RNA expression in response to virus infection and altered innate immune signaling. MBio 1. Peng, X., L. Gralinski, M. T. Ferris, M. B. Frieman, M. J. Thomas, S. Proll, M. J. Korth, J. R. Tisoncik, M. Heise, S. Luo, G. P. Schroth, T. M. Tumpey, C. Li, Y. Kawaoka, R. S. Baric, and M. G. Katze. 2011. Integrative deep sequencing of the mouse lung transcriptome reveals differential expression of diverse classes of small RNAs in response to respiratory virus infection. MBio 2. Renne, R., C. Barry, D. Dittmer, N. Compitello, P. O. Brown, and D. Ganem. 2001. Modulation of cellular and viral gene expression by the latency-associated nuclear antigen of Kaposi's sarcomaassociated herpesvirus. J Virol 75:458-468. Roy, D., S. H. Sin, B. Damania, and D. P. Dittmer. 2011. Tumor suppressor genes FHIT and WWOX are deleted in primary effusion lymphoma (PEL) cell lines. Blood 118:e32-39. Shiboski, C. H., J. Y. Webster-Cyriaque, M. Ghannoum, J. S. Greenspan, and D. Dittmer. 2011. Overview of the oral HIV/AIDS Research Alliance Program. Adv Dent Res 23:28-33. Sin, S. H., F. D. Fakhari, and D. P. Dittmer. 2010. The viral latency-associated nuclear antigen augments the B-cell response to antigen in vivo. J Virol 84:10653-10660. Tamburro, K. M., D. Yang, J. Poisson, Y. Fedoriw, D. Roy, A. Lucas, S. H. Sin, N. Malouf, V. Moylan, B. Damania, S. Moll, C. van der Horst, and D. P. Dittmer. 2012. Vironome of Kaposi sarcoma associated herpesvirus-inflammatory cytokine syndrome in an AIDS patient reveals coinfection of human herpesvirus 8 and human herpesvirus 6A. Virology 433:220-225. Wang, F. Z., D. Roy, E. Gershburg, C. B. Whitehurst, D. P. Dittmer, and J. S. Pagano. 2009. Maribavir inhibits epstein-barr virus transcription in addition to viral DNA replication. J Virol 83:1210812117. Wen, K. W., D. P. Dittmer, and B. Damania. 2009. Disruption of LANA in rhesus rhadinovirus generates a highly lytic recombinant virus. J Virol 83:9786-9802. Whitby, D., V. A. Marshall, R. K. Bagni, W. J. Miley, T. G. McCloud, R. Hines-Boykin, J. J. Goedert, B. A. Conde, K. Nagashima, J. Mikovits, D. P. Dittmer, and D. J. Newman. 2007. Reactivation of Kaposi's sarcoma-associated herpesvirus by natural products from Kaposi's sarcoma endemic regions. Int J Cancer 120:321-328. Yang, D., K. Ramkissoon, E. Hamlett, and M. C. Giddings. 2008. High-accuracy peptide mass fingerprinting using peak intensity data with machine learning. J Proteome Res 7:62-69. PHS 398/2590 (Rev. 06/09) Page Continuation Format Page