): Data Management and Resource Dissemination Core (Core C)

advertisement
Core C: Data Management and Resource Dissemination Core
Program Director/Principal Investigator (Last, First, Middle):
Data Management and Resource Dissemination Core (Core C)
1. SPECIFIC AIMS
The Data Management and Resource Dissemination Core (Core C) will provide the data management, data
cleaning, data storage and data release to public databases for all information generated during this project.
The core will inventory physical resources and submit them to public bio-repositories. In addition, Core C will
provide statistical analysis, develop and audit standard operating procedures and compliance with good
laboratory practices (GLP).

Leadership: Dr. Dirk Dittmer, a leader in viral genome profiling and sequencing, leads Core C. Dr. Dittmer
currently serves as the director of core laboratories (viral load, pathology, biomarkers, genomic) for the NIH
funded AIDS malignancies clinical trials consortium (AMC) and as director of the Oral HIV/AIDS Research
Alliance (OHARA) core laboratory. Over the past 10 years the Dittmer group has built an extensive
computational infrastructure, which has now become a dedicated core within the UNC Department of
Microbiology and Immunology. The group is experienced in the management, analysis, and interpretation
of large volumes of complex data, such as expression profiling data, NextGen sequencing data and high
throughput biochemical assays.

Infrastructure: The Data Management and Resource Dissemination Core (Core C) will physically be
located within the UNC Vironomics Core (https://www.med.unc.edu/vironomics). This facility was
inaugurated two years ago as a core laboratory for high-throughput and genomics experiments in Virology.
It is subsidized by the Lineberger comprehensive cancer center (LCCC) and operates under good clinical
laboratory (GCLP) and good laboratory practices (GLP) standards. Clinical viral load assays, clinical
genome profiling projects and high-throughput experimental projects are conducted through the core. The
same, standardized and quality-controlled environment will be used for this project. Core C leverages
these resources to connect the investigators from the Research Projects 1-3 and Cores A-B, and to
support the complex data flow for this U19 program.

Key Features: The Data Management and Resource Dissemination Core (Core C) responds to specific
requirements of the RFA.
o
The core will establish and maintain a sophisticated laboratory information management system
(LIMS) to facilitate the tracking of assay data as well as samples and reagents, the processing of
raw data through analysis pipelines such as Z standardization, and communication between all
projects and cores. This will be built on top of an open source SQL database environment (LabKey)
and modeled after our clinical samples inventory and tracking system.
o
The core will provide primary secure data storage on a dedicated server with automatic backup to
the central UNC 1800+ core HP Linux cluster (Kure). It will provide resources for computationally
intensive processes like RNAseq analysis, and develop job scheduling and interfaces to the
campus-wide UNC “Kure” computing cluster.
o
The core will provide statistical support and central QC/QA for all biochemical assays performed in
this project. The core will develop, maintain and certify SOP in accordance with GLP standards.
o
The core will be responsible for timely dissemination of data and reagents to public repositories,
NIH databases, and the research community at large.
The Specific Aims of the Data Management and Resource Dissemination Core C derive from the requirements
of the RFA and reflect the features and infrastructure of the core:
Aim 1: Develop and maintain a central data, resource and sample management system.
Aim 2: Ensure that all data is of high quality, conforms to GLP specifications, and is disseminated to all
U19 project members and the scientific community at large in a timely fashion.
Aim 3: Support the functional, bioinformatics, and statistical design and analysis of experimental data.
PHS 398/2590 (Rev. 06/09)
Page
Continuation Format Page
Core C: Data Management and Resource Dissemination Core
Program Director/Principal Investigator (Last, First, Middle):
2. SIGNIFICANCE
Successful implementation and execution of the research proposed in this U19 program necessitates a highly
organized, central data management and communications system. The role of the Data Management and
Resource Dissemination Core (Core C) is to facilitate the communication among Projects 1-3 and Cores A-B.
Indeed the amount of data is not overly large, since the focus of this U19 program is to provide detailed
biochemical data and since we only use expression profiling and Next-Gen sequencing in some of the aims.
Core C fulfills the requirements stipulated in the RFA
for a central hub for information and dissemination of data
to the researchers in the program and to the scientific
community (Figure 1). Core C will also provide the
statistical assistance in design and evaluation of individual
assays for each research project. We will seamlessly link all
program personnel by facilitating data communication and
integration across research projects and cores. As
discussed in the approach section, we will leverage an
established and customizable software system and
dedicated hardware infrastructure that is available through
the UNC Vironomics core.
This U19 program benefits significantly from the trackrecord in systems biology and pre-existing collaborations
between the Baric, Heise Damania, and Dittmer research
laboratories (10, 15, 16, 22, 23, 25, 28, 30)
Core C will play a central role throughout the project by: 1)
providing data management and processing avenues for all
aspects of the research; 2) furnishing a communication
environment for monitoring the status of experiments and
samples and a route for exchanging datasets and results;
3) working interactively with the other projects to provide
statistical support; and 4) be responsible for resource
distribution within the U19 cooperative and to the scientific
community. We will work with program investigators to
develop standard operating protocols (SOP) for
experimental design, sample naming, data processing,
annotation, quality control, and upload into a dedicated
laboratory information management system (LIMS). This
will capitalize on Dr. Dittmer’s expertise in developing
laboratory assays, and sample flow guidelines for NIH
cooperative trial groups (see publ. (4, 7, 14, 20, 21, 26)).
Further, we will provide expertise and support during the
validation and analysis of statistical and computational
designs and will play an important role in facilitating the
prioritization of hypothetical open reading frames (ORFs),
unknown ORFs, and non-coding RNAs as proposed by the
individual projects in this U19.
Figure 1: Outline of data and reagent flow.
Candidate uORFs, ncRNAs and hypothetical
gene sequences are submitted to Core C, where
they undergo prioritization and a project master
file is opened. The sequence is then forwarded to
Core B to generate expression constructs and
detection reagents.
Core B forwards these resources to the U19
projects as well as back to Core c (blue arrow) for
submission to public repositories. Each of the
projects delivers assay data to Core C, which
performs QC, stores the data, and together with
Core A disseminates direct data submission to
public repositories.
Note the unidirectional flow of data, which
facilitates process management, also the projects
work in parallel, so that their throughput is not
interdependent. Only validated reagent are
released from Core B, used at the same time in
all projects.
3. INNOVATION
The Data Management and Resource Dissemination Core C will develop a dedicated LIMS based on the
open-source LabKey Server and its underlying SQL database (http://labkey.com). The Labkey server provides
a flexible module-based system that is especially suited to this U19 program as it allows for the integration of
diverse data types. This system will facilitate the tracking of experiments and samples, the processing of raw
data. The Labkey software will be installed on a dedicated server. Hardware and software will be expanded
annually to grow with the project. In addition, this U19-dedicated infrastructure will provide secure data storage,
PHS 398/2590 (Rev. 06/09)
Page
Continuation Format Page
Core C: Data Management and Resource Dissemination Core
Program Director/Principal Investigator (Last, First, Middle):
resources for computationally intensive processes such as RNAseq analysis and the dissemination of data to
the projects, cores, and the research community.
Core C has experience with configuring and maintaining LIMS systems. It currently operates several LIMS,
some proprietary to harmonize with NIH clinical trials, others are open source. This infrastructure was
developed by the Dittmer lab to support real-time qPCR based high-throughput profiling of mRNA and micro
RNA (4, 7, 11, 14, 19-21). In addition, the existing LIMS manage data from chemical screens (31) and
transgenic mouse experiments (9, 27). We will draw on this experience and our hardware and software
infrastructure to ensure timely implementation and maintenance of project-specific LIMS environment. The
system will be configured to enable data security, referential integrity checks, automatic scheduling of backups,
and maintenance.
As some of the core project assays will yield qualitative information (Western-Blot images), the core will also
be responsible for unified quantitative scanning and interpretation of the data using open source NIH image
software. This will streamline analysis and data access for personnel working on this grant.
Rather than maintaining a lot of the information internally, we plan to submit all suitable data to NIH archives
(GEO, SRA, Genbank) on a 6-12 months schedule. Likewise all reagents generated in this U19 project will be
submitted to ATCC or Addgene quarterly. The Data Management and Resource Dissemination Core (Core C)
will work closely with Core A to resolve certain administrative aspects (MTA, transport certifications, etc.) of
making physical reagents available to public repositories.
4. APPROACH
Aim 1: Develop and maintain a dedicated data management system.
Rationale. Core C will work with Research Projects 1-3 and Cores A-B to define and implement a programwide data communication plan that ensures the efficient capture and subsequent transparency of all grantgenerated resources (this includes data and physical reagents). As a component of this plan, we will configure
and maintain a dedicated, central server for data storage and dissemination. The central server will house
assay read-outs, genomic, and sample information provided by the Research Projects. Thus the U19 group
has complete control over all software and hardware that supports the project.
Core C will act as a central hub for data storage, management, internal, and external data dissemination. We
are fortunate to have this resource and institutional support in form of LCCC subsidies to the core available for
this project. Dongmei Yang will develop a suitable web-interface on top of existing SQL database, using some
of the same approaches she developed when designing a Proteomics/MassSpec database (see publ. (32)).
In addition we will draw upon the expertise of the UNC bioinformatics group, which supports large sequencing
efforts on campus, such as ENCODE and TCGA consortium projects.
Design and Methods:
(1) Core C server and task-specific LIMS. Core C is experienced in the implementation of customized, data
management systems and currently uses multiple LIMS to support NIH clinical trials. These include first and
foremost a dedicated Informatics Server. This server integrates a Network Attached Storage (NAS) using the
Synology platform (http://www.synology.com), a UNIX-based 16-core, 3 TB file server and storage space on
the UNC 1800-core HP Linux cluster (22TB workspace, unlimited tape-drive space). This server houses the
website, web-based SOP (http://www.med.unc.edu/vironomics/protocols), and the sample submission Webportal. In addition, the server also disseminates the output of NextGen Sequencing data via a secure ftp site.
Secondly, we maintain a proprietary LIMS for high-throughput real-time qPCR data (Roche Inc.) and sample
inventory for clinical trials using trial specific distributed systems (GlobalTrace™ and FSTRF). In sum, we are
very familiar with multiple data management approaches and have operated a LIMS for many years.
(2) The open-source LabKey LIMS. For the purpose of this U19 application we plan to rely on the open
source LabKey software (http://labkey.com/). From a data management perspective, the LabKey Server
contains a number of compelling features that built upon basic functionalities that form the SQL standard.
PHS 398/2590 (Rev. 06/09)
Page
Continuation Format Page
Core C: Data Management and Resource Dissemination Core
Program Director/Principal Investigator (Last, First, Middle):
These include a database consistency checker that performs referential integrity checks, property and domain
consistency checks, and schema consistency checks.

Important for the integration of biochemical assays as proposed here, LabKey supports excel based data
entry. Excel files are easily uploaded, or data can be copy/pasted into customizable project folders. A
special feature of depositing large data sets on this site is a search function that can search through data
sets or files for any key word the user is looking for. This is important when one user uploads files, but
other users do not know what they were named, just what they contained. Once data is uploaded and
integrated into specific projects, U19 members can then sort or filter specific columns to view specific data
sets. Trends and reports are directly created from the selected data and visualized.

LabKey Server runs under apache tomcat and accesses a relational database. It can be installed and
deployed on windows, linux/unix and macOS operating systems. The web-based server allows users to
access a workspace via web with role-based security and user authentication.

LabKey Server allows flexible data management and secure data sharing, but also provides users a set of
tools for analyzing the data. These tools include sorting/filtering as well as built-in visualization and analysis
packages such as the open source statistical environment R (http://www.r-project.org/). Users can use the
full power of the R programming environment to analyze data and create R views using the R script engine.

LabKey Server provides NGS (initial focus on 454 instrument) data management and pipeline automation
though one or more Galaxy genotyping workflows on a run. Improvement and expansion on the NGS
capabilities are expected on the next several releases.

LabKey facilitates intra-group and inter group collaboration. Our U19 group will manage and follow complex
data-sets that may require different users to complete different tasks, Labkey calls this “Issue tracker”.
From start to finish, this site is set up to enable users to upload 100’s of files, integrate those files and data
sets into one database, and allow users to analyze and create reports all in one location
We will work with the Administrative Core (Core A) to manage user accounts and define data access security
roles. We will also work with Cores B and Research Projects 1-3 to customize the environment to meet projectspecific needs.
(3) Data encapsulation. The guiding concept of our design will be encapsulation of data and meta information,
which is borrowed from object oriented program such as has become integrated into NextGen sequencing data
storage
and
analysis
e.g.
NCBI
SRA
archive
(http://www.ncbi.nlm.nih.gov/
books/NBK47539/#SRA_Overview_BK). For real-time qPCR experiments and data we will follow the MIQE:
Minimum Information for Publication of Quantitative Real-Time PCR Experiments standard (2). We will develop
similar approaches and standardized reporting formats for the biochemical assays proposed by the different
projects.
(4) Dissemination of program information to program investigators —Core C will be responsible for
disseminating protocols, experimental data, high-throughput data, and computational results. We will upload all
data and results to the LIMS and publish a quarterly newsletter by e-mail, which includes progress measures
and program updates.
(5) Integration with and leverage of existing UNC resources. We do not expect to design a Bioinformatics
and LIMS system from scratch as we have an outstanding infrastructure already in place at UNC. UNC is part
of the NIH Cancer Genome ATLAS (http://genomics.unc.edu/projects/tcga.html) and the NIH Encode Project
(http://genomics.unc.edu/projects/encode.html). Hence, in the case of NextGen sequencing data warehousing,
QC, analysis and assemble, we already have a large collection of customized module available using the open
source Galaxy programming environment (http://galaxyproject.org/) on the Kure cluster.
Similarly, UNC is host to one of NIH HTS screening centers (http://pharmacy.unc.edu/research/centers/centerfor-integrative-chemical-biology-and-drug-discovery/summary-of-capabilities) and a Center for AIDS research,
which houses an extensive biorepository. Each of these cores has experience with inventory and analysis
systems, which we can rely on.
PHS 398/2590 (Rev. 06/09)
Page
Continuation Format Page
Core C: Data Management and Resource Dissemination Core
Program Director/Principal Investigator (Last, First, Middle):
(6) Training and support — Given the importance of central data management, it is imperative that all
personnel are able to successfully use the system. We will provide written documentation to support the use of
the system and, as needed, provide user training to members of the grant that are unfamiliar with the system.
We envision initial trainings to cover topics such as security features within the system, implementation of SOP
for data upload, data access, and adherence to pipeline protocols. The format and implementation of these
trainings will be dictated based on the needs of grant personnel and may include onsite or remote training
through Web conferences. Core C has abundant experience in providing training to end-users and will also
provide ad-hoc user support.
(7) Dissemination of program information to the scientific community — Release of information to the
scientific community will occur through the public webpages and through deposit of data into public
repositories. The public pages will describe the program, its mission, research projects and personnel, and
educational opportunities. The site will provide access to publications, news and events related to the program,
and links to data, SOP and related information.
Rather than being responsible for the physical distribution it will provide links and template for reagent requests
from Addgene (http://www.addgene.org) for program-generated vectors and plasmids, the ATCC
(http://www.atcc.org/) for biologics including antibodies. The generation of reagents for further studies and
external validation is as much a part of our proposal as the generation of specific functional data for each of the
unknown and hypothetical ORFs, small and large non-coding RNAs. The advantage of using established
biorepositories rather than duplicating their functionality on site, are standardization, added value through QC
and administrative experience with handling reagents, which may have use and distribution restrictions.
In addition to maintaining the Web portal, the Data Management and Resource Dissemination Core will be
responsible for deposition of finished datasets into public archives. We will coordinate the release of highthroughput data into the appropriate public repositories such as GEO for microarray (e.g. GSE25839 and
GSE28684 from our publication (25)) and sequencing data (e.g. genbank entry JQ619843.1, which describes
the first sequence of a clinical isolate of KSHV from our publ. (28)). To ease access to these data, links to
these public repositories will be included within the web-site.
Expected Outcome, pitfalls, and future directions: The level of communication necessary for this program
creates some challenges. The amount of data is not a great challenge compared to other primarily sequencing
or systems biology-based programs. The variety of data is the main challenge, since some of the projects will
generate high-throughput data, others conventional biochemical and low-throughput laboratory data. Finally all
projects will generate physical entities, i.e. reagents. This creates a data management challenge in capturing
all appropriate metadata related to not only the experimental procedure, but also detailed data and data
processing details. We address this issue by using data and protocol encapsulation as the guiding principle for
our database design and by using an existing, open source framework (LabKey) as a unified environment.
An ongoing challenge that we will address is the timely communication of results between program personnel.
We will implement a tracking system that will ensure that all grant members have access to the real-time status
of projects executed at any of the research sites. This tracking system will include centralized status updates
and automated email notifications that can be customized for specific groups of personnel.
AIM 2: Ensure that all data is of high quality, conforms to GLP specifications, and is disseminated
timely to all project members and the scientific community at large.
Rationale. We will leverage an existing infrastructure to develop and implement data processing pipelines as
well as SOP for all assay data and reagents that are generated by the Expression and Biochemical Evaluation
Core (Core B) and the individual projects. These will include quality control protocols specialized for each of
the technical platforms (biochemical assays, next-generation sequencing, and expression profiling).
Based on the prior experience of Core C as a central lab for NIAD sponsored clinical trials we will follow NIAD
DAIDS guidelines for good clinical laboratory practice standards (GCLP) modified for the specific assays used
in the U19 proposal.
PHS 398/2590 (Rev. 06/09)
Page
Continuation Format Page
Core C: Data Management and Resource Dissemination Core
Program Director/Principal Investigator (Last, First, Middle):
These pipelines will include quality control protocols specialized for each of the technical platforms and will
ensure that all information required for computational analysis is standardized (Z-scored by assay) to the
needs of downstream analysis and is readily accessible to all personnel. In addition, these pipelines will play a
critical role in dissemination of grant-related data and resources.
To facilitate public access to grant-generated data, and resources, we will work closely with the Administrative
Core (Core A) to develop a public interface and to submit data and resources (vectors) to public repositories
(NCBI, Addgene). All records will be maintained to confirm to 21 CFR 312.62.
(1) Program-wide SOP — Prior to the generation of data, we will work with the Research Projects and Cores
to develop grant-wide SOP and data templates that will allow information about experiments to be easily
uploaded and tracked. These SOP will apply to all grant-generated data, ensuring a uniform annotation for
data communication and downstream integration. These SOP will facilitate effective project management by
the Administrative Core and will outline the specific requirements for experimental design templates,
experiment and sample naming and tracking, and manuscript preparation tracking.
(2) QC/QA Implementation and sample tracking. These SOP will guide the implementation of good
laboratory practices (GLP) across all projects of this proposal. Core C will perform annual audits to verify
implementation, QC and QA for all individual assays. We will implement Levey-Jennings charts as a central
measure of assay performance. We will work closely with the Administrative Core to customize the system
using these SOP (See Aim 1, above) and provide the basis for monitoring samples and projects. This tracking
system will incorporate features such as email notifications when data is uploaded to the system, and facilitate
communication between personnel helping to ensure timely progress is made toward the goals of the grant.
(3) Application-specific SOP — Given the complexity of the proposed research and the number of sources
for data, it is critical that data pipeline protocols are determined prior to the collection and sharing of data. To
facilitate the generation of uniform data, we will work with the Research Projects and Cores to generate
application-specific SOP. The result will be the generation of experimental design templates that will capture
and annotate each experiment. These templates will include (but are not limited to) information about sample
source, treatments, replicates, and analytical methods. Existing infrastructure and protocols within the
Research Projects and Cores will streamline this process, ensuring that templates compatible with LabKey are
generated and rapidly implemented.
(4) Data processing. We will also work with the Research Projects and Cores to generate specific guidelines
and protocols for data processing. These protocols will be application-specific and delineate requirements for
QC, data format, and pre-processing prior to submission to the Core. The guiding principle will be submission
of all raw data bundled with experimental details to the Core. To support these pipelines, LabKey is capable of
importing data in a variety of formats, including xml, excel, tsv, csv, or plain text files. The platform is
extensible, and support for new formats or standards can be developed and integrated into the existing
framework. At least for automated and high throughput assays, our experience has been that separating data
analysis from data generation increases the robustness of the data, analogous to the double-blind study
principle of clinical trials.
Expected Outcome, pitfalls, and future directions: We are fortunate that Core C members have experience
in GCLP and are trained under the much more demanding guidelines for clinical laboratory work. Hence, we do
not foresee technical difficulties in implementing a rigorous quality control (QA) and quality assurance program
(QA) for this U19 project. Implementing GLP standards across the different projects and biochemical assays
will require some lead-time. We believe that we have adequate time as the first batch of unknown ORFs,
hypothetical ORFs and noncoding RNAs first will have to be cloned and verified before subjected to the
assays. While cloning and reagent building is in progress, Core C will develop and implement the data and
quality assurance standards.
Aim 3: Provide support for the functional analysis of experimental data.
Rationale: Core C will provide computational and statistical analysis support for Core B and Projects 1-3. We
will provide computational and statistical analysis support for the interpretation of data and results. As the
different assays yield different results we will provide quantitative QC, QA measures. One approach we will
explore is to convert all assay data into Z-standardized scores. This will allow us to compare across
PHS 398/2590 (Rev. 06/09)
Page
Continuation Format Page
Core C: Data Management and Resource Dissemination Core
Program Director/Principal Investigator (Last, First, Middle):
biochemical assays and with existing gene expression and short read data. All quality control and processed
data will be captured through the unified LIMS structure, thus making them accessible to all research
personnel and the scientific community at large.
Design and Methods
Principles of analyses. Having a central data storage and management core facilitates QC/QA in data
analysis. Specifically, we will follow the operating guidelines (GLP, GCLP) developed for clinical trials, i.e.
adhere to the same standards of data integrity that are required for research that supports FDA applications. In
its simplest incarnation, we will establish a project and time dependent data locking procedure, at which point
no investigator can change the data. Only once the assay data for a particular unidentified ORF, noncoding
RNA or hypothetical gene have passed QC and are locked will we proceed to analysis and publication. This is
the standard for clinical trials and safe guards data integrity against “fudging” by individual investigators, which
led to unfortunate situations in the past. Our approach is based on four key organizing principles:

Central Open Source data management using LabKey. Because all data is housed in a central LIMS,
this facilitates central analysis and automatic fact/ data integrity checking. It also facilitates regular
automatic uploads into public databases and ongoing curation and interaction with NIH.

Data encapsulation. We following the data model established by the SRA sequencing archive (see above)
and store data and assay information in linked files.

Open Source analysis using R statistics environment. We will conduct all our statistical analysis in R
studio (http://www.rstudio.com/). R studio supports Sweave and knitr, two packages, which support
dynamic reports and reproducible research. We use these routinely in our expression profiling studies (14,
25, 28)

Reproducible research using Sweave is a concept in statistical research, analogous to encapsulation,
whereby the analysis code and data become part of the final report/manuscript (pdf generated by Latex)
and can be repeated by any reader (13).
Analysis, pitfalls, and future directions: Providing support for the interpretation and integration of the
functional data, generated by the individual projects is key to the success of the overall program. We are aware
of the challenges that are associated with trying to combine diverse experimental approaches and systems into
a common analysis. Communication across all research personnel will be key as well as ensuring the timely
dissemination of results.
One of the challenges will be to integrate the different assay formats and data types. Z-standardization as used
in high throughout screens provides one approach, but it is by no means the only one. Pilot experiments and
experiments with viral proteins and ncRNAs of previously identified functions will provide the opportunity to
calibrate the assays across the multiple groups and cores of the application. We are fortunate to have
extensive bioinformatics and statistics support on the UNC campus, to help us solve data integration and
analysis problems as they arise.
Importantly, we can look back on our track record and will leverage over 10 years of experience in the
interpretation of medium and high-throughput data on viral and host mRNA and miRNA expression (see publ.
(3-8, 11, 14, 16-19, 21, 24, 25)), and the design and statistical analysis of biochemical and virologic screens in
medium and high throughput formats (see publ. (1, 12, 15, 20, 29, 31)).
5. TIMELINE
During the first months of Year 1, the Data Management and Resource Dissemination Core will implement and
configure the central server and facilitate the development of SOP for data generation and processing
pipelines. We will increase our hardware capacity annually. One of the benefits, time-wise, lies in the fact that
the core will have some lead time while unified expression constructs and reagents are generated for each of
the targeted unknown and hypothetical ORFs, and non-coding RNAs. This will provide ample time for database
development such that data capture, SOPs, analysis methods and dissemination functionality will in place
PHS 398/2590 (Rev. 06/09)
Page
Continuation Format Page
Core C: Data Management and Resource Dissemination Core
Program Director/Principal Investigator (Last, First, Middle):
before when the first biochemical data emerge. Analytical support of functional interpretation and statistical
support and maintenance will occur throughout the 5-year period.
6. VERTEBRATE ANIMALS:

n/a
7. PROTECTION OF HUMAN SUBJECTS:

n/a, no patient contact, no patient data, no cell lines.
8. INCLUSION ENROLLMENT REPORT:

n/a
9. INCLUSION OF WOMEN AND MINORITIES:

n/a
10. INCLUSION OF CHILDREN:

n/a
11. RESOURCE SHARING:
No GWAS data will be generated during this project.
To share resources with the academic research community, we will use the uniform Material Transfer
Agreement (MTA), which basically acknowledges that the materials are proprietary to Institutions of the
Cooperative Agreement and permitting their use in a manner that is consistent with the Bayh-Dole Act and NIH
funding requirements. Our individual NIH research grants require that research be made available to the
scientific community and public. The primary method of data sharing is through peer-reviewed publications in
scientific journals and by presentation at scientific meetings. In addition, data and results created from NIH
supported research will be submitted to NIH in annual progress reports per the terms and conditions of this
award.
Core C together with Core A will be responsible for resource sharing. These two cores will ensure adherence
to standards pertaining to dual use research of concern (DURC) and place restrictions on dissemination as
described below.
This project will generate the following information:
Sequence data:

Virus-specific primers and diagnostic assays, which we will make available as supplemental data to any
publication or as posts on our website.

Sequence information. Cores will ensure that all raw and manipulated datasets be made available to the
public. We will submit consensus sequences to genbank as they are generated. We will make raw
sequence reads available through submission to the NCBI short read archive at
http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi as described above.

We will make the specialized protocols available to the scientific community via publications in methods
journals as we have done in the past.
PHS 398/2590 (Rev. 06/09)
Page
Continuation Format Page
Core C: Data Management and Resource Dissemination Core
Program Director/Principal Investigator (Last, First, Middle):

Any microarray data are 'Minimum Information About a Microarray Experiment' (MIAMI) compliant and will
be submitted to the NCBI GEO database at http://www.ncbi.nlm.nih.gov/gds.
Plasmids:

Plasmids will be submitted to the Addgene repository (http://www.addgene.org/pgvec1) upon publication,
except where select agent or dual use concerns require a more restrictive distribution. In those cases Core
A staff will also acquire appropriate letters from the recipient institutions environmental health and safety
officers and help coordinate CDC and/or USDA and Department of Commerce permits, prior to shipping of
plasmids. Core A has a DOT trained and certified shipper. The program faculty will not send reagents to
individuals or institutions that do not have appropriate documentation of appropriate select agent BSL3
facilities, might harbor ill-intentions or are conducting irresponsible research.

Core A will work closely with the appropriate institutional Technology Transfer Office, Addgene and
individuals involved in these transactions. The goal will be to provide reagents within a few months of
receiving a request.
Cell lines at other reagents:

Novel cell lines will be submitted to the American Tissue culture collection (ATCC, http://www.atcc.org/)
upon publication, unless the parent cell line is covered by an MTA that prohibits public access. In such
cases cell lines will be made available upon request.

Other reagents will be made available through the general BEI Resources clearing house as appropriate.
For all other reagents/requests, a portion of each monthly conference call will involve a discussion of
requests for reagents/collaborations. Project members have established a consistent process for
evaluating requests for samples and reagents from outside scientists. In order of priority, these include: 1)
requests for reagents that have been published in peer-reviewed journals; 2) requests which
enhance/promote a specific agenda of the program projects and faculty; 3) requests that promote
scientifically valid collaborations between project faculty and outside scientists; and 4) overall SARS-CoV,
IAV, Ebola and HHV8 research and public health needs. The general format involves: a) establishing a
working knowledge of the research agenda and credentials of the requestor, b) group discussion and
agreement, 3) MTA agreement with the appropriate institution, or license agreement with a commercial
entity, and 4) inventory checking and sending out of reagents.

Core A will work closely with the appropriate institutional Technology Transfer Office and individuals
involved in these transactions. The goal will be to provide reagents within a few months of receiving a
request.
Transgenic mice:

No new transgenic mice will be generated in the course of this project. Specialized mice are based on
commercially available strains (Jax Lab Inc.). Their distribution is governed by strain-specific MTAs.

Mice will be submitted the NIH mutant mouse resource: http://www.mmrrc.org/index.html. UNC is one of
the members of this NIAID approved organization.
Restriction on Resource Sharing and compliance to limit information of pertaining to dual use research of
concern.

Core A staff will also acquire appropriate letters from the recipient institutions environmental health and
safety officers and help coordinate CDC and/or USDA and Department of Commerce permits, prior to
shipping of live SARS-CoV/WNV and influenza H1N1 2009 as appropriate. Dr. Sims is a DOT trained and
certified shipper for Core A and all other Projects and Cores have their own trained shippers. The program
faculty will not send reagents to individuals or institutions that do not have appropriate documentation of
appropriate BSL2 (IAV) or BSL3 facilities might harbor ill-intentions or are conducting irresponsible
research.
12. Select Agent Research:
PHS 398/2590 (Rev. 06/09)
Page
Continuation Format Page
Core C: Data Management and Resource Dissemination Core
Program Director/Principal Investigator (Last, First, Middle):
Core C does not handle, or store biological or chemical agents, only data. All physical reagents will be stored
and handled by Core B and used in the projects.
UNC is a registered Select Agent entity and SARS-CoV is in the process of being added to the license. A
detailed outline of the containment and biohazard plan is provided in core A, which administers biohazard and
select agent research for the overall U19 project.
PHS 398/2590 (Rev. 06/09)
Page
Continuation Format Page
Core C: Data Management and Resource Dissemination Core
Program Director/Principal Investigator (Last, First, Middle):
13. BIBLIOGRAPHY
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
Bhatt, A. P., S. R. Jacobs, A. J. Freemerman, L. Makowski, J. C. Rathmell, D. P. Dittmer, and B.
Damania. 2012. Dysregulation of fatty acid synthesis and glycolysis in non-Hodgkin lymphoma. Proc
Natl Acad Sci U S A 109:11818-11823.
Bustin, S. A., V. Benes, J. A. Garson, J. Hellemans, J. Huggett, M. Kubista, R. Mueller, T. Nolan,
M. W. Pfaffl, G. L. Shipley, J. Vandesompele, and C. T. Wittwer. 2009. The MIQE guidelines:
minimum information for publication of quantitative real-time PCR experiments. Clin Chem 55:611-622.
Chang, H., D. P. Dittmer, Y. C. Shin, Y. Hong, and J. U. Jung. 2005. Role of Notch signal
transduction in Kaposi's sarcoma-associated herpesvirus gene expression. J Virol 79:14371-14382.
Chugh, P., K. Tamburro, and D. P. Dittmer. 2010. Profiling of pre-micro RNAs and microRNAs using
quantitative real-time PCR (qPCR) arrays. J Vis Exp.
Dittmer, D. P. 2011. Restricted Kaposi's sarcoma (KS) herpesvirus transcription in KS lesions from
patients on successful antiretroviral therapy. MBio 2:e00138-00111.
Dittmer, D. P. 2003. Transcription profile of Kaposi's sarcoma-associated herpesvirus in primary
Kaposi's sarcoma lesions as determined by real-time PCR arrays. Cancer Res 63:2010-2015.
Dittmer, D. P., C. M. Gonzalez, W. Vahrson, S. M. DeWire, R. Hines-Boykin, and B. Damania.
2005. Whole-genome transcription profiling of rhesus monkey rhadinovirus. J Virol 79:8637-8650.
Fakhari, F. D., and D. P. Dittmer. 2002. Charting latency transcripts in Kaposi's sarcoma-associated
herpesvirus by whole-genome real-time quantitative PCR. J Virol 76:6213-6223.
Fakhari, F. D., J. H. Jeong, Y. Kanan, and D. P. Dittmer. 2006. The latency-associated nuclear
antigen of Kaposi sarcoma-associated herpesvirus induces B cell hyperplasia and lymphoma. J Clin
Invest 116:735-742.
Gregory, S. M., L. Wang, J. A. West, D. P. Dittmer, and B. Damania. 2012. Latent Kaposi's sarcomaassociated herpesvirus infection of monocytes downregulates expression of adaptive immune response
costimulatory receptors and proinflammatory cytokines. J Virol 86:3916-3923.
Hilscher, C., W. Vahrson, and D. P. Dittmer. 2005. Faster quantitative real-time PCR protocols may
lose sensitivity and show increased variability. Nucleic Acids Res 33:e182.
Hilton, I. B., and D. P. Dittmer. 2012. Quantitative analysis of the bidirectional viral G-protein-coupled
receptor and lytic latency-associated nuclear antigen promoter of Kaposi's sarcoma-associated
herpesvirus. J Virol 86:9683-9695.
Hothorn, T., and F. Leisch. 2011. Case studies in reproducibility. Brief Bioinform 12:288-300.
Lock, E. F., R. Ziemiecke, J. Marron, and D. P. Dittmer. 2010. Efficiency clustering for low-density
microarrays and its application to QPCR. BMC Bioinformatics 11:386.
Nun, T. K., D. J. Kroll, N. H. Oberlies, D. D. Soejarto, R. J. Case, P. Piskaut, T. Matainaho, C.
Hilscher, L. Wang, D. P. Dittmer, S. J. Gao, and B. Damania. 2007. Development of a fluorescencebased assay to screen antiviral drugs against Kaposi's sarcoma associated herpesvirus. Mol Cancer
Ther 6:2360-2370.
O'Hara, A. J., P. Chugh, L. Wang, E. M. Netto, E. Luz, W. J. Harrington, B. J. Dezube, B. Damania,
and D. P. Dittmer. 2009. Pre-micro RNA signatures delineate stages of endothelial cell transformation
in Kaposi sarcoma. PLoS Pathog 5:e1000389.
O'Hara, A. J., W. Vahrson, and D. P. Dittmer. 2008. Gene alteration and precursor and mature
microRNA transcription changes contribute to the miRNA signature of primary effusion lymphoma.
Blood 111:2347-2353.
O'Hara, A. J., L. Wang, B. J. Dezube, W. J. Harrington, Jr., B. Damania, and D. P. Dittmer. 2009.
Tumor suppressor microRNAs are underrepresented in primary effusion lymphoma and Kaposi
sarcoma. Blood 113:5938-5941.
Papin, J., W. Vahrson, R. Hines-Boykin, and D. P. Dittmer. 2005. Real-time quantitative PCR
analysis of viral transcription. Methods Mol Biol 292:449-480.
Papin, J. F., W. Vahrson, and D. P. Dittmer. 2004. SYBR green-based real-time quantitative PCR
assay for detection of West Nile Virus circumvents false-negative results due to strain variability. J Clin
Microbiol 42:1511-1518.
PHS 398/2590 (Rev. 06/09)
Page
Continuation Format Page
Core C: Data Management and Resource Dissemination Core
Program Director/Principal Investigator (Last, First, Middle):
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
Papin, J. F., W. Vahrson, L. Larson, and D. P. Dittmer. 2010. Genome-wide real-time PCR for West
Nile virus reduces the false-negative rate and facilitates new strain discovery. J Virol Methods 169:103111.
Peng, X., L. Gralinski, C. D. Armour, M. T. Ferris, M. J. Thomas, S. Proll, B. G. Bradel-Tretheway,
M. J. Korth, J. C. Castle, M. C. Biery, H. K. Bouzek, D. R. Haynor, M. B. Frieman, M. Heise, C. K.
Raymond, R. S. Baric, and M. G. Katze. 2010. Unique signatures of long noncoding RNA expression
in response to virus infection and altered innate immune signaling. MBio 1.
Peng, X., L. Gralinski, M. T. Ferris, M. B. Frieman, M. J. Thomas, S. Proll, M. J. Korth, J. R.
Tisoncik, M. Heise, S. Luo, G. P. Schroth, T. M. Tumpey, C. Li, Y. Kawaoka, R. S. Baric, and M. G.
Katze. 2011. Integrative deep sequencing of the mouse lung transcriptome reveals differential
expression of diverse classes of small RNAs in response to respiratory virus infection. MBio 2.
Renne, R., C. Barry, D. Dittmer, N. Compitello, P. O. Brown, and D. Ganem. 2001. Modulation of
cellular and viral gene expression by the latency-associated nuclear antigen of Kaposi's sarcomaassociated herpesvirus. J Virol 75:458-468.
Roy, D., S. H. Sin, B. Damania, and D. P. Dittmer. 2011. Tumor suppressor genes FHIT and WWOX
are deleted in primary effusion lymphoma (PEL) cell lines. Blood 118:e32-39.
Shiboski, C. H., J. Y. Webster-Cyriaque, M. Ghannoum, J. S. Greenspan, and D. Dittmer. 2011.
Overview of the oral HIV/AIDS Research Alliance Program. Adv Dent Res 23:28-33.
Sin, S. H., F. D. Fakhari, and D. P. Dittmer. 2010. The viral latency-associated nuclear antigen
augments the B-cell response to antigen in vivo. J Virol 84:10653-10660.
Tamburro, K. M., D. Yang, J. Poisson, Y. Fedoriw, D. Roy, A. Lucas, S. H. Sin, N. Malouf, V.
Moylan, B. Damania, S. Moll, C. van der Horst, and D. P. Dittmer. 2012. Vironome of Kaposi
sarcoma associated herpesvirus-inflammatory cytokine syndrome in an AIDS patient reveals coinfection of human herpesvirus 8 and human herpesvirus 6A. Virology 433:220-225.
Wang, F. Z., D. Roy, E. Gershburg, C. B. Whitehurst, D. P. Dittmer, and J. S. Pagano. 2009.
Maribavir inhibits epstein-barr virus transcription in addition to viral DNA replication. J Virol 83:1210812117.
Wen, K. W., D. P. Dittmer, and B. Damania. 2009. Disruption of LANA in rhesus rhadinovirus
generates a highly lytic recombinant virus. J Virol 83:9786-9802.
Whitby, D., V. A. Marshall, R. K. Bagni, W. J. Miley, T. G. McCloud, R. Hines-Boykin, J. J.
Goedert, B. A. Conde, K. Nagashima, J. Mikovits, D. P. Dittmer, and D. J. Newman. 2007.
Reactivation of Kaposi's sarcoma-associated herpesvirus by natural products from Kaposi's sarcoma
endemic regions. Int J Cancer 120:321-328.
Yang, D., K. Ramkissoon, E. Hamlett, and M. C. Giddings. 2008. High-accuracy peptide mass
fingerprinting using peak intensity data with machine learning. J Proteome Res 7:62-69.
PHS 398/2590 (Rev. 06/09)
Page
Continuation Format Page
Download