doc - Emory Integrated Core Facilities

advertisement
FACILITIES & OTHER RESOURCES
Specific Fields Relevant for EICC Users
Laboratory: N/A
Clinical: N/A
Animal:N/A
Computer: The EICC offers comprehensive computational services and bioinformatics pipelines for the
analysis of -omics data. The EICC operates a small HPC system for short computational jobs. The cluster
serves multiple functions related to core projects, including running NGS analysis pipelines, high-performance
and parallel computing for disciplines such as proteomics, metabolomics, and imaging, hosting personal data
analysis platforms such as Galaxy and GenePattern, and data management platforms such as cBioPortal. The
cluster is composed of 1 head node, 5 high-memory compute nodes, 3 medium-memory compute nodes, 2
load-balanced web server nodes, and 1 database node. The cluster has a 89TB local storage array and offers
access to PBNAS - a 1 PB storage system, as well as to Emory Isilon storage (1PB research-grade storage,
500TB of HIPAA compliant storage). A 10 Gbps ethernet switch provides a high-speed Storage Area Network
(SAN) fabric. All storage arrays and compute nodes utilize the SAN for data transfer, and are configured to
connect via the 10 Gbps high-speed network. The cluster is connected to the Internet2 high-speed network for
large data transfers to and from external systems. The cluster runs Centos 6.9 64-bit operating system on all
nodes and utilizes Sun Grid Engine for job submission and management. Configuration: One head node: 2 x
3.3 GHz 8-core CPUs, 64 GB RAM. Five compute nodes: 4 x 2.2 GHz 16-core CPUs, 512 GB RAM, 10 Gbps
ethernet. Three compute nodes: 2 x 2.7 GHz 12-core CPUs, 96 GB RAM, 10 Gbps ethernet. Two web servers:
2 x3.3 GHz 8-core CPUs, 128 GB RAM, 10 Gbps ethernet. One database server: 2 x 3.3 GHz 8-core CPUs,
128 GB RAM, 10 Gbps ethernet.
To meet larger computational needs, the EICC acts acts as the main point of contact and provides
technical assistance for Emory investigators who wish to access the TARDIS High Performance Computing
(HPC) cluster (http://it.emory.edu/catalog/high_performance_computing/), a 12-node high performance cluster
with an estimated throughput of 5 teraflops. Each node features quad 16-core AMD Opteron processors for a
total of 768 cores. Each core runs at 2.5 gigahertz. Each node has 512GB of memory (or 8GB per core). The
filesystem is a high performance 40-terabyte GPFS cluster file system. Scheduling of jobs is handled by
Torque/MOAB. The cluster operating system is RedHat Linux. The Tardis HPC cluster came online in Winter
2014 and has substantial available computing capacity to support large-scale data analyses.
The EICC also provides comprehensive computational services. We divide computational services into
two main categories. The first enables expert users to access existing pipelines or develop their own custom
analyses. The second category provides investigators the ability to have analyses performed by an EICC
computational/bioinformatics expert for a set fee per project. Galaxy provides a wide variety of bioinformatic
tools that allow the analysis, manipulation and visualization of large genome-wide datasets from a wide variety
of platforms, including microarrays and next-generation sequencing instruments. Standard analysis pipelines
using other open-source software packages are implemented for DNA/RNA-seq/ChIP-seq/16S microbiome
sequencing projects for human, animal, and microbial genomes. For the analysis of RNA-seq data, we have
implemented two main pipelines. The first, which includes TopHat, Cufflinks, and CuffDiff, is used when a
reference sequence is available for the targeted organism. We have also implemented a pipeline for De novo
transcript reconstruction for RNA-seq using the Trinity platform. This second pipeline does not require prior
knowledge of a reference sequence. Custom tools and pipelines can be developed for specialized projects
such as fusion transcript detection. For targeted sequencing, exome sequencing, and whole genome
sequencing, we use a custom PEMapper pipeline. For variant annotation, we use the SeqAnt software
package. We have also implemented and analzye data sets with other mapping and variant identification
pipelines (BWA, GATK) Substantial capacity exists for these integrated computing resources to support
computational/bioinformatic analyses for EICC users.
Office: The EICC has 500 sq ft of dedicated office space on the 7th floor of the Woodruff Memorial Research
Building which provides for meeting customers, weekly meetings with the members of the Emory Integrated
Genomics Core (EIGC, http://cores.emory.edu/eigc/), and for monthly meetings of computational service
providers from other cores within the EICF.
Other: The Emory Integrated Core Facilities (EICF, http://cores.emory.edu) provides a number of valuable
core facilities for use by all investigators that include integrated cellular imaging, biomedical systems imaging,
biostatistics and bioinformatics, electron microscopy, a personalized immunotherapy center, flow cytometry,
genomics, proteomics, transgenic mouse and gene targeting, and rodent behavioral characterization.
The EICC offers comprehensive bioinformatics pipelines and services for the analysis of -omics data. Using
this infrastructure, the EICC offers innovative, tailored genomics services for investigators, while also allowing
tighter integration with existing biostatistics/bioinformatics service centers at Emory. These include the Winship
Biostatistics and Bioinformatics Shared Resource (https://winshipcancer.emory.edu/research/sharedresources/biostatistics-bioinformatics.html?nd=669) headed by Dr. Jeanne Kowalski and the Biostatistics
Consulting Center (http://med.emory.edu/research/core_labs/service_centers.html) headed by Dr. Renee
Moore.
Download