doc - Emory Integrated Core Facilities

FACILITIES & OTHER RESOURCES Specific Fields Relevant for EICC Users Laboratory: N/A Clinical: N/A Animal:N/A Computer: The EICC offers comprehensive computational services and bioinformatics pipelines for the analysis of -omics data. The EICC operates a small HPC system for short computational jobs. The cluster serves multiple functions related to core projects, including running NGS analysis pipelines, high-performance and parallel computing for disciplines such as proteomics, metabolomics, and imaging, hosting personal data analysis platforms such as Galaxy and GenePattern, and data management platforms such as cBioPortal. The cluster is composed of 1 head node, 5 high-memory compute nodes, 3 medium-memory compute nodes, 2 load-balanced web server nodes, and 1 database node. The cluster has a 89TB local storage array and offers access to PBNAS - a 1 PB storage system, as well as to Emory Isilon storage (1PB research-grade storage, 500TB of HIPAA compliant storage). A 10 Gbps ethernet switch provides a high-speed Storage Area Network (SAN) fabric. All storage arrays and compute nodes utilize the SAN for data transfer, and are configured to connect via the 10 Gbps high-speed network. The cluster is connected to the Internet2 high-speed network for large data transfers to and from external systems. The cluster runs Centos 6.9 64-bit operating system on all nodes and utilizes Sun Grid Engine for job submission and management. Configuration: One head node: 2 x 3.3 GHz 8-core CPUs, 64 GB RAM. Five compute nodes: 4 x 2.2 GHz 16-core CPUs, 512 GB RAM, 10 Gbps ethernet. Three compute nodes: 2 x 2.7 GHz 12-core CPUs, 96 GB RAM, 10 Gbps ethernet. Two web servers: 2 x3.3 GHz 8-core CPUs, 128 GB RAM, 10 Gbps ethernet. One database server: 2 x 3.3 GHz 8-core CPUs, 128 GB RAM, 10 Gbps ethernet. To meet larger computational needs, the EICC acts acts as the main point of contact and provides technical assistance for Emory investigators who wish to access the TARDIS High Performance Computing (HPC) cluster (http://it.emory.edu/catalog/high_performance_computing/), a 12-node high performance cluster with an estimated throughput of 5 teraflops. Each node features quad 16-core AMD Opteron processors for a total of 768 cores. Each core runs at 2.5 gigahertz. Each node has 512GB of memory (or 8GB per core). The filesystem is a high performance 40-terabyte GPFS cluster file system. Scheduling of jobs is handled by Torque/MOAB. The cluster operating system is RedHat Linux. The Tardis HPC cluster came online in Winter 2014 and has substantial available computing capacity to support large-scale data analyses. The EICC also provides comprehensive computational services. We divide computational services into two main categories. The first enables expert users to access existing pipelines or develop their own custom analyses. The second category provides investigators the ability to have analyses performed by an EICC computational/bioinformatics expert for a set fee per project. Galaxy provides a wide variety of bioinformatic tools that allow the analysis, manipulation and visualization of large genome-wide datasets from a wide variety of platforms, including microarrays and next-generation sequencing instruments. Standard analysis pipelines using other open-source software packages are implemented for DNA/RNA-seq/ChIP-seq/16S microbiome sequencing projects for human, animal, and microbial genomes. For the analysis of RNA-seq data, we have implemented two main pipelines. The first, which includes TopHat, Cufflinks, and CuffDiff, is used when a reference sequence is available for the targeted organism. We have also implemented a pipeline for De novo transcript reconstruction for RNA-seq using the Trinity platform. This second pipeline does not require prior knowledge of a reference sequence. Custom tools and pipelines can be developed for specialized projects such as fusion transcript detection. For targeted sequencing, exome sequencing, and whole genome sequencing, we use a custom PEMapper pipeline. For variant annotation, we use the SeqAnt software package. We have also implemented and analzye data sets with other mapping and variant identification pipelines (BWA, GATK) Substantial capacity exists for these integrated computing resources to support computational/bioinformatic analyses for EICC users. Office: The EICC has 500 sq ft of dedicated office space on the 7th floor of the Woodruff Memorial Research Building which provides for meeting customers, weekly meetings with the members of the Emory Integrated Genomics Core (EIGC, http://cores.emory.edu/eigc/), and for monthly meetings of computational service providers from other cores within the EICF. Other: The Emory Integrated Core Facilities (EICF, http://cores.emory.edu) provides a number of valuable core facilities for use by all investigators that include integrated cellular imaging, biomedical systems imaging, biostatistics and bioinformatics, electron microscopy, a personalized immunotherapy center, flow cytometry, genomics, proteomics, transgenic mouse and gene targeting, and rodent behavioral characterization. The EICC offers comprehensive bioinformatics pipelines and services for the analysis of -omics data. Using this infrastructure, the EICC offers innovative, tailored genomics services for investigators, while also allowing tighter integration with existing biostatistics/bioinformatics service centers at Emory. These include the Winship Biostatistics and Bioinformatics Shared Resource (https://winshipcancer.emory.edu/research/sharedresources/biostatistics-bioinformatics.html?nd=669) headed by Dr. Jeanne Kowalski and the Biostatistics Consulting Center (http://med.emory.edu/research/core_labs/service_centers.html) headed by Dr. Renee Moore.

doc - Emory Integrated Core Facilities

Related documents

Products

Support

doc - Emory Integrated Core Facilities

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib