CLC Genomics Workbench

advertisement
CLC Genomics Workbench
Features & Benefits
Director of the Einstein
Center for Epigenomics at
the Albert Einstein College
of Medicine, Dr. John
Greally:
CLC bio's tools are going to put sophisticated
analytical ability into
the hands of molecular
biologists at Einstein,
and will greatly enhance
their ability to explore
the massively-parallel
sequencing data that
we are generating. We
see this as a way of
lowering barriers for
scientists who have not
previously performed
these high-throughput
epigenomic assays, allowing them to explore
their data and explore
hypotheses.
For Windows, Mac OS X,
and Linux
CLC bio©Copyright 2013
clcbio.com
Solving the data analysis challenges of
High-Throughput Sequencing
With High-Throughput Sequencing machines, HighThroughput Sequencing has become accessible to a very
large group of researchers. However, data analysis represents a serious bottleneck in NGS pipelines of most R&D
departments, which in turn dramatically reduces the Return of Investment of current NGS assets.
CLC Genomics Workbench solves this problem and will enable everyone to rapidly analyze and visualize the huge
amounts of data generated by NGS machines. The userfriendly and intuitive interface essentially takes HighThroughput Analysis away from hardcore bioinformatics
programmers doing command-line scripts, and hands it
to scientists searching for biological results. Furthermore,
the versatile nature of CLC Genomics Workbench allows
it to blend seamlessly into existing sequencing analysis
workflows, easing implementation and maximizing return
on investment.
Multi technology – multi platform
CLC Genomics Workbench includes High Performance
Computing accelerated assembly of High-Throughput Sequencing data as well as a large number of downstream
analysis tools.
CLC Genomics Workbench is the first comprehensive analysis package which can analyze and visualize data from
all major NGS platforms, like SOLiD, 454, Sanger, Illumina
and Ion Torrent. Collaboration with instrument manufacturers is a natural part of CLC bio’s development process.
Some features of CLC Genomics Workbench
Genomics
• Read mapping of Sanger, 454, Illumina Genome Analyzer and SOLiD sequencing data
• De novo assembly of genomes of any size (only limited
by RAM available)
• Color space mapping
• Advanced visualization, scrolling, and zooming tools
• Resequencing tools (variant detection and downstream
analysis)
• Support for multiplexing with DNA barcoding
Transcriptomics
• RNA-seq incl. support for paired data and transcriptlevel expression
• Small RNA analysis
• Expression profiling by tags
• EST library construction
• Advanced visualization, scrolling, and zooming tools
• Gene expression analysis
Epigenomics
• ChIP-seq analysis
• Peak finding and peak refinement
• Case/control analysis
Classical sequence analysis tools
• Primer design
• Molecular cloning
• BLAST
• Alignments
• Phylogenetic trees
• Advanced RNA structure prediction and editing
• Integrated 3D molecule analysis
• Secondary protein structure predictions
• And much more...
A
Company
CLC Genomics Workbench 1 / 4
Like all other Workbenches from CLC bio, CLC Genomics Workbench runs on
Mac OS X, Windows, and Linux platforms. You decide which computer to run
your software on – not us.
Genomics Features
CLC bio’s world renowned scientists have designed completely new and innovative algorithms to power the features of CLC Genomics Workbench. These
highly advanced and cutting edge algorithms incorporate SIMD processor accelerating technology to yield a significant speed-up of the read mapping as
well as the de novo assembly processes.
stead of doing tedious data-crunching.
Multiplexing
When doing batch sequencing of different samples, you can use multiplexing
techniques to run different samples in the same run. There is often a data
analysis challenge to separate the sequencing reads, so that the reads from
one sample are analyzed together.
CLC Genomics Workbench supports a large number of multiplexing protocols
for various types of multiplexing based on name and multiplexing based on
tags or barcoding.
Resequencing
CLC Genomics Workbench supports a complete resequencing pipeline from
read mapping over variant detection to downstream analysis.
Fig. 1: A region of low coverage has been found in the assembly view, and the corresponding region of the contig sequence is automatically highlighted.
Read mapping
Some of the features of the resequencing pipeline in CLC Genomics Workbench are:
• Tracks for comparing and displaying genomics data
• Advanced variant detection, also well suited for genomes of higher ploidy
• Trio analysis comparing father-mother-child variants
• Easy download of genomics sequence and annotation data from public
databases
The read mapping functionality of CLC Genomics Workbench supports both
short and long reads, it supports paired reads, it supports gapped and ungapped alignments, it supports complex genomes with many repeats, and it
supports Sanger, 454, Illumina Genome Analyzer and SOLiD sequencing data.
In the workbench it is possible to build workflows to combine various tools
from the Toolbox into one, e.g. several filtering and annotation steps. Workflows can be run in batch, making it a powerful tool for analyzing a high
number of samples through the same pipeline.
CLC Genomics Workbench map reads to genomes of any size as long as the
computer has the necessary RAM. A 10 fold human genome read mapping
can be carried out on a standard computer with 24 GB of RAM.
Identifying genomic rearrangements
Mapping of SOLiD data is carried out in native color space, using a high performance computing based algorithm. Up to 80% more hits have been found
when assembling 35mer SOLiD data in color space, compared to assembling
the same data in base space.
De novo assembly
The de novo assembly of CLC Genomics Workbench supports both short and
long reads, paired reads, and Sanger, 454, Illumina Genome Analyzer, and
SOLiD sequencing data.
The de novo assembly process has two stages: Firstly, contig sequences are
created by assembling all the reads. Secondly, all the reads are mapped using
the contig sequence as reference.
A combination of paired data protocols can be used mixing paired end and
mate pair data with various inset sizes in the same assembly. Depending on
the coverage and quality of the data, and, CLC Genomics Workbench de novo
assembles genomes of any size.
Support for analysis of hybrid data
Read mapping as well as de novo assembly support the analysis of different
kinds of data at the same time. An example would be the de novo assembly
of Sanger data, 454 single read data, and Illumina paired end data in the
same analysis. This functionality dramatically reduces manual work for the
scientists, facilitating focus on deriving biological results from the data in2 / 4 CLC Genomics Workbench
Through the advanced graphical user interface, CLC Genomics Workbench
supports the identification of a variety of genomic rearrangements like insertions, deletions, duplications and inversions.
Transcriptomics Features
CLC Genomics Workbench has tools to support a full work flow in analysis of
expression data. These include visual quality control tools, such as principal
component blots and box plots, transformation and normalization tools,
tools for statistical testing and false discovery rate control, clustering algorithms, heat-map visualization, and tests on gene annotations, such as
Hyper Geometric tests and Gene Set Enrichment analysis.
Data supported for expression analysis is RNA-seq, Small RNA, tag based
expression based profiling and single color microaray gene expression data.
The interactivity of the multiple available views allows easy navigation and
overview of data and analysis results. The complete integration of the expression analysis in the workbench enables the user to carry out downstream analysis of genes of interest with the comprehensive set of sequence analysis tools
provided, immediately and without the hassle of switching between softwares.
Digital Gene Expression
CLC Genomics Workbench includes mRNA seq based on the approach from
Mortazavi A, et.al, "Mapping and quantifying mammalian transcriptomes by
RNA-Seq", Nat Methods. 2008 Jul;5(7):585-7.
One of the advantages with this model is that the statistics is based on RPKM
(Reads Per Kilobase exon Model per million mapped reads), which is a good
result. Data can be based on the information contained in a single sample
subjected to immunoprecipitation (ChIP-sample) or by comparing a ChIPsample to a control sample.
Classical Sequence Analysis
In addition to all the High-Throughput Sequencing analysis tools, CLC
Genomics Workbench includes all the more than 100 features of CLC Main
Workbench for carrying out downstream analysis and for designing followup lab experiments. A few examples are primer design, molecular cloning,
BLAST, 5 different types of alignments, 3D molecule viewer, and phylogenetic
analyses.
Fig. 2: Heat-map visualization tool letting you depict the table of expression
values.
and easy way for normalizing values for the expression level of a gene when
using Digital Gene Expression.
Small RNA analysis
Small RNA sequenced on SOLiD, Illumina or 454 systems can be analyzed
using CLC Genomics Workbench. Adapter trimming and optionally de-multiplexing are the first steps in the analysis, then following by tag counting
and finally powerful tools for annotating the small RNAs using miRBase and
other resources. The annotations can be grouped on the precursor or mature
miRNA level. The final results can be visualized and analyzed using the expression
analysis tools.
brain_sample1
Transcripts
Unique
gene reads
Unique
exon reads
Total gene reads
ABHD8
5.416,87
1
656
595
695
ABHD9
21,02
1
18
2
32
AKAP8
1
222
124
361
1
772
478
897
125,49
1
27
20
31
AP1M1
2.749,13
1
426
326
468
ARMC6
1.238,56
1
201
149
230
ARRDC2
1.034,80
2
236
160
333
ATP13A1
1.332,95
1
325
244
341
0,00
1
36
0
76
BRD4
1.427,11
2
656
554
693
BST2
1.479,23
1
67
60
80
720,63
1
91
51
107
C19orf44
943,97
1
92
13
316
C19orf50
2.653,11
1
264
195
307
C19orf60
5.254,14
2
346
242
359
C19orf62
3.789,73
2
378
288
428
CCDC105
0,00
1
14
0
16
CCDC124
5.040,96
1
320
274
342
CHERP
1.668,09
1
239
172
474
1
25
0
44
ANKRD41
B3GNT3
C19orf42
CLC Genomics Server can be fully integrated with CLC Bioinformatics Database, supporting Oracle, MySQL, PostgreSQL, H2, and Microsoft SQL Server.
CLC Genomics
Workbench
CLC
673,58
4.311,30
AKAP8L
In addition to computational power, CLC Genomics Server offers a flexible job
queueing system, easy integration with other applications, easy data sharing
opportunities, and a range of other functionalities to ensure that your HighThroughput Sequencing analyses are carried out in a fast, secure, and flexible
IT environment.
Gen
r
Expression
values
CLC Genomics Workbench integrates smoothly with CLC Genomics Server
(figure 4, page 3). This enables the Genomics Workbench user to run heavy
jobs like whole genome assemblies on one or more central, powerful, computers while working with downstream analyses of other data on the local
computer.
r ve
Feature ID
Server Integration
o mic s S e
CLC Genomics
Fig.
3: A table view of 0,00
an expression sample
generated
from a sequence
file90of
CALR3
1
47
0
CASP14
0,00
1
5
0
7
NGS
mRNA reads.
CIB3
Expression
profiling0,00
by tags
CILP2
31,72
10
CLC
Genomics Workbench
includes 11 a powerful299tag profiling167 functionality
COMP
124,16
35
which
to SAGE, using3 NGS technology.
The full
exCOPE is an extension
7.315,62
582
429 workflow 649
CPAMD8
98,50
280
553
tracting
tags from sequence
reads of1 tag counting,
creating 31virtual tag list,
CRLF1
881,29
1
116
83
145
and
annotating tag3.396,18
counts with gene2 names are1613supported.1244
CRTC1
1790
CYP4F11
193,77
1
50
31
58
CYP4F12
33,09
1
44
2
76
1
3
0
9
EST
library construction
CYP4F2
8,06
It is possible to construct an EST library using the de novo assembly algorithm - e.g. to be used as reference sequences for mRNA seq or tag based
1
transcriptomics.
Epigenomics analyses
CLC Genomics Workbench includes a fully integrated ChIP-seq analysis solution which can easily enable researchers to go from raw data, through reference alignment and onto advanced visual and tabular output of ChIP-seq
Workbench
Fig. 4: Overview of our three-tier solution the CLC Genomics Server. People can
access the server from their laptop computer and easily work on large projects.
Customization
A new and fast evolving technology, High-Throughput Sequencing constantly
provides researchers with new scientific opportunities and new ways of analyzing the huge amounts of data.
The problem is not lack of ideas or lack of data. The problem is lack of efficient software for carrying out the analyses or for removing manual bottlenecks in the workflow.
CLC Genomics Workbench 3 / 4
CLC Genomics Workbench is available for Windows, Mac OS X, and Linux (Red
Hat 5 or later, SUSE 10.2 or later). For detailed system requirements, please
refer to clcbio.com/support/system-requirements.
We also design and develop customized add-on modules for CLC Genomics
Workbench and CLC Genomics Server based on specific customer requests.
This is a quick and cost effective way of improving both the speed and the
quality of your research.
Data set (Read mapping)
Run time
Roche 454, single-end - NA12878
Human GRCh37, 2.8 million reads, read length: 50-2,5 KiloBases,
1.6 GigaBases, 0.5 fold coverage
Ion-Torrent, single-end - NS12911
Human GRCh37, 11.7 million reads, read length: 200-300 bases,
2.9 GigaBases, 0.9 fold coverage
Illumina, paired-end - NA18507
Human GRCh37, 1.34 billion reads, read length: 101 bases,
134 GigaBases, 43.7 fold coverage
PacBio, single-end
Human GRCh37, 1.7 million reads, read length: 50-2.5 KiloBases,
0.9 GigaBases, 0.3 fold coverage
13 7
21 3
17 12 59
1 4 43
hrs
hr
mins
secs
mins
secs
mins
secs
mins
secs
Table 1: All benchmarks run on a CLC Genomics Machine, 2x Intel X5650 @ 2.66 GHz, 12 cores (24 logical cores), 48 GB, RAID5 of SATA-disks.
Contact your local sales representative or send
an e-mail to sales@clcbio.com if you would
like to try CLC Genomics Workbench.
CLC bio · EMEA
Silkeborgvej 2 · Prismet
8000 Aarhus C
Denmark
Phone: +45 7022 5509
4 / 4 CLC Genomics Workbench
CLC bio · Americas
10 Rogers St # 101
Cambridge · MA 02142
USA
Phone: +1 (617) 945 0178
CLC bio · AsiaPac
69 · Lane 77 · Xin Ai Road · 7th fl.
Neihu District · Taipei · Taiwan 114
Taiwan
Phone: +886 2 2790 0799
A
Company
8.09.2014 CLC bio
CLC bio eliminates these challenges by offering a free Java based Software
Developer Kit (SDK) for CLC Genomics Workbench and for CLC Genomics
Server. Using the SDK, you will be able to integrate your own algorithms
with our products.
Download