20120913_KOGO_Poster_NGS_application_최익영_초록

advertisement
Applications of NGS to Whole-Genome Sequencing and Expression Profiling
Jong-Sung Lim, Jina Kim and Ik-Young Choi
National Instrumentation Center for Environmental Management, Seoul National University,
Seoul 151-921, South Korea
Recently, the technologies of DNA sequence variation and gene expression profiling have
been used widely as approaches in the expertise of genome biology and genetics. The
application to genome study has been particularly developed with the introduction of the
next-generation DNA sequencer (NGS) Roche/454 and Illumina/Solexa systems, along with
bioinformation analysis technologies of whole-genome de novo assembly, expression
profiling, DNA variation discovery, and genotyping. One of the advantages of the NGS
systems is the cost-effectiveness to obtain the result of high-throughput DNA sequencing for
genome, RNAnome, and miRNAnome studies. Both massive whole-genome shotgun pairedend sequencing and mate paired-end sequencing data are important steps for constructing de
novo assembly of novel genome sequencing data and for resequencing the samples with a
reference genome DNA sequence. To construct high-quality contig consensus sequences,
each DNA fragment read length is important to obtain de novo assembly with long reading
sequences of the Roche/454 system. It is necessary to have DNA sequence information from
a multiplatform NGS with at least 2x and 30x depth sequence of genome coverage using
Roche/454 and Illumina/Solexa, respectively, for effective an way of de novo assembly, as
hybrid assembly for novel genome sequencing would be cost-effective. In some cases,
Illumina/Solexa data are used to construct scaffolds through de novo assembly with high
coverage depth and large diverse fragment mate paired-end information, even though they are
already participating in assembly and have made many contigs. Massive short-length reading
data from the Illumina/Solexa system is enough to discover DNA variation, resulting in
reducing the cost of DNA sequencing. MAQ and CLC software are useful to both SNP
discovery and genotyping through a comparison of resequencing data to a reference genome.
Whole-genome expression profile data are useful to approach genome system biology with
quantification of expressed RNAs from a whole-genome transcriptome, depending on the
tissue samples, such as control and exposed tissue. The hybrid mRNA sequences from
Rohce/454 and Illumina/Solexa are more powerful to find novel genes through de novo
assembly in any whole-genome sequenced species. The 20x and 50x coverage of the
estimated transcriptome sequences using Roche/454 and Illumina/Solexa, respectively, is
effective to create novel expressed reference sequences. However, only an average 30x
coverage of a transcriptome with short read sequences of Illumina/Solexa is enough to check
expression quantification, compared to the reference EST sequence. In an in silico method,
conserved miRNA and novel miRNA discovery is available on massive miRNAnome data in
any species. Particularly, the discovered target genes of miRNA could be robust to approach
genome biology study.
Download