De novo assembly of a large volume of genome using Next Generation Sequencing data of Toxocara canis Jung Im Won1), Sangkyoon Hong2), Jin Hwa Kong2), Sun Huh3), Jee Hee Yoon2) 1 2 Research Institute of Electrical and Computer Engineering, Hanyang University; Department of Computer Engineering, Hallym University; 3Department of Parasitology, College of Medicine, Hallym University Purpose: De novo assembly is the method of reconstructing read into sequences estimated as original sequences without the reference sequence. Recently, Next Generation Sequencing (NGS) technique has merit of producing large quantity of sequences with low cost. This presentation aims to provide the analysis methods of de novo assembly of whole genome of Toxocara canis, a dog intestinal nematode, using NGS read data. Methods: Read data of 400 bp (15.9 Gbp), 1900 bp (10.3 Gbp), 2900 bp (10.4 Gbp) from Toxocara canis with a variety of insert size produced by paired-end methods was used. A total size of read data was 36.6 Gbp, mean read length was 101 bp, and coverage was 104 X. De novo assembly algorithm used was SOAP de novo. This algorithm adopted De Brujin graph method using k-mer. To test the accuracy of results, length and mean length of N50 of contig and scaffold was analyzed. N50, the standard value of de novo assembly, is the longest length of contig among that consisted of half length of whole contig length. Hardware used for this experiment was Xeon E5620 (quod core) dual CPU 2.4GHxz, 144 GB RAM. Results: After comparing the statistical value of contigs produced according to the k-mer size using read data with insert size 400 bp, the best result was shown at the k-mer size, 41 in mean length of contig, N50 and N90. The results of similarity analysis with adjacent species genome was done using Caenorhabditis elegans whole genome sequence as reference species. Read alignment algorithms used were SOAP and GSNAP. K-mismatch was tested after expanding seed length. The total number of aligned read was compared to test the accuracy of alignment. The more k-mer value, the better the alignment score. Result by SOAP showed low accuracy than that by GSNAP since k value is below 2 during k-mismatch performance. From de novo assembly of Toxocara canis, A total length of 335 Mbp contig was produced of which N50 is about 531 with 45.4X coverage when k-mer size was 41 and insert size is 400 bp. Scaffold of 362 Mbp was acquired with N50 size of 4.3 Kbp by annexing contigs, using mate-pare information of read data of 1900 bp and 2900 bp. Conclusion: De novo assembly based on NGS was done for the Toxocara canis of which genome size was about 350 Mbp. The most appropriate k-mer size for contig and scaffold can be calculated. This results showed the usability of de novo assembly in analyzing large volume genome sequence by suggesting a variety of analysis results on contig and scaffold.