EUAsiaGrid Master Class 6 May 2010 Enabling Multiple Sequence Comparison by LogExpectation (MUSCLE) on EUAsiaGrid By: Lee Hong Kai and Thomas Tay NUHS Molecular Diagnostic Center Norovirus • Main pathogen causing non-bacterial outbreaks of gastroenteritis • Easily transmitted in semi-closed communities such as hospitals and long-term care facilities • Extremely infectious and a potentially dangerous pathogen when present in immunocompromised patients. Norovirus • • • • Small, round +ssRNA virus of about 50nm Genome size of about 7.5kb, encoding: - major structural protein, VP1 minor caspid protein, VP2 Genetically and antigenically diverse – genotyping of virus strains to determine epidemiology link of infected patients in an epidemic outbreak or transmission event. Purpose • A need for multiple sequence alignment of norovirus from genogroup I, II and IV ‣ Effective primer probe design ‣ Phylogeny analysis (sequence editing) ‣ SNP analysis ‣ Check sequence variability • • Problem ClustalW takes 9hrs for about 1000 sequences MUSCLE is much faster by still limited by memory MUSCLE v3.6 by Robert C. Edgar http://www.drive5.com/muscle This software is donated to the public domain.Please cite: Edgar, R.C. Nucleic Acids Res 32(5), 1792-97. noro 6423 seqs, max length 7746, avg length 582 00:05:08 10 MB(2%) Iter 1 100.00% K-mer dist pass 1 00:05:10 10 MB(2%) Iter 1 100.00% K-mer dist pass 2 muscle(1066) malloc: *** mmap(size=16777216) failed (error code=12) *** error: can't allocate region *** set a breakpoint in malloc_error_break to debug *** OUT OF MEMORY *** Memory allocated so far 10 MB Thank You!