MUSCLE

advertisement
EUAsiaGrid Master Class
6 May 2010
Enabling Multiple Sequence
Comparison by LogExpectation (MUSCLE) on
EUAsiaGrid
By: Lee Hong Kai and Thomas Tay
NUHS Molecular Diagnostic Center
Norovirus
• Main pathogen causing non-bacterial
outbreaks of gastroenteritis
• Easily transmitted in semi-closed
communities such as hospitals and long-term
care facilities
• Extremely infectious and a potentially
dangerous pathogen when present in
immunocompromised patients.
Norovirus
•
•
•
•
Small, round +ssRNA virus of about 50nm
Genome size of about 7.5kb, encoding:
-
major structural protein, VP1
minor caspid protein, VP2
Genetically and antigenically diverse
– genotyping of virus strains to
determine epidemiology link of infected
patients in an epidemic outbreak or
transmission event.
Purpose
• A need for multiple sequence alignment
of norovirus from genogroup I, II and IV
‣ Effective primer probe design
‣ Phylogeny analysis (sequence editing)
‣ SNP analysis
‣ Check sequence variability
•
•
Problem
ClustalW takes 9hrs for about 1000
sequences
MUSCLE is much faster by still limited by
memory
MUSCLE v3.6 by Robert C. Edgar
http://www.drive5.com/muscle
This software is donated to the public domain.Please cite: Edgar, R.C.
Nucleic Acids Res 32(5), 1792-97.
noro 6423 seqs, max length 7746, avg length 582
00:05:08
10 MB(2%) Iter
1 100.00% K-mer dist pass 1
00:05:10
10 MB(2%) Iter
1 100.00% K-mer dist pass 2
muscle(1066) malloc: *** mmap(size=16777216) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
*** OUT OF MEMORY ***
Memory allocated so far 10 MB
Thank You!
Download