Performance Optimization of Clustal W: Parallel Clustal W, HT Clustal and MULTICLUSTAL

Performance Optimization of Clustal W:

Parallel Clustal W, HT Clustal and

MULTICLUSTAL

Arunesh Mishra

CMSC 838 Presentation

Authors : Dmitri Mikhailov, Haruna Cofer, Roberto Gomperts

SGI

Problem Statement







Multiple Sequence Alignment (MSA)











Basis for phylogenetic analysis - Infer homology relationships

Building protein families - conserved region may imply common function

Aids in function/structure prediction of new proteins

Global MSA – Clustal W

Is it computationally expensive ? Yes, for 100 sequences.

Goal : Parallelize Clustal W





Clustal W takes hours for 100 or more sequences

Parallelization possible for the algorithm

Contribution of the paper





Parallel Clustal W



Parallel version of basic Clustal W

HT Clustal



Parallelize heterogeneous Multiple Sequence Alignment problems

 MULTICLUSTAL



Parallel version of an optimization on Clustal W

CMSC 838T – Presentation

Talk Overview

 Overview of talk

 Motivation

 Background



Sequential Clustal W

 Parallel Clustal W

 HT Clustal





Problem Statement

Optimizations

 MULTICLUSTAL



Sequential Algorithm



Optimizations

 Observations


Introduction

 Sequential Clustal W Algorithm

 Given N sequences of length M each

 Pairwise Alignment (PA)



Creates distance matrix N x N based on pairwise alignment scores



Evolutionary distance

 Guide Tree (GT) construction (Phylogenetic tree)



Use Neighbor-joining algorithm

 Progressive Multiple Alignment (PA)





Use guide tree to align closely related pairs of sequences

Progressively align next sequence to existing alignment



 Problem Statement

 Parallelize the Sequential Clustal W

 Execution time breakup

 PW = pairwise alignment, GT = guide tree, PA = progressive alignment



 Pairwise Alignment Stage

 N(N-1)/2 pairwise alignments

 Send them randomly to different processors



Random – as jobs of different load



Random also produces statistically uniform distribution

(over a large set of jobs)

 1.8X speedup achieved on a 1000 sequence MSA with 8 CPUs

 Guide Tree Stage

 Parallelize “find closest neighbors from distance matrix”

 Used in the neighbor joining algorithm



Find minimum element of each row concurrently



Use this to find minimum element of matrix



 Progressive Alignment Stage

 Computation of a function score(I,J) precomputed in parallel



Alignment score of sequence I and J

 Not much parallelization in the third stage

 Overall Speedup

 Speedup of 10x for 600 MA sequences using 16 CPUs

 Time reduced from 1 hr 7 minutes to 6.5 minutes

 Relative scaling is better for larger inputs


HT Clustal

 Problem Statement

 Calculate large numbers of MSAs of various sizes (independent problems)

 Such problems seen in high-throughput (HT) research environments

 Representative Problem (from paper) :



Perform independent MSA over

100 sets of sequences



Each set has between 20 to

100 sequences with average of 60 sequences



Average Length of sequence = 390


HT Clustal - Optimizations

 Basic Idea

 Each MSA operation (on one set of sequences) is independent of the other

 Run ClustalW as a uniprocessor job on one MSA problem

 Launch multiple Clustal W jobs on different processors

 Job Scheduling

 Jobs of different duration – depends on sequence set

 Two scheduling options explored:





Schedule dynamically – if processor is free, schedule an

MSA job – chosen randomly

Schedule dynamically – Sequences are presorted (based on filesize)


HT Clustal – Performance Numbers

 Speedups

 Almost linear speedups





31x on 32 CPUs for the representative MSA problem

116X on 128 CPUs for a larger test case



Solution time reduced from 18.5 hours to 9.5 minutes

 Speedup shown for the example MSA set:


HT Clustal – Effect of Presorting

 Effect of presorting

 Figure shows effect of presorting for the example

MSA set

32 CPUs, 100 sets,

~3 jobs per CPU

 If average number of jobs per CPU < 5 presorting helps

 For larger number of jobs per CPU statistical averaging reduces load imbalance


MULTICLUSTAL

 MULTICLUSTAL Algorithm











A Perl script to generate high quality MSA with little user intervention

Searches for best combination of Clustal W input parameters



To reduce gaps, increase clustering

Parameters to vary :



Scoring matrices : pairwise and multiple



Gap open and extension penalties (pairwise and multiple)

Sequential Algorithm :

1.

2.

3.

4.

Till all parameters are sufficiently varied { alignment = Run Clustal W ()

Calculate quality of alignment

Change Parameters }

Quality of alignment



A numerical quantity based on





 identitical amino acid matches

Conservative amino acid substitutions

Gap events, amino acid islands I.e. –X-, -XX-, -XXX-, -XXXX-


MULTICLUSTAL Optimizations

 Optimization on MULTICLUSTAL

 Run Clustal W once

 Reuse tree generated in the PW/GT Stages





Guide tree calculated only once for multiple runs

Results in speedups from 1.5X to 3X

 Use Parallel Clustal W for each run of Clustal W


Observations





Parallelizability





First (pairwise alignment) and second (guide tree) stages are parallelizable

Third stage is mostly sequential – speedup limited

100 sequence MSAs possible ?

 PIR at NBRF (Georgetown University) takes maximum of 20 sequences for MSA

 Speedup improves user response, for 20 sequences a PC would be sufficient

 Probable applications:

 Research Environments ?

 PIR servers ?

 Speedup only on shared memory SGI 3000 workstation ?


Performance Optimization of Clustal W: Parallel Clustal W, HT Clustal and MULTICLUSTAL

Performance Optimization of Clustal W:

Parallel Clustal W, HT Clustal and

MULTICLUSTAL

Arunesh Mishra

Problem Statement

Talk Overview

Introduction

Parallel Clustal W

Parallel Clustal W

Parallel Clustal W

HT Clustal

HT Clustal - Optimizations

HT Clustal – Performance Numbers

HT Clustal – Effect of Presorting

MULTICLUSTAL

MULTICLUSTAL Optimizations

Observations

Related documents

Products

Support

Performance Optimization of Clustal W: Parallel Clustal W, HT Clustal and MULTICLUSTAL