Performance Optimization of Clustal W: Parallel Clustal W, HT Clustal and MULTICLUSTAL

advertisement

Performance Optimization of Clustal W:

Parallel Clustal W, HT Clustal and

MULTICLUSTAL

Arunesh Mishra

CMSC 838 Presentation

Authors : Dmitri Mikhailov, Haruna Cofer, Roberto Gomperts

SGI

Problem Statement

Multiple Sequence Alignment (MSA)

Basis for phylogenetic analysis - Infer homology relationships

Building protein families - conserved region may imply common function

Aids in function/structure prediction of new proteins

Global MSA – Clustal W

Is it computationally expensive ? Yes, for 100 sequences.

Goal : Parallelize Clustal W

Clustal W takes hours for 100 or more sequences

Parallelization possible for the algorithm

Contribution of the paper

Parallel Clustal W

Parallel version of basic Clustal W

HT Clustal

Parallelize heterogeneous Multiple Sequence Alignment problems

 MULTICLUSTAL

Parallel version of an optimization on Clustal W

CMSC 838T – Presentation

Talk Overview

 Overview of talk

 Motivation

 Background

Sequential Clustal W

 Parallel Clustal W

 HT Clustal

Problem Statement

Optimizations

 MULTICLUSTAL

Sequential Algorithm

Optimizations

 Observations

CMSC 838T – Presentation

Introduction

 Sequential Clustal W Algorithm

 Given N sequences of length M each

 Pairwise Alignment (PA)

Creates distance matrix N x N based on pairwise alignment scores

Evolutionary distance

 Guide Tree (GT) construction (Phylogenetic tree)

Use Neighbor-joining algorithm

 Progressive Multiple Alignment (PA)

Use guide tree to align closely related pairs of sequences

Progressively align next sequence to existing alignment

CMSC 838T – Presentation

Parallel Clustal W

 Problem Statement

 Parallelize the Sequential Clustal W

 Execution time breakup

 PW = pairwise alignment, GT = guide tree, PA = progressive alignment

CMSC 838T – Presentation

Parallel Clustal W

 Pairwise Alignment Stage

 N(N-1)/2 pairwise alignments

 Send them randomly to different processors

Random – as jobs of different load

Random also produces statistically uniform distribution

(over a large set of jobs)

 1.8X speedup achieved on a 1000 sequence MSA with 8 CPUs

 Guide Tree Stage

 Parallelize “find closest neighbors from distance matrix”

 Used in the neighbor joining algorithm

Find minimum element of each row concurrently

Use this to find minimum element of matrix

CMSC 838T – Presentation

Parallel Clustal W

 Progressive Alignment Stage

 Computation of a function score(I,J) precomputed in parallel

Alignment score of sequence I and J

 Not much parallelization in the third stage

 Overall Speedup

 Speedup of 10x for 600 MA sequences using 16 CPUs

 Time reduced from 1 hr 7 minutes to 6.5 minutes

 Relative scaling is better for larger inputs

CMSC 838T – Presentation

HT Clustal

 Problem Statement

 Calculate large numbers of MSAs of various sizes (independent problems)

 Such problems seen in high-throughput (HT) research environments

 Representative Problem (from paper) :

Perform independent MSA over

100 sets of sequences

Each set has between 20 to

100 sequences with average of 60 sequences

Average Length of sequence = 390

CMSC 838T – Presentation

HT Clustal - Optimizations

 Basic Idea

 Each MSA operation (on one set of sequences) is independent of the other

 Run ClustalW as a uniprocessor job on one MSA problem

 Launch multiple Clustal W jobs on different processors

 Job Scheduling

 Jobs of different duration – depends on sequence set

 Two scheduling options explored:

Schedule dynamically – if processor is free, schedule an

MSA job – chosen randomly

Schedule dynamically – Sequences are presorted (based on filesize)

CMSC 838T – Presentation

HT Clustal – Performance Numbers

 Speedups

 Almost linear speedups

31x on 32 CPUs for the representative MSA problem

116X on 128 CPUs for a larger test case

Solution time reduced from 18.5 hours to 9.5 minutes

 Speedup shown for the example MSA set:

CMSC 838T – Presentation

HT Clustal – Effect of Presorting

 Effect of presorting

 Figure shows effect of presorting for the example

MSA set

32 CPUs, 100 sets,

~3 jobs per CPU

 If average number of jobs per CPU < 5 presorting helps

 For larger number of jobs per CPU statistical averaging reduces load imbalance

CMSC 838T – Presentation

MULTICLUSTAL

 MULTICLUSTAL Algorithm

A Perl script to generate high quality MSA with little user intervention

Searches for best combination of Clustal W input parameters

To reduce gaps, increase clustering

Parameters to vary :

Scoring matrices : pairwise and multiple

Gap open and extension penalties (pairwise and multiple)

Sequential Algorithm :

1.

2.

3.

4.

Till all parameters are sufficiently varied { alignment = Run Clustal W ()

Calculate quality of alignment

Change Parameters }

Quality of alignment

A numerical quantity based on

 identitical amino acid matches

Conservative amino acid substitutions

Gap events, amino acid islands I.e. –X-, -XX-, -XXX-, -XXXX-

CMSC 838T – Presentation

MULTICLUSTAL Optimizations

 Optimization on MULTICLUSTAL

 Run Clustal W once

 Reuse tree generated in the PW/GT Stages

Guide tree calculated only once for multiple runs

Results in speedups from 1.5X to 3X

 Use Parallel Clustal W for each run of Clustal W

CMSC 838T – Presentation

Observations

Parallelizability

First (pairwise alignment) and second (guide tree) stages are parallelizable

Third stage is mostly sequential – speedup limited

100 sequence MSAs possible ?

 PIR at NBRF (Georgetown University) takes maximum of 20 sequences for MSA

 Speedup improves user response, for 20 sequences a PC would be sufficient

 Probable applications:

 Research Environments ?

 PIR servers ?

 Speedup only on shared memory SGI 3000 workstation ?

CMSC 838T – Presentation

Download