slides

advertisement

A Case Study on What Works and What Doesn‘t

Eric C. Reed

Nicholas Chen

Ralph E. Johnson

Goal: Identify core programming patterns used in pipeline parallelism

Convert “pipeline-ish” serial programs to parallel ones

Identifying transformations could lead to automation

PARSEC & TBB pipelines

REU project focused on just part of the bigger picture

Always some “pre-transformation” needed before TBB could be used

TBB performed on par with or better than pthreads making library/framework based approaches attractive

TBB Flow Graph had not yet been released

 Resolves some problems we found

 Our work provides empirical evidence for needing more complex constructs than available in TBB pipelines

3.

4.

1.

2.

5.

6.

Read in image

Break image into segments

Extract feature vectors from segments

Query database with feature vectors to find candidate images

Rank candidate images based on similarity

Output best-matching images

 class foo : tbb::filter { void* operator()(void* inp) {

… operate on token …

};

};

A single stage of the pipeline

Represented as a function object

Input: void* to output of previous stage

Output: void* to input of next stage

First/Last stage generates/consumes tokens

Serial-in-order, serial-out-of-order, or parallel

A pipeline is a sequence of filters

Specified max number of live tokens

Calls first stage to get a new token

 A NULL pointer signifies no more input tbb::pipeline pipe; pipe.add_filter(new ReadFilter()); pipe.add_filter(new DoFilter()); pipe.add_filter(new WriteFilter()); pipe.run( 10 ); pipe.clear();

3.

4.

1.

2.

5.

6.

Read in image (serial-in-order)

Break image into segments (parallel)

Extract feature vectors from segments (parallel)

Query database with feature vectors to find candidate images (parallel)

Rank candidate images by similarity (parallel)

Output best-matching images (serial-out-of-order)

Ferret Execution Time (seconds vs. number of threads)

700

600

500 gcc-pthreads gcc-tbb icc-pthreads icc-tbb

400

300

200

100

0

1 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64

 Frame contents predicted from already encoded reference frames

 Frame processing cannot start until all reference frames are encoded

 Cannot be guaranteed by TBB without blocking

 TBB pipelines are not a suitable representation

1.

2.

3.

4.

5.

Write a file segment once and its hash every other time

Read in a block of the file (serial-in-order)

Split block into small segments (parallel)

1.

2.

Hash the segment and check database (parallel)

If hash found in database go to step 5

Otherwise go to step 4

Compress the segment’s data (parallel)

Reorder segments into a block. Reorder blocks and write out data (serial-in-order)

Token generating stage (step 2)

Optional stage (step 4)

1.

2.

3.

Read in a block from file (serial-in-order)

1.

2.

3.

4.

Do the following on the block (parallel)

Split block into segments (serial-in-order)

Compute and check hash (parallel)

Compress segment (parallel)

Check flag to either compress data or immediately return

Reorder segments into block (serial-in-order)

TBB handles reordering so we need only append the segment to the block data structure

Write out block (serial-in-order)

TBB handles reordering so we can just write out the block data

Dedup Execution Time (seconds vs. number of threads)

70

60

50 gcc-pthreads gcc-tbb icc-pthreads icc-tbb

20

10

40

30

0

1 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64

 Transformations

Recursive generators become iterators with stacks

 Semi-automation with user identifying state

Optional stages become required stages with flags

 Semi-automation with user identifying conditions

Token generating stages require nested pipelines

 Semi-automation with user specifying how to convert between pipelines

 TBB pipeline unsuitability

Dynamically constructed pipeline

Waiting on earlier tokens to finish first

Download