Presentation

advertisement
Advanced Optimization Techniques
for Complex Problems
Técnicas de Optimización Avanzadas para Problemas Complejos
TRACER:ULL - 2003
Barcelona, October 25th, 2003
http://www.tracer.ull.es
TIC2002-04498-C05-05
University of La Laguna
Outline
• Objectives
• Researchers
• Problems
• Branch and Bound and Divide and Conquer Skeletons
 Knapsack Problem
 Matrix Product
 Constrained two-dimensional cutting stock problem
• CALL and LLAC: tools for Complexity Analysis
 Symbolic regression Problem
• An analytical model for Pipeline and Master-Slave algorithms over
heterogeneous clusters
 Resource allocation problem
 Prediction of the RNA Secondary Structure problem
• Results
TRACER::ULL Objectives
•
•
•
The TRACER::ULL main objective is to achieve an efficient resolution of the following complex problems
by developing new optimization procedures:

Constrained two-dimensional cutting stock problem

Symbolic regression problem

Prediction of the RNA secondary structure problem
We propose the design, implementation and evaluation of solving tools using exact techniques:

Divide and Conquer

Branch and Bound

Dynamic Programming
It is an objective to provide sequential, parallel and distributed implementations for academia problems:

Resource allocation problem

Knapsack problem

Matrix Product
•
A second research track is related with the building of a methodology and the associated tool for the
complexity and performance analysis of both sequential and parallel algorithms.
•
Another goal is the implementation of

An Internet execution systems

A Problem repository

Performance Analysis
Web site: http://www.tracer.ull.es
Researchers
• ULL Staff
 Coromoto León Hernández
 Isabel Dorta González
Branch and Bound
Dynamic Programming
 Daniel González Morales
 Casiano Rodríguez León
 Jesús Alberto González Martínez
• Foreing
 Rumen Andonov
• Students
Performance Analysis Tools
and
Symbolic regression problem
Prediction of the RNA
secondary structure
problem
 Juan Ramón González González
 Gara Miranda Valladares
Divide and Conquer
 María Dolores Medina Barroso
Grants
two dimensional
cutting stock problem
Shared Memory Branch and Bound Skeletons
// shared variables {bqueue, bstemp, soltemp, data}
// private variables {auxSol, high, low}
// the initial subproblem is already inserted in the global shared queue
while(!bqueue.empty()) {
nn = bqueue.getNumberOfNodes();
nt = (nn > maxthread)?maxthread:nn;
data = new SubProblem[nt];
for (int j = 0; j < nt; j++)
data[j] = bqueue.remove();
set.num.threads(nt);
parallel forall (i = 0; i < nt; i++) {
high = data[i].upper_bound(pbm,auxSol);
if ( high > bstemp ) {
low = data[i].lower_bound(pbm,auxSol);
if ( low > bstemp ) {
// critical region
// only one thread can change the value at any time
bstemp = low;
soltemp = auxSol;
}
if ( high != low ) {
// critical region
// just one thread can insert subproblems in the queue at any time
data[i].branch(pbm,bqueue);
} } }
}
bestSol = bstemp;
sol = soltemp;
0-1 Knapsack Problem
The 0/1 Knapsack Problem can be stated as follows:
"We have been provided with a knapsack of capacity C and with a
set of N objects; p[k] and w[k] are the profit and weight associated
to object k. Without exceeding the capacity of the knapsack, the
objects must be inserted into the knapsack providing the maximum
profit".
N
max pk xk
k 1
subject
to :
N
w
k xk
k 1
C
xk  0,1
k  1,..., N
Martello, S., Toth, P. : Knapsack Problems Algorithms and Computer
Implementatios. John Wiley & Sons Ltd. (1990)
Comparison between MPI and OpenMP skeletons
KNP No Sol - N = 100,000
7,00
6,00
Speedup
5,00
4,00
MPI
3,00
OpenMP
2,00
1,00
0,00
2
3
4
8
16
24
32
Processors
Origin 3000- CIEMAT
Distributed Branch and Bound skeleton
• Initialization Phase
• Resolution Phase
 Conditional Communication
Message Reception
Avoiding starvation
 Compute
 Best bound Propagation
 Work querying
 Ending resolution phase
• Solution Building
Distributed Branch and Bound skeleton
Distributed Branch and Bound skeleton
Knapsack N = 50.000 ULL
5,00
4,50
4,00
speedup
3,50
ULL 500 Mhz
3,00
ULL 800 Mhz
2,50
ULL 800-500 Mhz
2,00
ULL 1400 Mhz
1,50
1,00
0,50
0,00
0
5
number of processors
10
Distributed Branch and Bound skeleton
Distributed Branch and Bound skeleton
Distributed Branch and Bound skeleton
Matrix Product
Lets be
 A11
A  
 A21
Definition:
A12 

A22 
y
 B11
B  
 B21
B12 

B22 
n
Cij   Aik Bkj
k 1
Strassen algorithm:
P1   A12  A22  B21  B22 
P2   A11  A22  B11  B22 
P3   A11  A21  B11  B12 
P4   A11  A12  B22
P5  A11  B12  B22 
P6  A22  B21  B11 
P7   A21  A22   B11
 P1  P2  P4  P6
C  
P6  P7

P4  P5


P2  P3  P5  P7 
Distributed Divide and Conquer skeleton
Two dimensional cutting stock Problem: User Interface
•
In this problem we are given a
large stock rectangle S of
dimension LxW and n types of
smaller rectangles (pieces) where
the i-th type has dimension lixwi.
Furthermore, each problem is now
to cut off from the large rectangle
a set of small rectangles such
that:
 All pieces have a fixed orientation,
i.e., a piece of length l and width w
is different from a piece of length w
and width l (l≠w)
 All applied cuts are of guillotine
type, i.e., cut that start form one
edge and run parallel to the other
two edges.
 There are at most bi rectangles of
type i in the cutting plane, the
demand constrain of the i-th piece.
 The overall profit obtained by
Σi=1ncixi where xi denotes the
number of rectangles of type i in
the cutting patter, is maximized.
Performance: CALL & LLAC
Parallel Architectures
Communication
Network
Memory
Memory
Processor
Processor
...
Standard Libraries
MPI PVM
Memory
Processor
We need a well accepted
Parallel Computing Model
BSP
LogP
......
CALL & LLAC Architecture
Performance: CALL & LLAC
C0  C1 N  C2 N 2  C3 N 3
#pragma cll mp mp[0] + mp[1]*N + mp[2]*N*N + mp[3]*N*N*N
for (i = 0; i < N; i++) {
for (j = 0; j < N; j++) {
sum = 0;
for (k = 0; k < N; k++)
sum += A(i,k) * B(k,j);
C(i,j) = sum;
}
}
#pragma cll end mp
Square Matrix Product.
A, B y C of dimension N×N,
Measuring and Predicting Performance
while(!bqueue.empty()) {
auxSp = bqueue.remove(); // pop a problem from the local queue
#pragma cll code numvis++;
high = auxSp.upper_bound(pbm,auxSol);
// upper bound
if ( high > bestSol ) {
low = auxSp.lower_bound(pbm,auxSol);
// lower bound
if ( low > bestSol ) {
bestSol = low;
sol = auxSol;
outputPacket.send(MASTER, SOLVE_TAG, bestSol, sol);
}
if ( high != low ) {
// calculate the number of required slaves
rSlaves = bqueue.getNumberOfNodes();
op.send(MASTER, BnB_TAG, high, rSlaves);
inputPacket.recv(MASTER, nfSlaves, bestSol, rank {1,..., nfSlaves});
if ( nfSlaves >= 0) {
auxSp.branch(pbm,bqueue);
// branch and save in the local queue
for i=0, nfSlaves{
// send subproblems to the assigned slaves
auxSp = bqueue.remove();
#pragma cll code numvis++;
outputPacket.send(rank, PBM_TAG, auxSp, bestSol, sol);
}
} // if nfSlaves == DONE the problem is bounded (cut)
} }
How to compile?
kpr.cll.h
kpr.c
call
kpr.c.dat
kpr
kpr.c.dat.1
......
kpr.c.dat.n
kpr.cll.c
cc
EXPERIMENT: "kps"
BEGIN_LINE: 115
END_LINE: 119
FORMULA: p 0 p 1 v 0 * +
INFORMULA: kps[0]+kps[1]*numvis
MAXTESTS: 131072
DIMENSION: 2
PARAMETERS:
NUMIDENTS: 1
IDENTS: numvis
OBSERVABLES: CLOCK
COMPONENTS: 1 numvis
POSTFIX_COMPONENT_0: 1
POSTFIX_COMPONENT_1: v 0
NUMTESTS: 1
SAMPLE:
CPU NCPUS numvis
CLOCK
0
1 261134.0 0.16491100
kpr
Number of visited Nodes Study
Measuring and Predicting Performance
int main(int argc, char ** argv) {
number sol;
readKnap(data);
#pragma cll code double numvis = 0.0;
#pragma cll kps kps[0]*unknown(numvis) posteriori numvis
/* obj. sig., capacidad rest., beneficio */
sol = knap(
0,
M,
0);
#pragma cll end kps
printf("\nsol = ", sol);
#pragma cll report all
return 0;
}
i   i w
Symbolic Regression Problem
• Find the unknown complexity formula starting from the
experimental data gathered by CALL.
• We can use Symbolic Regression: the induction of
mathematical expressions on data. Rather than searching
for the values of the regression constants, The object of
search is a symbolic description of the system.
• See Scientific Discovery using Genetic Programming by
Maarten Keijzer. 2001
http://www.cs.vu.nl/˜mkeijzer/publications/thesis/.
• Currently we use a fitness function that measures the error
of the predictions “on the asymptotic side” using linear
regression on a small sub-sample
Prediction of the RNA Secondary Structure Problem
•
RNA molecule: string of n characters:
R=r1 r2 ... rn
such that ri  {A, C, G, U}
•
Nucleotides join to free energy:
AU
GU
CG
The iteration space is n x n triangular
Dependences nonuniform: dependences
among non-consecutive stages
•
•
E(Si+1,j-1) +  ( ri, rj ),
E( Si,j ) = min
min { E(Si,k-1) + E(Sk,j) }
i<kj
TRACER::ULL 2003 Results
• Journals:
 Authors: Dorta, León, Rodríguez
Title: Comparing MPI and openMP Implementations of the 0-1 Knapsack
Problem
Journal: Parallel and Distributed Computing Practices. ISSN 1097-2803
(Accepted)
Date: 2003
 Authors: Blanco V., García L., González J.A., Rodríguez C., Rodríguez G.
Title: A Performance Model for the Analysis of OpenMP Programs
Journal: Parallel and Distributed Computing Practices. ISSN 1097-2803
(Accepted)
Date: 2003
TRACER::ULL 2003 Results
• International Conferences:

Blanco V., González J. A., León C. , Rodríguez C., Rodríguez G. “From Complexity Analysis
to Performance Analysis”. Euro-Par 2003. International Conference on Parallel and
Distributed Computing. Klagenfurt, Austria. 26 - 29 August 2003.

Dorta I., León C., Rodríguez C., Rojas A.”Parallel Skeletons for Divide and Conquer and
Branch and Bound Techniques”. 11th euromicro Conference on Parallel and Network-Based
Processing. ISSN 1066-6192. Genova, Italy. 5-7 February, 2003.

Dorta I., León C., Rodríguez C. “A comparison between MPI and OpenMP Branch-and-Bound
Skeletons”. 8th International Workshop on High-Level Parallel Programming Models and
Supportive Enviroments. ISBN 0-7695-1880-X. Nice, France.
22 April, 2003.

Dorta I., León C., Rodríguez C., Rojas A. “Parallel Skeletons. Branch-and-Bound and Divideand-Conquer Techniques”. TAM User Group Meeting 2003. Barcelona, Spain. 16 May, 2003

Dorta I., León C., Rodríguez C., Rojas A. “MPI and OpenMP implementations of Branch and
Bound Skeletons”. ParCo2003. Dresden, Germany. 2-5 Septiembre, 2003.

Dorta I., León C., Rodríguez C. “Parallel Branch and Bound Skeletons: Message Passing and
Shared Memory Implementtions”. 5th International Conference on Parallel Processing and
Applied Mathematics. Czestochowa, Poland. 7-10 September, 2003.

García L., González J.A., González J.C., León C., Rodríguez C., Rodríguez G. “Complexity
Driven Performance Analysis”. 10th EuroPVM/MPI 2003. Venice, Italy. Sep 29 - Oct 2, 2003.
TRACER::ULL 2003 Results
• National Conferences:
•

Dorta I., León C., Rodríguez C. Rodríguez, G., Rojas A. “Complejidad Algorítmica: de la Teoría a la
Práctica”. JENUI’03 (Jornadas de Enseñanza Universitaria de la Informática). ISBN 84-283-2845-5.
Cádiz. 9-11 Julio, 2003

González J.R., León, C., Rodríguez C., ”Un esqueleto para Ramificación y Acotación Distribuido”. XIV
Jornadas De Paralelismo. Leganés (Madrid). 15-17 septiembre 2003
PFC

González J. R., “Esqueletos Paralelos Distribuidos. Paradigmas de Ramificación y Acotación
y Divide y Vencerás”. Documento de Trabajo Interno del DEIOC: DT-03-07. Julio 2003.
Download