Presentation

Advanced Optimization Techniques for Complex Problems Técnicas de Optimización Avanzadas para Problemas Complejos TRACER:ULL - 2003 Barcelona, October 25th, 2003 http://www.tracer.ull.es TIC2002-04498-C05-05 University of La Laguna Outline • Objectives • Researchers • Problems • Branch and Bound and Divide and Conquer Skeletons  Knapsack Problem  Matrix Product  Constrained two-dimensional cutting stock problem • CALL and LLAC: tools for Complexity Analysis  Symbolic regression Problem • An analytical model for Pipeline and Master-Slave algorithms over heterogeneous clusters  Resource allocation problem  Prediction of the RNA Secondary Structure problem • Results TRACER::ULL Objectives • • • The TRACER::ULL main objective is to achieve an efficient resolution of the following complex problems by developing new optimization procedures:  Constrained two-dimensional cutting stock problem  Symbolic regression problem  Prediction of the RNA secondary structure problem We propose the design, implementation and evaluation of solving tools using exact techniques:  Divide and Conquer  Branch and Bound  Dynamic Programming It is an objective to provide sequential, parallel and distributed implementations for academia problems:  Resource allocation problem  Knapsack problem  Matrix Product • A second research track is related with the building of a methodology and the associated tool for the complexity and performance analysis of both sequential and parallel algorithms. • Another goal is the implementation of  An Internet execution systems  A Problem repository  Performance Analysis Web site: http://www.tracer.ull.es Researchers • ULL Staff  Coromoto León Hernández  Isabel Dorta González Branch and Bound Dynamic Programming  Daniel González Morales  Casiano Rodríguez León  Jesús Alberto González Martínez • Foreing  Rumen Andonov • Students Performance Analysis Tools and Symbolic regression problem Prediction of the RNA secondary structure problem  Juan Ramón González González  Gara Miranda Valladares Divide and Conquer  María Dolores Medina Barroso Grants two dimensional cutting stock problem Shared Memory Branch and Bound Skeletons // shared variables {bqueue, bstemp, soltemp, data} // private variables {auxSol, high, low} // the initial subproblem is already inserted in the global shared queue while(!bqueue.empty()) { nn = bqueue.getNumberOfNodes(); nt = (nn > maxthread)?maxthread:nn; data = new SubProblem[nt]; for (int j = 0; j < nt; j++) data[j] = bqueue.remove(); set.num.threads(nt); parallel forall (i = 0; i < nt; i++) { high = data[i].upper_bound(pbm,auxSol); if ( high > bstemp ) { low = data[i].lower_bound(pbm,auxSol); if ( low > bstemp ) { // critical region // only one thread can change the value at any time bstemp = low; soltemp = auxSol; } if ( high != low ) { // critical region // just one thread can insert subproblems in the queue at any time data[i].branch(pbm,bqueue); } } } } bestSol = bstemp; sol = soltemp; 0-1 Knapsack Problem The 0/1 Knapsack Problem can be stated as follows: "We have been provided with a knapsack of capacity C and with a set of N objects; p[k] and w[k] are the profit and weight associated to object k. Without exceeding the capacity of the knapsack, the objects must be inserted into the knapsack providing the maximum profit". N max pk xk k 1 subject to : N w k xk k 1 C xk  0,1 k  1,..., N Martello, S., Toth, P. : Knapsack Problems Algorithms and Computer Implementatios. John Wiley & Sons Ltd. (1990) Comparison between MPI and OpenMP skeletons KNP No Sol - N = 100,000 7,00 6,00 Speedup 5,00 4,00 MPI 3,00 OpenMP 2,00 1,00 0,00 2 3 4 8 16 24 32 Processors Origin 3000- CIEMAT Distributed Branch and Bound skeleton • Initialization Phase • Resolution Phase  Conditional Communication Message Reception Avoiding starvation  Compute  Best bound Propagation  Work querying  Ending resolution phase • Solution Building Distributed Branch and Bound skeleton Distributed Branch and Bound skeleton Knapsack N = 50.000 ULL 5,00 4,50 4,00 speedup 3,50 ULL 500 Mhz 3,00 ULL 800 Mhz 2,50 ULL 800-500 Mhz 2,00 ULL 1400 Mhz 1,50 1,00 0,50 0,00 0 5 number of processors 10 Distributed Branch and Bound skeleton Distributed Branch and Bound skeleton Distributed Branch and Bound skeleton Matrix Product Lets be  A11 A    A21 Definition: A12   A22  y  B11 B    B21 B12   B22  n Cij   Aik Bkj k 1 Strassen algorithm: P1   A12  A22  B21  B22  P2   A11  A22  B11  B22  P3   A11  A21  B11  B12  P4   A11  A12  B22 P5  A11  B12  B22  P6  A22  B21  B11  P7   A21  A22   B11  P1  P2  P4  P6 C   P6  P7  P4  P5   P2  P3  P5  P7  Distributed Divide and Conquer skeleton Two dimensional cutting stock Problem: User Interface • In this problem we are given a large stock rectangle S of dimension LxW and n types of smaller rectangles (pieces) where the i-th type has dimension lixwi. Furthermore, each problem is now to cut off from the large rectangle a set of small rectangles such that:  All pieces have a fixed orientation, i.e., a piece of length l and width w is different from a piece of length w and width l (l≠w)  All applied cuts are of guillotine type, i.e., cut that start form one edge and run parallel to the other two edges.  There are at most bi rectangles of type i in the cutting plane, the demand constrain of the i-th piece.  The overall profit obtained by Σi=1ncixi where xi denotes the number of rectangles of type i in the cutting patter, is maximized. Performance: CALL & LLAC Parallel Architectures Communication Network Memory Memory Processor Processor ... Standard Libraries MPI PVM Memory Processor We need a well accepted Parallel Computing Model BSP LogP ...... CALL & LLAC Architecture Performance: CALL & LLAC C0  C1 N  C2 N 2  C3 N 3 #pragma cll mp mp[0] + mp[1]*N + mp[2]*N*N + mp[3]*N*N*N for (i = 0; i < N; i++) { for (j = 0; j < N; j++) { sum = 0; for (k = 0; k < N; k++) sum += A(i,k) * B(k,j); C(i,j) = sum; } } #pragma cll end mp Square Matrix Product. A, B y C of dimension N×N, Measuring and Predicting Performance while(!bqueue.empty()) { auxSp = bqueue.remove(); // pop a problem from the local queue #pragma cll code numvis++; high = auxSp.upper_bound(pbm,auxSol); // upper bound if ( high > bestSol ) { low = auxSp.lower_bound(pbm,auxSol); // lower bound if ( low > bestSol ) { bestSol = low; sol = auxSol; outputPacket.send(MASTER, SOLVE_TAG, bestSol, sol); } if ( high != low ) { // calculate the number of required slaves rSlaves = bqueue.getNumberOfNodes(); op.send(MASTER, BnB_TAG, high, rSlaves); inputPacket.recv(MASTER, nfSlaves, bestSol, rank {1,..., nfSlaves}); if ( nfSlaves >= 0) { auxSp.branch(pbm,bqueue); // branch and save in the local queue for i=0, nfSlaves{ // send subproblems to the assigned slaves auxSp = bqueue.remove(); #pragma cll code numvis++; outputPacket.send(rank, PBM_TAG, auxSp, bestSol, sol); } } // if nfSlaves == DONE the problem is bounded (cut) } } How to compile? kpr.cll.h kpr.c call kpr.c.dat kpr kpr.c.dat.1 ...... kpr.c.dat.n kpr.cll.c cc EXPERIMENT: "kps" BEGIN_LINE: 115 END_LINE: 119 FORMULA: p 0 p 1 v 0 * + INFORMULA: kps[0]+kps[1]*numvis MAXTESTS: 131072 DIMENSION: 2 PARAMETERS: NUMIDENTS: 1 IDENTS: numvis OBSERVABLES: CLOCK COMPONENTS: 1 numvis POSTFIX_COMPONENT_0: 1 POSTFIX_COMPONENT_1: v 0 NUMTESTS: 1 SAMPLE: CPU NCPUS numvis CLOCK 0 1 261134.0 0.16491100 kpr Number of visited Nodes Study Measuring and Predicting Performance int main(int argc, char ** argv) { number sol; readKnap(data); #pragma cll code double numvis = 0.0; #pragma cll kps kps[0]*unknown(numvis) posteriori numvis /* obj. sig., capacidad rest., beneficio */ sol = knap( 0, M, 0); #pragma cll end kps printf("\nsol = ", sol); #pragma cll report all return 0; } i   i w Symbolic Regression Problem • Find the unknown complexity formula starting from the experimental data gathered by CALL. • We can use Symbolic Regression: the induction of mathematical expressions on data. Rather than searching for the values of the regression constants, The object of search is a symbolic description of the system. • See Scientific Discovery using Genetic Programming by Maarten Keijzer. 2001 http://www.cs.vu.nl/˜mkeijzer/publications/thesis/. • Currently we use a fitness function that measures the error of the predictions “on the asymptotic side” using linear regression on a small sub-sample Prediction of the RNA Secondary Structure Problem • RNA molecule: string of n characters: R=r1 r2 ... rn such that ri  {A, C, G, U} • Nucleotides join to free energy: AU GU CG The iteration space is n x n triangular Dependences nonuniform: dependences among non-consecutive stages • • E(Si+1,j-1) +  ( ri, rj ), E( Si,j ) = min min { E(Si,k-1) + E(Sk,j) } i<kj TRACER::ULL 2003 Results • Journals:  Authors: Dorta, León, Rodríguez Title: Comparing MPI and openMP Implementations of the 0-1 Knapsack Problem Journal: Parallel and Distributed Computing Practices. ISSN 1097-2803 (Accepted) Date: 2003  Authors: Blanco V., García L., González J.A., Rodríguez C., Rodríguez G. Title: A Performance Model for the Analysis of OpenMP Programs Journal: Parallel and Distributed Computing Practices. ISSN 1097-2803 (Accepted) Date: 2003 TRACER::ULL 2003 Results • International Conferences:  Blanco V., González J. A., León C. , Rodríguez C., Rodríguez G. “From Complexity Analysis to Performance Analysis”. Euro-Par 2003. International Conference on Parallel and Distributed Computing. Klagenfurt, Austria. 26 - 29 August 2003.  Dorta I., León C., Rodríguez C., Rojas A.”Parallel Skeletons for Divide and Conquer and Branch and Bound Techniques”. 11th euromicro Conference on Parallel and Network-Based Processing. ISSN 1066-6192. Genova, Italy. 5-7 February, 2003.  Dorta I., León C., Rodríguez C. “A comparison between MPI and OpenMP Branch-and-Bound Skeletons”. 8th International Workshop on High-Level Parallel Programming Models and Supportive Enviroments. ISBN 0-7695-1880-X. Nice, France. 22 April, 2003.  Dorta I., León C., Rodríguez C., Rojas A. “Parallel Skeletons. Branch-and-Bound and Divideand-Conquer Techniques”. TAM User Group Meeting 2003. Barcelona, Spain. 16 May, 2003  Dorta I., León C., Rodríguez C., Rojas A. “MPI and OpenMP implementations of Branch and Bound Skeletons”. ParCo2003. Dresden, Germany. 2-5 Septiembre, 2003.  Dorta I., León C., Rodríguez C. “Parallel Branch and Bound Skeletons: Message Passing and Shared Memory Implementtions”. 5th International Conference on Parallel Processing and Applied Mathematics. Czestochowa, Poland. 7-10 September, 2003.  García L., González J.A., González J.C., León C., Rodríguez C., Rodríguez G. “Complexity Driven Performance Analysis”. 10th EuroPVM/MPI 2003. Venice, Italy. Sep 29 - Oct 2, 2003. TRACER::ULL 2003 Results • National Conferences: •  Dorta I., León C., Rodríguez C. Rodríguez, G., Rojas A. “Complejidad Algorítmica: de la Teoría a la Práctica”. JENUI’03 (Jornadas de Enseñanza Universitaria de la Informática). ISBN 84-283-2845-5. Cádiz. 9-11 Julio, 2003  González J.R., León, C., Rodríguez C., ”Un esqueleto para Ramificación y Acotación Distribuido”. XIV Jornadas De Paralelismo. Leganés (Madrid). 15-17 septiembre 2003 PFC  González J. R., “Esqueletos Paralelos Distribuidos. Paradigmas de Ramificación y Acotación y Divide y Vencerás”. Documento de Trabajo Interno del DEIOC: DT-03-07. Julio 2003.

Presentation

Related documents

Products

Support

Presentation

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib