Advanced Optimization Techniques for Complex Problems Técnicas de Optimización Avanzadas para Problemas Complejos TRACER:ULL - 2003 Barcelona, October 25th, 2003 http://www.tracer.ull.es TIC2002-04498-C05-05 University of La Laguna Outline • Objectives • Researchers • Problems • Branch and Bound and Divide and Conquer Skeletons Knapsack Problem Matrix Product Constrained two-dimensional cutting stock problem • CALL and LLAC: tools for Complexity Analysis Symbolic regression Problem • An analytical model for Pipeline and Master-Slave algorithms over heterogeneous clusters Resource allocation problem Prediction of the RNA Secondary Structure problem • Results TRACER::ULL Objectives • • • The TRACER::ULL main objective is to achieve an efficient resolution of the following complex problems by developing new optimization procedures: Constrained two-dimensional cutting stock problem Symbolic regression problem Prediction of the RNA secondary structure problem We propose the design, implementation and evaluation of solving tools using exact techniques: Divide and Conquer Branch and Bound Dynamic Programming It is an objective to provide sequential, parallel and distributed implementations for academia problems: Resource allocation problem Knapsack problem Matrix Product • A second research track is related with the building of a methodology and the associated tool for the complexity and performance analysis of both sequential and parallel algorithms. • Another goal is the implementation of An Internet execution systems A Problem repository Performance Analysis Web site: http://www.tracer.ull.es Researchers • ULL Staff Coromoto León Hernández Isabel Dorta González Branch and Bound Dynamic Programming Daniel González Morales Casiano Rodríguez León Jesús Alberto González Martínez • Foreing Rumen Andonov • Students Performance Analysis Tools and Symbolic regression problem Prediction of the RNA secondary structure problem Juan Ramón González González Gara Miranda Valladares Divide and Conquer María Dolores Medina Barroso Grants two dimensional cutting stock problem Shared Memory Branch and Bound Skeletons // shared variables {bqueue, bstemp, soltemp, data} // private variables {auxSol, high, low} // the initial subproblem is already inserted in the global shared queue while(!bqueue.empty()) { nn = bqueue.getNumberOfNodes(); nt = (nn > maxthread)?maxthread:nn; data = new SubProblem[nt]; for (int j = 0; j < nt; j++) data[j] = bqueue.remove(); set.num.threads(nt); parallel forall (i = 0; i < nt; i++) { high = data[i].upper_bound(pbm,auxSol); if ( high > bstemp ) { low = data[i].lower_bound(pbm,auxSol); if ( low > bstemp ) { // critical region // only one thread can change the value at any time bstemp = low; soltemp = auxSol; } if ( high != low ) { // critical region // just one thread can insert subproblems in the queue at any time data[i].branch(pbm,bqueue); } } } } bestSol = bstemp; sol = soltemp; 0-1 Knapsack Problem The 0/1 Knapsack Problem can be stated as follows: "We have been provided with a knapsack of capacity C and with a set of N objects; p[k] and w[k] are the profit and weight associated to object k. Without exceeding the capacity of the knapsack, the objects must be inserted into the knapsack providing the maximum profit". N max pk xk k 1 subject to : N w k xk k 1 C xk 0,1 k 1,..., N Martello, S., Toth, P. : Knapsack Problems Algorithms and Computer Implementatios. John Wiley & Sons Ltd. (1990) Comparison between MPI and OpenMP skeletons KNP No Sol - N = 100,000 7,00 6,00 Speedup 5,00 4,00 MPI 3,00 OpenMP 2,00 1,00 0,00 2 3 4 8 16 24 32 Processors Origin 3000- CIEMAT Distributed Branch and Bound skeleton • Initialization Phase • Resolution Phase Conditional Communication Message Reception Avoiding starvation Compute Best bound Propagation Work querying Ending resolution phase • Solution Building Distributed Branch and Bound skeleton Distributed Branch and Bound skeleton Knapsack N = 50.000 ULL 5,00 4,50 4,00 speedup 3,50 ULL 500 Mhz 3,00 ULL 800 Mhz 2,50 ULL 800-500 Mhz 2,00 ULL 1400 Mhz 1,50 1,00 0,50 0,00 0 5 number of processors 10 Distributed Branch and Bound skeleton Distributed Branch and Bound skeleton Distributed Branch and Bound skeleton Matrix Product Lets be A11 A A21 Definition: A12 A22 y B11 B B21 B12 B22 n Cij Aik Bkj k 1 Strassen algorithm: P1 A12 A22 B21 B22 P2 A11 A22 B11 B22 P3 A11 A21 B11 B12 P4 A11 A12 B22 P5 A11 B12 B22 P6 A22 B21 B11 P7 A21 A22 B11 P1 P2 P4 P6 C P6 P7 P4 P5 P2 P3 P5 P7 Distributed Divide and Conquer skeleton Two dimensional cutting stock Problem: User Interface • In this problem we are given a large stock rectangle S of dimension LxW and n types of smaller rectangles (pieces) where the i-th type has dimension lixwi. Furthermore, each problem is now to cut off from the large rectangle a set of small rectangles such that: All pieces have a fixed orientation, i.e., a piece of length l and width w is different from a piece of length w and width l (l≠w) All applied cuts are of guillotine type, i.e., cut that start form one edge and run parallel to the other two edges. There are at most bi rectangles of type i in the cutting plane, the demand constrain of the i-th piece. The overall profit obtained by Σi=1ncixi where xi denotes the number of rectangles of type i in the cutting patter, is maximized. Performance: CALL & LLAC Parallel Architectures Communication Network Memory Memory Processor Processor ... Standard Libraries MPI PVM Memory Processor We need a well accepted Parallel Computing Model BSP LogP ...... CALL & LLAC Architecture Performance: CALL & LLAC C0 C1 N C2 N 2 C3 N 3 #pragma cll mp mp[0] + mp[1]*N + mp[2]*N*N + mp[3]*N*N*N for (i = 0; i < N; i++) { for (j = 0; j < N; j++) { sum = 0; for (k = 0; k < N; k++) sum += A(i,k) * B(k,j); C(i,j) = sum; } } #pragma cll end mp Square Matrix Product. A, B y C of dimension N×N, Measuring and Predicting Performance while(!bqueue.empty()) { auxSp = bqueue.remove(); // pop a problem from the local queue #pragma cll code numvis++; high = auxSp.upper_bound(pbm,auxSol); // upper bound if ( high > bestSol ) { low = auxSp.lower_bound(pbm,auxSol); // lower bound if ( low > bestSol ) { bestSol = low; sol = auxSol; outputPacket.send(MASTER, SOLVE_TAG, bestSol, sol); } if ( high != low ) { // calculate the number of required slaves rSlaves = bqueue.getNumberOfNodes(); op.send(MASTER, BnB_TAG, high, rSlaves); inputPacket.recv(MASTER, nfSlaves, bestSol, rank {1,..., nfSlaves}); if ( nfSlaves >= 0) { auxSp.branch(pbm,bqueue); // branch and save in the local queue for i=0, nfSlaves{ // send subproblems to the assigned slaves auxSp = bqueue.remove(); #pragma cll code numvis++; outputPacket.send(rank, PBM_TAG, auxSp, bestSol, sol); } } // if nfSlaves == DONE the problem is bounded (cut) } } How to compile? kpr.cll.h kpr.c call kpr.c.dat kpr kpr.c.dat.1 ...... kpr.c.dat.n kpr.cll.c cc EXPERIMENT: "kps" BEGIN_LINE: 115 END_LINE: 119 FORMULA: p 0 p 1 v 0 * + INFORMULA: kps[0]+kps[1]*numvis MAXTESTS: 131072 DIMENSION: 2 PARAMETERS: NUMIDENTS: 1 IDENTS: numvis OBSERVABLES: CLOCK COMPONENTS: 1 numvis POSTFIX_COMPONENT_0: 1 POSTFIX_COMPONENT_1: v 0 NUMTESTS: 1 SAMPLE: CPU NCPUS numvis CLOCK 0 1 261134.0 0.16491100 kpr Number of visited Nodes Study Measuring and Predicting Performance int main(int argc, char ** argv) { number sol; readKnap(data); #pragma cll code double numvis = 0.0; #pragma cll kps kps[0]*unknown(numvis) posteriori numvis /* obj. sig., capacidad rest., beneficio */ sol = knap( 0, M, 0); #pragma cll end kps printf("\nsol = ", sol); #pragma cll report all return 0; } i i w Symbolic Regression Problem • Find the unknown complexity formula starting from the experimental data gathered by CALL. • We can use Symbolic Regression: the induction of mathematical expressions on data. Rather than searching for the values of the regression constants, The object of search is a symbolic description of the system. • See Scientific Discovery using Genetic Programming by Maarten Keijzer. 2001 http://www.cs.vu.nl/˜mkeijzer/publications/thesis/. • Currently we use a fitness function that measures the error of the predictions “on the asymptotic side” using linear regression on a small sub-sample Prediction of the RNA Secondary Structure Problem • RNA molecule: string of n characters: R=r1 r2 ... rn such that ri {A, C, G, U} • Nucleotides join to free energy: AU GU CG The iteration space is n x n triangular Dependences nonuniform: dependences among non-consecutive stages • • E(Si+1,j-1) + ( ri, rj ), E( Si,j ) = min min { E(Si,k-1) + E(Sk,j) } i<kj TRACER::ULL 2003 Results • Journals: Authors: Dorta, León, Rodríguez Title: Comparing MPI and openMP Implementations of the 0-1 Knapsack Problem Journal: Parallel and Distributed Computing Practices. ISSN 1097-2803 (Accepted) Date: 2003 Authors: Blanco V., García L., González J.A., Rodríguez C., Rodríguez G. Title: A Performance Model for the Analysis of OpenMP Programs Journal: Parallel and Distributed Computing Practices. ISSN 1097-2803 (Accepted) Date: 2003 TRACER::ULL 2003 Results • International Conferences: Blanco V., González J. A., León C. , Rodríguez C., Rodríguez G. “From Complexity Analysis to Performance Analysis”. Euro-Par 2003. International Conference on Parallel and Distributed Computing. Klagenfurt, Austria. 26 - 29 August 2003. Dorta I., León C., Rodríguez C., Rojas A.”Parallel Skeletons for Divide and Conquer and Branch and Bound Techniques”. 11th euromicro Conference on Parallel and Network-Based Processing. ISSN 1066-6192. Genova, Italy. 5-7 February, 2003. Dorta I., León C., Rodríguez C. “A comparison between MPI and OpenMP Branch-and-Bound Skeletons”. 8th International Workshop on High-Level Parallel Programming Models and Supportive Enviroments. ISBN 0-7695-1880-X. Nice, France. 22 April, 2003. Dorta I., León C., Rodríguez C., Rojas A. “Parallel Skeletons. Branch-and-Bound and Divideand-Conquer Techniques”. TAM User Group Meeting 2003. Barcelona, Spain. 16 May, 2003 Dorta I., León C., Rodríguez C., Rojas A. “MPI and OpenMP implementations of Branch and Bound Skeletons”. ParCo2003. Dresden, Germany. 2-5 Septiembre, 2003. Dorta I., León C., Rodríguez C. “Parallel Branch and Bound Skeletons: Message Passing and Shared Memory Implementtions”. 5th International Conference on Parallel Processing and Applied Mathematics. Czestochowa, Poland. 7-10 September, 2003. García L., González J.A., González J.C., León C., Rodríguez C., Rodríguez G. “Complexity Driven Performance Analysis”. 10th EuroPVM/MPI 2003. Venice, Italy. Sep 29 - Oct 2, 2003. TRACER::ULL 2003 Results • National Conferences: • Dorta I., León C., Rodríguez C. Rodríguez, G., Rojas A. “Complejidad Algorítmica: de la Teoría a la Práctica”. JENUI’03 (Jornadas de Enseñanza Universitaria de la Informática). ISBN 84-283-2845-5. Cádiz. 9-11 Julio, 2003 González J.R., León, C., Rodríguez C., ”Un esqueleto para Ramificación y Acotación Distribuido”. XIV Jornadas De Paralelismo. Leganés (Madrid). 15-17 septiembre 2003 PFC González J. R., “Esqueletos Paralelos Distribuidos. Paradigmas de Ramificación y Acotación y Divide y Vencerás”. Documento de Trabajo Interno del DEIOC: DT-03-07. Julio 2003.