Co-Synthesis Algorithms: Distributed System CoSynthesis Part of HW/SW Codesign of Embedded Systems Course (CE 40-226) Winter-Spring 2001 Codesign of Embedded Systems 1 Topics Introduction Preliminaries Hardware/Software Partitioning Distributed System Co-Synthesis Winter-Spring 2001 Codesign of Embedded Systems 2 Topics Introduction A Integer Linear Programming Model A Heuristic Algorithm Winter-Spring 2001 Codesign of Embedded Systems 3 Introduction to Distributed System Co-Syn. Does not use an architectural template Instead, creates a multiprocessor architecture during co-synthesis Usually heterogeneous multiprocessor in Processing Elements Communication Channels Topologies Less emphasis on the design of ASICs More emphasis on the design of multiprocessor topology Winter-Spring 2001 Codesign of Embedded Systems 4 Introduction to Distrib. Sys. CoSyn. (cont’d) Very common in practice Specially large CPU + small microcontrollers + small ASICs Winter-Spring 2001 Codesign of Embedded Systems 5 Co-Synthesis Algorithms: Distributed System CoSynthesis Integer Linear Programming Model Winter-Spring 2001 Codesign of Embedded Systems 6 ILP Model Introduction Linear Programming (LP): Minimizing/maximizing a Linear target function Subject to a set of Linear constraints Current algorithms: Does find the optimal solution, or else the problem is not feasible at all. Example: Knapsack problem Winter-Spring 2001 Codesign of Embedded Systems 7 ILP Model (cont’d) Introduction (cont’d) Integer Linear Programming (ILP) Integer-solution counterpart of LP Example: Knapsack problem with integer-solution constraint Current algorithms: Absolute optimal solution is found Winter-Spring 2001 Takes much CPU time Only feasible for fairly small problems Codesign of Embedded Systems 8 Prakash-Parker ILP Model By Prakash and Parker, 1992 Developed an ILP formulation Inputs to the algorithm Single-rate task graph Technology model for the PEs, communication channels, and processes’ execution characteristics on them Target function Used general ILP solvers to solve it Minimize system implementation cost Constraints Describe the requirements of the system Winter-Spring 2001 Codesign of Embedded Systems 9 Prakash-Parker ILP Model (cont’d) Algorithm classification criteria Input Model Target Architecture Distributed multiprocessor Quantum Single-rate task graph Processes of the task graph Cost Estimation Based on technology models provided to the algorithm Represented as target function of the ILP Winter-Spring 2001 Codesign of Embedded Systems 10 Prakash-Parker ILP Model (cont’d) Algorithm classification criteria (cont’d) Performance Estimation Scheduling, Allocation Based on technology models provided to the algorithm Embedded in the ILP formulation constraints Algorithm details Winter-Spring 2001 Codesign of Embedded Systems 11 Prakash-Parker ILP Model (cont’d) Algorithm classification topics (cont’d) Algorithm details Target Function Minimize cost Sets of Constraints Winter-Spring 2001 Allocation (PE and communication links) Scheduling (Processes on PEs, and communications on links) Codesign of Embedded Systems 12 Prakash-Parker ILP Model (cont’d) Sets of Constraints (cont’d) Allocation Processor-selection constraints Each process must be assigned to one and only one (not more, not less) processor Data-transfer type constraints Winter-Spring 2001 Each communication must be either local or multi-hop. But not both, and not neither Codesign of Embedded Systems 13 Prakash-Parker ILP Model (cont’d) Sets of Constraints (cont’d) Scheduling Input-availability constraints Output-availability constraints Data must obey the fractional output generation parameters Process execution start/end constraints Data cannot be used by the sink process until after produced by the source process Process finish-time depends on its start-time and the PE on which it executes Data-transfer start/end constraints Winter-Spring 2001 Similar to previous, but using data transfer times Codesign of Embedded Systems 14 Prakash-Parker ILP Model (cont’d) Sets of Constraints (cont’d) Scheduling (cont’d) Processor-usage-exclusion Processes on a single PE must not execute simultaneously Communication-usage-exclusion Winter-Spring 2001 Multiple communications must not be scheduled on the same link simultaneously Codesign of Embedded Systems 15 Prakash-Parker ILP Model (cont’d) Experimental Results Applied only to relatively small problems Reason: use of general ILP solvers Their largest task graph: 9 processes Took 6000 CPU minutes on an unspecified processor Significance of the work Did Achieve precisely optimal solutions on those examples which they could solve Used as benchmarks for heuristic co-synthesis algorithms Winter-Spring 2001 Codesign of Embedded Systems 16 Co-Synthesis Algorithms: Distributed System CoSynthesis Wolf’s Heuristic Algorithm Winter-Spring 2001 Codesign of Embedded Systems 17 Wolf’s Heuristic Algorithm As ever, topics of importance: System Specification Language/Model Target Architecture Functionality (Allocation/Scheduling) Quantum Allocation Strategy Scheduling Strategy Cost Estimation Performance Estimation Algorithm Details Winter-Spring 2001 Codesign of Embedded Systems 18 Wolf’s Heuristic Algorithm (cont’d) Wolf’s Heuristic Algorithm System Specification Language/Model Target Architecture Primal approach: Performance is the major objective Scheduling Heterogeneous multiprocessor architecture Allocation Algorithm input: single-rate task graph ? Functionality Quantum Processes in a single-rate task graph Winter-Spring 2001 Codesign of Embedded Systems 19 Wolf’s Heuristic Algorithm (cont’d) Wolf’s Heuristic Algorithm (cont’d) Performance Estimation Cost Estimation Component Technology Library Run-time of each process on each available PE is supposed to be known Component Technology Library Total Cost = Si (Cost of PEi) + Sj (Cost of Comm_Channelj) Algorithm Details Winter-Spring 2001 Codesign of Embedded Systems 20 Wolf’s Heuristic Algorithm Details First ignore communication costs. Later, take them into account Steps: 1. Create an initial feasible solution, and perform an initial scheduling on it. Initial feasible solution: assign each process to a separate PE 2. Reallocate processes to PEs to minimize total PE cost. Possibly eliminate PEs from initial feasible solution 3. Reallocate processes again to minimize the amount of communication required between PEs 4. Allocate communication channels 5. Allocate IO devices. (Internal or external to PEs) Winter-Spring 2001 Codesign of Embedded Systems 21 Wolf’s Heuristic Algorithm Details (cont’d) The most important step: 2. Initial reallocation Reason: PE cost is the dominant hardware cost Initial reallocation 1. PE cost reduction: 1.1 Scan the PEs, starting with the least-utilized PE. 1.2 Try to reallocate that PE’s processes to other existing PEs 1.3 If no process left on the PE, eliminate it otherwise replace the PE with a suitable lower-cost one 2. Pair-wise merge Merge a pair of PEs into a single, more powerful one 3. Load balancing Winter-Spring 2001 Codesign of Embedded Systems 22 Wolf’s Heuristic Algorithm Details (cont’d) Initial reallocation (cont’d) “PE cost reduction” phase tries to reallocate multiple processes at a time The above 3 phases are repeated as far as possible Experimental results Finds optimal solutions to most of ILP-solved examples Finds near-optimal solutions for the remaining examples Showed good results on larger examples Requires very little run-time Winter-Spring 2001 Due to multiple-move strategy during PE cost minimization phase Codesign of Embedded Systems 23 What we learned today Distributed System Co-Synthesis: The other broad category of co-synthesis algorithms Winter-Spring 2001 Codesign of Embedded Systems 24