Process Scheduling for Performance Estimation and Synthesis of Hardware/Software Systems Petru Eles1, Krzysztof Kuchcinski1, Zebo Peng1, Alexa Doboli2 and Paul Pop1 1Dept. Of Computer and Information Science Linkoping University Sweden 2Computer Science and Engineering Department Technical University Timisoara Romania Process Scheduling for Performance Estimation and Synthesis of Hardware/Software Systems Slide 1 Outline Motivation Problem Formulation The Conditional Process Graph The Scheduling Strategy The Schedule Table Generation of The Schedule Table Experimental Results Conclusions Process Scheduling for Performance Estimation and Synthesis of Hardware/Software Systems Slide 2 Motivation System-Level Specification Architecture Selection Scheduling Partitioning High-level Synthesis Compilation Heterogeneous architecture • programmable processors • hardware components • shared buses • local and shared memories Process interaction captures • data flow • flow of control Integration Process Scheduling for Performance Estimation and Synthesis of Hardware/Software Systems Slide 3 Problem Formulation • Generic architecture: programmable processors, hardware components (ASICs), shared buses, local and shared memories • Each process is assigned to a (programmable or hardware) processor. • Each communication channel which connects processes assigned to different processors is assigned to a bus. • Each process or communication task is characterized by a certain execution time. Goals • To derive a worst case delay by which the system completes execution, so that this delay is minimized. • To generate the schedule which guarantees this delay. Process Scheduling for Performance Estimation and Synthesis of Hardware/Software Systems Slide 4 The Conditional Process Graph Subgraph corresponding to DCK P0 P1 C P2 P6 K P5 P7 P8 P9 P14 P12 P13 K P15 P16 P17 P10 First processor Second processor ASIC D P3 C C P4 D P11 P18 Process Scheduling for Performance Estimation and Synthesis of Hardware/Software Systems Slide 5 The Scheduling Strategy • The values of conditions are unpredictable. • At a certain moment during execution, when the values of some conditions are known, they have to be used in order to take the best possible scheduling decisions. • For each individual path there is an optimal schedule of the processes, which produces a minimal delay. The Scheduling Strategy 1. Scheduling of each individual alternative path. 2. Merging of the individual schedules and generation of the schedule table. Process Scheduling for Performance Estimation and Synthesis of Hardware/Software Systems Slide 6 The Schedule Table P1 P2 P10 P11 DCK true D DC DCK 0 3 34 D C K 34 K DC K D C D C D DC 26 26 34 26 22 24 22 10 18 0 P14 P17 P18 (13) P19 (25) P20 (310) DC 35 29 24 37 30 26 3 9 28 20 21 21 6 7 7 15 15 Process Scheduling for Performance Estimation and Synthesis of Hardware/Software Systems Slide 7 Scheduling of The Alternative Paths • Derive a minimal static schedule for a directed, acyclic polar graph. Allocation and execution time of processes is given. 1. List scheduling based heuristic. • Critical Path, • Urgency Based and • Partial Critical Path priority functions. 2. Branch-and-Bound based algorithm. • Branching, • Selection and • Bounding rules. Process Scheduling for Performance Estimation and Synthesis of Hardware/Software Systems Slide 8 Partial Critical Path Scheduling Critical Path Scheduling P0 lPA= tA+A PA lPB= tB+B PB tB tA PX PY Partial Critical Path Scheduling LPA= max(T_curr+tA+A , T_curr+tA+tB+B ) LPB= max(T_curr+tB+B , T_curr+tB+tA+A ) Select the alternative with the smaller delay: L = max(LPA , LPB ) PN A > B LPA < LPB B > A LPB < LPA Use as a priority criterion. Process Scheduling for Performance Estimation and Synthesis of Hardware/Software Systems Slide 9 Branch and Bound Scheduling • Branching Rule Generates new states starting from a given parent state. Generates children as a result of a scheduling decision. It might let a processor idle, even if there are ready processes on it. • Selection Rule From which state should we continue the branching operation? From the state which has the highest PCP priority. • Bounding Rule Should exploration continue further? Three bounding levels: - fast estimation of two weaker bounds, if bounding with them doesn’t succeed: - third bound based on a partial traversal of the process graph Process Scheduling for Performance Estimation and Synthesis of Hardware/Software Systems Slide 10 Branch and Bound Scheduling (cont’d) 1, 2 P1, 2 P4 , P2 P3 , P2 ... 1, P2 P3, 2 P4, 2 ... ... ... P3 , P8 P5 , P8 P3, 2 P5, 2 1, P8 ... ... ... ... ... Process Scheduling for Performance Estimation and Synthesis of Hardware/Software Systems Slide 11 Graphs Used for Experiments • Number of Graphs: 1250 250 for each dimension of 20, 40, 75, 130, 200 nodes. • Graphs Structure: Random and regular (trees, groups of chains). • Architecture: 1 ASIC, up to 10 processors and up to 3 buses. • Mapping: Random and using simple heuristics. Process Scheduling for Performance Estimation and Synthesis of Hardware/Software Systems Slide 12 Experimental Results Average percentage deviation from the optimal schedule lengths. • Urgency Based Priority: • Critical Path Priority: • Partial Critical Path Priority: 4.73% 4.69% 2.35% • Averages are similar for the five graph dimensions. • Deviations for individual graphs are in the range (0%, 47.74%). Process Scheduling for Performance Estimation and Synthesis of Hardware/Software Systems Slide 13 Experimental Results (cont’d) Percentage of final (optimal) results obtained with the BB algorithm. time limit (s) 0.04 0.08 0.3 1 3 5 60 1800 20 processes 40 processes 75 processes 130 processes 200 processes 91.6% 0.0% 0.0% 0.0% 0.0% 95.6% 54.0% 0.0% 0.0% 0.0% 98.8% 83.6% 66.4% 0.0% 0.0% 99.6% 87.6% 77.6% 56.8% 0.0% 99.6% 89.6% 79.2% 70.8% 51.0% 100% 90.4% 79.2% 72.0% 62.1% 100% 92.4% 80.8% 76.8% 71.5% 100% 96.8% 84.8% 80.4% 79.5% Process Scheduling for Performance Estimation and Synthesis of Hardware/Software Systems Slide 14 Experimental Results (cont’d) Percentage deviation of PCP Schedule from intermediate results obtained with BB. time 1 5 60 300 40 processes average maximum 1.94% 17.65% 1.67% 5.50% 1.48% 4.92% 1.39% 4.26% 75 processes average maximum 2.25% 16.11% 2.45% 16.11% 2.68% 19.01% 2.96% 19.01% Process Scheduling for Performance Estimation and Synthesis of Hardware/Software Systems 130 processes average maximum 2.71% 10.31% 2.76% 8.11% 2.96% 10.73% 3.04% 13.73% 200 processes average maximum 0% 0% 1.99% 21.10% 2.13% 10.58% 2.50% 12.75% Slide 15 The Table Generation Algorithm • Start times of processes are fixed in the table according, with priority, to the schedule of that path which produces the longest delay; • The start time of a process is placed in a column headed by the conjunction of condition values known at that time on the respective processor; • Conflicts have to be avoided at table generation. Process Scheduling for Performance Estimation and Synthesis of Hardware/Software Systems Slide 16 Experimental Results (cont’d) Percentage increase of the worst case delay relative to the delay of the longest path. % 7 6 5 4 3 2 120 processes 80 processes 60 processes 1 0 10 15 20 25 30 35 Number of merged schedules • Real-life example: F4 level of ATM protocol layer. Process Scheduling for Performance Estimation and Synthesis of Hardware/Software Systems Slide 17 Conclusions • An approach to process scheduling for the synthesis of embedded systems. • Process level representation which captures both data flow and the flow of control. • A schedule table is generated. The worst case delay is minimized. • Evaluation based on experiments using a large number of graphs generated for experimental purpose as well as real-life examples. Process Scheduling for Performance Estimation and Synthesis of Hardware/Software Systems Slide 18