Computers as Components Principles of Embedded Computing System Design Dr. Prof. Huang Tingle Group of IIPI Guilin University of Electronic Technology Ch1-1 Outline The embedded computing space. Platforms: system-on-chip, networks. Architectures, applications, methodologies. Standards-based design. Multiple standards. Embedded Computing 2 Example embedded computing systems Motorola Embedded Computing Siemens Apple BMW 3 Early history Late 1940’s: MIT Whirlwind computer was designed for real-time operations. Originally designed to control an aircraft simulator. First microprocessor was Intel 4004 in early 1970’s. HP-35 calculator used several chips to implement a microprocessor in 1972. Embedded Computing 4 Early history, cont’d. Automobiles used microprocessor-based engine controllers starting in 1970’s. Control fuel/air mixture, engine timing, etc. Multiple modes of operation: warm-up, cruise, hill climbing, etc. Provides lower emissions, better fuel efficiency. Embedded Computing 5 Multiprocessor systems-onchips Roughly speaking, system-on-chip with at least two processors. Usually heterogeneous multiprocessor: CPUs, DSPs, etc. Hardwired accelerators. Mixed-signal front end. Embedded Computing 6 Consumer electronics categories 2001 Embedded Computing 2002 2003 2004 Satellite $1.18 TV DVR 0.14 (40E6) DVD 2.1 $1.12 $1.48 $1.89 0.57 0.18 0.54 2.43 2.7 2.46 Set-top 0.20 Internet PC 12.96 (120E6) 0.12 0.63 0.341 12.61 15.58 17.2 Wall Street Journal/EIA 7 Consumer electronics prices Best Buy November 2003: Embedded Computing 8 Characteristics of embedded systems Very high performance. Vision + compression + speech + networking all on the same platform. Multiple task, heterogeneous. Real-time. Often low power. Highly reliable. I reboot my piano every 4 months, my PC every day. Embedded Computing 9 Mudge et al: Mobile supercomputing Future mobile platform: Speech recognition. Cryptography. Augmented reality. Typical applications (email, etc.). Requires 16x 2 GHz Pentium 4. Peak power must not exceed 75 mW. Assumes 5% battery improvement per Embedded Computing year. 10 Mudge et al: Performance trends for desktop processors 10k SPECInt2000 10000 Moore's Law Speedup Performance Gap Performance (SPECInt2000) 1000 100 10 Technology (relative FO4 delay) Pipelining (relative FO4 gates/stage) 1 ILP (relative SPECInt/Mhz) Performance 0.1 i386 i486 Pentium Pentium Pro Pentium II Pentium III Pentium 4 One Gen Two Gen Three Gen © 2004 IEEE Computer Society Embedded Computing 11 Mudge et al: Power trends for desktop processors 1000 Total Power (W) Dynamic Power (W) Static Power (W) Power (W) 100 Power Gap 10 1 75 mW Peak Power 0.1 i386 i486 Pentium Pentium Pro Pentium II Pentium III Pentium 4 One Gen Two Gen Three Gen © 2004 IEEE Computer Society Embedded Computing 12 Platforms An architecture that is designed for an application domain: Can be used in several products. Allows customization. Platforms are often customized for their target audience. Platforms spread out development costs over more products. Some people hope for a single universal platform… Embedded Computing 13 Why multiple platforms? People still care about cost. People care about power consumption. Sufficiently general solutions don’t fit on one chip. Embedded Computing 14 Intel IXP2850 network processor Packet processing, control processing, security. Software development environment includes simulator. Embedded Computing … Xscale Security processor … 16 microengines 15 TI OMAP Targets communications, multimedia. Multiprocessor with DSP, RISC. OMAP 5910: C55x DSP MMU Memory ctrl MPU interface System DMA control bridge I/O ARM9 Embedded Computing 16 Targets mobile multimedia. Memory system A multiprocessorof-multiprocessors. ARM9 Audio accelerator Embedded Computing I/O bridges ST Nomadik Video accelerator heterogeneous multiprocessors 17 ST MMDSP+ Embedded processor core used in multiple chips: Embedded Computing Runs at 175 MHz. 1 cycle per instruction. 2-level instruction cache. 16/24-bit fixed point. 32-bit floating point. C programmed Fully synthesizable. 18 Nomadik video accelerator instr RAM data RAM MMDSP+ Xbus Interrupt controller Embedded Computing Picture post processing Local data bus Video codec DMA Picture input processing Master AHB 19 Automotive embedded systems Today’s high-end automobile may have 100 microprocessors: 4-bit microcontroller checks seat belt; microcontrollers run dashboard devices; 16/32-bit microprocessor controls engine. Embedded Computing 20 BMW 850i brake and stability control system Anti-lock brake system (ABS): pumps brakes to reduce skidding. Automatic stability control (ASC+T): controls engine to improve stability. ABS and ASC+T communicate. ABS was introduced first---needed to interface to existing ABS module. Embedded Computing 21 BMW 850i, cont’d. sensor sensor brake brake ABS Embedded Computing hydraulic pump brake brake sensor sensor 22 The eternal triangle Hardware and software architectures determine capabilities. Applications guide design decisions. Methodologies allow repeatable, predictable design. Embedded Computing architectures applications methodologies 23 Observations and implications A little domain knowledge helps a lot. The architectural design space is large and chunky. Less synthesis, more analysis. IP components must be adapted to play together. Configurable IP, wrappers. Supporting tools (compilers, etc.) must be adaptable. Embedded Computing 24 Software in consumer devices (ST) Modern audio standards (Dolby, MP3, etc.): Modern video standards (MPEG2, DV, etc.): Embedded Computing 1 million lines of code. 2 million lines of code and counting. 25 Software and MPSoC design The MPSoC must run the application. Design verification must include the software running on the hardware. May not know all possible code at design time. Limits design characterization. Must provide programming environment. Embedded Computing 26 MPSoCs and standards Standards enable large markets. MPSoCs need large markets to justify chip development costs, reduce manufacturing overhead. MPSoCs provide benefits: Low power. High performance. Meeting the standard requires effort: Platform must allow multiple implementations. Standard is complex and hard to implement. Embedded Computing 27 Design challenges in standards-driven markets Design and verify methods within the standard. Standards allow differentiation. Design and verify methods outside the standard’s scope. User interface, etc. Design and verify interfaces. Within standard, connection to extra-standard elements. Embedded Computing 28 Standards-based systems Reference implementation forms a basis for product. Port to platform. Enhance performance, features. Want to minimize unnecessary changes to the software. Must make some changes to the software. Embedded Computing 29 Characteristics of reference implementations The specification does not describe hardware or software. The spec is in the domain of signal processing, etc. Designed for and tested on workstations. Infinite memory. Poor cache behavior. Single process. Limited real-time behavior. The executable spec misrepresents some system properties: Embedded Computing Error handling. Buffer management. 30 H.264 motion estimation, cont’d. Multiple reference frames increases accuracy. Handles occlusion. Once again, receiver is more complex. Embedded Computing 31 Why are standards so complex? Algorithm designers like to design algorithms. Standards are complex. Standards bodies must embody competing interests, ideas in their standards. Embedded Computing MPEG Tampere meeting 32 Design refinement Bad news: hard to learn the platform in order to change it. Worldwide shipping by UPS ... roughly US$ 50 for CD and US$ 100 for paper copy (1500 pages, heavy!) Bluetooth.com Good news: an existing design can be measured, analyzed, and refined. Embedded Computing 33 Four types of people Algorithms people. Don’t like programming. Don’t know that hardware exists. Software people. Don’t like hardware. Hardware people. Tolerate software. Don’t know applications exist. Managers. Embedded Computing Don’t know anything. Don’t do anything. 34 Example: MPEG-2 codec One of the reference MPEG-2 codecs. Simple algorithms. Designed for workstation operation. Implementers must port to chosen platform. Limited memory. Limited CPU. Embedded Computing 35 MPEG-2 porting challenges Codec uses a mixture of buffering strategies. Some buffers are statically allocated. Some buffers are allocated from the heap. May need to change number representation. Integer, double-precision, etc. Error messages use Unix methods. Embedded Computing 36 Example: H.264 codec Reference encoder is 700,000 lines of C code. Uses simple algorithms. Supports a wide range of: Display sizes. Features. Embedded Computing 37 H.264 porting challenges Figure out what code is of interest. Large call graph. May need to change number representation. Integer, double-precision, etc. Buffer management. Buffer allocation takes up over 50% of CPU time. Embedded Computing 38 Multiple standards Many MPSoCs must implement multiple standards: Communications. Networking. Multimedia. Security. Requires running a lot of different types of algorithms. Good case for specialization, co-design, configurable CPUs, etc. Need some general-purpose computers for load sharing, compatibility. Embedded Computing 39 Platforms, standards, and MPSoCs A platform allows multiple variations of a system. Well-suited to standards. Programmability is key to platform-based design. Embedded Computing 40 The design productivity gap 600 500 400 size 300 design 200 100 0 2001 Embedded Computing 2003 2006 2009 41 Two phases of platformbased design Semiconductor house designs the platform. requirements past designs Requirements may come from standards, systems houses. Systems house uses the platform. May need to start design before chip is available. Embedded Computing platform user needs product 42 Challenges in platform-based design Don’t have the full application. Must estimate characteristics of part of the application. Must determine the appropriate level of programmability. Programmability often costs in area, power. Must provide programming tools along with the chip. Embedded Computing 43 Transaction-level modeling is not enough The MPSoC must run the complete application. Implementing transactions is necessary but not sufficient. Transactions are relatively short term. SoCs have a lot of state in memory. Need to thoroughly exercise that state over a long period. Embedded Computing 44 Summary Chip designers are now system designers. Must deal with hardware and software. Today’s applications are complex. Reference implementations must be optimized, extended. Platforms present challenges for: Hardware designers---characterization, optimization. Software designers---performance/power evaluation, debugging. Embedded Computing 45 CH1-2 CD-PLAYER Compact disc players Device characteristics. Hardware architectures. Software. 47 CD audio 44.1 kHz sample rate. 16 bit samples. Stereo. Additional data tracks. 48 Compact disc Data stored on bottom of disc: substrate aluminum coating plastic coating 49 CD medium Rotational speed: 1.2-1.4 m/s (CLV). Track pitch: 1.6 microns. Diameter: 120 mm. Pit length: 0.8 -3 microns. Pit depth: .11 microns. Pit width: 0.5 microns. Laser wavelength: 780 nm. 50 CD layout Data stored in spiral, not concentric circle: 51 CD mechanism Laser, lens, sled: CD focus track track diffraction grating laser sled detectors 52 Laser focus Focus controlled by vertical position of lens. Unfocused beam causes irregular spot: Out of focus In focus Out of focus 53 Laser pickup Side spot detectors F A B D E C Level: A+B+C+D Focus error: (A+C)-(B+D) Tracking error: E-F 54 Servo control Four main signals: focus (laser) @ 245 kHz; tracking (laser) @ 245 kHz; sled (motor): @ 800 Hz; Disc motor. Optical pickup 55 EFM Eight-to-fourteen modulation: Fourteen-bit code guarantees a maximum distance between transitions. 00000011 00100100000000 56 Error correction CD capacity: 6.99 GB raw, 700 MB formatted. Reed-Solomon code: g(x) = (x-a) (x- a2) … (x- an-k-1) (x- an-k) Produces data, erasure bits. Time to solve varies greatly depending on noise. CD interleaves Reed-Solomon blocks to reduce effects of large data gaps. 57 CIRC encoding Cross-interleaved Reed-Solomon coding. Interleaves to reduce burst errors. Each 16-bit sample split into two 8-bit symbols. Specs: Max correctable burst: 4000 bits = 2.5 mm Max interpolatable burst: 12,300 bits = 7.7 mm 58 CIRC algorithm Sample split into two symbols. Six samples from each channel (=24 symbols) are chosen. Samples are delayed and scrambled. Parity symbols (Q symbols) are generated. Values are delayed by various amounts. P parity symbols are generated. Even words delayed by one symbol, P and Q words are inverted. Frame = 32 8-bit symbols. 59 Control word 8-bit control word for every 32-symbol block: P: 1 during music/lead-in, 0 at start of selection. Q: track number, time, etc (spread over 98 bits). R, S, T, U, V, W: reserved. 60 Control and error correction Skips caused by physical disturbance. Wait for disturbance to subside. Retry. Read errors caused by disc/servo problems. Detect error. Choose location for retry. Retry. Fail and interpolate. 61 Retry problems Data is stored in a spiral. Can’t seek track as on magnetic disc. Sled servo is very coarse. Data is only weakly addressed. Must read data to know where to go. 62 Audio playback Audio CD needs no audio processing. Tasks: convert to analog; amplify. 63 Digital/analog conversion 1-bit MASH conversion: interpolation noise shaping PWM integrator 64 MP3 Decoding is easier than encoding, but requires: decompression; filtering. Basic CD standard for data discs. No standards for MP3 disc file structure: player must understand Windows, Mac, Unix discs. 65 Jog/skip memory Read samples into RAM, play back from RAM. Modern RAMs are larger than needed for reasonable jog/skip. Jog memory saves some power. 66 CD/MP3 player Audio CPU memory Jog memory display amp DAC I2S Error corrector Servo CPU memory focus, tracking, sled, motor FE, TE, amp Analog out drive Analog in head 67 DVD format Similar to CD, but: shorter wavelength laser; tighter pits; two layers of data. 68 Audio on DVD Alternatives: MP3 on data DVD (stereo). Audio track of video DVD (5.1). DVD audio (5.1). SACD (5.1). 69 CH1-3 UML Introduction Object-oriented design. Unified Modeling Language (UML). © 2000 Morgan Kaufman 71 System modeling Need languages to describe systems: useful across several levels of abstraction; understandable within and between organizations. Block diagrams are a start, but don’t cover everything. © 2000 Morgan Kaufman 72 Object-oriented design Object-oriented (OO) design: A generalization of object-oriented programming. Object = state + methods. State provides each object with its own identity. Methods provide an abstract interface to the object. © 2000 Morgan Kaufman 73 OO implementation in C++ class display { pixels : pixeltype[IMAX,JMAX]; public: display() { } pixeltype pixel(int i, int j) { return pixels[i,j]; } void set_pixel(pixeltype val, int i, int j) { pixels[i,j] = val; } } © 2000 Morgan Kaufman 74 OO implementation in C typedef struct { pixels: pixeltype[IMAX,JMAX]; } display; display d1; pixeltype pixelval(pixel *px, int i, int j) { return px[i,j]; } © 2000 Morgan Kaufman 75 Objects and classes Class: object type. Class defines the object’s state elements but state values may change over time. Class defines the methods used to interact with all objects of that type. Each object has its own state. © 2000 Morgan Kaufman 76 OO design principles Some objects will closely correspond to real-world objects. Some objects may be useful only for description or implementation. Objects provide interfaces to read/write state, hiding the object’s implementation from the rest of the system. © 2000 Morgan Kaufman 77 UML Developed by Booch et al. Goals: object-oriented; visual; useful at many levels of abstraction; usable for all aspects of design. © 2000 Morgan Kaufman 78 UML object object name class name d1: Display pixels is a 2-D array pixels: array[] of pixels elements menu_items comment attributes © 2000 Morgan Kaufman 79 UML class Display class name pixels elements menu_items mouse_click() draw_box © 2000 Morgan Kaufman operations 80 The class interface The operations provide the abstract interface between the class’s implementation and other classes. Operations may have arguments, return values. An operation can examine and/or modify the object’s state. © 2000 Morgan Kaufman 81 Choose your interface properly If the interface is too small/specialized: object is hard to use for even one application; even harder to reuse. If the interface is too large: class becomes too cumbersome for designers to understand; implementation may be too slow; spec and implementation are probably © 2000 Morgan Kaufman 82 buggy. Relationships between objects and classes Association: objects communicate but one does not own the other. Aggregation: a complex object is made of several smaller objects. Composition: aggregation in which owner does not allow access to its components. Generalization: define one class in terms of another. © 2000 Morgan Kaufman 83 Class derivation May want to define one class in terms of another. Derived class inherits attributes, operations of base class. Derived_class UML generalization Base_class © 2000 Morgan Kaufman 84 Class derivation example Display pixels elements menu_items derived class BW_display © 2000 Morgan Kaufman base class pixel() set_pixel() mouse_click() draw_box Color_map_display 85 Multiple inheritance base classes Speaker Display Multimedia_display derived class © 2000 Morgan Kaufman 86 Links and associations Link: describes relationships between objects. Association: describes relationship between classes. © 2000 Morgan Kaufman 87 Link example Link defines the contains relationship: message msg = msg1 length = 1102 message set count = 2 message msg = msg2 length = 2114 © 2000 Morgan Kaufman 88 Association example # containing message sets # contained messages message msg: ADPCM_stream length : integer © 2000 Morgan Kaufman 0..* contains 1 message set count : integer 89 Stereotypes Stereotype: recurring combination of elements in an object or class. Example: <<foo>> © 2000 Morgan Kaufman 90 Behavioral description Several ways to describe behavior: internal view; external view. © 2000 Morgan Kaufman 91 State machines transition a state © 2000 Morgan Kaufman b state name 92 Event-driven state machines Behavioral descriptions are written as event-driven state machines. Machine changes state when receiving an input. An event may come from inside or outside of the system. © 2000 Morgan Kaufman 93 Types of events Signal: asynchronous event. Call: synchronized communication. Timer: activated by time. © 2000 Morgan Kaufman 94 Signal event <<signal>> mouse_click leftorright: button x, y: position declaration a mouse_click(x,y,button) b event description © 2000 Morgan Kaufman 95 Call event draw_box(10,5,3,2,blue) c © 2000 Morgan Kaufman d 96 Timer event tm(time-value) e © 2000 Morgan Kaufman f 97 Example state machine start input/output mouse_click(x,y,button)/ find_region(region) region found region = drawing/ find_object(objid) got menu item call_menu(I) called menu item highlight(objid) found object © 2000 Morgan Kaufman region = menu/ which_menu(i) object highlighted finish 98 Sequence diagram Shows sequence of operations over time. Relates behaviors of multiple objects. © 2000 Morgan Kaufman 99 Sequence diagram example m: Mouse d1: Display u: Menu mouse_click(x,y,button) which_menu(x,y,i) time call_menu(i) © 2000 Morgan Kaufman 100 Summary Object-oriented design helps us organize a design. UML is a transportable system design language. Provides structural and behavioral description primitives. © 2000 Morgan Kaufman 101 CH1-4 Models of Computation Topics Why models of computation? Structural models. Finite-state machines. Turing machines. Petri nets. Control flow graphs. Data flow models. Task graphs. Control flow models. 103 Models of computation Models of computation affect programming style. No one model of computation is good for all algorithms. Large systems may require different models of computation for different parts. Models must communicate compatibly. 104 Processor graph L1 M1 M2 L2 L3 M3 M4 105 Finite state machine State transition graph and table are equivalent: 0 s1 s2 0 1 s1 s1 0 0 s2 s2 1 0/0 1/0 s1 s2 1/1 1/0 s3 0/0 0/1 1 s2 s3 0 0 s3 s3 0 1 s3 s1 1 106 Finite state machine properties Finite state. Nondeterministic variant. 107 Nondeterministic FSM Several transitions out of a state for a given input. Equivalent to executing all alternatives in parallel. a s1 s2 a Can allow e moves--goes to next state without input. 108 Deterministic FSM from a nondeterministic FSM Add states for the various combinations of nondeterminism. a s1 s2 c b s4 s3 nondeterministic a s1 c s4 s12 c b s3 deterministic 109 Turing machine General model of computing: 1 0 1 0 0 1 1 state 0 1 1 tape 0 1 0 1 head program 110 Turing machine step 1. 2. 3. Read current square. Erase current square. Take state-dependent action: Print new value. 2. Move to adjacent cell. 3. Set machine to next state. 1. 111 Turing machine properties Example program: If (state = 2 and cell = 0): print 0, move left, state = 4. If (state = 2 and cell = 1): print 1, move left, state = 3. Can be implemented on many physical devices. Turing machine is a general model of computability. Can be extended to probabilistic behavior. 112 Turing machine properties Infinite tape = infinite state machine. Basic model of computability. Lambda calculus is alternative model. Other models of computing can be shown to be equivalent/proper subset of Turing machine. 113 Control flow graph Commonly used to model program structure. x=a i = 0? x=a-b y=c+d 114 CDFG properties Finite state model. Single thread of control. Can handle subroutines. 115 Petri net Parallel model of computation. place token transition arc 116 Firing rule A transition is enabled if each place at its inputs have at least one token. A transition doesn’t have to fire right away. Firing a transition removes tokens from inputs and adds a token to each output place. In general, may require multiple tokens to enable. 117 Properties of Petri nets Turing complete. Arbitrary number of tokens. Nondeterministic behavior. Naturally model parallelism. 118 Task graph t2 t1 P1 P2 P3 P4 P5 Used to model multi-rate systems. 119 Task graph properties Not a Turning machine. No branching behavior. May be extended to provide conditionals. Possible models of execution time: Constant. Min-max bounds. Statistical. Can model late arrivals, early departures by adding dummy processes. 120 Data flow graph Partially-ordered computations: + -, * +, -, * + - -, +, * * 121 Data flow streams Captures sequence but not time. Totally-ordered set of values. New values are appended at the end as they appear. May be infinite. + 88 -23 -44 88 7 -23 447944 -28 9 122 Firing rules A node may have one or more firing rules. Firing rules determine when tokens are consumed and produced. Firing consumes a set of tokens at inputs, generates token at output. 123 Example firing rules Basic rule fires when tokens are available at all inputs: Conditional firing rule depends on control input: a a + c b b T 124 Data flow graph properties Finite state model. Basic data flow graph is acyclic. Scheduling provides a total ordering of operations. 125 Synchronous data flow Lee/Messerschmitt: Relate data flow graph properties to schedulability. Synchronous communication between data flow nodes. Nodes may communicate at multiple rates. 126 SDF notation Nodes may have rates at which data are produced nor consumed. Edges may have delays. 2 1 + 5 127 SDF example This graph has consistent sample rates: separate outputs 1 2 + + 1 1 1 + 2 128 Delays in SDF graphs Delays do not change rates, only the amount of data stored in the system. Changes system start-up. 2 1 + 50 129 Kahn process network Process has unbounded FIFO at each input: channel process Each channel carries a possibly infinite sequence or stream. A process maps one or more input sequences to one or more output sequences. 130 Properties of processes Processes are usually required to be continuous: least upper boundedness can be moved across function boundary. Monotonicity: X in X’ => F(X) in F(X’) 131 Networks of processes A network of processes relates the streams of several processes. If I = input sequences, X = internal sequences + outputs, then network behavior fixed point is X = F(X,I) 132 Network properties A network of monotonic processes is a monotonic process. Even in the presence of feedback loops. Can add nondeterminism in several ways: allow process to test for emptiness; allow process to be internally nondeterminate; allow more than one process to consume data from a channel; etc. 133 Statecharts Ancestor of UML state diagrams. Provided composite states: OR states; AND states. Composite states reduce the size of the state transition graph. 134 Statechart OR state i1 S1 S1 i2 i1 S2 i2 s123 i1 i1 S4 S2 i2 S4 i2 S3 traditional S3 OR state 135 Statechart AND state sab c S1-3 S1 S1-4 S3 d a b b a b a c d c S2-3 S2 S2-4 d r traditional S4 r r S5 AND state S5 136 TCAS II specification TCAS II: aircraft collision avoidance system. Monitors aircraft and air traffic info. Provides audio warnings and directives to avoid collisions. Leveson et al used RMSL language to capture the TCAS specification. 137 RMSL State description: state1 Transition bus for transitions between many states: inputs a state description b c outputs d 138 TCAS top-level description CAS power-on Inputs: power-off TCAS-operational-status {operational,not-operational} fully-operational own-aircraft other-aircraft i:[1..30] C standby mode-s-ground-station i:[1..15] 139 Own-Aircraft AND state CAS Inputs: own-alt-radio: integer standby-discrete-input: {true,false} own-alt-barometric:integer, etc. Effective-SL Alt-SL 1 1 2 2 ... ... 7 7 Alt-layer Climb-inibit ... Descend-inibit ... ... Increase-Descend-inibit ... Increase-climb-inibit ... Advisory-Status ... Outputs: sound-aural-alarm: {true,false} aural-alarm-inhibit: {true, false} combined-control-out: enumerated, etc. 140