Introduction What are embedded computing systems? Challenges in embedded computing system design. Design methodologies. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed. Definition Embedded computing system: any device that includes a programmable computer but is not itself a general-purpose computer. Take advantage of application characteristics to optimize the design: don’t need all the general-purpose bells and whistles. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed. Embedding a computer CPU embedded computer © 2008 Wayne Wolf output analog input analog mem Overheads for Computers as Components Examples Cell phone. Printer. Automobile: engine, brakes, dash, etc. Airplane: engine, flight controls, nav/comm. Digital television. Household appliances. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed. Early history Late 1940’s: MIT Whirlwind computer was designed for real-time operations. Originally designed to control an aircraft simulator. First microprocessor was Intel 4004 in early 1970’s. HP-35 calculator used several chips to implement a microprocessor in 1972. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed. Early history, cont’d. Automobiles used microprocessor-based engine controllers starting in 1970’s. Control fuel/air mixture, engine timing, etc. Multiple modes of operation: warm-up, cruise, hill climbing, etc. Provides lower emissions, better fuel efficiency. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed. Microprocessor varieties Microcontroller: includes I/O devices, onboard memory. Digital signal processor (DSP): microprocessor optimized for digital signal processing. Typical embedded word sizes: 8-bit, 16bit, 32-bit. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed. Application examples Simple control: front panel of microwave oven, etc. Canon EOS 3 has three microprocessors. 32-bit RISC CPU runs autofocus and eye control systems. Digital TV: programmable CPUs + hardwired logic for video/audio decode, menus, etc. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed. Automotive embedded systems Today’s high-end automobile may have 100 microprocessors: 4-bit microcontroller checks seat belt; microcontrollers run dashboard devices; 16/32-bit microprocessor controls engine. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed. BMW 850i brake and stability control system Anti-lock brake system (ABS): pumps brakes to reduce skidding. Automatic stability control (ASC+T): controls engine to improve stability. ABS and ASC+T communicate. ABS was introduced first---needed to interface to existing ABS module. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed. BMW 850i, cont’d. sensor sensor brake brake ABS hydraulic pump brake brake sensor sensor © 2008 Wayne Wolf Overheads for Computers as Components Characteristics of embedded systems Sophisticated functionality. Real-time operation. Low manufacturing cost. Low power. Designed to tight deadlines by small teams. © 2008 Wayne Wolf Overheads for Computers as Components Functional complexity Often have to run sophisticated algorithms or multiple algorithms. Cell phone, laser printer. Often provide sophisticated user interfaces. © 2008 Wayne Wolf Overheads for Computers as Components Real-time operation Must finish operations by deadlines. Hard real time: missing deadline causes failure. Soft real time: missing deadline results in degraded performance. Many systems are multi-rate: must handle operations at widely varying rates. © 2008 Wayne Wolf Overheads for Computers as Components Non-functional requirements Many embedded systems are massmarket items that must have low manufacturing costs. Limited memory, microprocessor power, etc. Power consumption is critical in batterypowered devices. Excessive power consumption increases system cost even in wall-powered devices. © 2008 Wayne Wolf Overheads for Computers as Components Design teams Often designed by a small team of designers. Often must meet tight deadlines. 6 month market window is common. Can’t miss back-to-school window for calculator. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed. Why use microprocessors? Alternatives: field-programmable gate arrays (FPGAs), custom logic, etc. Microprocessors are often very efficient: can use same logic to perform many different functions. Microprocessors simplify the design of families of products. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed. The performance paradox Microprocessors use much more logic to implement a function than does custom logic. But microprocessors are often at least as fast: heavily pipelined; large design teams; aggressive VLSI technology. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed. Power Custom logic uses less power, but CPUs have advantages: Modern microprocessors offer features to help control power consumption. Software design techniques can help reduce power consumption. Heterogeneous systems: some custom logic for well-defined functions, CPUs+software for everything else. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed. Platforms Embedded computing platform: hardware architecture + associated software. Many platforms are multiprocessors. Examples: Single-chip multiprocessors for cell phone baseband. Automotive network + processors. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed. The physics of software Computing is a physical act. Software doesn’t do anything without hardware. Executing software consumes energy, requires time. To understand the dynamics of software (time, energy), we need to characterize the platform on which the software runs. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed. What does “performance” mean? In general-purpose computing, performance often means average-case, may not be well-defined. In real-time systems, performance means meeting deadlines. Missing the deadline by even a little is bad. Finishing ahead of the deadline may not help. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed. Characterizing performance We need to analyze the system at several levels of abstraction to understand performance: CPU. Platform. Program. Task. Multiprocessor. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed. Challenges in embedded system design How much hardware do we need? How big is the CPU? Memory? How do we meet our deadlines? Faster hardware or cleverer software? How do we minimize power? Turn off unnecessary logic? Reduce memory accesses? © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed. Challenges, etc. Does it really work? Is the specification correct? Does the implementation meet the spec? How do we test for real-time characteristics? How do we test on real data? How do we work on the system? Observability, controllability? What is our development platform? © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed. Design methodologies A procedure for designing a system. Understanding your methodology helps you ensure you didn’t skip anything. Compilers, software engineering tools, computer-aided design (CAD) tools, etc., can be used to: help automate methodology steps; keep track of the methodology itself. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed. Design goals Performance. Overall speed, deadlines. Functionality and user interface. Manufacturing cost. Power consumption. Other requirements (physical size, etc.) © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed. Levels of abstraction requirements specification architecture component design system integration © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed. Top-down vs. bottom-up Top-down design: start from most abstract description; work to most detailed. Bottom-up design: work from small components to big system. Real design uses both techniques. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed. Stepwise refinement At each level of abstraction, we must: analyze the design to determine characteristics of the current state of the design; refine the design to add detail. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed. Requirements Plain language description of what the user wants and expects to get. May be developed in several ways: talking directly to customers; talking to marketing representatives; providing prototypes to users for comment. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed. Functional vs. nonfunctional requirements Functional requirements: output as a function of input. Non-functional requirements: time required to compute output; size, weight, etc.; power consumption; reliability; etc. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed. Our requirements form name purpose inputs outputs functions performance manufacturing cost power physical size/weight © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed. Example: GPS moving map requirements Moving map obtains position from GPS, paints map from local database. Scotch Road I-78 lat: 40 13 lon: 32 19 © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed. GPS moving map needs Functionality: For automotive use. Show major roads and landmarks. User interface: At least 400 x 600 pixel screen. Three buttons max. Pop-up menu. Performance: Map should scroll smoothly. No more than 1 sec power-up. Lock onto GPS within 15 seconds. Cost: $120 street price = approx. $30 cost of goods sold. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed. GPS moving map needs, cont’d. Physical size/weight: Should fit in hand. Power consumption: Should run for 8 hours on four AA batteries. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed. GPS moving map requirements form name purpose inputs outputs functions performance manufacturing cost power physical size/weight © 2008 Wayne Wolf GPS moving map consumer-grade moving map for driving power button, two control buttons back-lit LCD 400 X 600 5-receiver GPS; three resolutions; displays current lat/lon updates screen within 0.25 sec of movement $100 cost-of-goodssold 100 mW no more than 2: X 6:, 12 oz. Overheads for Computers as Components, 2nd ed. Specification A more precise description of the system: should not imply a particular architecture; provides input to the architecture design process. May include functional and non-functional elements. May be executable or may be in mathematical form for proofs. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed. GPS specification Should include: What is received from GPS; map data; user interface; operations required to satisfy user requests; background operations needed to keep the system running. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed. Architecture design What major components go satisfying the specification? Hardware components: CPUs, peripherals, etc. Software components: major programs and their operations. Must take into account functional and non-functional specifications. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed. GPS moving map block diagram GPS receiver search engine database © 2008 Wayne Wolf renderer user interface Overheads for Computers as Components, 2nd ed. display GPS moving map hardware architecture display frame buffer CPU GPS receiver memory © 2008 Wayne Wolf panel I/O Overheads for Computers as Components, 2nd ed. GPS moving map software architecture position © 2008 Wayne Wolf database search renderer user interface timer Overheads for Computers as Components, 2nd ed. pixels Designing hardware and software components Must spend time architecting the system before you start coding. Some components are ready-made, some can be modified from existing designs, others must be designed from scratch. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed. System integration Put together the components. Many bugs appear only at this stage. Have a plan for integrating components to uncover bugs quickly, test as much functionality as early as possible. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed. Summary Embedded computers are all around us. Many systems have complex embedded hardware and software. Embedded systems pose many design challenges: design time, deadlines, power, etc. Design methodologies help us manage the design process. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed. Introduction Object-oriented design. Unified Modeling Language (UML). © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed. System modeling Need languages to describe systems: useful across several levels of abstraction; understandable within and between organizations. Block diagrams are a start, but don’t cover everything. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed. Object-oriented design Object-oriented (OO) design: A generalization of object-oriented programming. Object = state + methods. State provides each object with its own identity. Methods provide an abstract interface to the object. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed. Objects and classes Class: object type. Class defines the object’s state elements but state values may change over time. Class defines the methods used to interact with all objects of that type. Each object has its own state. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed. OO design principles Some objects will closely correspond to real-world objects. Some objects may be useful only for description or implementation. Objects provide interfaces to read/write state, hiding the object’s implementation from the rest of the system. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed. UML Developed by Booch et al. Goals: object-oriented; visual; useful at many levels of abstraction; usable for all aspects of design. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed. UML object object name class name d1: Display pixels is a 2-D array pixels: array[] of pixels elements menu_items comment attributes © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed. UML class Display class name pixels elements menu_items mouse_click() draw_box © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed. operations The class interface The operations provide the abstract interface between the class’s implementation and other classes. Operations may have arguments, return values. An operation can examine and/or modify the object’s state. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed. Choose your interface properly If the interface is too small/specialized: object is hard to use for even one application; even harder to reuse. If the interface is too large: class becomes too cumbersome for designers to understand; implementation may be too slow; spec and implementation are probably buggy. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed. Relationships between objects and classes Association: objects communicate but one does not own the other. Aggregation: a complex object is made of several smaller objects. Composition: aggregation in which owner does not allow access to its components. Generalization: define one class in terms of another. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed. Class derivation May want to define one class in terms of another. Derived class inherits attributes, operations of base class. Derived_class UML generalization Base_class © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed. Class derivation example Display base class pixels elements menu_items derived class BW_display © 2008 Wayne Wolf pixel() set_pixel() mouse_click() draw_box Color_map_display Overheads for Computers as Components, 2nd ed. Multiple inheritance base classes Speaker Display Multimedia_display derived class © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed. Links and associations Link: describes relationships between objects. Association: describes relationship between classes. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed. Link example Link defines the contains relationship: message msg = msg1 length = 1102 message set count = 2 message msg = msg2 length = 2114 © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed. Association example # containing message sets # contained messages message 0..* msg: ADPCM_stream length : integer © 2008 Wayne Wolf 1 contains Overheads for Computers as Components, 2nd ed. message set count : integer Stereotypes Stereotype: recurring combination of elements in an object or class. Example: <<foo>> © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed. Behavioral description Several ways to describe behavior: internal view; external view. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed. State machines transition a state © 2008 Wayne Wolf b state name Overheads for Computers as Components, 2nd ed. Event-driven state machines Behavioral descriptions are written as event-driven state machines. Machine changes state when receiving an input. An event may come from inside or outside of the system. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed. Types of events Signal: asynchronous event. Call: synchronized communication. Timer: activated by time. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed. Signal event <<signal>> mouse_click leftorright: button x, y: position a mouse_click(x,y,button) b declaration event description © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed. Call event draw_box(10,5,3,2,blue) c © 2008 Wayne Wolf d Overheads for Computers as Components, 2nd ed. Timer event tm(time-value) e © 2008 Wayne Wolf f Overheads for Computers as Components, 2nd ed. Example state machine start input/output mouse_click(x,y,button)/ find_region(region) region found region = drawing/ find_object(objid) got menu item call_menu(I) called menu item highlight(objid) found object © 2008 Wayne Wolf region = menu/ which_menu(i) object highlighted Overheads for Computers as Components, 2nd ed. finish Sequence diagram Shows sequence of operations over time. Relates behaviors of multiple objects. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed. Sequence diagram example m: Mouse d1: Display mouse_click(x,y,button) which_menu(x,y,i) time call_menu(i) © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed. u: Menu Summary Object-oriented design helps us organize a design. UML is a transportable system design language. Provides structural and behavioral description primitives. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed. Introduction Example: model train controller. © 2000 Morgan Kaufman Overheads for Computers as Components 2nd ed. Purposes of example Follow a design through several levels of abstraction. Gain experience with UML. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Model train setup rcvr motor power supply console ECC © 2008 Wayne Wolf command address Overheads for Computers as Components 2nd ed. header Requirements Console can control 8 trains on 1 track. Throttle has at least 63 levels. Inertia control adjusts responsiveness with at least 8 levels. Emergency stop button. Error detection scheme on messages. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Requirements form name purpose inputs model train controller control speed of <= 8 model trains throttle, inertia, emergency stop, train # outputs train control signals functions set engine speed w. inertia; emergency stop performance can update train speed at least 10 times/sec manufacturing cost $50 power wall powered physical console comfortable for 2 hands; < 2 size/weight lbs. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Digital Command Control DCC created by model railroad hobbyists, picked up by industry. Defines way in which model trains, controllers communicate. Leaves many system design aspects open, allowing competition. This is a simple example of a big trend: Cell phones, digital TV rely on standards. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. DCC documents Standard S-9.1, DCC Electrical Standard. Defines how bits are encoded on the rails. Standard S-9.2, DCC Communication Standard. Defines packet format and semantics. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed. DCC electrical standard Voltage moves around the power supply voltage; adds no DC component. 1 is 58 ms, 0 is at least 100 ms. logic 1 time 58 ms © 2008 Wayne Wolf logic 0 Overheads for Computers as Components 2nd ed. >= 100 ms DCC communication standard Basic packet format: PSA(sD)+E. P: preamble = 1111111111. S: packet start bit = 0. A: address data byte. s: data byte start bit. D: data byte (data payload). E: packet end bit = 1. © 2000 Morgan Kaufman Overheads for Computers as Components DCC packet types Baseline packet: minimum packet that must be accepted by all DCC implementations. Address data byte gives receiver address. Instruction data byte gives basic instruction. Error correction data byte gives ECC. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed. Conceptual specification Before we create a detailed specification, we will make an initial, simplified specification. Gives us practice in specification and UML. Good idea in general to identify potential problems before investing too much effort in detail. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Basic system commands command name parameters set-speed speed (positive/negative) inertia-value (nonnegative) none set-inertia estop © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Typical control sequence :console set-inertia set-speed set-speed estop set-speed © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. :train_rcvr Message classes command set-speed set-inertia value: integer value: unsignedinteger © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. estop Roles of message classes Implemented message classes derived from message class. Attributes and operations will be filled in for detailed specification. Implemented message classes specify message type by their class. May have to add type as parameter to data structure in implementation. © 2008 Wayne Wolf Overheads for Computers as Components Subsystem collaboration diagram Shows relationship between console and receiver (ignores role of track): 1..n: command :console © 2008 Wayne Wolf :receiver Overheads for Computers as Components 2nd ed. System structure modeling Some classes define non-computer components. Denote by *name. Choose important systems at this point to show basic relationships. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Major subsystem roles Console: read state of front panel; format messages; transmit messages. Train: receive message; interpret message; control the train. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Console system classes 1 1 panel console 1 1 formatter 1 1 receiver* © 2008 Wayne Wolf 1 1 transmitter 1 1 sender* Overheads for Computers as Components 2nd ed. Console class roles panel: describes analog knobs and interface hardware. formatter: turns knob settings into bit streams. transmitter: sends data on track. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Train system classes train set 1 receiver 1 1 detector* © 2008 Wayne Wolf 1 1 1..t train 1 1 1 controller Overheads for Computers as Components 2nd ed. 1 motor interface 1 1 pulser* Train class roles receiver: digitizes signal from track. controller: interprets received commands and makes control decisions. motor interface: generates signals required by motor. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Detailed specification We can now fill in the details of the conceptual specification: more classes; behaviors. Sketching out the spec first helps us understand the basic relationships in the system. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Train speed control Motor controlled by pulse width modulation: duty cycle © 2008 Wayne Wolf + V - Overheads for Computers as Components 2nd ed. Console physical object classes knobs* train-knob: integer speed-knob: integer inertia-knob: unsignedinteger emergency-stop: boolean © 2008 Wayne Wolf pulser* pulse-width: unsignedinteger direction: boolean sender* detector* send-bit() read-bit() : integer Overheads for Computers as Components 2nd ed. Panel and motor interface classes panel motor-interface train-number() : integer speed() : integer inertia() : integer estop() : boolean new-settings() © 2008 Wayne Wolf speed: integer Overheads for Computers as Components 2nd ed. Class descriptions panel class defines the controls. new-settings() behavior reads the controls. motor-interface class defines the motor speed held as state. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Transmitter and receiver classes transmitter receiver send-speed(adrs: integer, speed: integer) send-inertia(adrs: integer, val: integer) set-estop(adrs: integer) © 2008 Wayne Wolf current: command new: boolean read-cmd() new-cmd() : boolean rcv-type(msg-type: command) rcv-speed(val: integer) rcv-inertia(val:integer) Overheads for Computers as Components 2nd ed. Class descriptions transmitter class has one behavior for each type of message sent. receiver function provides methods to: detect a new message; determine its type; read its parameters (estop has no parameters). © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Formatter class formatter current-train: integer current-speed[ntrains]: integer current-inertia[ntrains]: unsigned-integer current-estop[ntrains]: boolean send-command() panel-active() : boolean operate() © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Formatter class description Formatter class holds state for each train, setting for current train. The operate() operation performs the basic formatting task. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Control input cases Use a soft panel to show current panel settings for each train. Changing train number: must change soft panel settings to reflect current train’s speed, etc. Controlling throttle/inertia/estop: read panel, check for changes, perform command. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. change in change in speed/ train number inertia/estop Control input sequence diagram :knobs :panel :formatter :transmitter change in read panel panel-active control panel settings settings send-command read panel send-speed, send-inertia. panel settings send-estop read panel change in panel settings train number new-settings set-knobs © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Formatter operate behavior update-panel() panel-active() new train number idle other © 2008 Wayne Wolf send-command() Overheads for Computers as Components 2nd ed. Panel-active behavior T panel*:read-train() F T panel*:read-speed() current-train = train-knob update-screen changed = true current-speed = throttle changed = true F ... © 2008 Wayne Wolf ... Overheads for Computers as Components 2nd ed. Controller class controller current-train: integer current-speed[ntrains]: integer current-direction[ntrains]: boolean current-inertia[ntrains]: unsigned-integer operate() issue-command() © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Setting the speed Don’t want to change speed instantaneously. Controller should change speed gradually by sending several commands. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Sequence diagram for setspeed command :receiver :controller new-cmd cmd-type rcv-speed :motor-interface set-speed set-pulse set-pulse set-pulse set-pulse set-pulse © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. :pulser* Controller operate behavior wait for a command from receiver receive-command() issue-command() © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Refined command classes command type: 3-bits address: 3-bits parity: 1-bit set-speed set-inertia estop type=010 value: 7-bits type=001 value: 3-bits type=000 © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Summary Separate specification and programming. Small mistakes are easier to fix in the spec. Big mistakes in programming cost a lot of time. You can’t completely separate specification and architecture. Make a few tasteful assumptions. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Instruction sets Computer architecture taxonomy. Assembly language. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. von Neumann architecture Memory holds data, instructions. Central processing unit (CPU) fetches instructions from memory. Separate CPU and memory distinguishes programmable computer. CPU registers help out: program counter (PC), instruction register (IR), generalpurpose registers, etc. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. CPU + memory address memory 200 PC data CPU 200 ADD r5,r1,r3 © 2008 Wayne Wolf ADD IR r5,r1,r3 Overheads for Computers as Components 2nd ed. Harvard architecture address data memory data address program memory © 2008 Wayne Wolf data Overheads for Computers as Components 2nd ed. PC CPU von Neumann vs. Harvard Harvard can’t use self-modifying code. Harvard allows two simultaneous memory fetches. Most DSPs use Harvard architecture for streaming data: greater memory bandwidth; more predictable bandwidth. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. RISC vs. CISC Complex instruction set computer (CISC): many addressing modes; many operations. Reduced instruction set computer (RISC): load/store; pipelinable instructions. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Instruction set characteristics Fixed vs. variable length. Addressing modes. Number of operands. Types of operands. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Programming model Programming model: registers visible to the programmer. Some registers are not visible (IR). © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Multiple implementations Successful architectures have several implementations: varying clock speeds; different bus widths; different cache sizes; etc. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Assembly language One-to-one with instructions (more or less). Basic features: One instruction per line. Labels provide names for addresses (usually in first column). Instructions often start in later columns. Columns run to end of line. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. ARM assembly language example label1 ADR LDR ADR LDR SUB © 2008 Wayne Wolf r4,c r0,[r4] ; a comment r4,d r1,[r4] r0,r0,r1 ; comment Overheads for Computers as Components 2nd ed. Pseudo-ops Some assembler directives don’t correspond directly to instructions: Define current address. Reserve storage. Constants. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. ARM instruction set ARM ARM ARM ARM ARM ARM versions. assembly language. programming model. memory organization. data operations. flow of control. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. ARM versions ARM architecture has been extended over several versions. We will concentrate on ARM7. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. ARM assembly language Fairly standard assembly language: label © 2008 Wayne Wolf LDR r0,[r8] ; a comment ADD r4,r0,r1 Overheads for Computers as Components 2nd ed. ARM programming model r0 r1 r2 r3 r4 r5 r6 r7 © 2008 Wayne Wolf r8 r9 r10 r11 r12 r13 r14 r15 (PC) Overheads for Computers as Components 2nd ed. 0 31 CPSR NZCV Endianness Relationship between bit and byte/word ordering defines endianness: bit 31 bit 0 byte 3 byte 2 byte 1 byte 0 bit 0 byte 0 byte 1 byte 2 byte 3 little-endian © 2008 Wayne Wolf bit 31 big-endian Overheads for Computers as Components 2nd ed. ARM data types Word is 32 bits long. Word can be divided into four 8-bit bytes. ARM addresses cam be 32 bits long. Address refers to byte. Address 4 starts at byte 4. Can be configured at power-up as either little- or bit-endian mode. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. ARM status bits Every arithmetic, logical, or shifting operation sets CPSR bits: N (negative), Z (zero), C (carry), V (overflow). Examples: -1 + 1 = 0: NZCV = 0110. 231-1+1 = -231: NZCV = 0101. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. ARM data instructions Basic format: ADD r0,r1,r2 Computes r1+r2, stores in r0. Immediate operand: ADD r0,r1,#2 Computes r1+2, stores in r0. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. ARM data instructions ADD, ADC : add (w. carry) SUB, SBC : subtract (w. carry) RSB, RSC : reverse subtract (w. carry) MUL, MLA : multiply (and accumulate) © 2008 Wayne Wolf AND, ORR, EOR BIC : bit clear LSL, LSR : logical shift left/right ASL, ASR : arithmetic shift left/right ROR : rotate right RRX : rotate right extended with C Overheads for Computers as Components 2nd ed. Data operation varieties Logical shift: fills with zeroes. Arithmetic shift: fills with ones. RRX performs 33-bit rotate, including C bit from CPSR above sign bit. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. ARM comparison instructions CMP : compare CMN : negated compare TST : bit-wise test TEQ : bit-wise negated test These instructions set only the NZCV bits of CPSR. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. ARM move instructions MOV, MVN : move (negated) MOV r0, r1 ; sets r0 to r1 © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. ARM load/store instructions LDR, LDRH, LDRB : load (half-word, byte) STR, STRH, STRB : store (half-word, byte) Addressing modes: register indirect : LDR r0,[r1] with second register : LDR r0,[r1,-r2] with constant : LDR r0,[r1,#4] © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. ARM ADR pseudo-op Cannot refer to an address directly in an instruction. Generate value by performing arithmetic on PC. ADR pseudo-op generates instruction required to calculate address: ADR r1,FOO © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Example: C assignments C: x = (a + b) - c; Assembler: ADR LDR ADR LDR ADD ADR LDR v r4,a r0,[r4] r4,b r1,[r4] r3,r0,r1 r4,c r2[r4] ; ; ; ; ; ; ; get address for a get value of a get address for b, reusing r4 get value of b compute a+b get address for c get value of c Overheads for Computers as Components 2nd ed. C assignment, cont’d. SUB r3,r3,r2 ADR r4,x STR r3[r4] © 2008 Wayne Wolf ; complete computation of x ; get address for x ; store value of x Overheads for Computers as Components 2nd ed. Example: C assignment C: y = a*(b+c); Assembler: ADR LDR ADR LDR ADD ADR LDR r4,b ; get address for b r0,[r4] ; get value of b r4,c ; get address for c r1,[r4] ; get value of c r2,r0,r1 ; compute partial result r4,a ; get address for a r0,[r4] ; get value of a © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. C assignment, cont’d. MUL r2,r2,r0 ; compute final value for y ADR r4,y ; get address for y STR r2,[r4] ; store y © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Example: C assignment C: z = (a << 2) | (b & 15); Assembler: ADR LDR MOV ADR LDR AND ORR r4,a ; get address for a r0,[r4] ; get value of a r0,r0,LSL 2 ; perform shift r4,b ; get address for b r1,[r4] ; get value of b r1,r1,#15 ; perform AND r1,r0,r1 ; perform OR © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. C assignment, cont’d. ADR r4,z ; get address for z STR r1,[r4] ; store value for z © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Additional addressing modes Base-plus-offset addressing: LDR r0,[r1,#16] Loads from location r1+16 Auto-indexing increments base register: LDR r0,[r1,#16]! Post-indexing fetches, then does offset: LDR r0,[r1],#16 Loads r0 from r1, then adds 16 to r1. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. ARM flow of control All operations can be performed conditionally, testing CPSR: EQ, NE, CS, CC, MI, PL, VS, VC, HI, LS, GE, LT, GT, LE Branch operation: B #100 Can be performed conditionally. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Example: if statement C: if (a > b) { x = 5; y = c + d; } else x = c - d; Assembler: ; compute and test condition ADR r4,a ; get address for a LDR r0,[r4] ; get value of a ADR r4,b ; get address for b LDR r1,[r4] ; get value for b CMP r0,r1 ; compare a < b BGE fblock ; if a >= b, branch to false block © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. If statement, cont’d. ; true block MOV r0,#5 ; generate value for x ADR r4,x ; get address for x STR r0,[r4] ; store x ADR r4,c ; get address for c LDR r0,[r4] ; get value of c ADR r4,d ; get address for d LDR r1,[r4] ; get value of d ADD r0,r0,r1 ; compute y ADR r4,y ; get address for y STR r0,[r4] ; store y B after ; branch around false block © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. If statement, cont’d. ; false block fblock ADR r4,c ; get address for c LDR r0,[r4] ; get value of c ADR r4,d ; get address for d LDR r1,[r4] ; get value for d SUB r0,r0,r1 ; compute a-b ADR r4,x ; get address for x STR r0,[r4] ; store value of x after ... © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Example: Conditional instruction implementation ; true block MOVLT r0,#5 ; generate value ADRLT r4,x ; get address for STRLT r0,[r4] ; store x ADRLT r4,c ; get address for LDRLT r0,[r4] ; get value of ADRLT r4,d ; get address for LDRLT r1,[r4] ; get value of ADDLT r0,r0,r1 ; compute y ADRLT r4,y ; get address for STRLT r0,[r4] ; store y © 2008 Wayne Wolf for x x c c d d y Overheads for Computers as Components 2nd ed. Example: switch statement C: switch (test) { case 0: … break; case 1: … } Assembler: ADR r2,test ; get address for test LDR r0,[r2] ; load value for test ADR r1,switchtab ; load address for switch table LDR r1,[r1,r0,LSL #2] ; index switch table switchtab DCD case0 DCD case1 ... © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Example: FIR filter C: for (i=0, f=0; i<N; i++) f = f + c[i]*x[i]; Assembler ; loop initiation code MOV r0,#0 ; use r0 for I MOV r8,#0 ; use separate index for arrays ADR r2,N ; get address for N LDR r1,[r2] ; get value of N MOV r2,#0 ; use r2 for f © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. FIR filter, cont’.d ADR r3,c ; load r3 with base of c ADR r5,x ; load r5 with base of x ; loop body loop LDR r4,[r3,r8] ; get c[i] LDR r6,[r5,r8] ; get x[i] MUL r4,r4,r6 ; compute c[i]*x[i] ADD r2,r2,r4 ; add into running sum ADD r8,r8,#4 ; add one word offset to array index ADD r0,r0,#1 ; add 1 to i CMP r0,r1 ; exit? BLT loop ; if i < N, continue © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. ARM subroutine linkage Branch and link instruction: BL foo Copies current PC to r14. To return from subroutine: MOV r15,r14 © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Nested subroutine calls Nesting/recursion requires coding convention: f1 LDR r0,[r13] ; load arg into r0 from stack ; call f2() STR r13!,[r14] ; store f1’s return adrs STR r13!,[r0] ; store arg to f2 on stack BL f2 ; branch and link to f2 ; return from f1() SUB r13,#4 ; pop f2’s arg off stack LDR r13!,r15 ; restore register and return © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Summary Load/store architecture Most instructions are RISCy, operate in single cycle. Some multi-register operations take longer. All instructions can be executed conditionally. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. TI C55x instruction set C55x C55x C55x C55x C55x programming model. assembly language. memory organization. data operations. flow of control. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. TI C55x overview Accumulator architecture: acc = operand op acc. Very useful in loops for DSP. C55x assembly language: Label: MPY *AR0, *CDP+, AC0 MOV #1, T0 C55x algebraic assembly language: AC1 = AR0 * coef(*CDP) © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Intrinsic functions Compiler support for assembly language. Intrinsic function maps directly onto an instruction. Example: int_sadd(arg1,arg2) Performs saturation arithmetic addition. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. C55x data types Data types: Word: 16 bits. Longword: 32 bits. Instructions are byte-addressable. Some instructions operate on register bits. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. C55x registers Terminology: Register: any type of register. Accumulator: acc = operand op ac. Most registers are memory-mapped. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. C55x program counter and control flow registers PC is program counter. XPC is program counter extension. RETA is subroutine return address. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. C55x accumulators and status registers Four 40-bit accumulators: AC0, AC1, AC2, and AC3. Low-order bits 0-15 are AC0L, etc. High-order bits 16-31 are AC0H, etc. Guard bits 32-39 are AC0G, etc. ST0, ST1, PMST, ST0_55, ST1_55, ST2_55, ST3_55 provide arithmetic/bit manipulation flags, etc. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. C55x stack pointers SP keeps track of user stack. SSP holds system stack pointer. SPH is extended data page pointer for both SP and SSP. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. C55x auxiliary registers and circular buffer pointers AR0-AR7 are auxiliary instructions. CDP points to coefficients for polynomial evaluation instructions. CDPH is main data page pointer. BK47 is used for circular buffer operations along with AR4-7. BK03 addresses circular buffers. BKC is size register for CDP. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. C55x block repeat registers BRC0 counts block repeat instructions. RSA0L and REA0L keep track of start and end points of blocks. BRC1 and BRS1 are used to repeat blocks of instructions. RSA0 and RSA1 are the start address registers for block repeats. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. C55x addressing mode and interrupt registers DP and DPH set base address for data access. PDP determines base address for I/O. IER0 and IER1 are interrupt mask registers. IFR0 and IFR1 keep track of currently pending registers. DBIER0 and DBIER1 are for debugging. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. C55x memory map 24-bit address space, 16 MB of memory. Data, program, I/O all mapped to same physical memory. Addressability: memory mappedpage registers main data 0 main data page 1 main data page 2 … Program space main data page 127 address is 24 bits. Data space is 23 bits. I/O address is 16 bits. Overheads for Computers as © 2008 Wayne Wolf Components 2nd ed. C55x addressing modes Three addressing modes: Absolute addressing supplies an address in an instruction. Direct addressing supplies an offset. Indirect addressing uses a register as a pointer. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. C55x direct addressing DP addressing accesses data pages: ADP = DPH[22:15](DP+Doffset) SP addressing accesses stack values: ASP = SPH[22:15](SP+Soffset) Register-bit direcdt addressing accesses bits in registers. PDP addressing accesses data pages: APDP = PDP[15:6]PDPoffset © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. C55x indirect addressing AR indirect addressing uses auxiliary register to point to data. Dual AR indirect addressing allows two simultaneous accesses. CDP indirect addressing uses CDP to access coefficients. Coefficient indirect addressing is similar to CDP indirect but for instructions with 3 memory operands per cycle. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. C55x stack operations Two stacks: data and system. Three different stack configurations: Dual 16-bit stack with fast return has independent data and system stacks. Dual 16-bit stack with slow return, independent data and system stacks but RETA and CFCT are not used for slow returns. 32-bit stack with slow return, SP and SSP are both modified by the same amount. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. C55x data operations MOV moves data between registers and memory: MOV src, dst Varieties of ADDs: ADD src,dst ADD dual(LMEM),ACx,ACy Multiplication: MPY src,dst MAC AC,TX,ACy © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. C55x flow of control Unconditional branch: B ACx B label Conditional branch: BCC label, cond Loops: Single-instruction repeat Block repeat © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Efficient loops General rules: Don’t use function calls. Keep loop body small to enable local repeat (only forward branches). Use unsigned integer for loop counter. Use <= to test loop counter. Make use of compiler---global optimization, software pipelining. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Single-instruction repeat loop example STM #4000h,AR2 ; load pointer to source STM #100h,AR3 ; load pointer to destination RPT #(1024-1) MVDD *AR2+,*AR3+ ; move © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. C55x subroutines Unconditional subroutine call: CALL target Conditional subroutine call: CALLCC adrs,cond Two types of return: Fast return gives return address and loop context in registers. Slow return puts return address/loop on Overheads for Computers as stack. © 2008 Wayne Wolf Components 2nd ed. C55x interrupts Handled using subroutine mechanism. Four step handling process: Receive interrupt. Acknowledge interrupt. Prepare for ISR by finishing current instruction, retrieving interrupt vector. Processing the interrupt service routine. 32 possible interrupt vectors, 27 priorities. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. CPUs Input and output. Supervisor mode, exceptions, traps. Co-processors. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. I/O devices CPU status reg data reg © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. mechanism Usually includes some non-digital component. Typical digital interface to CPU: Application: 8251 UART Universal asynchronous receiver transmitter (UART) : provides serial communication. 8251 functions are integrated into standard PC interface chip. Allows many communication parameters to be programmed. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Serial communication Characters are transmitted separately: no char start bit 0 bit 1 ... bit n-1 stop time © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Serial communication parameters Baud (bit) rate. Number of bits per character. Parity/no parity. Even/odd parity. Length of stop bit (1, 1.5, 2 bits). © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. 8251 CPU interface CPU status (8 bit) 8251 data (8 bit) © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. xmit/ rcv serial port Programming I/O Two types of instructions can support I/O: special-purpose I/O instructions; memory-mapped load/store instructions. Intel x86 provides in, out instructions. Most other CPUs use memory-mapped I/O. I/O instructions do not preclude memorymapped I/O. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. ARM memory-mapped I/O Define location for device: DEV1 EQU 0x1000 Read/write code: LDR LDR LDR STR r1,#DEV1 ; set up device adrs r0,[r1] ; read DEV1 r0,#8 ; set up value to write r0,[r1] ; write value to device © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Peek and poke Traditional HLL interfaces: int peek(char *location) { return *location; } void poke(char *location, char newval) { (*location) = newval; } © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Busy/wait output Simplest way to program device. Use instructions to test when device is ready. current_char = mystring; while (*current_char != ‘\0’) { poke(OUT_CHAR,*current_char); while (peek(OUT_STATUS) != 0); current_char++; } © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Simultaneous busy/wait input and output while (TRUE) { /* read */ while (peek(IN_STATUS) == 0); achar = (char)peek(IN_DATA); /* write */ poke(OUT_DATA,achar); poke(OUT_STATUS,1); while (peek(OUT_STATUS) != 0); } Overheads for Computers as © 2008 Wayne Wolf Components 2nd ed. Interrupt I/O Busy/wait is very inefficient. CPU can’t do other work while testing device. Hard to do simultaneous I/O. Interrupts allow a device to change the flow of control in the CPU. Causes subroutine call to handle device. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Interrupt interface intr ack data/address © 2008 Wayne Wolf status reg data reg Overheads for Computers as Components 2nd ed. mechanism CPU PC IR intr request Interrupt behavior Based on subroutine call mechanism. Interrupt forces next instruction to be a subroutine call to a predetermined location. Return address is saved to resume executing foreground program. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Interrupt physical interface CPU and device are connected by CPU bus. CPU and device handshake: device asserts interrupt request; CPU asserts interrupt acknowledge when it can handle the interrupt. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Example: character I/O handlers void input_handler() { achar = peek(IN_DATA); gotchar = TRUE; poke(IN_STATUS,0); } void output_handler() { } © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Example: interrupt-driven main program main() { while (TRUE) { if (gotchar) { poke(OUT_DATA,achar); poke(OUT_STATUS,1); gotchar = FALSE; } } } Overheads for Computers as © 2008 Wayne Wolf Components 2nd ed. Example: interrupt I/O with buffers Queue for characters: a head tail © 2008 Wayne Wolf tail Overheads for Computers as Components 2nd ed. Buffer-based input handler void input_handler() { char achar; if (full_buffer()) error = 1; else { achar = peek(IN_DATA); add_char(achar); } poke(IN_STATUS,0); if (nchars == 1) { poke(OUT_DATA,remove_char(); poke(OUT_STATUS,1); } } © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. I/O sequence diagram :foreground :input :output :queue empty a empty b bc c © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Debugging interrupt code What if you forget to change registers? Foreground program can exhibit mysterious bugs. Bugs will be hard to repeat---depend on interrupt timing. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Priorities and vectors Two mechanisms allow us to make interrupts more specific: Priorities determine what interrupt gets CPU first. Vectors determine what code is called for each type of interrupt. Mechanisms are orthogonal: most CPUs provide both. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Prioritized interrupts device 1 device 2 interrupt acknowledge L1 L2 .. Ln CPU © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. device n Interrupt prioritization Masking: interrupt with priority lower than current priority is not recognized until pending interrupt is complete. Non-maskable interrupt (NMI): highestpriority, never masked. Often used for power-down. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Example: Prioritized I/O :interrupts :foreground :A B C A A,B © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. :B :C Interrupt vectors Allow different devices to be handled by different code. Interrupt vector table: Interrupt vector table head handler 0 handler 1 handler 2 handler 3 © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Interrupt vector acquisition :CPU :device receive request receive ack receive vector © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Generic interrupt mechanism continue execution N N ignore intr? Y intr priority > current priority? Y ack Y bus error Y timeout? N vector? Y call table[vector] © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Assume priority selection is handled before this point. Interrupt sequence CPU acknowledges request. Device sends vector. CPU calls handler. Software processes request. CPU restores state to foreground program. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Sources of interrupt overhead Handler execution time. Interrupt mechanism overhead. Register save/restore. Pipeline-related penalties. Cache-related penalties. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. ARM interrupts ARM7 supports two types of interrupts: Fast interrupt requests (FIQs). Interrupt requests (IRQs). Interrupt table starts at location 0. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. ARM interrupt procedure CPU actions: Save PC. Copy CPSR to SPSR. Force bits in CPSR to record interrupt. Force PC to vector. Handler responsibilities: Restore proper PC. Restore CPSR from SPSR. Clear interrupt disable flags. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. ARM interrupt latency Worst-case latency to respond to interrupt is 27 cycles: Two cycles to synchronize external request. Up to 20 cycles to complete current instruction. Three cycles for data abort. Two cycles to enter interrupt handling state. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. C55x interrupts Latency is between 7 and 13 cycles. Maskable interrupt sequence: Interrupt flag register is set. Interrupt enable register is checked. Interrupt mask register is checked. Interrupt flag register is cleared. Appropriate registers are saved. INTM set to 1, DBGM set to 1, EALLOW set to 0. Branch to ISR. Two styles of return: fast and slow. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Supervisor mode May want to provide protective barriers between programs. Avoid memory corruption. Need supervisor mode to manage the various programs. C55x does not have a supervisor mode. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. ARM supervisor mode Use SWI instruction to enter supervisor mode, similar to subroutine: SWI CODE_1 Sets PC to 0x08. Argument to SWI is passed to supervisor mode code. Saves CPSR in SPSR. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Exception Exception: internally detected error. Exceptions are synchronous with instructions but unpredictable. Build exception mechanism on top of interrupt mechanism. Exceptions are usually prioritized and vectorized. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Trap Trap (software interrupt): an exception generated by an instruction. Call supervisor mode. ARM uses SWI instruction for traps. SHARC offers three levels of software interrupts. Called by setting bits in IRPTL register. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Co-processor Co-processor: added function unit that is called by instruction. Floating-point units are often structured as co-processors. ARM allows up to 16 designer-selected coprocessors. Floating-point co-processor uses units 1, 2. C55x uses co-processors as well. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. C55x image/video hardware extensions Available in 5509 and 5510. Equivalent C-callable functions for other devices. Available extensions: DCT/IDCT. Pixel interpolation Motion estimation. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. DCT/IDCT 2-D DCT/IDCT is computed from two 1-D DCT/IDCT. Put data in different banks to maximize throughput. © 2008 Wayne Wolf block Column DCT interim Overheads for Computers as Components 2nd ed. Row DCT DCT C55 DCT/IDCT coprocessor extensions Load, compute, transfer to accumulators: ACy=copr(k8,ACx,Xmem,Ymem) Compute, transfer, mem write: ACy=copr(k8,ACx,ACy), Lmem=ACz Special: ACy=copr(k8,ACx,ACy) © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Software pipelined load/compute/store for DCT Iteration i-1 Iteration i Iteration i+1 Dual_load Dual_load Dual_load 4 empty 4 empty 4 empty 3 Dual_load 3 Dual_load 3 Dual_load 8 compute 8 compute 8 compute empty empty empty 4 Long_store © 2008 Wayne Wolf 4 4 Long_store Long_store Overheads for Computers as Components 2nd ed. op_i(0), load_i+1(0,1) op_i(1), store_i-1(0,1) op_i(2), store_i-1(2,3) op_i(2), store_i-1(4,5) op_i(2), store_i-1(6,7) op_i(2), load_i+1(2,3) … C55 motion estimation Search strategy: Full vs. non-full. Accuracy: Full-pixel vs. half-pixel. Number of returned motion vectors: 1 (one 16x16) vs. 4 (four 8x8). Algorithms: 3-step algorithm (distance 4,2,1). 4-step algorithm (distance 8,4,2,1). 4-step with half-pixel refinement. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Four-step motion estimation breakdown d = {8,4,2,1}; for (i=0; i<4; i++) { compute 3 upper differences for d[i]; compute 3 middle differences for d[i]; compute 3 lower differences for d[i]; compute minimum value; move to next d; } © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. X C55 motion estimation accelerator Includes 3 16-bit pixel data paths, 3 16-bit absolute differences (ADs). Basic operation: [ACx,ACy] = copr(k8,ACx,ACy,Xmem,Ymem,Coeff) K8 = control bits (enable AD units, etc.) ACx, ACy = accumulated absolute differences Xmem, Ymem = pointers to odd, even lines of the search window Pointer to two adjacent pixels from reference window © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. C55 pixel interpolation Given four pixels A, B, C, D, interpolate three half-pixels: A C © 2008 Wayne Wolf U B M R D Overheads for Computers as Components 2nd ed. Pixel interpolation coprocessor operations Load pixels and compute: ACy=copr(k8,AC,Lmem) Load pixels, compute, and store: ACy=copr(k8,AACx,Lmem) || Lmem=ACz © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. CPUs Caches. Memory management. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Caches and CPUs CPU data © 2008 Wayne Wolf data cache controller address cache address data Overheads for Computers as Components 2nd ed. main memory Cache operation Many main memory locations are mapped onto one cache entry. May have caches for: instructions; data; data + instructions (unified). Memory access time is no longer deterministic. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Terms Cache hit: required location is in cache. Cache miss: required location is not in cache. Working set: set of locations used by program in a time interval. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Types of misses Compulsory (cold): location has never been accessed. Capacity: working set is too large. Conflict: multiple locations in working set map to same cache entry. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Memory system performance h = cache hit rate. tcache = cache access time, tmain = main memory access time. Average memory access time: tav = htcache + (1-h)tmain © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Multiple levels of cache CPU © 2008 Wayne Wolf L1 cache Overheads for Computers as Components 2nd ed. L2 cache Multi-level cache access time h1 = cache hit rate. h2 = hit rate on L2. Average memory access time: tav = h1tL1 + (h2-h1)tL2 + (1- h2-h1)tmain © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Replacement policies Replacement policy: strategy for choosing which cache entry to throw out to make room for a new memory location. Two popular strategies: Random. Least-recently used (LRU). © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Cache organizations Fully-associative: any memory location can be stored anywhere in the cache (almost never implemented). Direct-mapped: each memory location maps onto exactly one cache entry. N-way set-associative: each memory location can go into one of n sets. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Cache performance benefits Keep frequently-accessed locations in fast cache. Cache retrieves more than one word at a time. Sequential accesses are faster after first access. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Direct-mapped cache 1 valid 0xabcd tag byte byte byte ... data cache block tag index offset = hit © 2008 Wayne Wolf value byte Overheads for Computers as Components 2nd ed. Write operations Write-through: immediately copy write to main memory. Write-back: write to main memory only when location is removed from cache. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Direct-mapped cache locations Many locations map onto the same cache block. Conflict misses are easy to generate: Array a[] uses locations 0, 1, 2, … Array b[] uses locations 1024, 1025, 1026, … Operation a[i] + b[i] generates conflict misses. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Set-associative cache A set of direct-mapped caches: Set 1 Set 2 hit © 2008 Wayne Wolf ... data Overheads for Computers as Components 2nd ed. Set n Example: direct-mapped vs. set-associative address 000 001 010 011 100 101 110 111 © 2008 Wayne Wolf data 0101 1111 0000 0110 1000 0001 1010 0100 Overheads for Computers as Components 2nd ed. Direct-mapped cache behavior After 001 access: block 00 01 10 11 © 2008 Wayne Wolf tag 0 - data 1111 - After 010 access: block 00 01 10 11 Overheads for Computers as Components 2nd ed. tag 0 0 - data 1111 0000 - Direct-mapped cache behavior, cont’d. After 011 access: block 00 01 10 11 © 2008 Wayne Wolf tag 0 0 0 data 1111 0000 0110 After 100 access: block 00 01 10 11 Overheads for Computers as Components 2nd ed. tag 1 0 0 0 data 1000 1111 0000 0110 Direct-mapped cache behavior, cont’d. After 101 access: block 00 01 10 11 © 2008 Wayne Wolf tag 1 1 0 0 data 1000 0001 0000 0110 After 111 access: block 00 01 10 11 Overheads for Computers as Components 2nd ed. tag 1 1 0 1 data 1000 0001 0000 0100 2-way set-associtive cache behavior Final state of cache (twice as big as direct-mapped): set blk 0 tag 00 1 01 0 10 0 11 0 © 2008 Wayne Wolf blk 0 data 1000 1111 0000 0110 blk 1 tag 1 1 Overheads for Computers as Components 2nd ed. blk 1 data 0001 0100 2-way set-associative cache behavior Final state of cache (same size as directmapped): set blk 0 tag 0 01 1 10 © 2008 Wayne Wolf blk 0 data 0000 0111 blk 1 tag 10 11 Overheads for Computers as Components 2nd ed. blk 1 data 1000 0100 Example caches StrongARM: 16 Kbyte, 32-way, 32-byte block instruction cache. 16 Kbyte, 32-way, 32-byte block data cache (write-back). C55x: Various models have 16KB, 24KB cache. Can be used as scratch pad memory. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Scratch pad memories Alternative to cache: Software determines what is stored in scratch pad. Provides predictable behavior at the cost of software control. C55x cache can be configured as scratch pad. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Memory management units Memory management unit (MMU) translates addresses: logical address CPU © 2008 Wayne Wolf memory management unit physical address Overheads for Computers as Components 2nd ed. main memory Memory management tasks Allows programs to move in physical memory during execution. Allows virtual memory: memory images kept in secondary storage; images returned to main memory on demand during execution. Page fault: request for location not resident in memory. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Address translation Requires some sort of register/table to allow arbitrary mappings of logical to physical addresses. Two basic schemes: segmented; paged. Segmentation and paging can be combined (x86). © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Segments and pages page 1 page 2 segment 1 memory segment 2 © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Segment address translation segment base address logical address + segment lower bound segment upper bound range check physical address © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. range error Page address translation page offset page i base concatenate page © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. offset Page table organizations page descriptor page descriptor flat © 2008 Wayne Wolf tree Overheads for Computers as Components 2nd ed. Caching address translations Large translation tables require main memory access. TLB: cache for address translation. Typically small. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. ARM memory management Memory region types: section: 1 Mbyte block; large page: 64 kbytes; small page: 4 kbytes. An address is marked as section-mapped or page-mapped. Two-level translation scheme. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. ARM address translation Translation table base register descriptor 1st level table 1st index 2nd index offset concatenate concatenate descriptor 2nd level table © 2008 Wayne Wolf physical address Overheads for Computers as Components 2nd ed. CPUs CPU performance CPU power consumption. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Elements of CPU performance Cycle time. CPU pipeline. Memory system. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Pipelining Several instructions are executed simultaneously at different stages of completion. Various conditions can cause pipeline bubbles that reduce utilization: branches; memory system delays; etc. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Performance measures Latency: time it takes for an instruction to get through the pipeline. Throughput: number of instructions executed per time period. Pipelining increases throughput without reducing latency. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. ARM7 pipeline ARM 7 has 3-stage pipe: fetch instruction from memory; decode opcode and operands; execute. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. ARM pipeline execution fetch sub r2,r3,r6 execute fetch decode execute fetch decode cmp r2,#3 1 © 2008 Wayne Wolf add r0,r1,#5 decode 2 3 Overheads for Computers as Components 2nd ed. time execute Pipeline stalls If every step cannot be completed in the same amount of time, pipeline stalls. Bubbles introduced by stall increase latency, reduce throughput. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. ARM multi-cycle LDMIA instruction ldmia fetch decodeex ld r2ex ld r3 r0,{r2,r3} sub r2,r3,r6 cmp r2,#3 fetch decode ex sub fetch decodeex cmp time © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Control stalls Branches often introduce stalls (branch penalty). Stall time may depend on whether branch is taken. May have to squash instructions that already started executing. Don’t know what to fetch until condition is evaluated. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. ARM pipelined branch bne foo sub r2,r3,r6 foo add r0,r1,r2 fetch decode ex bne ex bne ex bne fetch decode fetch decode ex add time © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Delayed branch To increase pipeline efficiency, delayed branch mechanism requires n instructions after branch always executed whether branch is executed or not. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Example: ARM execution time Determine execution time of FIR filter: for (i=0; i<N; i++) f = f + c[i]*x[i]; Only branch in loop test may take more than one cycle. BLT loop takes 1 cycle best case, 3 worst case. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. FIR filter ARM code ; loop initiation code MOV r0,#0 ; use r0 for i, set to 0 MOV r8,#0 ; use a separate index for arrays ADR r2,N ; get address for N LDR r1,[r2] ; get value of N MOV r2,#0 ; use r2 for f, set to 0 ADR r3,c ; load r3 with address of base of c ADR r5,x ; load r5 with address of base of x © 2008 Wayne Wolf ; loop body loop LDR r4,[r3,r8] ; get value of c[i] LDR r6,[r5,r8] ; get value of x[i] MUL r4,r4,r6 ; compute c[i]*x[i] ADD r2,r2,r4 ; add into running sum ; update loop counter and array index ADD r8,r8,#4 ; add one to array index ADD r0,r0,#1 ; add 1 to i ; test for exit CMP r0,r1 BLT loop if i < N, continue loop loopend ... Overheads for Computers as Components 2nd ed. ; FIR filter performance by block Block Variable # instructions # cycles Initialization tinit 7 7 Body tbody 4 4 Update tupdate 2 2 Test ttest 2 [2,4] tloop = tinit+ N(tbody + tupdate) + (N-1) ttest,worst + ttest,best Loop test succeeds is worst case Loop test fails is best case © 2000 Morgan Kaufman Overheads for Computers as Components C55x pipeline C55x has 7-stage pipe: fetch; decode; address: computes data/branch addresses; access 1: reads data; access 2: finishes data read; Read stage: puts operands on internal busses; execute. Overheads for Computers as © 2008 Wayne Wolf Components 2nd ed. C55x organization B C, D busses D bus 16 3 data read address busses 24 program address bus 24 3 data read busses program read bus 32 Instruction unit Dual Dual-multiply operand Instruction Data read Single Writes operand read coefficient fetch from memory 2 data write busses 2 data write address busses © 2008 Wayne Wolf Program flow unit 16 24 Overheads for Computers as Components 2nd ed. Address unit Data unit C55x pipeline hazards Processor structure: Three computation units. 14 operators. Can perform two operations per instruction. Some combinations of operators are not legal. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. C55x hazards A-unit ALU/A-unit ALU. A-unit swap/A-unit swap. D-unit ALU,shifter,MAC/D-unit ALU,shifter,MAC D-unit shifter/D-unit shift, store D-unit shift, store/D-unit shift, store D-unit swap/D-unit swap P-unit control/P-unit control © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Memory system performance Caches introduce indeterminacy in execution time. Depends on order of execution. Cache miss penalty: added time due to a cache miss. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Types of cache misses Compulsory miss: location has not been referenced before. Conflict miss: two locations are fighting for the same block. Capacity miss: working set is too large. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. CPU power consumption Most modern CPUs are designed with power consumption in mind to some degree. Power vs. energy: heat depends on power consumption; battery life depends on energy consumption. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. CMOS power consumption Voltage drops: power consumption proportional to V2. Toggling: more activity means more power. Leakage: basic circuit characteristics; can be eliminated by disconnecting power. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. CPU power-saving strategies Reduce power supply voltage. Run at lower clock frequency. Disable function units with control signals when not in use. Disconnect parts from power supply when not in use. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. C55x low power features Parallel execution units---longer idle shutdown times. Multiple data widths: 16-bit ALU vs. 40-bit ALU. Instruction caches minimizes main memory accesses. Power management: Function unit idle detection. Memory idle detection. User-configurable IDLE domains allow programmer control of what hardware is shut down. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Power management styles Static power management: does not depend on CPU activity. Example: user-activated power-down mode. Dynamic power management: based on CPU activity. Example: disabling off function units. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Application: PowerPC 603 energy features Provides doze, nap, sleep modes. Dynamic power management features: Uses static logic. Can shut down unused execution units. Cache organized into subarrays to minimize amount of active circuitry. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. PowerPC 603 activity Percentage of time units are idle for SPEC integer/floating-point: unit D cache I cache load/store fixed-point floating-point system register © 2008 Wayne Wolf Specint92 29% 29% 35% 38% 99% 89% Overheads for Computers as Components 2nd ed. Specfp92 28% 17% 17% 76% 30% 97% Power-down costs Going into a power-down mode costs: time; energy. Must determine if going into mode is worthwhile. Can model CPU power states with power state machine. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Application: StrongARM SA-1100 power saving Processor takes two supplies: VDD is main 3.3V supply. VDDX is 1.5V. Three power modes: Run: normal operation. Idle: stops CPU clock, with logic still powered. Sleep: shuts off most of chip activity; 3 steps, each about 30 ms; wakeup takes > 10 ms. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. SA-1100 power state machine Prun = 400 mW run 10 ms 160 ms 90 ms 10 ms idle Pidle = 50 mW © 2008 Wayne Wolf 90 ms sleep Psleep = 0.16 mW Overheads for Computers as Components 2nd ed. CPUs Example: data compressor. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Goals Compress data transmitted over serial line. Receives byte-size input symbols. Produces output symbols packed into bytes. Will build software module only here. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Collaboration diagram for compressor :input 1..m: packed 1..n: input output symbols symbols :data compressor :output © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Huffman coding Early statistical text compression algorithm. Select non-uniform size codes. Use shorter codes for more common symbols. Use longer codes for less common symbols. To allow decoding, codes must have unique prefixes. No code can be a prefix of a longer valid code. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Huffman example character a b c d e f v P .45 .24 .11 .08 .07 .05 P=1 P=.55 P=.31 P=.19 P=.12 Overheads for Computers as Components 2nd ed. Example Huffman code Read code from root to leaves: a 1 b 01 c 0000 d 0001 e 0010 f 0011 © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Huffman coder requirements table name data compression module purpose functions code module for Huffman compression encoding table, uncoded byte-size inputs packed compression output symbols Huffman coding performance fast manufacturing cost N/A power N/A physical size/weight N/A inputs outputs © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Building a specification Collaboration diagram shows only steadystate input/output. A real system must: Accept an encoding table. Allow a system reset that flushes the compression buffer. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. data-compressor class data-compressor buffer: data-buffer table: symbol-table current-bit: integer encode(): boolean, data-buffer flush() new-symbol-table() © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. data-compressor behaviors encode: Takes one-byte input, generates packed encoded symbols and a Boolean indicating whether the buffer is full. new-symbol-table: installs new symbol table in object, throws away old table. flush: returns current state of buffer, including number of valid bits in buffer. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Auxiliary classes data-buffer symbol-table databuf[databuflen] : character len : integer symbols[nsymbols] : data-buffer len : integer insert() length() : integer value() : symbol load() © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Auxiliary class roles data-buffer holds both packed and unpacked symbols. Longest Huffman code for 8-bit inputs is 256 bits. symbol-table indexes encoded verison of each symbol. load() puts data in a new symbol table. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Class relationships 1 data-compressor 1 1 data-buffer © 2008 Wayne Wolf 1 symbol-table Overheads for Computers as Components 2nd ed. Encode behavior input symbol T encode return true buffer filled? F © 2008 Wayne Wolf create new buffer add to buffers add to buffer Overheads for Computers as Components 2nd ed. return false Insert behavior input symbol T pack into this buffer fills buffer? F © 2008 Wayne Wolf pack bottom bits into this buffer, top bits into overflow buffer Overheads for Computers as Components 2nd ed. update length Program design In an object-oriented language, we can reflect the UML specification in the code more directly. In a non-object-oriented language, we must either: add code to provide object-oriented features; diverge from the specification structure. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. C++ classes Class data_buffer { char databuf[databuflen]; int len; int length_in_chars() { return len/bitsperbyte; } public: void insert(data_buffer,data_buffer&); int length() { return len; } int length_in_bytes() { return (int)ceil(len/8.0); } int initialize(); ... © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. C++ classes, cont’d. class data_compressor { data_buffer buffer; int current_bit; symbol_table table; public: boolean encode(char,data_buffer&); void new_symbol_table(symbol_table); int flush(data_buffer&); data_compressor(); ~data_compressor(); } © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. C code struct data_compressor_struct { data_buffer buffer; int current_bit; sym_table table; } typedef struct data_compressor_struct data_compressor, *data_compressor_ptr; boolean data_compressor_encode(data_compressor_ptr mycmptrs, char isymbol, data_buffer *fullbuf) ... © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Testing Test by encoding, then decoding: symbol table input symbols encoder decoder compare © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. result Code inspection tests Look at the code for potential problems: Can we run past end of symbol table? What happens when the next symbol does not fill the buffer? Does fill it? Do very long encoded symbols work properly? Very short symbols? Does flush() work properly? © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Bus-Based Computer Systems Busses. Memory devices. I/O devices: serial links timers and counters keyboards displays analog I/O © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. The CPU bus Bus allows CPU, memory, devices to communicate. Shared communication medium. A bus is: A set of wires. A communications protocol. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Bus protocols Bus protocol determines how devices communicate. Devices on the bus go through sequences of states. Protocols are specified by state machines, one state machine per actor in the protocol. May contain asynchronous logic behavior. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Four-cycle handshake device 1 enq device 1 device 2 ack device 2 1 © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. 2 3 4 time Four-cycle handshake, cont’d. 1. 2. 3. 4. Device Device Device Device © 2008 Wayne Wolf 1 2 2 1 raises enq. responds with ack. lowers ack once it has finished. lowers enq. Overheads for Computers as Components 2nd ed. Microprocessor busses Clock provides synchronization. R/W is true when reading (R/W’ is false when reading). Address is a-bit bundle of address lines. Data is n-bit bundle of data lines. Data ready signals when n-bit data is ready. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Timing diagrams © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Bus read © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. State diagrams for bus read Get data Send data Done See ack Ack Adrs Wait CPU © 2008 Wayne Wolf Release ack Adrs Wait start Overheads for Computers as Components 2nd ed. device Bus wait state © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Bus burst read © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Bus multiplexing data enable device data CPU adrs adrs Adrs enable © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. DMA Direct memory access (DMA) performs data transfers without executing instructions. CPU sets up transfer. DMA engine fetches, writes. DMA controller is a separate unit. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Bus mastership By default, CPU is bus master and initiates transfers. DMA must become bus master to perform its work. CPU can’t use bus while DMA operates. Bus mastership protocol: Bus request. Bus grant. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. DMA operation CPU sets DMA registers for start address, length. DMA status register controls the unit. Once DMA is bus master, it transfers automatically. May run continuously until complete. May use every nth bus cycle. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Bus transfer sequence diagram © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. System bus configurations Multiple busses allow parallelism: CPU A bridge connects two busses. © 2008 Wayne Wolf bridge Slow devices on one bus. memory Fast devices on separate bus. slow device slow device high-speed device Overheads for Computers as Components 2nd ed. Bridge state diagram © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. ARM AMBA bus Two varieties: AHB is high-performance. APB is lower-speed, lower cost. AHB supports pipelining, burst transfers, split transactions, multiple bus masters. All devices are slaves on APB. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Memory components Several different types of memory: DRAM. SRAM. Flash. Each type of memory comes in varying: Capacities. Widths. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Random-access memory Dynamic RAM is dense, requires refresh. Synchronous DRAM is dominant type. SDRAM uses clock to improve performance, pipeline memory accesses. Static RAM is faster, less dense, consumes more power. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. SDRAM operation © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Read-only memory ROM may be programmed at factory. Flash is dominant form of fieldprogrammable ROM. Electrically erasable, must be block erased. Random access, but write/erase is much slower than read. NOR flash is more flexible. NAND flash is more dense. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Flash memory Non-volatile memory. Flash can be programmed in-circuit. Random access for read. To write: Erase a block to 1. Write bits to 0. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Flash writing Write is much slower than read. 1.6 ms write, 70 ns read. Blocks are large (approx. 1 Mb). Writing causes wear that eventually destroys the device. Modern lifetime approx. 1 million writes. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Types of flash NOR: Word-accessible read. Erase by blocks. NAND: Read by pages (512-4K bytes). Erase by blocks. NAND is cheaper, has faster erase, sequential access times. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Timers and counters Very similar: a timer is incremented by a periodic signal; a counter is incremented by an asynchronous, occasional signal. Rollover causes interrupt. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Watchdog timer Watchdog timer is periodically reset by system timer. If watchdog is not reset, it generates an interrupt to reset the host. interrupt host CPU © 2008 Wayne Wolf reset watchdog timer Overheads for Computers as Components 2nd ed. Switch debouncing A switch must be debounced to multiple contacts caused by eliminate mechanical bouncing: © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Encoded keyboard An array of switches is read by an encoder. N-key rollover remembers multiple key depressions. row © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. LED Must use resistor to limit current: © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. 7-segment LCD display May use parallel or multiplexed input. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Types of high-resolution display Liquid crystal display (LCD) is dominant form. Plasma, OLED, etc. Frame buffer holds current display contents. Written by processor. Read by video. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Touchscreen Includes input and output device. Input device is a two-dimensional voltmeter: © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Touchscreen position sensing ADC voltage © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Digital-to-analog conversion Use resistor tree: R bn bn-1 bn-2 Vout 2R 4R 8R bn-3 © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Flash A/D conversion N-bit result requires 2n comparators: Vin encoder ... © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Dual-slope conversion Use counter to time required to charge/discharge capacitor. Charging, then discharging eliminates non-linearities. Vin © 2008 Wayne Wolf timer Overheads for Computers as Components 2nd ed. Sample-and-hold Samples data: Vin © 2008 Wayne Wolf converter Overheads for Computers as Components 2nd ed. Bus-Based Computer Systems Designing with microprocessors. Development and debugging. System-level performance analysis. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. System architectures Architectures and components: software; hardware. Some software is very hardwaredependent. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Hardware platform architecture Contains several elements: CPU; bus; memory; I/O devices: networking, sensors, actuators, etc. How big/fast much each one be? © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Software architecture Functional description must be broken into pieces: division among people; conceptual organization; performance; testability; maintenance. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Hardware and software architectures Hardware and software are intimately related: software doesn’t run without hardware; how much hardware you need is determined by the software requirements: speed; memory. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Evaluation boards Designed by CPU manufacturer or others. Includes CPU, memory, some I/O devices. May include prototyping section. CPU manufacturer often gives out evaluation board netlist---can be used as starting point for your custom board design. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Adding logic to a board Programmable logic devices (PLDs) provide low/medium density logic. Field-programmable gate arrays (FPGAs) provide more logic and multi-level logic. Application-specific integrated circuits (ASICs) are manufactured for a single purpose. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. The PC as a platform Advantages: cheap and easy to get; rich and familiar software environment. Disadvantages: requires a lot of hardware resources; not well-adapted to real-time. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Typical PC hardware platform CPU memory CPU bus intr ctrl DMA controller bus interface device high-speed bus timers bus interface low-speed bus device © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Typical busses PCI: standard for high-speed interfacing 33 or 66 MHz. PCI Express. USB (Universal Serial Bus), Firewire (IEEE 1394): relatively low-cost serial interface with high speed. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Software elements IBM PC uses BIOS (Basic I/O System) to implement low-level functions: boot-up; minimal device drivers. BIOS has become a generic term for the lowest-level system software. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Example: StrongARM StrongARM system includes: CPU chip (3.686 MHz clock) system control module (32.768 kHz clock). • • • • • • © 2008 Wayne Wolf Real-time clock; operating system timer general-purpose I/O; interrupt controller; power manager controller; reset controller. Overheads for Computers as Components 2nd ed. Debugging embedded systems Challenges: target system may be hard to observe; target may be hard to control; may be hard to generate realistic inputs; setup sequence may be complex. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Host/target design Use a host system to prepare software for target system: target system host system © 2008 Wayne Wolf serial line Overheads for Computers as Components 2nd ed. Host-based tools Cross compiler: compiles code on host for target system. Cross debugger: displays target state, allows target system to be controlled. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Software debuggers A monitor program residing on the target provides basic debugger functions. Debugger should have a minimal footprint in memory. User program must be careful not to destroy debugger program, but , should be able to recover from some damage caused by user code. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Breakpoints A breakpoint allows the user to stop execution, examine system state, and change state. Replace the breakpointed instruction with a subroutine call to the monitor program. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. ARM breakpoints 0x400 0x404 0x408 0x40c MUL r4,r6,r6 ADD r2,r2,r4 ADD r0,r0,#1 B loop uninstrumented code © 2008 Wayne Wolf 0x400 0x404 0x408 0x40c MUL r4,r6,r6 ADD r2,r2,r4 ADD r0,r0,#1 BL bkpoint code with breakpoint Overheads for Computers as Components 2nd ed. Breakpoint handler actions Save registers. Allow user to examine machine. Before returning, restore system state. Safest way to execute the instruction is to replace it and execute in place. Put another breakpoint after the replaced breakpoint to allow restoring the original breakpoint. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. In-circuit emulators A microprocessor in-circuit emulator is a specially-instrumented microprocessor. Allows you to stop execution, examine CPU state, modify registers. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Logic analyzers A logic analyzer is an array of low-grade oscilloscopes: © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Logic analyzer architecture sample memory UUT system clock microprocessor vector address controller clock gen state or timing mode © 2008 Wayne Wolf keypad Overheads for Computers as Components 2nd ed. display Boundary scan Simplifies testing of multiple chips on a board. Registers on pins can be configured as a scan chain. Used for debuggers, in-circuit emulators. © 2008 Wayne Wolf Overheads for Computers as Components How to exercise code Run on host system. Run on target system. Run in instruction-level simulator. Run on cycle-accurate simulator. Run in hardware/software co-simulation environment. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Debugging real-time code Bugs in drivers can cause nondeterministic behavior in the foreground problem. Bugs may be timing-dependent. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. System-level performance analysis Performance depends on all the elements of the system: CPU. Cache. Bus. Main memory. I/O device. © 2008 Wayne Wolf memory CPU cache Overheads for Computers as Components 2nd ed. Bandwidth as performance Bandwidth applies to several components: Memory. Bus. CPU fetches. Different parts of the system run at different clock rates. Different components may have different widths (bus, memory). © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Bandwidth and data transfers Video frame: 320 x 240 x 3 = 230,400 bytes. Transfer in 1/30 sec. Transfer 1 byte/msec, 0.23 sec per frame. Too slow. Increase bandwidth: Increase bus width. Increase bus clock rate. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Bus bandwidth T: # bus cycles. P: time/bus cycle. Total time for transfer: O1 D O2 W t = TP. D: data payload length. O1 + O2 = overhead O. © 2008 Wayne Wolf Tbasic(N) = (D+O)N/W Overheads for Computers as Components 2nd ed. Bus burst transfer bandwidth T: # bus cycles. P: time/bus cycle. Total time for transfer: 1 2 B … O W t = TP. D: data payload length. O1 + O2 = overhead O. © 2008 Wayne Wolf Tburst(N) = (BD+O)N/(BW) Overheads for Computers as Components 2nd ed. Memory aspect ratios 16 M 64 M 8M 1 © 2008 Wayne Wolf 4 Overheads for Computers as Components 2nd ed. 8 Memory access times Memory component access times comes from chip data sheet. Page modes allow faster access for successive transfers on same page. If data doesn’t fit naturally into physical words: A = [(E/w)mod W]+1 © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Bus performance bottlenecks Transfer 320 x 240 video frame @ 30 frames/sec = 612,000 bytes/sec. Is performance bottleneck bus or memory? © 2008 Wayne Wolf memory Overheads for Computers as Components 2nd ed. CPU Bus performance bottlenecks, cont’d. Bus: assume 1 MHz bus, D=1, O=3: Tbasic = (1+3)612,000/2 = 1,224,000 cycles = 1.224 sec. Memory: try burst mode B=4, width w=0.5. Tmem = (4*1+4)612,000/(4*0.5) = 2,448,000 cycles = 0.2448 sec. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Performance spreadsheet bus clock period W D O N T_basic t © 2000 Morgan Kaufman 1.00E-06 2 1 3 612000 1224000 1.22E+00 memory clock period W D O B N 1.00E-08 0.5 1 4 4 612000 T_mem t 2448000 2.45E-02 Overheads for Computers as Components Parallelism Speed things up by running several units at once. DMA provides parallelism if CPU doesn’t need the bus: DMA + bus. CPU. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Bus-Based Computer Systems Example: alarm clock © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Alarm clock interface Alarm on Alarm off buzzer PM Alarm ready light set time © 2008 Wayne Wolf set alarm hour minute Overheads for Computers as Components 2nd ed. button Operations Set time: hold set time, depress hour, minute. Set alarm time: hold set alarm, depress hour, minute. Turn alarm on/off: depress alarm on/off. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Alarm clock requirements name purpose inputs outputs functions alarm clock 24-hour digital clock with one alarm set time, set alarm, hour, minute, alarm on/off four-digit display, PM indicator, alarm ready, buzzer keep time, set time, set alarm, turn alarm on/off, activate buzzer by alarm performance hours and digits, no seconds; not high precision manufacturing consumer product cost power AC physical fits on stand size/weight © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Alarm clock class diagram 1 Lights* 1 Display 1 1 1 1 Buttons* Speaker* © 2008 Wayne Wolf 1 Overheads for Computers as Components Mechanism 1 Alarm clock physical classes Lights* Buttons* digit-val() digit-scan() alarm-on-light() PM-light() set-time(): boolean set-alarm(): boolean alarm-on(): boolean alarm-off(): boolean minute(): boolean hour(): boolean © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Speaker* buzz() Display class Display time[4]: integer alarm-indicator: boolean PM-indicator: boolean set-time() alarm-light-on() alarm-light-off() PM-light-on() PM-light-off() © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Mechanism class Mechanism Seconds: integer PM: boolean tens-hours, ones-hours: boolean tens-minutes, ones-minutes: boolean alarm-ready: boolean alarm-tens-hours, alarm-ones-hours: boolean alarm-tens-minutes, alarm-ones-minutes: boolean scan-keyboard() update-time() © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Update-time behavior update seconds with rollover Rollover? display.set-time(current time) F Time >= alarm and alarm-on? T update hh:mm with rollover AM->PM PM=true © 2008 Wayne Wolf T alarm.buzzer(true) PM->AM PM=false Overheads for Computers as Components 2nd ed. F Scan-keyboard behavior compute button activations alarm-ready= true alarm-ready= false alarm.buzzer(false) © 2008 Wayne Wolf Alarm-on Alarm-off save button states Overheads for Computers as Components 2nd ed. Set-time and not set-alarm and hours Increment time tens w. rollover and AM/PM Increment time ones w. rollover and AM/PM Set-time and not set-alarm and minutes System architecture Includes: periodic behavior (clock); aperiodic behavior (buttons, buzzer activation). Two major software components: interrupt-driven routine updates time; foreground program deals with buttons, commands. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Interrupt-driven routine Timer probably can’t handle one-minute interrupt interval. Use software variable to convert interrupt frequency to seconds. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Foreground program Operates as while loop: while (TRUE) { read_buttons(button_values); process_command(button_values); check_alarm(); } © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Testing Component testing: test interrupt code on the platform; can test foreground program using a mockup. System testing: relatively few components to integrate; check clock accuracy; check recognition of buttons, buzzer, etc. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Program design and analysis Software components. Representations of programs. Assembly and linking. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Software state machine State machine keeps internal state as a variable, changes state based on inputs. Uses: control-dominated code; reactive systems. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. State machine example no seat/no seat/ buzzer off idle seat/timer on no seat/buzzer belt/ buzzer off © 2008 Wayne Wolf seated Belt/buzzer on belt/belted no belt/timer on Overheads for Computers as Components 2nd ed. no belt and no timer/- C implementation #define IDLE 0 #define SEATED 1 #define BELTED 2 #define BUZZER 3 switch (state) { case IDLE: if (seat) { state = SEATED; timer_on = TRUE; } break; case SEATED: if (belt) state = BELTED; else if (timer) state = BUZZER; break; … } © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Signal processing and circular buffer Commonly used in signal processing: new data constantly arrives; each datum has a limited lifetime. time time t+1 d1 d2 d3 d4 d5 d6 d7 Use a circular buffer to hold the data stream. Overheads for Computers as © 2008 Wayne Wolf Components 2nd ed. Circular buffer x1 x2 x3 t1 x4 t2 x5 t3 Data stream x1 x5 x2 x6 x3 x7 x4 Circular buffer © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. x6 Circular buffers Indexes locate currently used data, current input data: input use d1 use d5 d2 input d2 d3 d3 d4 d4 time t1+1 time t1 © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Circular buffer implementation: FIR filter int circ_buffer[N], circ_buffer_head = 0; int c[N]; /* coefficients */ … int ibuf, ic; for (f=0, ibuff=circ_buff_head, ic=0; ic<N; ibuff=(ibuff==N-1?0:ibuff++), ic++) f = f + c[ic]*circ_buffer[ibuf]; © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Queues Elastic buffer: holds data that arrives irregularly. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Buffer-based queues #define Q_SIZE 32 #define Q_MAX (Q_SIZE-1) int q[Q_MAX], head, tail; void initialize_queue() { head = tail = 0; } void enqueue(int val) { if (((tail+1)%Q_SIZE) == head) error(); q[tail]=val; if (tail == Q_MAX) tail = 0; else tail++; } © 2008 Wayne Wolf int dequeue() { int returnval; if (head == tail) error(); returnval = q[head]; if (head == Q_MAX) head = 0; else head++; return returnval; } Overheads for Computers as Components 2nd ed. Models of programs Source code is not a good representation for programs: clumsy; leaves much information implicit. Compilers derive intermediate representations to manipulate and optiize the program. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Data flow graph DFG: data flow graph. Does not represent control. Models basic block: code with no entry or exit. Describes the minimal ordering requirements on operations. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Single assignment form x = a + b; y = c - d; z = x * y; y = b + d; x = a + b; y = c - d; z = x * y; y1 = b + d; original basic block single assignment form © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Data flow graph x = a + b; y = c - d; z = x * y; y1 = b + d; a b c + - y x single assignment form * z DFG © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. d + y1 DFGs and partial orders a b c d + - y x © 2008 Wayne Wolf * + z y1 Partial order: a+b, c-d; b+d x*y Can do pairs of operations in any order. Overheads for Computers as Components 2nd ed. Control-data flow graph CDFG: represents control and data. Uses data flow graphs as components. Two types of nodes: decision; data flow. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Data flow node Encapsulates a data flow graph: x = a + b; y=c+d Write operations in basic block form for simplicity. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Control T v1 v4 cond F value v2 Equivalent forms © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. v3 CDFG example if (cond1) bb1(); else bb2(); bb3(); switch (test1) { case c1: bb4(); break; case c2: bb5(); break; case c3: bb6(); break; } T cond1 F bb2() bb3() c1 c3 test1 c2 bb4() © 2008 Wayne Wolf bb1() Overheads for Computers as Components 2nd ed. bb5() bb6() for loop for (i=0; i<N; i++) loop_body(); i=0 for loop F i<N i=0; while (i<N) { loop_body(); i++; } equivalent © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. T loop_body() Assembly and linking Last steps in compilation: HLL HLL HLL compile link © 2008 Wayne Wolf assembly assembly assembly executable Overheads for Computers as Components 2nd ed. assemble link Multiple-module programs Programs may be composed from several files. Addresses become more specific during processing: relative addresses are measured relative to the start of a module; absolute addresses are measured relative to the start of the CPU address space. © 2008 Wayne Wolf Overheads for Computers as Components Assemblers Major tasks: generate binary for symbolic instructions; translate labels into addresses; handle pseudo-ops (data, etc.). Generally one-to-one translation. Assembly labels: label1 © 2008 Wayne Wolf ORG 100 ADR r4,c Overheads for Computers as Components 2nd ed. Symbol table xx yy ADD r0,r1,r2 ADD r3,r4,r5 CMP r0,r3 SUB r5,r6,r7 assembly code © 2008 Wayne Wolf xx yy 0x8 0x10 symbol table Overheads for Computers as Components 2nd ed. Symbol table generation Use program location counter (PLC) to determine address of each location. Scan program, keeping count of PLC. Addresses are generated at assembly time, not execution time. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Symbol table example PLC=0x7 ADD r0,r1,r2 PLC=0x7 xx ADD r3,r4,r5 PLC=0x7 CMP r0,r3 PLC=0x7 yy SUB r5,r6,r7 © 2008 Wayne Wolf xx yy 0x8 0x10 Overheads for Computers as Components 2nd ed. Two-pass assembly Pass 1: generate symbol table Pass 2: generate binary instructions © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Relative address generation Some label values may not be known at assembly time. Labels within the module may be kept in relative form. Must keep track of external labels---can’t generate full binary for instructions that use external labels. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Pseudo-operations Pseudo-ops do not generate instructions: ORG sets program location. EQU generates symbol table entry without advancing PLC. Data statements define data blocks. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Linking Combines several object modules into a single executable module. Jobs: put modules in order; resolve labels across modules. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Externals and entry points entry point xxx yyy a ADD r1,r2,r3 B a external reference %1 © 2008 Wayne Wolf ADR r4,yyy ADD r3,r4,r5 Overheads for Computers as Components 2nd ed. Module ordering Code modules must be placed in absolute positions in the memory space. Load map or linker flags control the order of modules. module1 module2 module3 © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Dynamic linking Some operating systems link modules dynamically at run time: shares one copy of library among all executing programs; allows programs to be updated with new versions of libraries. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Program design and analysis Compilation flow. Basic statement translation. Basic optimizations. Interpreters and just-in-time compilers. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Compilation Compilation strategy (Wirth): compilation = translation + optimization Compiler determines quality of code: use of CPU resources; memory access scheduling; code size. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Basic compilation phases HLL parsing, symbol table machine-independent optimizations machine-dependent optimizations assembly © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Statement translation and optimization Source code is translated into intermediate form such as CDFG. CDFG is transformed/optimized. CDFG is translated into instructions with optimization decisions. Instructions are further optimized. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Arithmetic expressions a*b + 5*(c-d) expression b a c * 5 * + DFG © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. d Arithmetic expressions, cont’d. b a 1 c * 2 5 3 4 * + d - ADR r4,a MOV r1,[r4] ADR r4,b MOV r2,[r4] ADD r3,r1,r2 ADR r4,c MOV r1,[r4] ADR r4,d MOV r5,[r4] SUB r6,r4,r5 MUL r7,r6,#5 ADD r8,r7,r3 DFG © 2008 Wayne Wolf code Overheads for Computers as Components 2nd ed. Control code generation if (a+b > 0) x = 5; else x = 7; a+b>0 x=7 © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. x=5 Control code generation, cont’d. 1 a+b>0 3 x=7 © 2008 Wayne Wolf x=5 ADR r5,a LDR r1,[r5] ADR r5,b 2 LDR r2,b ADD r3,r1,r2 BLE label3 LDR r3,#5 ADR r5,x STR r3,[r5] B stmtent LDR r3,#7 ADR r5,x STR r3,[r5] stmtent ... Overheads for Computers as Components 2nd ed. Procedure linkage Need code to: call and return; pass parameters and results. Parameters and returns are passed on stack. Procedures with few parameters may use registers. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Procedure stacks growth proc1(int a) { proc2(5); } proc1 FP frame pointer proc2 5 SP stack pointer © 2008 Wayne Wolf accessed relative to SP Overheads for Computers as Components 2nd ed. ARM procedure linkage APCS (ARM Procedure Call Standard): r0-r3 pass parameters into procedure. Extra parameters are put on stack frame. r0 holds return value. r4-r7 hold register values. r11 is frame pointer, r13 is stack pointer. r10 holds limiting address on stack size to check for stack overflows. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Data structures Different types of data structures use different data layouts. Some offsets into data structure can be computed at compile time, others must be computed at run time. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. One-dimensional arrays C array name points to 0th element: a a[0] a[1] = *(a + 1) a[2] © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Two-dimensional arrays Column-major layout: a[0,0] N ... a[1,0] a[1,1] © 2008 Wayne Wolf M a[0,1] ... = a[i*M+j] Overheads for Computers as Components 2nd ed. Structures Fields within structures are static offsets: aptr struct { int field1; char field2; } mystruct; field1 field2 struct mystruct a, *aptr = &a; © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. 4 bytes *(aptr+4) Expression simplification Constant folding: 8+1 = 9 Algebraic: a*b + a*c = a*(b+c) Strength reduction: a*2 = a<<1 © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Dead code elimination Dead code: #define DEBUG 0 if (DEBUG) dbg(p1); 0 0 Can be eliminated by analysis of control flow, constant folding. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. 1 dbg(p1); Procedure inlining Eliminates procedure linkage overhead: int foo(a,b,c) { return a + b - c;} z = foo(w,x,y); z = w + x + y; © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Loop transformations Goals: reduce loop overhead; increase opportunities for pipelining; improve memory system performance. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Loop unrolling Reduces loop overhead, enables some other optimizations. for (i=0; i<4; i++) a[i] = b[i] * c[i]; for (i=0; i<2; i++) { a[i*2] = b[i*2] * c[i*2]; a[i*2+1] = b[i*2+1] * c[i*2+1]; } © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Loop fusion and distribution Fusion combines two loops into 1: for (i=0; i<N; i++) a[i] = b[i] * 5; for (j=0; j<N; j++) w[j] = c[j] * d[j]; for (i=0; i<N; i++) { a[i] = b[i] * 5; w[i] = c[i] * d[i]; } Distribution breaks one loop into two. Changes optimizations within loop body. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Loop tiling Breaks one loop into a nest of loops. Changes order of accesses within array. Changes cache behavior. © 2000 Morgan Kaufman Overheads for Computers as Components 2nd ed. Loop tiling example for (i=0; i<N; i++) for (j=0; j<N; j++) c[i] = a[i,j]*b[i]; © 2008 Wayne Wolf for (i=0; i<N; i+=2) for (j=0; j<N; j+=2) for (ii=0; ii<min(i+2,n); ii++) for (jj=0; jj<min(j+2,N); jj++) c[ii] = a[ii,jj]*b[ii]; Overheads for Computers as Components 2nd ed. Array padding Add array elements to change mapping into cache: a[0,0] a[0,1] a[0,2] a[0,0] a[0,1] a[0,2] a[0,2] a[1,0] a[1,1] a[1,2] a[1,0] a[1,1] a[1,2] a[1,2] before © 2008 Wayne Wolf after Overheads for Computers as Components 2nd ed. Register allocation Goals: choose register to hold each variable; determine lifespan of varible in the register. Basic case: within basic block. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Register lifetime graph w = a + b; t=1 x = c + w; t=2 y = c + d; t=3 a b c d w x y 1 © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. 2 3 time Instruction scheduling Non-pipelined machines do not need instruction scheduling: any order of instructions that satisfies data dependencies runs equally fast. In pipelined machines, execution time of one instruction depends on the nearby instructions: opcode, operands. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Reservation table A reservation table relates instructions/time to CPU resources. © 2008 Wayne Wolf Time/instr instr1 instr2 instr3 instr4 Overheads for Computers as Components 2nd ed. A X X X B X X Software pipelining Schedules instructions across loop iterations. Reduces instruction latency in iteration i by inserting instructions from iteration i+1. © 2008 Wayne Wolf Overheads for Computers as Components Instruction selection May be several ways to implement an operation or sequence of operations. Represent operations as graphs, match possible instruction sequences onto graph. + * expression © 2008 Wayne Wolf * MUL + ADD templates Overheads for Computers as Components 2nd ed. + * MADD Using your compiler Understand various optimization levels (O1, -O2, etc.) Look at mixed compiler/assembler output. Modifying compiler output requires care: correctness; loss of hand-tweaked code. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Interpreters and JIT compilers Interpreter: translates and executes program statements on-the-fly. JIT compiler: compiles small sections of code into instructions during program execution. Eliminates some translation overhead. Often requires more memory. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Program design and analysis Program-level performance analysis. Optimizing for: Execution time. Energy/power. Program size. Program validation and testing. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Program-level performance analysis Need to understand performance in detail: Real-time behavior, not just typical. On complex platforms. Program performance CPU performance: Pipeline, cache are windows into program. We must analyze the entire program. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Complexities of program performance Varies with input data: Different-length paths. Cache effects. Instruction-level performance variations: Pipeline interlocks. Fetch times. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. How to measure program performance Simulate execution of the CPU. Makes CPU state visible. Measure on real CPU using timer. Requires modifying the program to control the timer. Measure on real CPU using logic analyzer. Requires events visible on the pins. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Program performance metrics Average-case execution time. Typically used in application programming. Worst-case execution time. A component in deadline satisfaction. Best-case execution time. Task-level interactions can cause best-case program behavior to result in worst-case system behavior. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Elements of program performance Basic program execution time formula: execution time = program path + instruction timing Solving these problems independently helps simplify analysis. Easier to separate on simpler CPUs. Accurate performance analysis requires: Assembly/binary code. Execution platform. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Data-dependent paths in an if statement if (a || b) { /* T1 */ if ( c ) /* T2 */ x = r*s+t; /* A1 */ else y=r+s; /* A2 */ z = r+s+u; /* A3 */ } else { if ( c ) /* T3 */ y = r-t; /* A4 */ } © 2008 Wayne Wolf a b c path 0 0 0 T1=F, T3=F: no assignments 0 0 1 T1=F, T3=T: A4 0 1 0 T1=T, T2=F: A2, A3 0 1 1 T1=T, T2=T: A1, A3 1 0 0 T1=T, T2=F: A2, A3 1 0 1 T1=T, T2=T: A1, A3 1 1 0 T1=T, T2=F: A2, A3 1 1 1 T1=T, T2=T: A1, A3 Overheads for Computers as Components 2nd ed. Paths in a loop for (i=0, f=0; i<N; i++) f = f + c[i] * x[i]; i=0 f=0 N i=N Y f = f + c[i] * x[i] i=i+1 © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Instruction timing Not all instructions take the same amount of time. Multi-cycle instructions. Fetches. Execution times of instructions are not independent. Pipeline interlocks. Cache effects. Execution times may vary with operand value. Floating-point operations. Some multi-cycle integer operations. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Mesaurement-driven performance analysis Not so easy as it sounds: Must actually have access to the CPU. Must know data inputs that give worst/best case performance. Must make state visible. Still an important method for performance analysis. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Feeding the program Need to know the desired input values. May need to write software scaffolding to generate the input values. Software scaffolding may also need to examine outputs to generate feedbackdriven inputs. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Trace-driven measurement Trace-driven: Instrument the program. Save information about the path. Requires modifying the program. Trace files are large. Widely used for cache analysis. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Physical measurement In-circuit emulator allows tracing. Affects execution timing. Logic analyzer can measure behavior at pins. Address bus can be analyzed to look for events. Code can be modified to make events visible. Particularly important for real-world input streams. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. CPU simulation Some simulators are less accurate. Cycle-accurate simulator provides accurate clock-cycle timing. Simulator models CPU internals. Simulator writer must know how CPU works. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. SimpleScalar FIR filter simulation int x[N] = {8, 17, … }; int c[N] = {1, 2, … }; main() { int i, k, f; for (k=0; k<COUNT; k++) for (i=0; i<N; i++) f += c[i]*x[i]; } © 2008 Wayne Wolf N total sim cycles sim cycles per filter execution 100 25854 259 1,000 155759 156 1,0000 1451840 145 Overheads for Computers as Components 2nd ed. Performance optimization motivation Embedded systems must often meet deadlines. Faster may not be fast enough. Need to be able to analyze execution time. Worst-case, not typical. Need techniques for reliably improving execution time. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Programs and performance analysis Best results come from analyzing optimized instructions, not high-level language code: non-obvious translations of HLL statements into instructions; code may move; cache effects are hard to predict. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Loop optimizations Loops are good targets for optimization. Basic loop optimizations: code motion; induction-variable elimination; strength reduction (x*2 -> x<<1). © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Code motion for (i=0; i<N*M; i++) z[i] = a[i] + b[i]; i=0; Xi=0; = N*M i<N*M i<X Y z[i] = a[i] + b[i]; i = i+1; © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. N Induction variable elimination Induction variable: loop index. Consider loop: for (i=0; i<N; i++) for (j=0; j<M; j++) z[i,j] = b[i,j]; Rather than recompute i*M+j for each array in each iteration, share induction variable between arrays, increment at end Overheads for Computers as of loop body. © 2008 Wayne Wolf Components 2 ed. nd Cache analysis Loop nest: set of loops, one inside other. Perfect loop nest: no conditionals in nest. Because loops use large quantities of data, cache conflicts are common. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Array conflicts in cache a[0,0] 1024 1024 b[0,0] ... 4099 main memory © 2008 Wayne Wolf 4099 cache Overheads for Computers as Components 2nd ed. Array conflicts, cont’d. Array elements conflict because they are in the same line, even if not mapped to same location. Solutions: move one array; pad array. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Performance optimization hints Use registers efficiently. Use page mode memory accesses. Analyze cache behavior: instruction conflicts can be handled by rewriting code, rescheudling; conflicting scalar data can easily be moved; conflicting array data can be moved, padded. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Energy/power optimization Energy: ability to do work. Most important in battery-powered systems. Power: energy per unit time. Important even in wall-plug systems---power becomes heat. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Measuring energy consumption Execute a small loop, measure current: I while (TRUE) a(); © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Sources of energy consumption Relative energy per operation (Catthoor et al): memory transfer: 33 external I/O: 10 SRAM write: 9 SRAM read: 4.4 multiply: 3.6 add: 1 © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Cache behavior is important Energy consumption has a sweet spot as cache size changes: cache too small: program thrashes, burning energy on external memory accesses; cache too large: cache itself burns too much power. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Cache sweet spot [Li98] © 1998 IEEE © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Optimizing for energy First-order optimization: high performance = low energy. Not many instructions trade speed for energy. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Optimizing for energy, cont’d. Use registers efficiently. Identify and eliminate cache conflicts. Moderate loop unrolling eliminates some loop overhead instructions. Eliminate pipeline stalls. Inlining procedures may help: reduces linkage, but may increase cache thrashing. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Efficient loops General rules: Don’t use function calls. Keep loop body small to enable local repeat (only forward branches). Use unsigned integer for loop counter. Use <= to test loop counter. Make use of compiler---global optimization, software pipelining. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Single-instruction repeat loop example STM #4000h,AR2 ; load pointer to source STM #100h,AR3 ; load pointer to destination RPT #(1024-1) MVDD *AR2+,*AR3+ ; move © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Optimizing for program size Goal: reduce hardware cost of memory; reduce power consumption of memory units. Two opportunities: data; instructions. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Data size minimization Reuse constants, variables, data buffers in different parts of code. Requires careful verification of correctness. Generate data using instructions. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Reducing code size Avoid function inlining. Choose CPU with compact instructions. Use specialized instructions where possible. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Program validation and testing But does it work? Concentrate here on functional verification. Major testing strategies: Black box doesn’t look at the source code. Clear box (white box) does look at the source code. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Clear-box testing Examine the source code to determine whether it works: Can you actually exercise a path? Do you get the value you expect along a path? Testing procedure: Controllability: rovide program with inputs. Execute. Observability: examine outputs. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Controlling and observing programs firout = 0.0; for (j=curr, k=0; j<N; j++, k++) firout += buff[j] * c[k]; for (j=0; j<curr; j++, k++) firout += buff[j] * c[k]; if (firout > 100.0) firout = 100.0; if (firout < -100.0) firout = -100.0; Controllability: Must fill circular buffer with desired N values. Other code governs how we access the buffer. Observability: Want to examine firout before limit testing. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Execution paths and testing Paths are important in functional testing as well as performance analysis. In general, an exponential number of paths through the program. Show that some paths dominate others. Heuristically limit paths. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Choosing the paths to test Possible criteria: Execute every statement at least not covered once. Execute every branch direction at least once. Equivalent for structured programs. Not true for gotos. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Basis paths Approximate CDFG with undirected graph. Undirected graphs have basis paths: All paths are linear combinations of basis paths. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Cyclomatic complexity Cyclomatic complexity is a bound on the size of basis sets: e = # edges n = # nodes p = number of graph components M = e – n + 2p. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Branch testing Heuristic for testing branches. Exercise true and false branches of conditional. Exercise every simple condition at least once. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Branch testing example Correct: Test: if (a || (b >= c)) { printf(“OK\n”); } Incorrect: if (a && (b >= c)) { printf(“OK\n”); } © 2008 Wayne Wolf a = F (b >=c) = T Example: Correct: [0 || (3 >= 2)] = T Incorrect: [0 && (3 >= 2)] = F Overheads for Computers as Components 2nd ed. Another branch testing example Correct: if ((x == good_pointer) && x->field1 == 3)) { printf(“got the value\n”); } Incorrect: if ((x = good_pointer) && x->field1 == 3)) { printf(“got the value\n”); } © 2008 Wayne Wolf Incorrect code changes pointer. Assignment returns new LHS in C. Test that catches error: (x != good_pointer) && x->field1 = 3) Overheads for Computers as Components 2nd ed. Domain testing Heuristic test for linear inequalities. Test on each side + boundary of inequality. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Def-use pairs Variable def-use: Def when value is assigned (defined). Use when used on right-hand side. Exercise each def-use pair. Requires testing correct path. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Loop testing Loops need specialized tests to be tested efficiently. Heuristic testing strategy: Skip loop entirely. One loop iteration. Two loop iterations. # iterations much below max. n-1, n, n+1 iterations where n is max. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Black-box testing Complements clear-box testing. May require a large number of tests. Tests software in different ways. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Black-box test vectors Random tests. May weight distribution based on software specification. Regression tests. Tests of previous versions, bugs, etc. May be clear-box tests of previous versions. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. How much testing is enough? Exhaustive testing is impractical. One important measure of test quality---bugs escaping into field. Good organizations can test software to give very low field bug report rates. Error injection measures test quality: Add known bugs. Run your tests. Determine % injected bugs that are caught. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Program design and analysis Software modem. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Theory of operation Frequency-shift keying: separate frequencies for 0 and 1. 0 1 time © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. FSK encoding Generate waveforms based on current bit: 0110101 © 2008 Wayne Wolf bit-controlled waveform generator Overheads for Computers as Components 2nd ed. A/D converter FSK decoding © 2008 Wayne Wolf zero filter detector 0 bit one filter detector 1 bit Overheads for Computers as Components 2nd ed. Transmission scheme Send data in 8-bit bytes. Arbitrary spacing between bytes. Byte starts with 0 start bit. Receiver measures length of start bit to synchronize itself to remaining 8 bits. start (0) © 2008 Wayne Wolf bit 1 bit 2 bit 3 Overheads for Computers as Components 2nd ed. ... bit 8 Requirements Inputs Analog sound input, reset button. Outputs Analog sound output, LED bit display. Functions Transmitter: Sends data from memory in 8-bit bytes plus start bit. Receiver: Automatically detects bytes and reads bits. Displays current bit on LED. 1200 baud. Performance Manufacturing cost Power Physical size/weight © 2008 Wayne Wolf Dominated by microprocessor and analog I/O Powered by AC. Small desktop object. Overheads for Computers as Components 2nd ed. Specification Line-in* 1 1 Receiver sample-in() bit-out() input() Transmitter 1 1 bit-in() sample-out() © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Line-out* output() System architecture Interrupt handlers for samples: input and output. Transmitter. Receiver. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Transmitter Waveform generation by table lookup. float sine_wave[N_SAMP] = { 0.0, 0.5, 0.866, 1, 0.866, 0.5, 0.0, -0.5, -0.866, -1.0, 0.866, -0.5, 0}; time © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Receiver Filters (FIR for simplicity) use circular buffers to hold data. Timer measures bit length. State machine recognizes start bits, data bits. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Hardware platform CPU. A/D converter. D/A converter. Timer. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Component design and testing Easy to test transmitter and receiver on host. Transmitter can be verified with speaker outputs. Receiver verification tasks: start bit recognition; data bit recognition. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. System integration and testing Use loopback mode to test components against each other. Loopback in software or by connecting D/A and A/D converters. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Processes and operating systems Multiple tasks and multiple processes. Specifications of process timing. Preemptive real-time operating systems. Processes and UML. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Reactive systems Respond to external events. Engine controller. Seat belt monitor. Requires real-time response. System architecture. Program implementation. May require a chain reaction among multiple processors. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Tasks and processes A task is a functional description of a connected set of operations. (Task can also mean a collection of processes.) A process is a unique execution of a program. Several copies of a program may run simultaneously or at different times. A process has its own state: registers; memory. The operating system manages processes. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Why multiple processes? Multiple tasks means multiple processes. Processes help with timing complexity: multiple rates multimedia automotive asynchronous input user interfaces communication systems © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Multi-rate systems Tasks may be synchronous or asynchronous. Synchronous tasks may recur at different rates. Processes run at different rates based on computational needs of the tasks. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Example: engine control Tasks: spark control crankshaft sensing fuel/air mixture oxygen sensor Kalman filter © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. engine controller Typical rates in engine controllers Variable Full range time (ms) Update period (ms) Engine spark timing 300 2 Throttle 40 2 Air flow 30 4 Battery voltage 80 4 Fuel flow 250 10 Recycled exhaust gas 500 25 Status switches 100 20 Air temperature Seconds 400 Barometric pressure Seconds 1000 Spark (dwell) 10 1 Fuel adjustment 80 8 Carburetor 500 25 Mode actuators 100 Overheads for Computers as 100 © 2008 Wayne Wolf Components 2nd ed. Real-time systems Perform a computation to conform to external timing constraints. Deadline frequency: Periodic. Aperiodic. Deadline type: Hard: failure to meet deadline causes system failure. Soft: failure to meet deadline causes degraded response. Firm: late response is useless but some late responses can be tolerated. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Timing specifications on processes Release time: time at which process becomes ready. Deadline: time at which process must finish. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Release times and deadlines deadline P1 initiating event © 2008 Wayne Wolf period aperiodic process periodic process periodic process initiated initiated by event at start of period Overheads for Computers as Components 2nd ed. time Rate requirements on processes Period: interval between process activations. Rate: reciprocal of period. Initiatino rate may be higher than period--several copies of process run at once. © 2008 Wayne Wolf CPU 1 CPU 2 CPU 3 CPU 4 Overheads for Computers as Components 2nd ed. P11 P12 P13 P14 time Timing violations What happens if a process doesn’t finish by its deadline? Hard deadline: system fails if missed. Soft deadline: user may notice, but system doesn’t necessarily fail. © 2000 Morgan Kaufman Overheads for Computers as Components Example: Space Shuttle software error Space Shuttle’s first launch was delayed by a software timing error: Primary control system PASS and backup system BFS. BFS failed to synchronize with PASS. Change to one routine added delay that threw off start time calculation. 1 in 67 chance of timing problem. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Task graphs Tasks may have data dependencies---must execute in certain order. Task graph shows data/control dependencies between processes. Task: connected set of processes. Task set: One or more tasks. © 2008 Wayne Wolf P1 P2 P5 P3 P6 P4 task 1 Overheads for Computers as Components 2nd ed. task 2 task set Communication between tasks Task graph assumes that all processes in each task run at the same rate, tasks do not communicate. In reality, some amount of inter-task communication is necessary. It’s hard to require immediate response for multi-rate communication. © 2008 Wayne Wolf MPEG system layer MPEG audio Overheads for Computers as Components 2nd ed. MPEG video Process execution characteristics Process execution time Ti. Execution time in absence of preemption. Possible time units: seconds, clock cycles. Worst-case, best-case execution time may be useful in some cases. Sources of variation: Data dependencies. Memory system. CPU pipeline. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Utilization CPU utilization: Fraction of the CPU that is doing useful work. Often calculated assuming no scheduling overhead. Utilization: U = (CPU time for useful work)/ (total available CPU time) =[S = T/t © 2008 Wayne Wolf t1 ≤ t ≤ t2 T(t) ] / [t2 – t1] Overheads for Computers as Components 2nd ed. State of a process A process can be in one of three states: executing on the CPU; ready to run; waiting for data. executing gets CPU preempted needs data gets data ready waiting needs data © 2008 Wayne Wolf gets data and CPU Overheads for Computers as Components 2nd ed. The scheduling problem Can we meet all deadlines? Must be able to meet deadlines in all cases. How much CPU horsepower do we need to meet our deadlines? © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Scheduling feasibility Resource constraints make schedulability analysis NP-hard. Must show that the deadlines are met for all timings of resource requests. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. P1 P2 I/O device Simple processor feasibility Assume: No resource conflicts. Constant process execution times. Require: T2 T T ≥ Si Ti Can’t use more than 100% of the CPU. © 2008 Wayne Wolf T1 Overheads for Computers as Components 2nd ed. T3 Hyperperiod Hyperperiod: least common multiple (LCM) of the task periods. Must look at the hyperperiod schedule to find all task interactions. Hyperperiod can be very long if task periods are not chosen carefully. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Hyperperiod example Long hyperperiod: P1 7 ms. P2 11 ms. P3 15 ms. LCM = 1155 ms. Shorter hyperperiod: P1 8 ms. P2 12 ms. P3 16 ms. LCM = 96 ms. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Simple processor feasibility example P1 period 1 ms, CPU time 0.1 ms. P2 period 1 ms, CPU time 0.2 ms. P3 period 5 ms, CPU time 0.3 ms. © 2004 Wayne Wolf LCM P1 P2 P3 5.00E-03 peirod CPU time CPU time/LCM 1.00E-03 1.00E-04 5.00E-04 1.00E-03 2.00E-04 1.00E-03 5.00E-03 3.00E-04 3.00E-04 total CPU/LCM utilization Overheads for Computers as Components 2nd ed. 1.80E-03 3.60E-01 Cyclostatic/TDMA Schedule in time slots. T Same process activation irrespective of workload. 1 T2 P Time slots may be equal size or unequal. © 2004 Wayne Wolf Overheads for Computers as Components 2nd ed. T3 T1 T2 P T3 TDMA assumptions Schedule based on least common multiple (LCM) of the process periods. Trivial scheduler > very small scheduling overhead. © 2008 Wayne Wolf P1 P1 P2 Overheads for Computers as Components 2nd ed. P1 P2 PLCM TDMA schedulability Always same CPU utilization (assuming constant process execution times). Can’t handle unexpected loads. Must schedule a time slot for aperiodic events. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. TDMA schedulability example TDMA period = 10 ms. P1 CPU time 1 ms. P2 CPU time 3 ms. P3 CPU time 2 ms. P4 CPU time 2 ms. © 2008 Wayne Wolf TDMA period CPU time P1 1.00E-03 P2 3.00E-03 P3 2.00E-03 P4 2.00E-03 total 8.00E-03 utilization 8.00E-01 Overheads for Computers as Components 2nd ed. 1.00E-02 Round-robin Schedule process only if ready. Always test processes in the same order. Variations: T1 T2 P Constant system period. Start round-robin again after finishing a round. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. T3 T2 T3 P Round-robin assumptions Schedule based on least common multiple (LCM) of the process periods. Best done with equal time slots for processes. Simple scheduler -> low scheduling overhead. Can be implemented in hardware. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Round-robin schedulability Can bound maximum CPU load. May leave unused CPU cycles. Can be adapted to handle unexpected load. Use time slots at end of period. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Schedulability and overhead The scheduling process consumes CPU time. Not all CPU time is available for processes. Scheduling overhead must be taken into account for exact schedule. May be ignored if it is a small fraction of total execution time. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Running periodic processes Need code to control execution of processes. Simplest implementation: process = subroutine. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. while loop implementation Simplest implementation has one loop. No control over execution timing. © 2008 Wayne Wolf while (TRUE) { p1(); p2(); } Overheads for Computers as Components 2nd ed. Timed loop implementation Encapuslate set of all processes in a single function that implements the task set,. Use timer to control execution of the task. void pall(){ p1(); p2(); } No control over timing of individual processes. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Multiple timers implementation Each task has its own function. Each task has its own timer. May not have enough timers to implement all the rates. © 2008 Wayne Wolf void pA(){ /* rate A */ p1(); p3(); } void B(){ /* rate B */ p2(); p4(); p5(); } Overheads for Computers as Components 2nd ed. Timer + counter implementation Use a software count to divide the timer. Only works for clean multiples of the timer period. © 2008 Wayne Wolf int p2count = 0; void pall(){ p1(); if (p2count >= 2) { p2(); p2count = 0; } else p2count++; p3(); } Overheads for Computers as Components 2nd ed. Implementing processes All of these implementations are inadequate. Need better control over timing. Need a better mechanism than subroutines. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Processes and operating systems Operating systems. Operating systems The operating system controls resources: who gets the CPU; when I/O takes place; how much memory is allocated. The most important resource is the CPU itself. CPU access controlled by the scheduler. Process state A process can be in one of three states: executing on the CPU; ready to run; waiting for data. executing gets CPU preempted gets data and CPU needs data gets data ready waiting needs data Operating system structure OS needs to keep track of: process priorities; scheduling state; process activation record. Processes may be created: statically before system starts; dynamically during execution. Embedded vs. generalpurpose scheduling Workstations try to avoid starving processes of CPU access. Fairness = access to CPU. Embedded systems must meet deadlines. Low-priority processes may not run for a long time. Priority-driven scheduling Each process has a priority. CPU goes to highest-priority process that is ready. Priorities determine scheduling policy: fixed priority; time-varying priorities. Priority-driven scheduling example Rules: each process has a fixed priority (1 highest); highest-priority ready process gets CPU; process continues until done. Processes P1: priority 1, execution time 10 P2: priority 2, execution time 30 P3: priority 3, execution time 20 Priority-driven scheduling example P3 ready t=18 P2 ready t=0 P1 ready t=15 P2 0 P1 10 20 P2 30 P3 40 60 50 time The scheduling problem Can we meet all deadlines? Must be able to meet deadlines in all cases. How much CPU horsepower do we need to meet our deadlines? Process initiation disciplines Periodic process: executes on (almost) every period. Aperiodic process: executes on demand. Analyzing aperiodic process sets is harder--must consider worst-case combinations of process activations. Timing requirements on processes Period: interval between process activations. Initiation interval: reciprocal of period. Initiation time: time at which process becomes ready. Deadline: time at which process must finish. Timing violations What happens if a process doesn’t finish by its deadline? Hard deadline: system fails if missed. Soft deadline: user may notice, but system doesn’t necessarily fail. Example: Space Shuttle software error Space Shuttle’s first launch was delayed by a software timing error: Primary control system PASS and backup system BFS. BFS failed to synchronize with PASS. Change to one routine added delay that threw off start time calculation. 1 in 67 chance of timing problem. Interprocess communication Interprocess communication (IPC): OS provides mechanisms so that processes can pass data. Two types of semantics: blocking: sending process waits for response; non-blocking: sending process continues. IPC styles Shared memory: processes have some memory in common; must cooperate to avoid destroying/missing messages. Message passing: processes send messages along a communication channel---no common address space. Shared memory Shared memory on a bus: CPU 1 memory CPU 2 Race condition in shared memory Problem when two CPUs try to write the same location: CPU 1 reads flag and sees 0. CPU 2 reads flag and sees 0. CPU 1 sets flag to one and writes location. CPU 2 sets flag to one and overwrites location. Atomic test-and-set Problem can be solved with an atomic test-and-set: single bus operation reads memory location, tests it, writes it. ARM test-and-set provided by SWP: ADR r0,SEMAPHORE LDR r1,#1 GETFLAG SWP r1,r1,[r0] BNZ GETFLAG Critical regions Critical region: section of code that cannot be interrupted by another process. Examples: writing shared memory; accessing I/O device. Semaphores Semaphore: OS primitive for controlling access to critical regions. Protocol: Get access to semaphore with P(). Perform critical region operations. Release semaphore with V(). Message passing Message passing on a network: CPU 1 CPU 2 message message message Process data dependencies One process may not be able to start until another finishes. Data dependencies defined in a task graph. All processes in one task run at the same rate. P1 P2 P3 P4 Other operating system functions Date/time. File system. Networking. Security. Processes and operating systems Scheduling policies: RMS; EDF. Scheduling modeling assumptions. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Metrics How do we evaluate a scheduling policy: Ability to satisfy all deadlines. CPU utilization---percentage of time devoted to useful work. Scheduling overhead---time required to make scheduling decision. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Rate monotonic scheduling RMS (Liu and Layland): widely-used, analyzable scheduling policy. Analysis is known as Rate Monotonic Analysis (RMA). © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. RMA model All process run on single CPU. Zero context switch time. No data dependencies between processes. Process execution time is constant. Deadline is at end of period. Highest-priority ready process runs. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Process parameters Ti is computation time of process i; ti is period of process i. period ti Pi computation time Ti © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Rate-monotonic analysis Response time: time required to finish process. Critical instant: scheduling state that gives worst response time. Critical instant occurs when all higherpriority processes are ready to execute. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Critical instant interfering processes P1 P1 P2 critical instant P3 P1 P1 P2 P1 P2 P3 P4 © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. RMS priorities Optimal (fixed) priority assignment: shortest-period process gets highest priority; priority inversely proportional to period; break ties arbitrarily. No fixed-priority scheme does better. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. RMS example P2 period P2 P1 0 P1 period P1 P1 5 10 time © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. RMS CPU utilization Utilization for n processes is S i Ti / ti As number of tasks approaches infinity, maximum utilization approaches 69%. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. RMS CPU utilization, cont’d. RMS cannot use 100% of CPU, even with zero context switch overhead. Must keep idle cycles available to handle worst-case scenario. However, RMS guarantees all processes will always meet their deadlines. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. RMS implementation Efficient implementation: scan processes; choose highest-priority active process. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Earliest-deadline-first scheduling EDF: dynamic priority scheduling scheme. Process closest to its deadline has highest priority. Requires recalculating processes at every timer interrupt. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. EDF analysis EDF can use 100% of CPU. But EDF may fail to miss a deadline. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. EDF implementation On each timer interrupt: compute time to deadline; choose process closest to deadline. Generally considered too expensive to use in practice. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Fixing scheduling problems What if your set of processes is unschedulable? Change deadlines in requirements. Reduce execution times of processes. Get a faster CPU. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Priority inversion Priority inversion: low-priority process keeps high-priority process from running. Improper use of system resources can cause scheduling problems: Low-priority process grabs I/O device. High-priority device needs I/O device, but can’t get it until low-priority process is done. Can cause deadlock. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Solving priority inversion Give priorities to system resources. Have process inherit the priority of a resource that it requests. Low-priority process inherits priority of device if higher. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Data dependencies Data dependencies allow us to improve utilization. Restrict combination of processes that can run simultaneously. P1 and P2 can’t run simultaneously. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. P1 P2 Context-switching time Non-zero context switch time can push limits of a tight schedule. Hard to calculate effects---depends on order of context switches. In practice, OS context switch overhead is small (hundreds of clock cycles) relative to many common task periods (ms – ms). © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Processes and operating systems Interprocess communication. Operating system performance. Power management. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Interprocess communication OS provides interprocess communication mechanisms: various efficiencies; communication power. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Interprocess communication Interprocess communication (IPC): OS provides mechanisms so that processes can pass data. Two types of semantics: blocking: sending process waits for response; non-blocking: sending process continues. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. IPC styles Shared memory: processes have some memory in common; must cooperate to avoid destroying/missing messages. Message passing: processes send messages along a communication channel---no common address space. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Shared memory Shared memory on a bus: CPU 1 © 2008 Wayne Wolf memory Overheads for Computers as Components 2nd ed. CPU 2 Race condition in shared memory Problem when two CPUs try to write the same location: CPU 1 reads flag and sees 0. CPU 2 reads flag and sees 0. CPU 1 sets flag to one and writes location. CPU 2 sets flag to one and overwrites location. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Atomic test-and-set Problem can be solved with an atomic test-and-set: single bus operation reads memory location, tests it, writes it. ARM test-and-set provided by SWP: ADR r0,SEMAPHORE LDR r1,#1 GETFLAG SWP r1,r1,[r0] BNZ GETFLAG © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Critical regions Critical region: section of code that cannot be interrupted by another process. Examples: writing shared memory; accessing I/O device. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Semaphores Semaphore: OS primitive for controlling access to critical regions. Protocol: Get access to semaphore with P(). Perform critical region operations. Release semaphore with V(). © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Message passing Message passing on a network: CPU 1 CPU 2 message message message © 2008 Wayne Wolf Overheads for Computers as Components Process data dependencies One process may not be able to start until another finishes. Data dependencies defined in a task graph. All processes in one task run at the same rate. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. P1 P2 P3 P4 Signals in UML More general than Unix signal---may carry arbitrary data: <<signal>> aSig someClass <<send>> sigbehavior() p : integer © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Evaluating RTOS performance Simplifying assumptions: Context switch costs no CPU time,. We know the exact execution time of processes. WCET/BCET don’t depend on context switches. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Scheduling and context switch overhead Process Execution deadline time P1 3 5 P2 3 10 With context switch overhead of 1, no feasible schedule. 2TP1 + TP2 = 2*(1+3)+(1_3)=11 © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Process execution time Process execution time is not constant. Extra CPU time can be good. Extra CPU time can also be bad: Next process runs earlier, causing new preemption. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Processes and caches Processes can cause additional caching problems. Even if individual processes are wellbehaved, processes may interfere with each other. Worst-case execution time with bad behavior is usually much worse than execution time with good cache behavior. © 2008 Wayne Wolf Overheads for Computers as Components Effects of scheduling on the cache Process WCET Avg. CPU time P1 8 6 P2 4 3 P3 4 3 Schedule 1 (LRU cache): Schedule 2 (half of cache reserved for P1): © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Power optimization Power management: determining how system resources are scheduled/used to control power consumption. OS can manage for power just as it manages for time. OS reduces power by shutting down units. May have partial shutdown modes. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Power management and performance Power management and performance are often at odds. Entering power-down mode consumes energy, time. Leaving power-down mode consumes energy, time. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Simple power management policies Request-driven: power up once request is received. Adds delay to response. Predictive shutdown: try to predict how long you have before next request. May start up in advance of request in anticipation of a new request. If you predict wrong, you will incur additional delay while starting up. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Probabilistic shutdown Assume service requests are probabilistic. Optimize expected values: power consumption; response time. Simple probabilistic: shut down after time Ton, turn back on after waiting for Toff. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Advanced Configuration and Power Interface ACPI: open standard for power management services. applications device drivers OS kernel ACPI BIOS Hardware platform © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. power management ACPI global power states G3: mechanical off G2: soft off S1: S2: S3: S4: low wake-up latency with no loss of context low latency with loss of CPU/cache state low latency with loss of all state except memory lowest-power state with all devices off G1: sleeping state G0: working state © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Processes and operating systems Telephone answering machine. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Theory of operation Compress audio using adaptive differential pulse code modulation (ADPCM). analog time ADPCM 3 2 1 -1 -2 -3 time © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. ADPCM coding Coded in a small alphabet with positive and negative values. {-3,-2,-1,1,2,3} Minimize error between predicted value and actual signal value. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. ADPCM compression system S quantizer integrator inverse quantizer encoder samples inverse quantizer integrator decoder © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Telephone system terms Subscriber line: line to phone. Central office: telephone switching system. Off-hook: phone active. On-hook: phone inactive. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Real and simulated subscriber line Real subscriber line: 90V RMS ringing signal; companded analog signals; lightning protection, etc. Simulated subscriber line: microphone input; speaker output; switches for ring, off-hook, etc. © 2008 Wayne Wolf Overheads for Computers as Components Requirements Inputs Performance Telephone: voice samples, ring. User interface: microphone, play messages button, record OGM button. Telephone: voice samples, onhook/off-hook command. User interface: speaker, # messages indicator, message light. Default mode: detects ring, signals offhook, pays OGM, records ICM Playback: play all messages, wait 5 seconds for new playback. OGM editing: OGM up to 10 sec. About 30 minutes voice (@ 8kHz). Manufacturing cost Consumer product range ($50) Power Physical size/weight AC plug Comparable to desk phone. Outputs Functions © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Comments on analysis DRAM requirement influenced by DRAM price. Details of user interface protocol could be tested on a PC-based prototype. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Answering machine class diagram 1 1 Microphone* Line-in* Controls 1 1 1 Speaker* 1 1 1 Playback 1 Buttons* © 2008 Wayne Wolf 11 1 Line-out* 1 Record 1 1 1 Lights Overheads for Computers as Components 2nd ed. 1 * Outgoing* message * Incoming* message Physical interface classes Microphone* Line-in* Line-out* sample() sample() ring-indicator() sample() pick-up() Buttons* Lights* record-OGM play messages num-messages © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Speaker* sample() Message classes Message length start-adrs next-msg samples Incoming-message Outgoing-message msg-time length=30 sec © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Operational classes Controls Record Playback operate() record-msg() playback-msg() © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Software components Front panel module. Speaker module. Telephone line module. Telephone input and output modules. Compression module. Decompression module. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Controls activate behavior Compute buttons, line activations Activations? Play OGM Record OGM Play ICM Wait for timeout Erase Answer Play OGM Allocate ICM Erase Record ICM © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Record-msg/playback-msg behaviors nextadrs = 0 nextadrs = 0 msg.samples[nextadrs] = sample(source) speaker.samples() = msg.samples[nextadrs]; nextadrs++ F nextadrs=msg.length F End(source) T T playback-msg record-msg © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Hardware platform CPU. Memory. Front panel. 2 A/Ds: subscriber line, microphone. 2 D/A: subscriber line, speaker. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Component design and testing Must test performance as well as testing. Compression time shouldn’t dominate other tasks. Test for error conditions: memory overflow; try to delete empty message set, etc. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. System integration and testing Can test partial integration on host platform; full testing requires integration on target platform. Simulate phone line for tests: it’s legal; easier to produce test conditions. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Multiprocessors Why multiprocessors? CPUs and accelerators. Multiprocessor performance analysis. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Why multiprocessors? Better cost/performance. Match each CPU to its tasks or use custom logic (smaller, cheaper). CPU cost is a non-linear function of performance. cost performance © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Why multiprocessors? cont’d. Better real-time performance. Put time-critical functions on less-loaded processing elements. Remember RMS utilization---extra CPU cycles must be reserved to meet deadlines. cost deadline w. RMS overhead deadline performance © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Why multiprocessors? cont’d. Using specialized processors or custom logic saves power. Desktop uniprocessors are not power-efficient enough for batterypowered applications. © 2008 Wayne Wolf [Aus04] © 2004 IEEE Computer Society Overheads for Computers as Components 2nd ed. Why multiprocessors? cont’d. Good for processing I/O in real-time. May consume less energy. May be better at streaming data. May not be able to do all the work on even the largest single CPU. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Accelerated systems Use additional computational unit dedicated to some functions? Hardwired logic. Extra CPU. Hardware/software co-design: joint design of hardware and software architectures. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Accelerated system architecture request CPU accelerator result data data memory I/O © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Accelerator vs. coprocessor A co-processor executes instructions. Instructions are dispatched by the CPU. An accelerator appears as a device on the bus. The accelerator is controlled by registers. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Accelerator implementations Application-specific integrated circuit. Field-programmable gate array (FPGA). Standard component. Example: graphics processor. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. System design tasks Design a heterogeneous multiprocessor architecture. Processing element (PE): CPU, accelerator, etc. Program the system. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Accelerated system design First, determine that the system really needs to be accelerated. How much faster is the accelerator on the core function? How much data transfer overhead? Design the accelerator itself. Design CPU interface to accelerator. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Accelerated system platforms Several off-the-shelf boards are available for acceleration in PCs: FPGA-based core; PC bus interface. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Accelerator/CPU interface Accelerator registers provide control registers for CPU. Data registers can be used for small data objects. Accelerator may include special-purpose read/write logic. Especially valuable for large data transfers. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. System integration and debugging Try to debug the CPU/accelerator interface separately from the accelerator core. Build scaffolding to test the accelerator. Hardware/software co-simulation can be useful. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Caching problems Main memory provides the primary data transfer mechanism to the accelerator. Programs must ensure that caching does not invalidate main memory data. CPU reads location S. Accelerator writes location S. CPU writes location S. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. BAD Synchronization As with cache, main memory writes to shared memory may cause invalidation: CPU reads S. Accelerator writes S. CPU reads S. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Multiprocessor performance analysis Effects of parallelism (and lack of it): Processes. CPU and bus. Multiple processors. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Accelerator speedup Critical parameter is speedup: how much faster is the system with the accelerator? Must take into account: Accelerator execution time. Data transfer time. Synchronization with the master CPU. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Accelerator execution time Total accelerator execution time: taccel = tin + tx + tout Data input Data output Accelerated computation © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Accelerator speedup Assume loop is executed n times. Compare accelerated system to nonaccelerated system: S = n(tCPU - taccel) = n[tCPU - (tin + tx + tout)] Execution time on CPU © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Single- vs. multi-threaded One critical factor is available parallelism: single-threaded/blocking: CPU waits for accelerator; multithreaded/non-blocking: CPU continues to execute along with accelerator. To multithread, CPU must have useful work to do. But software must also support multithreading. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Total execution time Single-threaded: Multi-threaded: P1 P1 P2 A1 P2 P3 P3 P4 P4 © 2008 Wayne Wolf Overheads for Computers as Components A1 Execution time analysis Single-threaded: Count execution time of all component processes. © 2008 Wayne Wolf Multi-threaded: Find longest path through execution. Overheads for Computers as Components 2nd ed. Sources of parallelism Overlap I/O and accelerator computation. Perform operations in batches, read in second batch of data while computing on first batch. Find other work to do on the CPU. May reschedule operations to move work after accelerator initiation. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Data input/output times Bus transactions include: flushing register/cache values to main memory; time required for CPU to set up transaction; overhead of data transfers by bus packets, handshaking, etc. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Scheduling and allocation Must: schedule operations in time; allocate computations to processing elements. Scheduling and allocation interact, but separating them helps. Alternatively allocate, then schedule. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Example: scheduling and allocation P1 P2 M1 d1 M2 d2 P3 Task graph © 2008 Wayne Wolf Hardware platform Overheads for Computers as Components 2nd ed. First design Allocate P1, P2 -> M1; P3 -> M2. M1 P1 P1C P2 P2C M2 P3 time © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Second design Allocate P1 -> M1; P2, P3 -> M2: M1 P1 M2 P2 P1C P3 time © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Example: adjusting messages to reduce delay Task graph: 3 execution time 3 P1 P2 d1 4 Network: allocation M1 d2 P3 Transmission time = 4 © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. M2 M3 Initial schedule M1 P1 M2 P2 M3 P3 network d1 d2 Time = 15 0 © 2008 Wayne Wolf 5 10 Overheads for Computers as Components 2nd ed. 15 20 time New design Modify P3: reads one packet of d1, one packet of d2 computes partial result continues to next packet © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. New schedule M1 P1 M2 P2 M3 P3 P3 P3 P3 network d1d2d1d2d1d2d1d2 Time = 12 0 © 2008 Wayne Wolf 5 10 Overheads for Computers as Components 2nd ed. 15 20 time Buffering and performance Buffering may sequentialize operations. Next process must wait for data to enter buffer before it can continue. Buffer policy (queue, RAM) affects available parallelism. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Buffers and latency Three processes separated by buffers: B1 © 2008 Wayne Wolf A B2 B Overheads for Computers as Components 2nd ed. B3 C Buffers and latency schedules A[0] A[1] … B[0] B[1] … C[0] C[1] … © 2008 Wayne Wolf Must wait for all of A before getting any B A[0] B[0] C[0] A[1] B[1] C[1] … Overheads for Computers as Components 2nd ed. Multiprocessors Consumer electronics systems. Cell phones. CDs and DVDs. Audio players. Digital still cameras. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Consumer electronics use cases Multimedia: stored in compressed form, uncompressed on viewing. Data storage and management: keep track of your multimedia, etc. Communication: download, upload, chat. © 2000 Morgan Kaufman Overheads for Computers as Components Non-functional requirements for CE Often battery-operated, strict power budget., Very inexpensive. User interface must be capable but inexpensive. © 2000 Morgan Kaufman Overheads for Computers as Components CE devices and hosts Many devices talk to host system. PC host does things that are hard to do on the device. Increasingly, CE devices communicate directly over the network, avoiding the host for access. © 2000 Morgan Kaufman Overheads for Computers as Components Platforms and operating systems Many CE devices use a DSP for signal processing and a RISC CPU for other tasks. I/O devices include buttons, screen, USB. © 2000 Morgan Kaufman Overheads for Computers as Components Flash file systems Flash is widely used for mass storage. Flash wears out on writing (up to 1 million cycles). Directory is most often written, wears out first. Flash file system has layer that moves contents to levelize wear. Hides wear leveling from API. © 2000 Morgan Kaufman Overheads for Computers as Components Cell phones Most popular CE device in history; most widely used computing device. 1 billion sold per year. Handset talks to cell. Cells hand off handset as it moves. © 2000 Morgan Kaufman Overheads for Computers as Components Cell phone platforms Today’s cell phones use analog front end, digital baseband processing. Future cell phones will perform IF processing with DSP. Baseband processing in DSP: Voice compression. Network protocol. Other processing: Multimedia functions. User interface. File system. Applications (contacts, etc.) © 2000 Morgan Kaufman Overheads for Computers as Components CD/MP3 player Audio CPU memory Jog memory display amp DAC I2S © 2008 Wayne Wolf Error corrector Servo CPU memory Analog out focus, tracking, sled, Analog motor in FE, TE, amp Overheads for Computers as Components 2nd ed. drive head CD medium Rotational speed: 1.2-1.4 m/s (CLV). Track pitch: 1.6 microns. Diameter: 120 mm. Pit length: 0.8 -3 microns. Pit depth: .11 microns. Pit width: 0.5 microns. Laser wavelength: 780 nm. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. CD mechanism Laser, lens, sled: CD focus track track © 2008 Wayne Wolf diffraction grating laser sled detectors Overheads for Computers as Components 2nd ed. Laser focus Focus controlled by vertical position of lens. Unfocused beam causes irregular spot: Out of focus © 2008 Wayne Wolf In focus Overheads for Computers as Components 2nd ed. Out of focus Laser pickup Side spot detectors F A B D E © 2008 Wayne Wolf C Overheads for Computers as Components 2nd ed. Level: A+B+C+D Focus error: (A+C)-(B+D) Tracking error: E-F Servo control Four main signals: focus (laser) @ 245 kHz; tracking (laser) @ 245 kHz; sled (motor): @ 800 Hz; Disc motor. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Optical pickup EFM Eight-to-fourteen modulation: Fourteen-bit code guarantees a maximum distance between transitions. 00000011 © 2008 Wayne Wolf 00100100000000 Overheads for Computers as Components 2nd ed. Error correction CD capacity: 6.99 GB raw, 700 MB formatted. Reed-Solomon code: g(x) = (x-a) (x- a2) … (x- an-k-1) (x- an-k) Produces data, erasure bits. Time to solve varies greatly depending on noise. CD interleaves Reed-Solomon blocks to reduce effects of large data gaps. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Control and error correction Skips caused by physical disturbance. Wait for disturbance to subside. Retry. Read errors caused by disc/servo problems. Detect error. Choose location for retry. Retry. Fail and interpolate. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. MPEG audio standards Layer 1: Lossless compression of subbands + optional simple masking model Layer 2: More advanced masking model. Layer 3: Additional processing for lower bit rates. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. MPEG audio rates Input sampling rates: 32, 44.1, 48 kHz. Output bit rates: 23, 48, 64, 96, 112, 128, 192, 256, 384 kbits/sec. Output can be mono, dual-channel (bilingual, etc.), stereo. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Other standards Dolby Digital (AC-3): Uses modified discrete cosine transform. ATRAC (MiniDisc): Uses subband + modified DCT. MPEG-2 AAC. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. MPEG Layer 1 384 samples/block at all frequencies. Equals 8 ms at 48 kHz. Optional masking model. Driven by separate FFT for better accuracy. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. MPEG Layer 1 data frame Bit allocation codes specify word length in each subband. Scale factors give gain for each band. header CRC © 2008 Wayne Wolf bit allocation scale factors subband samples Overheads for Computers as Components 2nd ed. aux data MPEG Layer 1 encoder Choose Scale factor Filter bank mux * requantize 0101.. FFT © 2008 Wayne Wolf Masking model Overheads for Computers as Components 2nd ed. MPEG Layer 1 decoder Scale factor demux 0101.. inverse quantize * * expand Step size © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Inverse filter bank MP3 Decoding is easier than encoding, but requires: decompression; filtering. Basic CD standard for data discs. No standards for MP3 disc file structure: player must understand Windows, Mac, Unix discs. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Audio players Audio players may use flash, hard disk, or CD for mass storage. Decompression requires small amount of CPU: 10% of ARM7. File system must be compatible (FAT). © 2000 Morgan Kaufman Overheads for Computers as Components Digital still cameras DSC must determine exposure before taking picture. After taking picture: Improve image quality. Compress. Save as file. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Digital still camera architecture DSC uses CPU for general-purpose processing, DSP for image processing. Internal memory buffers the passes on the image. Display is lower resolution than image sensor. Image must be downsampled. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Image capture Before taking picture: Determine exposure. Determine focus. Optimize white balance. Bayer pattern © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Image processing Must perform basic processing to get usable picture: Bayer->RGB interpolation. DSCs perform many functions formerly performed by photoprocessors for film: Image sharpening. Color balance. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. File management EXIF standard gives format for digital pictures: Format of data in a file. Directory structure. EXIF file includes: Image (JPEG, etc.) Thumbnail. Metadata (camera type, date/time, etc.) © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Accelerators Example: video accelerator © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Concept Build accelerator for block motion estimation, one step in video compression. Perform two-dimensional correlation: f2 f2f2 f2f2f2f2f2f2f2 Frame 1 © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Block motion estimation MPEG divides frame into 16 x 16 macroblocks for motion estimation. Search for best match within a search range. Measure similarity with sum-of-absolutedifferences (SAD): S | M(i,j) - S(i-ox, j-oy) | © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Best match Best match produces motion vector for motion block: © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Full search algorithm bestx = 0; besty = 0; bestsad = MAXSAD; for (ox = - SEARCHSIZE; ox < SEARCHSIZE; ox++) { for (oy = -SEARCHSIZE; oy < SEARCHSIZE; oy++) { int result = 0; for (i=0; i<MBSIZE; i++) { for (j=0; j<MBSIZE; j++) { result += iabs(mb[i][j] - search[iox+XCENTER][j-oy-YCENTER]); © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Full search algorithm, cont’d. } } if (result <= bestsad) { bestsad = result; bestx = ox; besty = oy; } } } © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Computational requirements Let MBSIZE = 16, SEARCHSIZE = 8. Search area is 8 + 8 + 1 in each dimension. Must perform: nops = (16 x 16) x (17 x 17) = 73984 ops CIF format has 352 x 288 pixels -> 22 x 18 macroblocks. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Accelerator requirements name block motion estimator purpose block motion est. in PC inputs macroblocks, search areas outputs motion vectors functions performance compute motion vectors with full search as fast as possible manufacturing cost hundreds of dollars power from PC power supply physical size/weight PCI card © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Accelerator data types, basic classes Motion-vector Macroblock Search-area x, y : pos pixels[] : pixelval pixels[] : pixelval PC Motion-estimator memory[] compute-mv() © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Sequence diagram :PC :Motion-estimator compute-mv() Search area memory[] memory[] macroblocks © 2000 Morgan Kaufman memory[] Overheads for Computers as Components 2nd ed. Architectural considerations Requires large amount of memory: macroblock has 256 pixels; search area has 1,089 pixels. May need external memory (especially if buffering multiple macroblocks/search areas). © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. © 2008 Wayne Wolf network PE 0 PE 1 comparator ctrl ... network macroblock Address generator search area Motion estimator organization Motion vector PE 15 Overheads for Computers as Components 2nd ed. Pixel schedules PE 0 PE 1 PE 2 |M(0,0)-S(0,0)| M(0,0) |M(0,1)-S(0,1)| |M(0,0)-S(0,1)| |M(0,2)-S(0,2)| |M(0,1)-S(0,2)| |M(0,0)-S(0,2)| S(0,2) © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. System testing Testing requires a large amount of data. Use simple patterns with obvious answers for initial tests. Extract sample data from JPEG pictures for more realistic tests. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Networking for Embedded Systems Why we use networks. Network abstractions. Example networks. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Network elements distributed computing platform: PE PE communication link network PE PEs may be CPUs or ASICs. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Networks in embedded systems initial processing more processing PE sensor PE actuator PE © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Why distributed? Higher performance at lower cost. Physically distributed activities---time constants may not allow transmission to central site. Improved debugging---use one CPU in network to debug others. May buy subsystems that have embedded processors. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Network abstractions International Standards Organization (ISO) developed the Open Systems Interconnection (OSI) model to describe networks: 7-layer model. Provides a standard way to classify network components and operations. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. OSI model application end-use interface presentation data format session © 2008 Wayne Wolf application dialog control transport connections network end-to-end service data link reliable data transport physical mechanical, electrical Overheads for Computers as Components 2nd ed. OSI layers Physical: connectors, bit formats, etc. Data link: error detection and control across a single link (single hop). Network: end-to-end multi-hop data communication. Transport: provides connections; may optimize network resources. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. OSI layers, cont’d. Session: services for end-user applications: data grouping, checkpointing, etc. Presentation: data formats, transformation services. Application: interface between network and end-user programs. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Hardware architectures Many different types of networks: topology; scheduling of communication; routing. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Point-to-point networks One source, one or more destinations, no data switching (serial port): PE 1 PE 2 link 1 © 2008 Wayne Wolf PE 3 link 2 Overheads for Computers as Components 2nd ed. Bus networks Common physical connection: PE 1 header © 2008 Wayne Wolf PE 2 address PE 3 data PE 4 ECC Overheads for Computers as Components 2nd ed. packet format Bus arbitration Fixed: Same order of resolution every time. Fair: every PE has same access over long periods. round-robin: rotate top priority among Pes. fixed round-robin A B C A B C A B C B C A A,B,C © 2008 Wayne Wolf A,B,C Overheads for Computers as Components 2nd ed. Crossbar out4 out3 out2 out1 in1 © 2008 Wayne Wolf in2 in3 in4 Overheads for Computers as Components 2nd ed. Crossbar characteristics Non-blocking. Can handle arbitrary multi-cast combinations. Size proportional to n2. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Multi-stage networks Use several stages of switching elements. Often blocking. Often smaller than crossbar. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Message-based programming Transport layer provides message-based programming interface: send_msg(adrs,data1); Data must be broken into packets at source, reassembled at destination. Data-push programming: make things happen in network based on data transfers. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. I2C bus Designed for low-cost, medium data rate applications. Characteristics: serial; multiple-master; fixed-priority arbitration. Several microcontrollers come with builtin I2C controllers. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. I2C physical layer master 1 data line SDL clock line SCL slave 1 © 2008 Wayne Wolf master 2 Overheads for Computers as Components 2nd ed. slave 2 I2C data format SCL ... SDL ... start © 2008 Wayne Wolf MSB Overheads for Computers as Components 2nd ed. ... ack I2C electrical interface Open collector interface: + SDL + SCL © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. I2C signaling Sender pulls down bus for 0. Sender listens to bus---if it tried to send a 1 and heard a 0, someone else is simultaneously transmitting. Transmissions occur in 8-bit bytes. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. I2C data link layer Every device has an address (7 bits in standard, 10 bits in extension). Bit 8 of address signals read or write. General call address allows broadcast. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. I2C bus arbitration Sender listens while sending address. When sender hears a conflict, if its address is higher, it stops signaling. Low-priority senders relinquish control early enough in clock cycle to allow bit to be transmitted reliably. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. I2C transmissions multi-byte write S adrs 0 data data 1 data P 0 data S P read from slave S adrs write, then read S adrs © 2008 Wayne Wolf adrs Overheads for Computers as Components 2nd ed. 1 data P Ethernet Dominant non-telephone LAN. Versions: 10 Mb/s, 100 Mb/s, 1 Gb/s Goal: reliable communication over an unreliable medium. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Ethernet topology Bus-based system, several possible physical layers: A © 2008 Wayne Wolf B Overheads for Computers as Components 2nd ed. C CSMA/CD Carrier sense multiple access with collision detection: sense collisions; exponentially back off in time; retransmit. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Exponential back-off times time © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Ethernet packet format preamble start frame © 2008 Wayne Wolf source adrs dest data length padding CRC adrs payload Overheads for Computers as Components 2nd ed. Ethernet performance Quality-of-service tends to non-linearly decrease at high load levels. Can’t guarantee real-time deadlines. However, may provide very good service at proper load levels. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Fieldbus Used for industrial control and instrumentation---factories, etc. H1 standard based on 31.25 MB/s twisted pair medium. High Speed Ethernet (HSE) standard based on 100 Mb/s Ethernet. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. System design techniques Design methodologies. Requirements and specification. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Design methodologies Process for creating a system. Many systems are complex: large specifications; multiple designers; interface to manufacturing. Proper processes improve: quality; cost of design and manufacture. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Product metrics Time-to-market: beat competitors to market; meet marketing window (back-to-school). Design cost. Manufacturing cost. Quality. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Mars Climate Observer Lost on Mars in September 1999. Requirements problem: Requirements did not specify units. Lockheed Martin used English; JPL wanted metric. Not caught by manual inspections. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Design flow Design flow: sequence of steps in a design methodology. May be partially or fully automated. Use tools to transform, verify design. Design flow is one component of methodology. Methodology also includes management organization, etc. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Waterfall model Early model for software development: requirements architecture coding testing maintenance © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Waterfall model steps Requirements: determine basic characteristics. Architecture: decompose into basic modules. Coding: implement and integrate. Testing: exercise and uncover bugs. Maintenance: deploy, fix bugs, upgrade. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Waterfall model critique Only local feedback---may need iterations between coding and requirements, for example. Doesn’t integrate top-down and bottomup design. Assumes hardware is given. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Spiral model system feasibility specification prototype initial system enhanced system design requirements test © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Spiral model critique Successive refinement of system. Start with mock-ups, move through simple systems to full-scale systems. Provides bottom-up feedback from previous stages. Working through stages may take too much time. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Successive refinement model specify specify architect architect design design build build test initial system © 2008 Wayne Wolf test refined system Overheads for Computers as Components 2nd ed. Hardware/software design flow requirements and specification architecture software design hardware design integration testing © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Co-design methodology Must architect hardware and software together: provide sufficient resources; avoid software bottlenecks. Can build pieces somewhat independently, but integration is major step. Also requires bottom-up feedback. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Hierarchical design flow Embedded systems must be designed across multiple levels of abstraction: system architecture; hardware and software systems; hardware and software components. Often need design flows within design flows. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Hierarchical HW/SW flow spec spec spec architecture HWSW architecture architecture HW SW detailed detailed design design integrate integration integration test testtest system © 2008 Wayne Wolf hardware software Overheads for Computers as Components 2nd ed. Concurrent engineering Large projects use many people from multiple disciplines. Work on several tasks at once to reduce design time. Feedback between tasks helps improve quality, reduce number of later design problems. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Concurrent engineering techniques Cross-functional teams. Concurrent product realization. Incremental information sharing. Integrated product management. Supplier involvement. Customer focus. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. AT&T PBX concurrent engineering Benchmark against competitors. Identify breakthrough improvements. Characterize current process. Create new process. Verify new process. Implement. Measure and improve. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Requirements analysis Requirements: informal description of what customer wants. Specification: precise description of what design team should deliver. Requirements phase links customers with designers. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Types of requirements Functional: input/output relationships. Non-functional: timing; power consumption; manufacturing cost; physical size; time-to-market; reliability. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Good requirements Correct. Unambiguous. Complete. Verifiable: is each requirement satisfied in the final system? Consistent: requirements do not contradict each other. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Good requirements, cont’d. Modifiable: can update requirements easily. Traceable: know why each requirement exists; go from source documents to requirements; go from requirement to implementation; back from implementation to requirement. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Setting requirements Customer interviews. Comparison with competitors. Sales feedback. Mock-ups, prototypes. Next-bench syndrome (HP): design a product for someone like you. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Specifications Capture functional and non-functional properties: verify correctness of spec; compare spec to implementation. Many specification styles: control-oriented vs. data-oriented; textual vs. graphical. UML is one specification/design language. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. SDL Used in telecommunications protocol design. Event-oriented state machine model. telephone on-hook caller goes off-hook dial tone caller gets dial tone © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Statecharts Ancestor of UML state diagrams. Provided composite states: OR states; AND states. Composite states reduce the size of the state transition graph. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Statechart OR state s123 i1 i1 S1 S1 i2 i1 S2 i2 i1 S4 S2 i2 S3 S3 traditional © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. OR state i2 S4 Statechart AND state sab c S1-3 S1 S1-4 S3 d a b b a b a c d c S2-3 S2 S2-4 d r r r S5 AND state traditional © 2008 Wayne Wolf S4 Overheads for Computers as Components 2nd ed. S5 AND-OR tables Alternate way of specifying complex conditions: cond1 or (cond2 and !cond3) OR AND cond1 cond2 cond3 © 2008 Wayne Wolf T - T F Overheads for Computers as Components 2nd ed. TCAS II specification TCAS II: aircraft collision avoidance system. Monitors aircraft and air traffic info. Provides audio warnings and directives to avoid collisions. Leveson et al used RMSL language to capture the TCAS specification. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. RMSL State description: state1 Transition bus for transitions between many states: inputs a state description b c outputs © 2008 Wayne Wolf d Overheads for Computers as Components 2nd ed. TCAS top-level description CAS power-on Inputs: power-off TCAS-operational-status {operational,not-operational} fully-operational own-aircraft other-aircraft i:[1..30] mode-s-ground-station i:[1..15] © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. C standby Own-Aircraft AND state CAS Inputs: own-alt-radio: integer standby-discrete-input: {true,false} own-alt-barometric:integer, etc. Effective-SL Alt-SL 1 1 2 2 ... ... 7 7 Alt-layer Climb-inibit ... Descend-inibit ... ... Increase-Descend-inibit ... Increase-climb-inibit ... Advisory-Status Outputs: sound-aural-alarm: {true,false} aural-alarm-inhibit: {true, false} combined-control-out: enumerated, etc. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. ... CRC cards Well-known method for analyzing a system and developing an architecture. CRC: classes; responsibilities of each class; collaborators are other classes that work with a class. Team-oriented methodology. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. CRC card format Class name: Superclasses: Subclasses: Responsibilities: Collaborators: Class name: Class’s function: Attributes: front © 2008 Wayne Wolf back Overheads for Computers as Components 2nd ed. CRC methodology Develop an initial list of classes. Simple description is OK. Team members should discuss their choices. Write initial responsibilities/collaborators. Helps to define the classes. Create some usage scenarios. Major uses of system and classes. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. CRC methodology, cont’d. Walk through scenarios. See what works and doesn’t work. Refine the classes, responsibilities, and collaborators. Add class relatoinships: superclass, subclass. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. CRC cards for elevator Real-world classes: elevator car, passenger, floor control, car control, car sensor. Architectural classes: car state, floor control reader, car control reader, car control sender, scheduler. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Elevator responsibilities and collaborators class responsibilities Elevator car* Move up and down Car control, car sensor, car control sender Transmits car Passenger, floor requests control reader Car control* Car state © 2008 Wayne Wolf Reads current position of car Overheads for Computers as Components 2nd ed. collaborators Scheduler, car sensor System design techniques Quality assurance. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Quality assurance Quality judged by how well product satisfies its intended function. May be measured in different ways for different kinds of products. Quality assurance (QA) makes sure that all stages of the design process help to deliver a quality product. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Therac-25 Medical Imager (Leveson and Turner) Six known accidents: radiation overdoses leading to death and serious injury. Radiation gun controlled by PDP-11. Four major software components: stored data; scheduler; set of tasks; interrupt services. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Therac-25 tasks Treatment monitor controlled and monitored setup and delivery of treatment in eight phases. Servo task controlled radiation gun. Housekeeper task took care of status interlocks and limit checks. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Treatment monitor task Treat was main monitor task. Eight subroutines. Treat rescheduled itself after every subroutine. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Software timing race Timing-dependent use of mode and energy: if keyboard handler sets completion behavior before operator changes mode/energy data, Datent task will not detect the change, but Hand task will. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Software timing errors Changes to parameters made by operator may show on screen but not be sensed by Datent task. One accident caused by entering mode/energy, changing mode/energy, returning to command line in 8 seconds. Skilled operators typed faster, more likely to exercise bug. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Leveson and Turner observations Performed limited safety analysis: guessed at error probabilities, etc. Did not use mechanical backups to check machine operation. Used overly complex programs written in unreliable styles. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. ISO 9000 Developed by International Standards organization. Applies to a broad range industries. Concentrates on process. Validation based on extensive documentation of organization’s process. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. CMU Capability Maturity Model Five levels of organizational maturity: Initial: poorly organized process, depends on individuals. Repeatable: basic tracking mechanisms. Defined: processes documented and standardized. Managed: makes detailed measurements. Optimizing: measurements used for improvement. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Verification cost to fix Verification and testing are important throughout the design flow. Early bugs are more expensive to fix: © 2008 Wayne Wolf requirements bug coding bug Overheads for Computers as Components 2nd ed. time Verifying requirements and specification Requirements: prototypes; prototyping languages; pre-existing systems. Specifications: usage scenarios; formal techniques. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Design review Uses meetings to catch design flaws. Simple, low-cost. Proven by experiments to be effective. Use other people in the project/company to help spot design problems. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Design review players Designers: present design to rest of team, make changes. Review leader: coordinates process. Review scribe: takes notes of meetings. Review audience: looks for bugs. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Before the design review Design team prepares documents used to describe the design. Leader recruits audience, coordinates meetings, distributes handouts, etc. Audience members familiarize themselves with the documents before they go to the meeting. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Design review meeting Leader keeps meeting moving; scribe takes notes. Designers present the design: use handouts; explain what is going on; go through details. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Design review audience Look for any problems: Is the design consistent with the specification? Is the interface correct? How well is the component’s internal architecture designed? Did they use good design/coding practices? Is the testing strategy adequate? © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Follow-up Designers make suggested changes. Document changes. Leader checks on results of changes, may distribute to audience for further review or additional reviews. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Measurements Measurements help ground our beliefs: Do our practices really work? Do they work where we think they work? Types of measurements: bugs found at different stages of design; bugs as a function of time; bugs in different types of components; how bugs are found. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.