LECTURE NOTES ON Embedded systems 2018 2019 IV B. Tech I Semester (JNTUA-R15) Mr. M. Jagadeesh Babu, Associate Professor Chapter No 1 Unit 1 :Introduction to Embedded System 1.1 Embedded system introduction 1.2 Host and Target Concept 1.3 Embedded Applications 1.4 Features and Architecture considerations for Embedded systemsROM, RAM, Timers 1.5 Data and Address Bus concept 1.6 Embedded Processor and their types 1.7 Memory types (Student seminar) 1.8 Overview of design process of embedded systems 1.9 Programming languages and tools for embedded design Unit -II Embedded Processor Architecture 2 2.1 CISC Vs RISC design philosophy 2.2 Von-Neumann Vs Harvard architecture 2.3 Introduction to ARM architecture and Cortex 2.4 2.5 Introduction to the TM4C family viz.TM4C123x & TM4C129x and its targeted applications. TM4C block diagram 2.6 Address space 2.7 on-chip peripherals (analog and digital) Register sets 2.8 Addressing modes and instruction set basics. M series Unit III : Overview of Microcontroller and Embedded Systems 3 3.1 Embedded hardware and various building blocks 3.2 Processor Selection for an Embedded System 3.3 Interfacing Processor, Memories and I/O Devices 3.4 I/O Devices and I/O interfacing concepts 3.5 I/O Devices and I/O interfacing concepts 3.6 Timer and Counting Devices, 3.7 Serial Communication and Advanced I/O, 3.8 Buses between the Networked Multiple Devices 3.9 Embedded System Design 3.10 Co-design Issues in System Development Process 3.11 Design Cycle in the Development Phase for an Embedded System 3.12 Uses of Target System or its Emulator and In-Circuit Emulator (ICE) Page No 3.13 Use of Software Tools for Development of an Embedded System 3.14 Design metrics of embedded systems - low power, high Performance, UNIT-IV : Microcontroller fundamentals for basic programming 4 4.1 I/O pin multiplexing 4.2 pull up/down registers 4.3 GPIO control 4.4 Memory Mapped Peripherals, 4.5 programming System registers 4.6 Watchdog Timer 4.7 Need of low power for embedded systems 4.8 System clocks and Control 4.9 Hibernation Module on TM4C 4.10 Active Vs Standby current consumption 4.11 Introduction to Interrupts, Interrupt Vector Table, Interrupt Programming 4.12 Basic Timer 4.13 Real Time clock (RTC) 4.14 Motion Control Peripherals : PWM 4.15 Module and Quadrature Encoder Interfacing (QEI) Unit-V : Embedded communications protocols and Internet of Things 5.1 Synchronous/Asynchronous interfaces (UART, SPI, I2C, USB) 5.2 Communication Basics and Baud Rate Concepts 5.3 Interfacing digital and analog external device 5.4 Implementing and programming UART, SPI and I2C, SPI interface using TM4C Case Study: Tiva based embedded system application using the interface protocols for communication 5.5 5 5.6 IoT overview and architecture 5.7 Overview of wireless sensor networks and design examples 5.8 Adding Wi-Fi capability to the Microcontroller 5.9 Embedded Wi-Fi 5.10 Building IOT Applications using CC3100 user API 5.11 Case study : Tiva Based Embedded Networking Applications : - UNIT I Introduction to Embedded System 1.1 Embedded System An embedded system is a combination of hardware and software with some attached peripherals to perform a specific task or a narrow range of tasks with restricted resources. It is an electronic system that is not directly programmed by the user, unlike a personal computer. An embedded system is a device that incorporates a computer within its implementation, primarily as a means to simplify the system design, and to provide flexibility; and the user of the device is not even aware that a computer is present. It is a microcontroller-based, softwaredriven, reliable, real time control system,. Autonomous or human or network interactive, operating on diverse physical variables and in diverse environments, and sold in a competitive and cost conscious market. Generally, an embedded system is a subsystem in ~ larger system and it is application specific. The generic block diagram of an embedded system is shown in Figure 1.1. Every embedded system consists of certain input devices such as: key boards, switches, sensors, actuators; output devices such as: displays, buzzers, sensors; processor along with a control program embedded in the off-chip or on-chip memory, and a real time operating system (RTOS). Input Output Figure 1.1: Block Diagram of a Generic Embedded System. An embedded system exhibits different characteristics such as: Single functionality, No reprogrammability, Security, Reliability, Dependability, Robustness and Efficiency in terms of cost, weight, energy, size, and speed. Designing the system to meet these characteristics is very important in the success of the final product. A specialized computer system. That is part of a larger system or machine. Typically, an embedded system is housed on a single microprocessor board with the programs stored in ROM. Virtually all appliances that have a digital Interface -- watches, microwaves, VCRs, cars -- utilize embedded systems. Some embedded systems include an operating system, but many are so specialized that the entire logic can be implemented as a single program. Embedded systems programming is the development of programs intended to be part of a larger operating system or, in a somewhat different usage, to be incorporated on a microprocessor that can then be included as part of a variety of hardware devices. Several other definitions are: A combination of computer hardware and software, and perhaps additional mechanical or other parts, designed to perform a dedicated function. In some cases, embedded systems are part of a larger system or product, as in the case of an antilock braking system in a car. Contrast with general-purpose computer. A specialized computer system which is dedicated to a specific task. Embedded systems range in size from a single processing board to systems with operating systems (ex, Linux, Windows® NT Embedded). Examples of embedded systems are medical equipment and manufacturing equipment. A computer system that is a component of a larger machine or system. Embedded systems can respond to events in real time. Most digital appliances, such as watches or cars, utilize an embedded system. Hardware and software that forms a component of some larger system and is expected to function without human intervention. Typically an embedded system consists of a single-board microcomputer with software in ROM, which starts running a dedicated application as soon as power is turned on and does not stop until power is turned off. An embedded system is some combination of computer hardware and software, either fixed in capability or programmable, that is specifically designed for a particular kind of application device. Industrial machines, automobiles, medical equipment, cameras, household appliances, airplanes, vending machines, and toys (as well as the more obvious cellular phone and PDA) are among the myriad possible hosts of an embedded system. A phrase that refers to a device that contains computer logic on a chip inside it. Such equipment is electrical or battery powered. The chip controls one or more functions of the equipment, such as remembering how long it has been since the device last received maintenance, An embedded system is a special-purpose computer system, which is completely encapsulated by the device it controls. An embedded system has specific requirements and performs pre-defined tasks, unlike a general-purpose personal computer. 1.1.2 Characteristics of an Embedded System: The important characteristics of an embedded system are Speed (bytes/sec) : Should be high speed Power (watts) : Low power dissipation Size and weight : As far as possible small in size and low weight Accuracy (% error) : Must be very accurate Adaptability: High adaptability and accessibility. Reliability: Must be reliable over a long period of time. So, an embedded system must perform the operations at a high speed so that it can be readily used for real time applications and its power consumption must be very low and the size of the system should be as for as possible small and the readings must be accurate with minimum error. The system must be easily adaptable for different situations. 1.1.3 Categories of Embedded systems: Embedded systems can be classified into the following 4 categories based on their functional and performance requirements. Functional 1. Stand-alone embedded systems 2. Real time embedded system a) Hard real time E.S b) Soft Real time E.S Performance 1. Small scale embedded system 2. Medium scale embedded s/m 3. Large scale embedded system Or 3. Networked embedded system 4. Mobile embedded system Sophisticated Embedded Systems Stand-alone Embedded systems: A stand-alone embedded system works by itself. It is a self-contained device which does not require any host system like a computer. It takes either digital or analog inputs from its input ports, calibrates, converts, and processes the data, and outputs the resulting data to its attached output device, which either displays data, or controls and drives the attached devices. EX: Temperature measurement systems, Video game consoles, MP3 players, digital cameras, and microwave ovens are the examples for this category. Real-time embedded systems: An embedded system which gives the required output in a specified time or which strictly follows the time deadlines for completion of a task is known as a Real time system i.e. a Real Time system, in addition to functional correctness, also satisfies the time constraints . There are two types of Real time systems. (i) Soft real time system and (ii) Hard real time system. Soft Real-Time system: A Real time system in which, the violation of time constraints will cause only the degraded quality, but the system can continue to operate is known as a Soft real time system. In soft real-time systems, the design focus is to offer a guaranteed bandwidth to each real-time task and to distribute the resources to the tasks. Ex: A Microwave Oven, washing machine, TV remote etc. Hard Real-Time system: A Real time system in which, the violation of time constraints will cause critical failure and loss of life or property damage or catastrophe is known as a Hard Real time system. These systems usually interact directly with physical hardware instead of through a human being .The hardware and software of hard real-time systems must allow a worst case execution (WCET) analysis that guarantees the execution be completed within a strict deadline. The chip selection and RTOS selection become important factors for hard real-time system design. Ex: Deadline in a missile control embedded system , Delayed alarm during a Gas leakage , car airbag control system , A delayed response in pacemakers ,Failure in RADAR functioning etc. Networked embedded systems: The networked embedded systems are related to a network with network interfaces to access the resources. The connected network can be a Local Area Network (LAN) or a Wide Area Network (WAN), or the Internet. The connection can be either wired or wireless. The networked embedded system is the fastest growing area in embedded systems applications. The embedded web server is such a system where all embedded devices are connected to a web server and can be accessed and controlled by any web browser. Ex: A home security system is an example of a LAN networked embedded system where all sensors (e.g. motion detectors, light sensors, or smoke sensors) are wired and running on the TCP/IP protocol. Mobile Embedded systems: The portable embedded devices like mobile and cellular phones, digital cameras, MP3 players, PDA (Personal Digital Assistants) are the example for mobile embedded systems. The basic limitation of these devices is the limitation of memory and other resources. Based on the performance of the Microcontroller they are also classified into (i) Small scaled embedded system (ii) Medium scaled embedded system and (iii) Large scaled embedded system. 1.1.4 Classifications of Embedded systems 1. Small Scale Embedded Systems: These systems are designed with a single 8- or 16bit microcontroller; they have little hardware and software complexities and involve board- level design. They may even be battery operated. When developing embedded software for these, an editor, assembler and cross assembler, specific to the microcontroller or processor used, are the main programming tools. Usually, C used for developing these systems. C program compilation is done into the assembly, and executable codes are then appropriately located in the system memory. The software has to fit within the memory available and keep in view the need to limit power dissipation when system is running continuously. 2. Medium Scale Embedded Systems: These systems are usually designed with a single or few 16- or 32-bit microcontrollers or DSPs or Reduced Instruction Set Computers (RISCs). These have both hardware and software complexities. For complex software design, there are the following programming tools: RTOS, Source code engineering tool, Simulator, Debugger and Integrated Development Environment (IDE). Software tools also provide the solutions to the hardware complexities. An assembler is of little use as a programming tool. These systems may also employ the readily available ASSPs and IPs (explained later) for the various functions for example, for the bus interfacing, encrypting, deciphering, discrete cosine transformation and inverse transformation, TCP/IP protocol stacking and network connecting functions. 3. Sophisticated Embedded Systems: Sophisticated embedded systems have enormous hardware and software complexities and may need scalable processors or configurable processors and programmable logic arrays. They are used for cutting edge applications that need hardware and software co-design and integration in the final system; however, they are constrained by the processing speeds available in their hardware units. Certain software functions such as encryption and deciphering algorithms, discrete cosine transformation and inverse transformation algorithms, TCP/IP protocol stacking and network driver functions are implemented in the hardware to obtain additional speeds by saving time. Some of the functions of the hardware resources in the system are also implemented by the software. Development tools for these systems may not be readily available at a reasonable cost or may not be available at all. In some cases, a compiler or retarget able compiler might have to be developed for these. 1.2 Host and Target machine An embedded system is a special-purpose system in which the computer is completely encapsulated by the device it controls. Each embedded system has unique characteristics. The components and functions of hardware and software can be different for each system. Nowadays, embedded software is used in all the electronic devices such as watches, cellular phones etc. This embedded software is similar to general programming. But the embedded hardware is unique. The method of communication between interfaces can vary from processor to processor. It leads to more complexity of software. Engineers need to be aware of the software developing process and tools. There are a lot of things that software development tools can do automatically when the target platform is well defined. This automation is possible because the tools can exploit features of the hardware and operating system on which your program will execute. Embedded software development tools can rarely make assumptions about the target platform. Hence the user has to provide some explicit instructions of the system to the tools. Figure 1.2 shows how an embedded system is developed using a host and a target machine. Figure 1.2 Embedded system using host and target machine Host machine: The application program developed runs on the host computer. The host computer is also called as Development Platform. It is a general purpose computer. It has a higher capability processor and more memory. It has different input and output devices. The compiler, assembler, linker, and locator run on a host computer rather than on the embedded system itself. These tools are extremely popular with embedded software developers because they are freely available (even the source code is free) and support many of the most popular embedded processors. It contains many development tools to create the output binary image. Once a program has been written, compiled, assembled and linked, it is moved to the target platform. Program Development Tool Kit 1. 2. Program development tool kit or IDE assembly mnemonics or C++ or Java or Visual C++ using the keyboard of the host system (PC) for entering the program. 3. Using GUIs for allowing the entry, addition, deletion, insert, appending previously written lines or files, merging record and files at the specific positions. 4. Create source file that stores the edited file. 5. File given an appropriate name by the programmer 6. Can use previously created files 7. Can also integrate the various source files. 8. Can save different versions of the source files. 9. Compiler, cross compiler, assembler, Target machine The output binary image is executed on the target hardware platform. It consists of two entities - the target hardware (processor) and runtime environment (OS). It is needed only for final output 1. Target system differs from a final system 2. Target system interfaces with the computer as well works as a standalone system 3. In target system might be repeated downloading of the codes during the development phase 4. Target system copy made that later on functions as embedded system 5. Designer later on simply copies it into final system or product. 6. Final system may employs ROM in place of flash, EEPROM or EPROM in embedded system. Figure 1.3 Host and Target interfacing Examples: 1. Phillips LPC 21xx development board 2. MSP 430 development board 3. TIVA TM4Cxxx development board 4. TIVA LPC 21xx MSP430 1.3 Embedded Applications Embedded systems used in various applications are listed in Table 1.1. It shows that embedded systems have rapidly emerged as important computing discipline because of the technology convergence in computers, consumer electronics, communications, entertainment etc. Further, new applications m medical electronics, mobile communications etc., are being continuously evolving and are being added to fulfil the ever growing requirements of the users. S.No Embedded System 1 Home Appliances 2 Office Automation 3 Security 4 Academia 5 Instrumentation 6 Telecommunication Application Dishwasher, washing machine, microwave, Top-set box, security system, HVAC system, DVD, answering machine, garden sprinkler systems etc... Fax, copy machine, smart phone system, modern, scanner, printers. Face recognition, finger recognition, eye recognition, building security system, airport security system, and alarm system. Smart board, smart room, OCR, calculator, smart cord. Signal generator, signal processor, power supplier, Process instrumentation, Router, hub, cellular phone, IP phone, web camera 7 Automobile 8 Entertainment 9 Aerospace 10 Industrial automation 11 Personal 12 Medical 13 Banking & Finance 14 Miscellaneous: Fuel injection controller, anti-locking brake system, air-bag system, GPS, cruise control. MP3, video game, Mind Storm, smart toy. Navigation system, automatic landing system, flight attitude controller, space explorer, space robotics. Assembly line, data collection system, monitoring systems on pressure, voltage, current, temperature, hazard detecting system, industrial robot. PDA, iPhone, palmtop, data organizer. CT scanner, ECG, EEG, EMG, MRI, Glucose monitor, blood pressure monitor, medical diagnostic device. ATM, smart vendor machine, cash register ,Share market Elevators, tread mill, smart card, security door etc. 1.4 Features of an Embedded System Embedded systems products have been effectively used not only in our day to day used products but these are also used as wholly or partially unavoidable components in many high end uses like military, scientific research, telecommunication etc. Its size may confined from hand held cell phones to components of nuclear missile. Irrespective of its size it necessarily consist of some hardware and software designed to work on hardware. So Embedded Systems are called Product of Hardware and Software Co-design. Features of different hardware and software units of embedded systems are explained in the following sections. 1.4.1 Hardware features of standalone embedded systems Standalone embedded system includes different types of processors, power supply unit, clock, reset circuit, memories which are considered to be most essential hardware components of standalone embedded systems. A brief discussion on important features of these components are given below 1.4.1.1 Different types of processors used Processor: A processor is the heart of the embedded system. It is responsible for execution of instruction and controlling flow of data to and from processor. Designer should have proper knowledge regarding efficiency of different types of processor and based on which one should select the appropriate processor as per requirement. Different types of processors available can be categorized into four broad categories (l) General purpose processor (GPP) (2) Application specific System processor (ASSP) (3) Multiprocessor system and (4) GPP core or ASIP core. A GPP has the usage advantages over other processors because of a). Having predefined known instruction set resulting fast system development. b). Board and I/O Interfaces designed for GPP can be used for different system changing the software. c) Ready availability of computer facilities in high level language along with compiler and debugger, resulting in fast development of a new system. a) General Purpose Processor (GPP) may be any one of Microprocessor, Microcontroller, Embedded processor, Digital Signal Processor (DSP) and Media processor. Microprocessor is a single VLSI chip that has a CPU with caches, floating point processing arithmetic unit, pipelining and super scaling, units. Later units may present for faster processing. RAM is externally connected to CPU. Microcontroller is also a single chip VLSI unit with limited computational capability keeping all functional units /components inside the chip. Embedded processor may be microprocessor or microcontroller when design specially to achieve capabilities of fast context switching resulting lower latency, atomic ALU operation with no shared data problem RISC core for fast and precise calculations. ARM family processors, Intel i960 etc. belongs to this class. DSP as a GPP is a single chip VLSI having computational capabilities of a microprocessor and a multiply and Accumulate (MAC) unit(s). DSP is an essential units of an embedded system with very large instruction word (VLWI) processing capabilities. It process very efficiently Single Instruction Multiple Data (SIMD), Discrete Cosine Transformation (DCT) and Inverse Discrete Cosine Transformation (IDCT). DCT and IDCT are most useful for algorithms for signal analysing, coding, filtering noise cancellation, echo-elimination etc. b) Application Specific System Processor (ASSP): ASSP is dedicated for faster processing and useful for applications like real time video processing which incorporates lots of processing before transmitting. It may also include some features of RTOS. ASSP provides hardwired solution for most of its time consuming tasks. For example ASSP chip i2ehip has TCP, UDP, IP, ARP and Ethernet 10/100 MAC Media Access Control) hardwired logic included into it. In practice, an ASSP is used as an additional processing unit for running the application specific tasks in place of processing using embedded software. c) Multiprocessor System: As embedded algorithm has to work within strict deadline, sometimes it may not be possible to carry out the same with a single processor. In a real time video processing number of MAC operations required may be more than possible from one DSP unit. In such a case an embedded system may go for two or more processors. Similar requirement may be needed in modem cell phones which has to perform number of tasks. Multiprocessors are different tasks that have to be performed concurrently. The operations of all processors are synchronized to obtain an optimum performance. d) GPP or ASEP core: GPP core or ASIP core is integrated into either an Application Specific Integrated Circuit (ASIC) or a VLSI or an FPGA (Field programming Gate Array) core integrated with processor units. Lately a new innovation in this area is System on Chip (SOC). A SOC may be embedded with multiple processors, memories, multiple standard source solutions called IP (Intellectual Property) core and other logic and analog units. It may have also a network protocol embedded on it. It can embed DSP applications and FPGA core. For a number of applications GPP core may not be a suitable solution. For various security application, smart card, video game, mobile Internet, Gbps transceiver, Gbps LAN, missile system needs a special processing unit on a VLSI design circuit to function as a processor. These units are called Application Specific Instruction Processor (ASIP). Sometime for an application both configurable processor (FPGA or ASIP) and non - configurable processor (DSP or microprocessor or microcontroller) might be needed on a chip. Generally this type of applications are very important in some killer applications (application which is useful to millions of people) such as HDTV, cell-phone etc. 1.4.1.2 Power supply unit Generally embedded system has its own Power supply unit. Four range of voltage (i) 5.0V + 0.25V (ii) 3.3V+ 0.3V (iii) 2.0V +0.2V (iv) 1.5V+0.2V are used for operation of different units. Additionally 12V+0.2V supply is needed for a flash or EEPROM and RS232 Serial Interfaces. Supply of voltage to the chip depends on number of pins provided in the chip which is generally in pair supply and ground. A processor may have more than two pins of Vdd Vss which are responsible for distribution of power and reduction of inferences in all the sections. Supply should separately power the (a) External I/O driving port (b) timers and (c) clock and reset circuits. Clock and reset circuit should be specially designed to be free from radio frequency inference either connected to an external power supply or use charge pump for necessary power supply. Example of first type may be Network Interface Card (NIC) and Graphics accelerator which do not have their own power supply are connected to PC power-supply line. In the second type charge pump brings power from a non-supply line. It consist of a diode in the series followed by a charging capacitor. The diode gets forward bias input from an external signal, say RTS (Request to Send) signal in case a mouse used with the computer. The charge pump inside the mouse store charge in inactive state and dissipate power when the mouse is used. An embedded system has to perform tasks continuously from power-up to power-off and may even be kept on continuously. Real Time Systems (RTOS) use Wait and Stop instructions and disabling certain units when not needed. This indeed is very important for saving power during program execution. Performing tasks at reduce clock rate is also a way to control power dissipation. Performance of software analysis during design phase can include power dissipation considerations also. A good design must optimize the conflicting needs of low power dissipation and fast efficient program execution. 1.4.1.3 Clock Oscillator The function of this oscillator circuit is to provide an accurate and stable periodic clock signal to a processor. The processor needs a clock oscillator as clock controls the various clocking requirements of CPU. The clocking requirements are the system timers and CPU machine cycles. The machine cycle includes (i) Fetching code and data from memory and Decoding and execution and (ii) Transferring results to memory. The clock controls the time for executing an instruction. The clock circuit uses either a crystal (External to the processor) or a ceramic resonator (internally associated with the processor) or an external IC attached to the processor. (a)The crystal resonator gives the highest stability in frequency with temperature and drift in the circuit. The crystal in association with an appropriate resistance in parallel and a pair of series capacitance at both pins. The crystal is kept as near as feasible to the two pins of the processor, (b) The internal ceramic generator, if available in a processor, saves the use of the external crystal and gives a reasonable though not very high frequency (c) The external IC based clock oscillator has a significantly higher power dissipation compared to the internal processor resonator. It provides a higher driving capability, which might be needed when various embedded circuits of embedded systems concurrently driven for e.g. in multiprocessor based systems. 1.4.1.4 Real time clock or timer units A timer is suitably configured as system clock sometime referred as RTC (Real Time Clock). RTC is used by scheduler for real time programming. A hardware timer is a counter that is incremented at a fixed rate when the system clock pulses. There are several different types of timers available. A timer/counter can perform several different tasks. The CPU uses the timer to keep track of time accurately. The timer can generate a stream of pulses or a single pulse at different frequencies. It can be used to start and stop tasks at desired times A COP (computer operating properly) or watchdog timer checks for runaway code execution. The hardware implementation of watchdog timers varies considerably between different processors. In general watchdog timers must be turned on once within the first few cycles after reset and then reset periodically with software. Some watchdog timers can programmed for different time-out delays. The reset sequence is sometimes as simple as a specialized instruction or as complex as sending a sequence of bytes to a port. Watchdog timers either reset the processor or execute an interrupt when they time out. More than one timers using the RTC may be needed for various timing and counting need. There may be hardware and software implementations of timers. At least one hardware timer device is must in a system which is used as system clock. The hardware timer gets the input from a clock out signal from processor and activates the system clock as per the number ticks present at the hardware timer. Number of hardware timers present are generally limited. A software timer is a software that executes and increases or decreases a count variable (count value) or an interrupt on a timer output or on a real time clock interrupt. A software timer can also generate interrupt on overflow of count value or the final value of count variable. Software timers are used as virtual timing devices. There are number of control bits and time out status flags in each timer device. A timer device when given count inputs, in place of clock pulses performs as a counting device. 1.4.1.5 Interrupt Handlers A system possesses a number of devices and the system processor has to control and handle the requirements of devices by running appropriate Interrupt Service Routine (ISR) for each. An interrupt handling mechanism must exist in each system to handle interrupt from various processes in the system. An interrupt is an event that suspends regular program operation while the event is serviced by another program. Interrupts increase the response speed to external events. Different microcontrollers have different interrupt sources which can include external, timer and serial port interrupts. When an interrupt is received the current operation is suspended, the interrupt is identified and the controller jumps (vectors) to an interrupt service routine. There are two sources of interrupt: hardware and software. Hardware interrupts include a signal to a pin, timer overflow, and serial port interrupts. Software interrupts are commands given by the programmer. There are two different interrupt types: maskable and non-maskable. A maskable interrupt can be disabled and enabled while non-maskable interrupts cannot be disabled and are therefore always enabled. Most 8 bit microcontrollers use vectored arbitration interrupts. Vectored arbitration means that when a specific interrupt occurs the interrupt handler automatically branches to an address associated with that interrupt. The servicing of interrupts in general is dictated by the status of the GIE (Global Interrupt Enable). GIE is cleared when an interrupt occurs and all interrupts are delayed until it is set. 1.4.1.6 Reset circuit and Watchdog tinier Reset instruction start execution from starting address otherwise execution start from this address when it is powered up. The reset circuit activates for a fixed period (a few clock cycles) and then deactivates to let the program proceed from a default beginning address. On deactivation of the reset that succeed the processor activation, a program executes from start-up address. Reset can be activated either by external reset circuit that activates on power up or by software instruction or by a programmed timer known as watchdog timer. Watchdog timer is a timing device that resets the system after a predefined timeout this time is usually configured and the watchdog timer is activated within the first few clock cycles after power up. It has many applications. In many embedded systems reset by a watchdog timer is very essential because it helps in rescuing the system from program hangs. On restart program can function normally. 1.4.1.7 Memories Embedded system makes use of different types of memories based on their features. These can be viewed with following chart. These may be briefly explained based on their functionality (i) Internal RAM used for registers, temporary data and stack. (ii) Internal ROM/PROM/EPROM for application program (iii) External RAM for temporary data and stack (iv) Intemal cache available in case of some microcontroller or microprocessor. (v)EEPROM of flash memory for saving the results (vi) External ROM or PROM for embedding software used in non microcontroller based systems. (vii) RAM memory buffers at ports. Caches for superscalar microprocessors. Figure 1.4: Various forms of system memory Different types of memory devices in varying sizes are available for use as per requirement. These are (a) Masked ROM or EPROM of flash which stores the embedded software (ROM image). Masked ROM is for bulked manufacturing. (2)EPROM or EEPROM is used for testing and design stages. (3)EEPROM (5V form) is used to store the results during the system program run time. It is erased byte by byte and written during the system run. It is useful to store modifiable bytes for example run time system status, time and date. Flash is very useful when a processed image or voice is to be stored or a data set or system configuration data is to be stored which can be upgraded as and when required. In a flash new images after compressing and processing can be stored and the old one is erased from a sector in a single instruction cycle. In boot block flash a OPT sector is reserved to store once only at the time of first boot. It stores boot program and initial data or permanent system configuration data. This OTP sector can be used to store ROM image. (4)RAM is mostly used in SRAM form in a system. Advanced system uses RAM in the form of a DRAM, SDRAM, or RDRAM (5) Parameterised distributed RAM is used when I/O devices and subunits require a memory buffer. (6) Subunits like MAC which operates at fast speed uses separate blocks of RAM. 1.4.18 Input / Output units and buses The system gets input from physical devices such as keypads/boards, sensors, transducer circuits etc. It gets the values by read operations at the port address. The system has output ports through which it sends output bytes to the real world. It sends the values to output by a write operation at the port address. In case of some devices a port may be used as both input as well as output port. One example is mobile phone which sends as well as receives signals. There are two types of I/O ports (i) Parallel port and (ii) Serial port. In a serial port, system gets a serial stream of bits at an input and sends the signal as bits through a modem. A serial port facilitates long distance communications and interconnections. A serial port may be serial URAT, a serial synchronous port or serial interfacing port. A system may get inputs from multiple channels or may have to send multiple output channels. A demultiplexer takes input from various channels and transfers the input to a selected channel. A multiplexer takes output from the system and sends it to another system. A system might have to be connected to a number of other devices and systems. For networking system there are different types of buses e.g., I2C, CAN, USB, ISA, EISA and PCI. 1.4.1.8 DAC/ADC For automatic control and signal processing applications, a system must provide necessary interfacing circuit and software for Digital to Analog Conversion (DAC) unit and Analog to Digital Conversion (ADC) unit. A DAC operation is done with the help of a combination of PWM (Pulse Width Modulation) unit in the microcontroller and External Integrator chip. ADC operations are needed in systems for voice processing, Instrumentation, Data acquisition systems and automatic control. 1.5 Data and Address Bus concept We refine the high level functional diagram to illustrate a typical bus configuration comprising the address, data and control lines Address bus and data bus: According to computer architecture, a bus is defined as a system that transfers data between hardware components of a computer or between two separate computers. Initially, buses were made up using electrical wires, but now the term bus is used more broadly to identify any physical subsystem that provides equal functionality as the earlier electrical buses. Computer buses can be parallel or serial and can be connected as multi drop, daisy chain or by switched hubs. System bus is a single bus that helps all major components of a computer to communicate with each other. It is made up of an address bus, data bus and a control bus. The data bus carries the data to be stored, while address bus carries the location to where it should be stored. Address Bus Address bus is a part of the computer system bus that is dedicated for specifying a physical address. When the computer processor needs to read or write from or to the memory, it uses the address bus to specify the physical address of the individual memory block it needs to access (the actual data is sent along the data bus). More correctly, when the processor wants to write some data to the memory, it will assert the write signal, set the write address on the address bus and put the data on to the data bus. Similarly, when the processor wants to read some data residing in the memory, it will assert the read signal and set the read address on the address bus. After receiving this signal, the memory controller will get the data from the specific memory block (after checking the address bus to get the read address) and then it will place the data of the memory block on to the data bus. The size of the memory that can be addressed by the system determines the width of the data bus and vice versa. For example, if the width of the address bus is 32 bits, the system can address 232 memory blocks (that is equal to 4GB memory space, given that one block holds 1 byte of data). Data Bus A data bus simply carries data. Internal buses carry information within the processor, while external buses carry data between the processor and the memory. Typically, the same data bus is used for both read/write operations. When it is a write operation, the processor will put the data (to be written) on to the data bus. When it is the read operation, the memory controller will get the data from the specific memory block and put it in to the data bus. What is the difference between Address Bus and Data Bus? Data bus is bidirectional, while address bus is unidirectional. That means data travels in both directions but the addresses will travel in only one direction. The reason for this is that unlike the data, the address is always specified by the processor. The width of the data bus is determined by the size of the individual memory block, while the width of the address bus is determined by the size of the memory that should be addressed by the system. 1.6 Embedded Processor and their types 1.7 Memory Types Data memory types: 1. Random Access Memory which can be read & written Static & Dynamic RAM 2. Read Only Memory which retains data PROM, EPROM, EEPROM, Flash Programmable Logic: 1. Programmable Arrays PLDs, PALs, GALs 2. Complex Programmable Devices CPLD, FPGA technology Summary of Characteristics 1.7.1 SRAM, DRAM, SDRAM, DDR SDRAM There are many kinds of RAM and new ones are invented all the time. One of aims is to make RAM access as fast as possible in order to keep up with the increasing speed of CPUs. SRAM (Static RAM) is the fastest form of RAM but also the most expensive. Due to its cost it is not used as main memory but rather for cache memory. Each bit requires a 6-transistor circuit. DRAM (Dynamic RAM) is not as fast as SRAM but is cheaper and is used for main memory. Each bit uses a single capacitor and single transistor circuit. Since capacitors lose their charge, DRAM needs to be refreshed every few milliseconds. The memory system does this transparently. There are many implementations of DRAM, two well-known ones are SDRAM and DDR SDRAM. SDRAM (Synchronous DRAM) is a form of DRAM that is synchronised with the clock of -side bus (FSB). As an example, if the system bus operates at 167Mhz over an 8-byte (64-bit) data bus , then an SDRAM module could transfer 167 x 8 ~ 1.3GB/sec. DDR SDRAM (Double-Data Rate DRAM) is an optimisation of SDRAM that allows data to be transferred on both the rising edge and falling edge of a clock signal. Effectively doubling the amount of data that can be transferred in a period of time. For example a PC-3200 DDRSDRAM module operating at 200Mhz can transfer 200 x 8 x 2 ~ 3.2GB/sec over an 8-byte (64-bit) data bus. 1.7.1.1 Static RAM (SRAM) Static Random Access Memory Static: Data value is retained as long as VDD is present. sequential addresses) SRAM can be built using either: D-type latch or 6-transistor CMOS RAM cell D-type Latch: Used for building CPU registers, etc. Derived from inverted S-R flip-flop Inverted S-R flip-flop: /S 0 0 1 1 /R 0 1 0 1 X 1 0 Q D-type latch D /S Q /R /Q E 0 0 1 1 D 0 1 0 1 /S 1 1 1 0 /R 1 1 0 1 No Change No Change 0 1 En When the Enable line is zero (En=0) /S = /R = 1 and the inverting SR flip-flop retains its previous value. When the enable line is high (En=1) The value of data line D is latched into the flip-flop. Each BIT would need 16 transistors (NAND gate = 4 transistors) For large SRAM modules not very efficient. 1-MB SRAM -> 8-Mb -> 128 Million transistors 1.7.1.2 Transistor Cell (Cross Coupled Inverter) For larger SRAM modules the above circuit is not very efficient Transistor count per bit is too high TO READ: BIT lines are charged high Enable line WL is pulled high, switching access transistors M5 and M6 on` If value stored in /Q is 0, value is accessed through access transistor M5 on /BL. If value stored in Q is 1, charged value of Bit line BL is pulled up to VDD. TO WRITE: Apply value to be stored to Bit lines BL and /BL Enable line WL is triggered and input value is latched into storage cell BIT line drivers must be stronger than SRAM transistor cell to override previous values While Enable line is held low, the inverters retain the previous value could use tri-state WE line on BIT to drive into specific state. Transistor count per bit is only 6 + (line drivers & sense logic) 1.7.1.3 Addressed SRAM Can view RAM as N-bit by M-word black box: N input lines N output lines A address lines (2A = M) WE write enable line DIN DOUT A WE 1.7.1.4 Single SRAM Bit Data IN DI Write W Address A Line D Q DO Data OUT EN When A = 0, Latch Enable is off. Data cannot be written into the D-type latch DOUT = 0. When A = 1 Latch is Enabled If W = 1 (Data-Write) Data at DIN can be written into the D-type latch Output gate is enabled IF W = 0 New value on DIN is not stored. Output gate is enabled. Not very efficient since 1-bit address line can access 2 memory locations. This memory is 1-bit X 1-word RAM Stores one 1-bit data value A 0 0 0 0 1 1 1 1 W 0 0 1 1 0 0 1 1 DI 0 1 0 1 0 1 0 1 Flip Flop Out Q(t-1) Q(t-1) Q(t-1) Q(t-1) Q(t) Q(t) 0 1 DO 0 0 0 0 Q(t) Q(t) 0 1 1.7.1.5 1-bit X 2-word SRAM DI W 1-Bit Memory Cell 0 A Data Out DI DI W W A1 A 1-Bit Memory Cell 1 When address bit AI = 0 Cell1 is disabled and Cell0 is enabled IF W = 1 : Value of DIN is written to cell0 IF W = 0 : Data out is Cell0 OR 0 When address bit AI = 1 Cell0 is disabled and Cell1 is enabled IF W = 1 : Value of DIN is written to cell1 IF W = 0 : Data out is Cell1 OR 0 Only 1 cell can be active at one time Output line is always driven by one cell Important for shared bus 1.7.1.6 4-bit X 16-word SRAM DI1 A4 a 0 . . . . . . a 15 A3 A2 A1 A DI2 DI3 DI4 DI DO CS Chip Select W => to all cells DO1 DO2 DO3 DO4 When CS = 1 AND A4 A3 A2 A1 = 0000 Address decoder decodes A4-A1 to 1000000000000000 (a0 = 1, a1-a15 = 0 Data at DI1 DI2 DI3 DI4 is written to address 0 when W = 1 If W = 0, No new data is stored and address0 drives the output bus Contents of memory address 0 appear at output Address decoder maps input address bits to row control signals Should only set one bit for every possible input 2A states where A is the number of address lines The CS (chip select) line allows the memory to be doubled with only one inverter [+ OR gates]. 1.7.1.7 Tri-State Outputs: In previous examples, one location is enabled during each operation which can drive the output bus. If RAM is on shared bus, the RAM cannot be allowed to drive the bus at all times Must have method of removing RAM from bus Solution is to use Tri-State logic A1 A2 A3 A4 DI0 . . . .DI3 CS DO0 . . .DO3 A1 A2 A3 A4 A1 A2 A3 A4 A5 CS DI0 . . . DI3 DO0 . . . DO3 Data Bus Outputs from each cell are tri-state outputs. When not active the outputs are in high impedance. Can either use CS line to control when Hicontrols the output OE Allows both other RAM cells and other devices to control data bus 1.7.2 Dynamic RAM (DRAM) SRAM requires a number of transistors per bit Difficult to cost-effectively scale for larger memories DRAM utilises MOSFET capacitance to store data bit Transistor per bit cost is approx. 1 X Row select Y Storage Cell Data I/O Si02 insulates gate and substrate Creating dielectric capacitor between gate and substrate Data bit is stored in this capacitance Each bit now only requires 1 MOSFET per bit. However the charge stored in cell dissipates over time and must be recharged over time to avoid corruption DRAM Refresh Must read data bit and write value back to cell. JEDEC standardises DRAM row refreshes at least every 64 ms. All bits in row must be refreshed. Dedicated hardware control DRAM refresh Refresh is transparent to user Above 64 Kbits, DRAM more economic than SRAM logic Even with refresh. X Row select Y Storage Cell Data I/O Write Operation X 0 X 1 1 Y X 0 1 1 Data I/O X X 0 1 C 0 1 Read Operation X 0 X 1 1 Y X 0 1 1 Data I/O X X 0 1 C C C 0 1 1.7. 2.1 DRAM Organization Matrix stores n 1-bit words N is determined by the number of address lines available Each matrix is parallelised to create word size memories i.e. : 8 parallel 4Kx1-bit DRAM matrices creates an 4K * 8-bit RAM module Example An 8x8 array forms a 64 x 1 dynamic RAM Column Address (CAS) The row and column select logic are comprised of address decoders. 8-rows and 8-columns need 3-address bits each. Above block is 64x1-bit DRAM Diagram omits but matrix has 1 data I/O line. Row and Column address control which bit is active 1.7.3 ROM, PROM, EPROM, EEPROM, Flash In addition to RAM, they are also a range of other semi-conductor memories that retain their contents when the power supply is switched off. ROM (Read Only Memory) is a form of semi-conductor that can be written to once, typically -up program (so called firmware) that a computer executes when powered on, although it has now fallen out-of-favour to more flexible memories that support occasional writes. ROM is still used in systems with fixed functionalities, e.g. controllers in cars, household appliances etc. PROM (Programmable ROM) is like ROM but allows end-users to write their own programs and data. It requires a special PROM writing equipment. Note: users can only write-once to PROM. EPROM (Erasable PROM). With EPROM we can erase (using strong ultra-violet light) the contents of the chip and rewrite it with new contents, typically several thousand times. It is this firmware, the BIOS (Basic I/O System). Other systems use Open Firmware. Intel-based Macs use EFI (Extensible Firmware Interface). EEPROM (Electrically Erasable PROM). As the name implies the contents of EEPROMs are erased electrically. EEPROMSs are also limited to the number of erase-writes that can be performed (e.g., 100,000) but support updates (erase-writes) to individual bytes whereas EPROM updates the whole memory and only supports around 10,000 erase-write cycles. FLASH memory is a cheaper form of EEPROM where updates (erase-writes) can only be performed on blocks of memory, not on individual bytes. Flash memories are found in USB sticks, flash cards and typically range in size from 32M to 2GB. The number of erase/write cycles to a block is typically several hundred thousand before the block can no longer be written. Characteristics of the various memory types Type Volatile? Writeable? Erase Size Max Erase Cycles Cost (per Byte) Speed SRAM Yes Yes Byte Unlimited Expensive Fast DRAM Yes Yes Byte Unlimited Moderate Moderate Masked ROM No No n/a n/a Inexpensive Fast No Once, with a device programmer n/a n/a Moderate Fast No Yes, with a device programmer Entire Chip Limited (consult datasheet) Moderate Fast Byte Limited (consult datasheet) Expensive Fast to read, slow to erase/write Sector Limited (consult datasheet) Moderate Fast to read, slow to erase/write Unlimited Expensive (SRAM + battery) Fast PROM EPROM EEPROM Flash NVRAM No No No Yes Yes Yes Byte 1.8 Overview of design process of embedded systems Figure1.3 shows a high level flow through the development process and identifies the major elements of the development life cycle. Figure 1. Embedded system life cycle The traditional design approach has been traverse the two sides of the accompanying diagram separately, that is, Design the hardware components Design the software components. Bring the two together. Spend time testing and Debugging the system. The major areas of the design process are Ensuring a sound software and hardware specification. Formulating the architecture for the system to be designed. Partitioning the h/w and s/w. Providing an iterative approach to the design of h/w and s/w 1.8.1 Requirements Informal descriptions gathered from the customer are known as requirements. The requirements are refined into a specification to begin the designing of the system architecture. Requirements can be functional or non-functional requirements. Functional requirements need output as a function of input. Non-functional requirements includes performance, cost, physical size, weight, and power consumption. Performance may be a combination of soft performance metrics such as approximate time to perform a user-level function and hard deadlines by which a particular operation must be completed. Cost includes the manufacturing, nonrecurring engineering (NRE) and other costs of designing the system. Physical size and weight are the physical aspects of the final system. These can vary greatly depending upon the application. Power consumption can be specified in the requirements stage in terms of battery life. 1.8.2 Specification Requirements gathered is refined into a specification. Specification serves as the contract between the customers and the architects. Specification is essential to create working systems with a minimum of designer effort. It must be specific, understandable and accurately reflect Example: Considering the example of the GPS system, the specification would include details for several components: Data received from the GPS satellite constellation Map data User interface Operations that must be performed to satisfy customer requests Background actions 1.8.3 Architecture Design The specification describes only the functions of the system. Implementation of the system is described by the Architecture. The architecture is a plan for the overall structure of the system. It will be used later to design the components. The architecture will be illustrated using block diagrams as shown below. Example: This block diagram (figure 3) is an initial architecture that is not based either on hardware or on software but combination of both. This block diagram explains about GPS navigating system where GPS receiver gets current position and the destination is taken from user, digital map for source to destination is found from database and displayed by the renderer. The system block diagram may be refined into two block diagrams - hardware and software 1.3.1 Hardware block diagram: Hardware consists of one central CPU surrounded by memory and I/O devices. We have chosen to use two memories that is frame buffer for the pixels to be displayed and separate program/data memory for general use by the CPU. The GPS receiver is used to get the GPS coordinates, and the panel I/O is used to get the destination from the user. 1.3.2 Software block diagram The software block diagram closely follows the system block diagram. We have added a timer to control when we read the buttons on the user interface and render data onto the screen. To have a truly complete architectural description, we require more details, such as where units in the software block diagram will be executed in the hardware block diagram and when the operations will be performed in time. Architectural descriptions must be designed to satisfy the functional and non-functional requirements. Not only must all the required functions be present, but we must meet cost, speed, power and other non- functional constraints. Starting out with a system architecture and refining that to hardware and software architectures is one good way to ensure that we meet all specifications. We can concentrate on the functional elements in the system block diagram, and then consider the non- functional constraints when creating the hardware and software architectures. How do we know that our hardware and software architectures in fact meet constraints on speed, cost, and so on? Estimate the properties of the components in the block diagrams (Example: search and rendering functions in the moving map system) Accurate estimation derives in part from experience, both general design and particular experience. All the non- functional constraints are estimated. If the decisions are based on bad data, those results will show up only during the final phases of design. 1.4 Hardware and Software components The architectural description tells us what components we need. The component design effort builds those components in conformance to the architecture and specification. The components in general includes both hardware and software modules. Some of the components will be ready-made (example: CPU, memory chips). Example: In the moving map, GPS receiver is a predesigned standard hardware component. Topographic software is a standard software module which uses standard routines to access the database. Printed circuit board are the components which needs to be designed. Lots of custom programming is required. When creating these embedded software modules, ensure the system runs properly in real time and that it does not take up more memory space than allowed. The power consumption of the moving map software example is particularly important. You may need to be very careful about how you read and write memory to minimize power. For example, memory transactions must be carefully planned to avoid reading the same data several times, since memory accesses are a major source of power consumption. 1.8.5 System integration After the components are built, they are integrated. Bugs are typically found during the system integration. Good planning can help us to find the bugs quickly. By debugging a few modules at a time, simple bugs can be uncovered. By fixing the simple bugs early, more complex or obscure bugs can be uncovered. System integration is difficult because it usually uncovers problems. The debugging facilities for embedded systems are usually much more limited than the desktop systems. Careful attention is needed to insert appropriate debugging facilities during design which can help to ease system integration problems. 1.9 Programming languages and tools for embedded design The software is the most important aspect of the embedded system, hardware perform the task as per software instruction. It is actually the brain of the system. An Embedded system processor and the system need software that is specific to a given application of that system. The processor of the system processes instructions coded and data. In the final stage these are placed in the memory (ROM) for all the tasks that have to be executed. Assembly as well as high level language like C, C++, and Java etc. are used for software development. Challenging in designing and implementing embedded software comes from reliability, performance and cost. Reliability expectation brings greater responsibility to eliminate bugs and fault tolerant as many embedded system has to run 24 hours a day, a week and 365 days in a year. Sometime rebooting is not possible, so good programming and thorough testing is must for embedded software development Performance issue may come from different considerations, such as proper multitasking and scheduling any considerably effect the performance. At the same time systems using sensors depends on how accurately sensor value is converted into real world value. Input/output device may effect speed, complexity and cost. For better productivity sometime it may be needed to program directly in assembly in place of high level language. Embedded consumer products as produced in large so it is possible to keep in minimal production cost and no modification is performed once it start produced. 1.9.1Creation of ROM image In the final stage processed codes and instructions are placed in ROM which is called creation of ROM image. All executions of tasks are carried out from there. A brief description of creation of ROM image in assembly and High level language is described below There are different stages in converting an assembly language program into machine implementable software file and then finally obtaining ROM image file. These steps are explained with the following figure 1.9. In the assembling step assembler translate assembly software into machine codes. Next in linking phase linker links no of codes with other assembled codes. There are certain codes having certain beginning address. Linking produces the final binary file by linking all these. The linked file in a computer is commonly known as .exe file. In the third phase reallocation of codes is done by placing it in physical memory by a program called loader. Loader find out appropriate position in RAM that is ready to run. Finally in locating phase ROM image is permanently placed in actually available address of ROM. In embedded system since there is only one program so designer has to define the available address to load and create files for permanent location. The locator locates the I/O task and hardware device driver codes at unchanged address as port address of these are fixed. In the last phase device programmer takes the ROM image and is burnt in to the PROM or EPROM. Figure 1.9.1 : Process of converting assembly language program into ROM image In the conversion process of a high level language like C to ROM image file first compiler generates the object codes. As per processor instruction compiler assemble the codes and then code optimization is carried out by code optimizer Optimization is carried out before linking. After compilation linker links codes including various standard codes like printf, scanf and device driver codes. After linking subsequent steps for creating ROM image is same as explained for assembly language Figure 1.9.2 :Proeess of converting C Program into ROM image A comparative view of build and load process of desktop and embedded application can be depicted with following figures 1.9.3 and 1.9.4 Figure 1.9.3: The build and load process for desktop application program Fig 1.9.4: The build and load process for embedded application program 1.9. 2 Software for embedded system device driver, multiple tasks, RTOS There may be a number of physical devices attached with embedded systems. Device driver is the program needed to drive these devices. A driver uses hardware status flag and control register. It controls three functions (a)Initializing by placing appropriate bits at the control register.(b)Calling Interrupt service routine(ISR) for setting status flag (c)Resetting the status flag after interrupt service. Device driver coding is made using operating system functions such that underlying hardware is hidden. Device management software module provides codes for detecting the presence of devices. In designing the software for this category two types of devices are considered -Physical and Virtual. Physical devices includes Keyboard, Printers, display matrix etc. Virtual device could be a file which may be used for reading and writing the stream of bytes. Operating system has modules for insertion of both device driver and device management module. Sometime embedded systems has to control multiple devices for scheduling of multiple functions (task). To implement this embedded system must have a multitasking operating system above application level which is generally a Real Time Operating System (RTOS). In multitasking OS each process (task) has different memory allocation of its own and task has one or more than one procedures for a specific job [12]. A task may share memory (data) with other task. Processor may process different task separately or concurrently. An OS or RTOS has a kernel which is responsible for scheduling the transition of task from ready state to running state. Kernel may select a task for processing based on its priority value out of many ready state tasks. Calling ISR kernel may temporarily halt a running task and allow another task to run and resume the same after completion of new task. An embedded system in multitasking environment always need not require an RTOS. An RTOS is required in a multitasking environment when real time constraints becomes must (i.e. task has to be completed in defined deadline). An RTOS main functions includes Real time task scheduling, Interrupt latency control, Time allocation and de-allocation to attain efficiency, predictable timing behaviour, priority management and time slicing of process soft real time. Hard real time strictly adhere task schedule whereas in soft real time precedence and sequence of task is defined. 1.9.3 Tools for designing embedded software Different software tools for assembly language programming, high level language programming, RTOS, debugging and integrated tools can be summarized as given below. Editor: It enables users to write codes for high level as well as assembly language in computer. Different features like addition, deletion, copy, insertion are made available for easy writing. It saves the content in a file with user defined or default extension. User can make necessary modification of saved files as and when required. Compiler: It takes the input of whole high level source code and converts it to machine readable object code. It may include functions, library routines etc. for compilation. Interpreter: It converts high level codes to machine readable form line by line. Like compiler it may also include functions, library routines etc. for conversion. Assembler: It is used for conversion of assembly language programs to executable binary files. It creates the list file which has address, source code and hexadecimal object codes. It is processor specific. Cross assembler: Cross assembler assembles the assembly code of target processor as assembly code of the processor of the PC used in the system development. Later it provides the object codes for the target processor. These will be the final codes used for the developed system. Simulator: It is the program which can simulate all the functions of an embedded system circuit including additional memory and peripherals. It is independent of a particular target system. RTOS: Explained in above Stethoscope: This program is used to keep track of dynamic change in program variables and parameters. It can demonstrate the sequences of multiple processes, tasks, threads that execute and keeps entire time history. Trace scope: It traces the change in module according to time. Accordingly list of actions to be initiated at desired time is also prepared. Integrated Development Environment (IDE): Total software and hardware environment consist of simulator, compiler, assembler, cross assembler, logic analyser EPROM/EBPROM, application codes, burners defines the integrated development environment of the system. Locator: Locator program uses cross-assembler output and a memory allocation map and provides locator program output. Unit -II Embedded Processor Architecture 2.1 CISC Vs RISC design philosophy hitectural designs of CPU are RISC (Reduced instruction set computing) and CISC (Complex instruction set computing). CISC has the ability to execute addressing modes or multi-step operations within one instruction set. It is the design of the CPU where one instruction performs many low-level operations. For example, memory storage, an arithmetic operation and loading from memory. RISC is a CPU design strategy based on the insight that simplified instruction set gives higher performance when combined with a microprocessor architecture which has the ability to execute the instructions by using some microprocessor cycles per instruction. . Fig 2.1: CISC Vs RISC We discusses about the RISC and CISC architecture with suitable diagrams. 1. Hardware of the Intel is termed as Complex Instruction Set Computer (CISC) 2. Apple hardware is Reduced Instruction Set Computer (RISC). What is RISC and CISC Architectures? UNIT -2 EMBEDDED PROCESSOR ARCHITECTURE Instruction Set Architecture Instructin sst can bs dsfnsd as ths cimmunicatin intsrfacs bstwssn ths pricsssir and ths prigrammsr. Evsry pricsssir has its iwn instructin sst imppsmsntsd in ths hardwars ti sxscuts instructins such as mivs, add ir muptppy data in a dsfnits way. Prigrammsrs can sithsr uss any high psvsp panguags such as C, C++, Java stc. ir asssmbpy panguags ti writs ths prigram. Accirdingpy, a cimpipsr ir asssmbpsr can bs ussd ti transpats ths prigram inti machins undsrstandabps panguags fippiwing ths pricsssir instructin sst. Thsrs ars twi cpassic architscturss if instructin sst imppsmsntatin, ths cimppsx instructin sst cimputsr (CISC) and ths rsducsd instructin sst cimputsr (RISC). Each has its iwn advantagss and disadvantagss. Ths CISC architscturs has mirs cimppsxity in ths hardwars itsspf whips RISC architscturs ifsrs mirs cimppsxity ti ths sifwars. Ths fsaturss if sach architscturs ars summarizsd as bspiw. Features of Complex Instruction Set Computer (CISC): Mist if ths instructins ars cimppsx in typs. Instructins rsquirs muptpps cpick cycpss fir sxscutin. Mirs addrsssing midss ars avaipabps in ths instructin sst. Fswsr wirking rsgistsrs and mirs frsqusnt msmiry accsss. Liad and Stirs ipsratins ars incirpiratsd in instructins. High cids dsnsity is achisvsd bscauss if avaipabipity if muptfunctinap instructins. Pipspins imppsmsntatin is diffcupt. Mirs cimppsxity is givsn ti ths hardwars dssign. Features of Reduced Instruction Set Computer (RISC): Mist if ths instructins ars simpps in naturs. App ths instructins ars sxscutsd in singps cpick duratin. Ths addrsssing midss avaipabps ars fswsr than in cass if CISC. Instructin sst has ssparats Liad/Stirs architscturs. Highsr numbsr if wirking rsgistsrs si psss frsqusnt msmiry accsss. Mist if ths data transfsr happsns frim rsgistsr ti rsgistsr. Largs cids sizs cimparsd ti CISC architscturs. Psrfirmancs if RISC architscturs is apways bstsr than CISC architscturs. Pipspins imppsmsntatin is sasisr cimparsd ti CISC. Mirs cimppsxity is ifsrsd ti ths cimpipsr dssign. Memory Block Ths msmiry bpick cinsists if prigram and data msmiry. ROM is ussd as ths prigram msmiry and RAM is ussd as ths data msmiry. Thsrs ars twi msmiry architscturss: Harvard and VinNsumann. In Harvard architscturs, ths prigram and data msmiriss ars ssgrsgatsd with ssparats addrsss and data bus drawn ti sach. Si thsrs can bs parappsp accsss ti bith and psrfirmancs if ths systsm can bs imprivsd at ths cist if hardwars cimppsxity. On ths ithsrhand, ths Vin-Nsumann architscturs has ins unifsd msmiry ussd fir bith prigram and data. Ths systsm is cimparatvspy spiwsr, but ths dssign imppsmsntatin is simpps and cist sfsctvs fir an smbsddsd systsm. Variius ROM and RAM dsvicss ars ussd in smbsddsd systsms bassd in ths apppicatins. ARM Architecture ARM cirss ars dssignsd spscifcappy fir smbsddsd systsms. Ths nssds if smbsddsd systsms can bs satsfsd inpy if fsaturss if RISC and CISC ars cinsidsrsd tigsthsr fir pricsssir dssign. Si ARM architscturs is nit a purs RISC architscturs. It has a bpsnd if bith RISC and CISC fsaturss. Tabps 1.1. ARM Architscturs Fsaturss and Bsnsfts Fsaturss High Psrfirmancs Liw piwsr cinsumptin Liw sipicin arsa High Cids dsnsity Liad/stirs architscturs Rsgistsr bank with pargs numbsr if wirking rsgistsrs Bsnsfts ti smbsddsd systsm Ensurss ths systsm has a fast rsspinss Makss ths systsm mirs snsrgy sfcisnt Rsducss ths sizs and apsi cinsumss psss piwsr Hspps smbsddsd systsm ti havs psss msmiry fiitprint Ussd ti piad data frim ths msmiry ti ths ARM CPU rsgistsr ir stirs data frim ths CPU rsgistsr ti ths msmiry; snabpss ths msmiry accsss whsn rsquirsd Rsquirsd ti psrfirm mist if ths ipsratins within ths CPU and prividss fastsr cintsxt switch in a mupttasking apppicatins A Basic architecture of the ARM7core ARM 7, ths basic architscturs if ARM ssriss if cirss, is intriducsd hsrs in this ssctin. A brisf intriductin abiut sach functinap bpick if ths architscturs if ARM7 cirs shiwn in Figurs.1.2 is prsssntsd bspiw. Ths Rsgistsr Bank has sixtssn gsnsrap purpiss rsgistsrs (R0-R15) and a currsnt prigram status rsgistsr (CPSR) which ars accsssibps by ussr apppicatins. In additin ti that, it has twsnty numbsrs if banksd rsgistsrs spscifcappy ussd fir difsrsnt ipsratng midss if ARM cirs. Thsss ars invisibps ti ussr apppicatins. Ths rsgistsr bank has twi rsad pirts ti rsad ipsrand1 and ipsrand2 and ins writs pirt ti writs back ths rssupt if ipsratin ti ths any rsgistsr spscifsd in ths instructin. It has an additinap bidirsctinap pirt ti updats ths prigram ciuntsr with addrsss rsgistsr and incrsmsntsr. Addrsss rsgistsr cintsnt is incrsmsntsd at svsry ssqusntap byts accsss by ths incrsmsntsr but ths prigram ciuntsr is incrsmsntsd by fiur in ARM stats if ths cirs ir is incrsmsntsd by 2 in Thumb stats if ths cirs at svsry instructin accsss. ARM and Thumb statss if ths cirs ars discusssd in ssctin 1.3. Addrsss rsgistsr is dirsctpy cinnsctsd ti ths addrsss bus. Ths barrsp shifsr can shif ir ritats ipsrand 2 by spscifsd numbsr if bits priir ti arithmstc ir pigic ipsratins. Ths 32 bit ALU psrfirms ths arithmstc and pigic functins. Ths data in and data iut rsgistsrs hipd ths input and iutput data frim and ti ths msmiry. Ths instructin dscidsr and assiciatsd cintrip pigic gsnsratss appripriats cintrip signaps fir ths data path afsr dsciding ths fstchsd instructin. Ths MAC unit is ti muptppy twi rsgistsr ipsrands and accumupats with anithsr rsgistsr hipding ths partap sum if ths priducts. Ths sncidsd instructin byts if ths prigram savsd in ths cids msmiry is fstchsd thriugh ths data bus and frst sntsrs inti ths data-in rsgistsr if ths ARM architscturs frim whsrs it is dspivsrsd ti ths instructin dscidsr. Afsr ths instructin is dscidsd, appripriats cintrip signaps ars gsnsratsd fir ths data path. Ths rsquirsd rsgistsrs ars actvatsd in ths rsgistsr bank and ths ipsrands fiw iut frim twi rsad pirts if rsgistsr bank ti ths ALU: ipsrand1 thriugh A-bus and ipsrand2 thriugh Bbus afsr prspricsssing at barrsp shifsr. Ths rssupt if ipsratin at ALU is writsn back ti ths rssupt rsgistsr thriugh a writs pirt at rsgistsr bank. Fir Liad/Stirs instructins, afsr dsciding ths instructin, ths data msmiry addrsss is frst capcupatsd at ALU as spscifsd in ths instructin and ths piintsr rsgistsr is updatsd at ths rsgistsr bank. Ths addrsss in ths piintsr rsgistsr is givsn ti ths addrsss rsgistsr ti accsss ths msmiry and transfsr data. If it is a piad muptpps ir stirs muptpps instructin, ths cirs diss nit hapt bsfirs cimppstng ths rsquirsd numbsr if data transfsrs unpsss it is a rssst sxcsptin. Migration to Cortex Series In ths path if architscturap sviputin, ARM has cintributsd many vsrsiins if IP cirss ti ths smbsddsd cimputng wirpd. ARM piinssrsd smbsddsd priducts ars sxcspping in svsry visibps spsctrum. Sincs its incsptin, ARM has migratsd ivsr a ping msaningfup riad map startng frim v4T ARM7TDMI ti v7 Cirtsx ssriss if architscturss achisving many string mipsstinss in bstwssn. It is currsntpy ths nsw sra if fsaturs rich ARM Cirtsx ssriss architscturss trupy smpiwsring ths smbsddsd cimputng wirpd. ARM architecture evolution Fig 1.13. Performance and capability graph of Classic ARM and Cortex application processors ARM architscturs has bssn imprivsd a pit in ths riad map frim cpassic ARM ti ARM Cirtsx. Fig1.7 and fg117 dspict ths psrfirmancs and capabipity cimparisin if cpassic ARM with smbsddsd cirtsx and apppicatin cirtsx ssriss if pricsssirs. Evsn thiugh ARM had sarpisr vsrsiins if priducts i.s.,v1, v2, v3 and v4, ths cpassic griup if ARM starts with v4T. Ths cpassic griup is dividsd inti fiur basic famipiss cappsd ARM7, ARM9, ARM10 and ARM11. ARM7 has thrss-stags (fstch, dscids, sxscuts) pipspins, Vin-Numann architscturs whsrs bith addrsss and data uss ths sams bus. It sxscutss v4T instructin sst. T stands fir Thumb. ARM9 has fvs-stags (fstch, dscids, sxscuts, msmiry, writs) pipspins with highsr psrfirmancs, Harvard architscturs with ssparats instructin and data bus. ARM9 sxscutss v4T and v5TE instructin ssts. E stands fir snhancsd instructins. ARM10 has six-stags (fstch, issus, dscids, sxscuts, msmiry, writs) pipspins with iptinap vsctir fiatng piint unit and dspivsrs high fiatng piint psrfirmancs. ARM10 sxscutss v5TE instructin ssts. Microcontroller profle (Cortex -M) Cirtsx M ssriss if architscturss havs v6-M as cirtsx M0, M0+ and M1 and v7-M with Cirtsx M3, M4 and ithsr succsssirs. This ssriss if architscturss dsvspipsd fir dssppy smbsddsd micricintrippsr prifps, ifsr piwsst gats ciunt si smappsst sipicin arsa. Thsss ars fsxibps and piwsrfup dssigns with cimppstspy prsdictabps and dstsrministc intsrrupt handping capabipitss by intriducing ths nsstsd vsctir intsrrupt cintrippsr (NVIC). Ths smapp instructin ssts suppirt fir high cids dsnsity and simppifsd sifwars dsvspipmsnt. Dsvspipsrs ars abps ti achisvs 32-bit psrfirmancs at 1-bit prics. Ths vsry piw gats ciunt if Cirtsx M0 facipitatss its dsppiymsnt in anapig and mixsd mids dsvicss. Dus ti furthsr dsmanding apppicatins rsquiring svsn bstsr snsrgy sfcisncy, Cirtsx M0+ was dssignsd with twi stags pipspins and achisvsd high psrfirmancs with vsry piw dynamic piwsr cinsumptin, rsducsd branch shadiw and rsducsd numbsr if fash msmiry accsss. Cirtsx M1 was dssignsd fir imppsmsntatin in FPGA. It is functinappy a subsst if Cirtsx M3 and runs ARM v6 instructin sst with OS sxtsnsiin iptins. It has 32-bit AHB pits bus intsrfacs, ssparats tghtpy ciuppsd msmiry intsrfacs and JTAG intsrfacs ti facipitats dsbug iptins. It has thrss stags pipspins imppsmsntatin and cinfgurabps NVIC fir rsducing intsrrupt patsncy. Introduction to TIVA Microcontrollers In this text book, TIVA platforms and launch pads are used to develop various embedded applications. So in this section two TIVA series microcontrollers are introduced. TIVA TM4C123GH6PM Microcontroller The microcontroller block diagram shown in Fig 1.20 and Fig 1.21 have six functional units. The cortex M4F core, on-chip memory, analog block, serial interface, motion control and system integration. Features: o o o o o o o o o TM4C123GH6PM microcontroller has 32 bit ARM Cortex M4 CPU core with 80 MHz clock rate. Memory protection unit provides protected operating system functionality and floating point unit supports IEEE single precision operations. JTAG/SWD/ETM for serial wire debug and trace. Nested vector interrupt controller (NVIC) reduces interrupt response latency. Serial control block holds the system configuration information. The microcontroller has a set of memory integrated in it: 256 KB flash memory, 32 KB SRAM, 2 KB EEPROM and ROM loaded with TIVA software library and bootloader. Serial communications peripherals such as: 2 CAN controllers, full speed USB controller, 8 UARTs, 4 I2C modules and 4 Synchronous serial interface modules. On chip voltage regulator, two analog comparators and two 12 channel 12-bit analog to digital converter with sample rate I million samples per second are the analog functions in built to the device. Two quadrature encoder with index module and two PWM modules are the advanced motion control functions integrated into the device that facilitate wheel and motor controls. Various system functions integrated into the device are: Direct Memory Access controller, clock and reset circuitry with 16 MHz precision oscillator, six 32-bit timers, six 64-bit timers, twelve 32/64 bit capture compare PWM, battery backed hibernation module and RTC hibernation module, 2 watchdog timers and 43 GPIOs. Few Applications: o o o o o Building automation system Lighting control system Data acquisition system Motion control IoT and Sensor networks. 1.2.16.2 TIVA TM4C129CNCZAD Microcontroller Features: o o o o TM4C129CNCZAD microcontroller has 32 bit ARM Cortex M4F CPU core with 120 MHz clock rate. Memory protection unit provides a privileged mode for protected operating system functionality and floating point unit supports IEEE 754 compliant single precision operations. JTAG/SWD/ETM for serial wire debug and trace. Nested vector interrupt controller (NVIC) reduces interrupt response latency and high performance interrupt handling for time critical applications. o o o o o o o o The microcontroller has a set of memory integrated in it: 1MB flash memory, 256 KB SRAM, 6 KB EEPROM and ROM loaded with TIVAware, software library and bootloader. Serial communications peripherals such as: 2 CAN controllers, full speed and high speed USB controller, 8 UARTs, 10 I2C modules and 4 Synchronous serial interface modules. On chip voltage regulator, three analog comparators and two 12 channel 12-bit analog to digital converter with sample rate 2 million samples per second and temperature sensor are the analog functions in built to the device. One quadrature encoder and one PWM module with 8 PWM outputs are the advanced motion control functions integrated into the device that facilitate wheel and motor controls. Various system functions integrated into the device are: Micro Direct Memory Access controller, clock and reset circuitry with 16 MHz precision oscillator, eight 32-bit timers, low power battery backed hibernation module and RTC hibernation module, 2 watchdog timers and 140 GPIOs. Cyclic Redundancy Check (CRC) computation module is used for message transfer and safety system checks. CRC module can be used in combination with AES and DES modules. Advanced Encryption Standard (AES) and Data Encryption Standard (DES) accelerator module provides hardware accelerated data encryption and decryption functions. Secure Hash Algorithm/ Message Digest Algorithm (SHA/MD5) provides hardware accelerated hash functions for secured data applications. Registers Rsgistsrs ars fir tsmpirary data stirags within pricsssir architscturs. As shiwn in Fig.1.1, ARM pricsssir has sixtssn numbsrs if gsnsrap purpiss rsgistsrs, R0-R15 and a currsnt prigram status rsgistsr (CPSR) dsfnsd fir ussr mids if ipsratin. Each if thsss rsgistsrs is if 32-bits. Out if thsss rsgistsrs, R13, R14 and R15 havs spsciap purpisss R13: Ussd as ths stack piintsr that hipds ths addrsss if ths tip if ths stack in ths currsnt pricsssir mids. R14: Ussd as ths pink rsgistsr that savss ths cintsnt if prigram ciuntsr in cintrip transfsr dus ti ths iccurrsncs if sxcsptins ir using ths branch instructins in ths prigram. R15: Ussd as ths prigram ciuntsr that piints ti ths nsxt instructin ti bs sxscutsd. In ARM stats, app instructins ars if 32-bits (fiur bytss) fir which, PC is apways apignsd ti a wird biundary. This msans that ths psast signifcant twi bits if ths PC ars apways zsri. Ths PC can apsi bs hapfwird (16bit) apignsd fir Thumb stats (16 bit instructins) ir byts apignsd fir Jazspps stats (1-bit instructins) suppirtsd by difsrsnt vsrsiins if ARM architscturs Current Program Status Register (CPSR) CPSR, a 32-bit status rsgistsr, hipds ths currsnt stats if ths ARM cirs. As shiwn in Fig 1.4, ths rsgistsr is dividsd inti fiur difsrsnt fspds- fags, status, sxtsnsiin and cintrip; sach if 1-bits. Ths fag fspd has ths bit spscifcatin fir fiur cinditin fags; N, a, C and V and is ussd fir arithmstc and pigic instructins. N-(Nsgatin fag) 1 indicatss nsgatvs rssupt frim ALU. a- (asri fag) 1 indicatss zsri rssupt frim ALU. C- (Carry fag) 1 indicatss ALU ipsratin gsnsratsd carry. V- (Ovsrfiw fag) 1 indicatss ALU ipsratin ivsrfiwsd. Mist if ths ARM instructins ars cinditinappy sxscutsd. Bassd in ths status if thsss cinditin fags, cinditin cidss ars ussd aping with instructin mnsminics ti cintrip whsthsr ir nit ths instructin wipp bs sxscutsd. Status and sxtsnsiin fspds ars rsssrvsd fir futurs usags. In ths cintrip fspd, ths psast signifcant fvs bits ars ussd ti savs ths midss if ipsratin if ARM cirs. Pricsssir mids can bs changsd by dirsctpy midifying thsss cintrip bits. Ths mist signifcant thrss bits I, F and T havs signifcancs as bspiw: I 1 indicatss IRQ is disabpsd ; 0 indicatss IRQ is snabpsd. F 1 indicatss FIQ is disabpsd ; 0 indicatss FIQ is snabpsd. T 1 indicatss ths Thumb stats is actvs ; 0 indicatss ARM stats is actvs. Thsss ars pricsssir spscifc fsaturss. Addressing modes Addrsssing mids is ths way if addrsssing data ir ipsrand in ths instructin. Evsry pricsssir instructin sst ifsrs difsrsnt addrsssing midss ti dstsrmins ths addrsss if ipsrands. Sims fundamsntap addrsssing midss ussd by mist if ths pricsssirs ars: rsgistsr addrsssing, immsdiats addrsssing, dirsct addrsssing and rsgistsr indirsct addrsssing. In rsgistsr addrsssing mids, ths ipsrand is hspd in a rsgistsr which is spscifsd in ths instructin. In immsdiats addrsssing mids, ths ipsrand is hspd in ths instructin. In dirsct addrsssing mids, ths ipsrand rssidss in ths msmiry whiss addrsss is spscifsd in ths instructin. Simiparpy in rsgistsr indirsct addrsssing mids, ths ipsrand is hspd in ths msmiry whiss addrsss rssidss in a rsgistsr that is spscifsd in ths instructin ARM Addressing modes: Rsgistsr Addrsssing: Ths ipsrands ars in ths rsgistsrs. MOV R1, R2 // mivs cintsnt if R2 ti R1 // SUB R0, R1, R2 //subtract cintsnt if R2 frim R1 and mivs ths rssupt ti R0 // Rspatvs Addrsssing: Addrsss if ths msmiry dirsctpy spscifsd in ths instructin. Bsubriutns1// branch ti suriutns1 // BEQ LOOP // branch ti LOOP if prsviius instructin ssts ths zsri fag i.s, a 1 // Immsdiats Addrsssing: Opsrand2 is an immsdiats vapus. SUB R0, R0, #1// Savs (R0 –1) ti R0 // MOV R0, #0xFF00 // Put 0xFF00 ti R0 // Rsgistsr Indirsct Addrsssing: Addrsss if ths msmiry picatin that hipds ths ipsrands thsrs in a rsgistsr. LDR R1, [R2]//Liad R1 with ths data piintsd by rsgistsr R2. // ADD R0, R1, [R2]//add R1 with ths data piintsd by R2 and put ths rssupt inti R0// Rsgistsr Ofsst Addrsssing: Opsrand2 is in a rsgistsr with sims ifsst capcupatin. MOV R0, R2, LSL #3 // (R2 << 3), thsn mivs ti R0 // AND R0, R1, R2, LSR R3// (R2 >> R3), pigicappy AND with R1 and mivs rssupt ti R0 // Rsgistsr bassd with Ofsst Addrsssing: Efsctvs msmiry addrsss has ti bs capcupatsd frim a bass addrsss and an ifsst. Ofsst can bs an immsdiats ifsst, rsgistsr ifsst ir scapsd rsgistsr ifsst. Prs-Indsxsd Addrsssing LDR R2, [R3, #0x0F] // Immsdiats ifsst. // Taks vapus in R3, add ti 0x0F, uss it as addrsss and piad data frim that addrsss ti R2 // STR R1, [R0, -R2] // Rsgistsr ifsst // Uss (R0-R2) as addrsss if ths msmiry and stirs data if R1 ti that addrsss.// LDR R3, [R1, R2 LSR #1] // Scapsd rsgistsr ifsst// // Uss (R1+ (R2>>1)) as addrsss and piad ths data frim that addrsss ti R3. // Prs-Indsxsd with writs back apsi cappsd auti-indsxing with prs-indsxsd addrsssing. symbip indicatss that ths instructin savss ths capcupatsd addrsss in ths bass addrsss rsgistsr. LDR R0, [R1, #4]! // Immsdiats ifsst // // Uss (R1+4) as addrsss and piad ths data frim that addrsss ti R0 and updats R1 by (R1+4)// STR R1, [R2, R0]! // Rsgistsr ifsst // // Uss (R2+R0) as addrsss and stirs ths data frim R1 ti that addrsss. Updats R2 by (R2+R0) // STR R3, [R1, R2 LSL #4]! // Scapsd rsgistsr ifsst // // Uss (R1+ (R2<<4)) as addrsss and stirs ths data frim R3 ti that addrsss. Updats R1 by (R1+ (R2<<4)) // Pist-Indsxsd apsi cappsd auti-indsxing with pist-indsxsd addrsssing. LDR R0, [R1], #4 // Immsdiats ifsst // // Liad ths data piintsd ti by R1 ti R0 and thsn updats R1 by (R1+4). // STR R1, [R3], R4 // Rsgistsr ifsst // // Stirs ths data in R1 ti ths msmiry picatin piintsd ti by R3 and thsn updats R3 by (R3+R4)// LDR R2, [R0], -R3, LSR #4 // Scapsd rsgistsr ifsst // // Liad ths data frim ths addrsss piintsd ti by R0 ti R2 and thsn updats R0 ti (R0- (R3>>4)). // ARM Instruction Set In any pricsssir architscturs, an instructin incpudss an ipcids that spscifss ths ipsratin ti psrfirm, such as add cintsnts if twi rsgistsrs ir mivs data frim a rsgistsr ti msmiry stc, with spscifsd ipsrands, which may spscify rsgistsrs, msmiry picatins, ir immsdiats data. Instructin sst if a pricsssir givss infirmatin abiut ths instructins, addrsssing midss and ths tming rsquirsmsnt fir ths sxscutin if sach instructin. Ths instructin sst is apways spscifsd by ths pricsssir dssignsr. Evsry pricsssir imppsmsnts its instructin sst in ths architscturs. ARM Ltd bsing ths pricsssir cirs dssignsr and nit ths sipicin manufactursr, it dsfnss ths instructin sst ti bs imppsmsntsd by ths chip manufactursrs. Features ARM architscturs has twi instructin ssts. Ths ARM instructin sst and Thumb instructin sst. In ARM instructin sst, app instructins ars 32 bits wids and ars apignsd at 4-bytss biundariss in msmiry. On ths ithsr hand, in thumb instructin sst, app instructins ars if 16 bits wids and ars apignsd at svsn ir twi bytss biundariss in msmiry. Ths impirtant fsaturss if ths ARM and Thumb instructin sst ars: o o Mist if ths instructins ars sxscutsd in ins cycps. Liad/Stirs architscturs fir accsssing data frim sxtsrnap msmiry with piwsrfup auti-indsxing addrsssing midss. o Incpusiin if piad and stirs muptpps rsgistsr instructins. o 3-addrsss instructins: twi siurcs ipsrand rsgistsrs and ths rssupt rsgistsr ars app distnctpy spscifsd. o Data pricsssing instructins act inpy in rsgistsrs. o Evsry instructin can bs cinditinappy sxscutsd which imprivss ths psrfirmancs and cids dsnsity by rsducing ths numbsr if branch instructins. o Ths abipity ti sxscuts a barrsp shif ipsratin and an ALU ipsratin if a singps cimppsx instructin in a singps cpick cycps. o Incpusiin if advancsd DSP instructins in ths ARM instructin sst fir ths muptppy and accumupats (MAC) unit rsppacss ths nssd if ssparats digitap signap pricsssir. o Imppsmsntatin if cipricsssir instructin sst with sxtsnsiin if ths prigramming midsp. o Ths Thumb instructin sst is 16-bit cimprssssd rsprsssntatin if ths ARM instructins that prividss high cids dsnsity. ARM Instructins can bs catsgirizsd inti fippiwing briad cpassss: 1 .Data mivsmsnt instructins 2. Data Pricsssing Instructins o o o o Arithmstc/pigic Instructins Barrsp shifing instructins Cimparisin Instructins Muptppy Instructins 3. Branch Instructins 4. Liad and stirs Instructins o o o o Liad and Stirs rsgistsr instructin Liad and Stirs muptpps rsgistsr instructins Stack instructins Swap rsgistsr and msmiry cintsnt 5. Prigram Status rsgistsr Instructins o Sst ths vapuss if ths cinditinap cids fag o o Sst ths vapuss if ths intsrrupt snabps bit Sst ths pricsssir mids 6. Excsptin gsnsratng Instructins o o Sifwars Intsrrupt Instructin Sifwars Brsak Piint instructin UNIT-III Overview of Microcontroller and Embedded Systems 3.1 Embedded hardware and various building blocks:- Fig. 1 Components of Embedded system hardware Fig. 2 Various Building blocks of embedded system 3.2 Processor Selection for an Embedded System:- 3.2.1. Microcontroller Selection: 3.3 Interfacing Processor, Memories and I/O Devices:- Features: 3.4. Timer & Counting Devices:Most embedded systems needs a timing device. Timing Device: Counting Device: Timer cum Counting Device: Uses of Timer Devices: States in a Timer: Ten Forms of a Timer: Variables for control bits and status in a software timer: 3.5. Serial Communication and advanced I/O:- I/O Types & Examples: Serial Bus Communication Protocols:- 3.6 Buses between the Networked multiple Devices:- 3.7 Embedded System Design and Co-Design Issues in System Development Process:- 3.8 Design Cycle in the Development Phase for an Embedded System:- 3.9. Uses of Target System or its Emulator and In-Circuit Emulator:- 3.10 Use of software tools for Development of an Embedded System:- 2 3.11 Design Metrics of Embedded Systems:- UNIT-4 MICROCONTROLLER FUNDAMENTALS FOR BASIC PROGRAMMING The I/O pin configurations for the TM4C123 microcontrollers. The regular function of a pin is to perform parallel I/O. Most of the pins have an alternative function. Joint Test Action Group (JTAG) is a standard test access port used to program and debug the microcontroller board. Each microcontroller uses five port pins for the JTAG interface. I/O pins on Tiva microcontrollers have a wide range of alternative functions: • • • • • • • • • • • UART SSI I2C Timer PWM ADC Analog Comparator QEI USB Ethernet CAN Universal asynchronous receiver/transmitter Synchronous serial interface Inter-integrated circuit Periodic interrupts, input capture, and output compare Pulse width modulation Analog to digital converter, measure analog signals Compare two analog signals Quadrature encoder interface Universal serial bus High-speed network Controller area network The UART can be used for serial communication between computers. It is asynchronous and allows for simultaneous communication in both directions. The SSI is alternately called serial peripheral interface (SPI). It is used to interface medium-speed I/O devices. I2C is a simple I/O bus that we will use to interface low speed peripheral devices. Input capture and output compare will be used to create periodic interrupts and measure period, pulse width, phase, and frequency. PWM outputs will be used to apply variable power to motor interfaces. In a typical motor controller, input capture measures rotational speed, and PWM controls power. A PWM output can also be used to create a DAC. The ADC will be used to measure the amplitude of analog signals and will be important in data acquisition systems. The analog comparator takes two analog inputs and produces a digital output depending on which analog input is greater. The QEI can be used to interface a brushless DC motor. USB is a high-speed serial communication channel. The Ethernet port can be used to bridge the microcontroller to the Internet or a local area network. The CAN creates a high-speed communication channel between microcontrollers and is commonly found in automotive and other distributed control applications. 4.1 Tiva TM4C123 LaunchPad I/O pins Pins on the TM4C family can be assigned to as many as eight different I/O functions. Pins can be configured for digital I/O, analog input, timer I/O, or serial I/O. For example PA0 can be digital I/O or serial input. There are two buses used for I/O. The digital I/O ports are connected to both the advanced peripheral bus and the advanced high-performance bus. Because of the multiple buses, the microcontroller can perform I/O bus cycles simultaneous with instruction fetches from flash ROM. The TM4C123GH6PM adds up to 16 PWM outputs. There are 43 I/O lines. There are twelve ADC inputs; each ADC can convert up to 1M samples per second. Table 6.1 lists the regular and alternate names of the port pins. Figure : I/O port pins for the TM4C123GH6PM microcontrollers. Each pin has one configuration bit in the GPIOAMSEL register. We set this bit to connect the port pin to the ADC or analog comparator. For digital functions, each pin also has four bits in the GPIOPCTL register, which we set to specify the alternative function for that pin (0 means regular I/O port). Not every pin can be connected to every alternative function. Pins PC3 – PC0 were left off Table 4.1 because these four pins are reserved for the JTAG debugger and should not be used for regular I/O. Notice, most alternate function modules (e.g., U0Rx) only exist on one pin (PA0). While other functions could be mapped to two or three pins (e.g., CAN0Rx could be mapped to one of the following: PB4, PE4, or PF0.) The microcontroller board provides an integrated In-Circuit Debug Interface (ICDI), which allows programming and debugging of the onboard TM4C123 microcontroller. One USB cable is used by the debugger (ICDI), and the other USB allows the user to develop USB applications (device). The user can select board power to come from either the debugger (ICDI) or the USB device (device) by setting the Power selection switch. Pins PA1 – PA0 create a serial port, which is linked through the debugger cable to the PC. The serial link is a physical UART as seen by the TM4C and mapped to a virtual COM port on the PC. The USB device interface uses PD4 and PD5. The JTAG debugger requires pins PC3 – PC0. The LaunchPad connects PB6 to PD0, and PB7 to PD1. If you wish to use both PB6 and PD0 you will need to remove the R9 resistor. Similarly, to use both PB7 and PD1 remove the R10 resistor. Figure: Tiva LaunchPad based on the TM4C123GH6PM. The Tiva LaunchPad evaluation board has two switches and one 3-color LED. See Figure 4.3. The switches are negative logic and will require activation of the internal pull-up resistors. In particular, you will set bits 0 and 4 in GPIO_PORTF_PUR_R register. The LED interfaces on PF3 – PF1 are positive logic. To use the LED, make the PF3 – PF1 pins an output. To activate the red color, output a one to PF1. The blue color is on PF2, and the green color is controlled by PF3. The 0-Ω resistors (R1, R2, R11, R12, R13, R25, and R29) can be removed to disconnect the corresponding pin from the external hardware. The LaunchPad has four 10-pin connectors, labeled as J1 J2 J3 J4 in Figures 4.2 and 4.4, to which you can attach your external signals. The top side of these connectors has male pins, and the bottom side has female sockets. Figure 4.3. Switch and LED interfaces on the Tiva LaunchPad Evaluation Board. The zero ohm resistors can be removed so the corresponding pin can be used for its regular purpose. 4.2 GPIO GPIO stand for General Purpose Input/Outputs, meaning that it's a module capable of receiving and transmitting signals. They work with digital signals but can be mixed to use the pins with other peripheral functions (ADC, SSI, UART, etc). Tiva GPIO’s The tm4c123gh6pm has 6 GPIO blocks, each with his own GPIO port (portA, port B, port C, port D , port E , port F). Up to 43 GPIOs, depending on configuration Highly flexible pin muxing allows use as GPIO or one of several peripheral functions 5-V-tolerant in input configuration Ports A-G accessed through the Advanced Peripheral Bus (APB) Fast toggle capable of a change every clock cycle for ports on AHB, every two clock cycles for ports on APB Programmable control for GPIO interrupt Interrupt generation masking Edge-triggered on rising, falling, or both Level-sensitive on High or Low values Bit masking in both read and write operations through address lines Can be used to initiate an ADC sample sequence or a μDMA transfer Pin state can be retained during Hibernation mode Pins configured as digital inputs are Schmitt-triggered Programmable control for GPIO pad configuration Weak pull-up or pull-down resistors 2-mA, 4-mA, and 8-mA pad drive for digital communication; up to four pads can sink 18-mA for high-current applications Slew rate control for 8-mA pad drive Open drain enables Digital input enables Note that PD4, PD5, PB0 and PB1 aren't 5V tolerant and are maxed at a 3.6V input. Each GPIO has 8 pins which should make a total of 48 pins but some of those are internal and can't be used so the maximum is 43. The launchpads usually have less since some are not physically available. The TM4C123 launchpad has just 37 GPIO pins. Alternate functions The GPIO allows digital inputs or outputs and also allows alternate functions. The alternate functions can be analog readings by muxing the ADC to a pin, or UART communication by making the right muxing. Very Important GPIO Pins With Special Considerations Some pins are locked to a certain configuration and can only be used if you unlock them. You need to do that in the GPIOLOCK register and uncommitted it by setting the GPIOCR register. If you use TivaWare this should work, just chose the right base: HWREG(GPIO_PORTx_BASE + GPIO_O_LOCK) = GPIO_LOCK_KEY; HWREG(GPIO_PORTx_BASE + GPIO_O_CR) |= 0x80; TM4C123 GPIO Programming The TI LaunchPad uses the TM4C123GH6PM microcontroller, which has 256K bytes (256KB) of onchip Flash memory for code, 32KB of on-chip SRAM for data, and a large number of on-chip peripherals. The ARM Cortex-M4 has 4GB (Giga bytes) of memory space. It uses memory mapped I/O, which means that the I/O peripheral ports are mapped into the 4GB memory space. Allocated size Flash 256 KB SRAM 32 KB I/O Allocated address 0x0000.0000 To 0x0003.FFFF 0x2000.0000 To 0x2000.7FFF All the peripherals 0x4000.0000 to 0x400F.FFFF The General Purpose I/O ports (GPIO) on TM4C123GXL LaunchPad are designated to port A to port F. The address range assigned to each GPIO port is shown as follows: Port A: 0x4000.4000 to 0x4000.4FFF Port B: 0x4000.5000 to 0x4000.5FFF Port C: 0x4000.6000 to 0x4000.6FFF Port D: 0x4000.7000 to 0x4000.7FFF Port E: 0x4002.4000 to 0x4002.4FFF Port F: 0x4002.5000 to 0x4002.5FFF The 4K bytes of memory space is assigned to each of the GPIO. The reason is that each GPIO has a large number of special function registers associated with it, and furthermore GPIO Data Register supports bit-specific addressing, which allows collective access to 1 to 8 bits in a data port. To initialize an I/O port for general use seven steps need to be performed. 1. Activate the clock for the port in the Run Mode Clock Gating Control Register 2 (RCGC2). 2. Unlock the port (LOCK = 0x4C4F434B). This step is only needed for pins PC03, PD7 and PF0 on TM4C123GXL LaunchPad. 3. Disable the analog function of the pin in the Analog Mode Select register (AMSEL), because we want to use the pin for digital I/O. If this pin is connected to the ADC or analog comparator, its corresponding bit in AMSELmust be set as 1. In our case, this pin is used as digital I/O, so its corresponding bit must be set as 0. 4. Clear bits in the port control register (PCTL) to select regular digital function. Each GPIO pin needs four bits in its corresponding PCTL register. Not every pin can be configured to every alternative function. Figure 2.2 shows which pin can be used as what kind of alternate functions. 5. Set its direction register (DIR). A DIR bit of 0 means input, and 1 means output. 6. Clear bits in the alternate Function Select register (AFSEL). 7. Enable digital port in the Digital Enable register (DEN). We need to add a short delay between activating the clock and setting the port registers. Figure: registers used to configure GPIO Figure: – PMCx bits in the GPIOPCTL register on the TM4C specify alternate functions. PD4 and PD5 are hardwired to the USB device. PA0 and PA1 are hardwired to the serial port The GPIO Data Register is located at the offset address of 0x000 from the base address of its port. As we mentioned before, the data register supports bit-specific addressing. In order to write to this register, the corresponding bits in the mask, resulting from the address bus bits[9:2], must be set. Otherwise, the bit values remain unchanged by the write. For example, writing to address 0x40004038 means that bits 1, 2 and 3 of port A must be changed, since the base address of port A is 0x40004000. The explanation is shown in below. The following table help you calculate offset address for the bits of a port, to which you want to access. If we want to access bit Offset Constanct 7 0x200 6 0x100 5 0x080 4 0x040 3 0x020 2 0x010 1 0x008 0 0x004 If we want to read and write all 8 bits of a port, it means that we need to sum all these 8 offset constants, which makes the offset address of 0x3FC (001111111100 in binary). 4.3 Peripheral and Memory Address A 32-bit processor can have 4 GB (=232) of address spaces. It depends on the architecture of the CPU how these address spaces are segregated, among the memory and peripherals. Peripheral Addressing There are two complementary methods of addressing I/O devices for input and output between CPU and peripheral. These are known as memory mapped I/O (MMIO) and port mapped I/O (PMIO). www.ti.com Peripheral and Memory Address In MMIO, same address bus is used to address both memory and peripheral devices. The address bus of the CPU is shared between the peripheral devices and memory devices attached to the CPU. Thus, any address accessed by the CPU may denote an address in the memory or a register of attached peripheral. In these architectures, same CPU instructions used for memory access can also be used for I/O access. In PMIO, peripheral devices possess a separate address bus from general memory devices. This is accomplished in most architectures by providing a separate address bus dedicated to the peripheral devices attached to the CPU. In these CPUs, the instruction set includes separate instructions to perform I/O access. A TM4C123GH6PM chip employs MMIO which implies that the peripherals are mapped into the 32-bit address bus. 4.4 Memory Mapped Peripherals A TM4C123GH6PM chip consists of a 256 KB of Flash memory and 32 KB of SRAM. Table 5 shows the memory map of a TM4C123GH6PM chip with addresses. Flash Memory Flash memory is structured into multiple blocks of single KB size which can be individually written to and erased. Flash memory is used for store program code. Constant data used in a program can also be stored in this memory. Lookup tables are used in many designs for performance improvement. These lookup tables are stored in this memory. Table: Memory Mapping in TM4C123GH6PM Chip SRAM The on-chip SRAM starts at address 0x2000.0000 of the device memory map. ARM provides a technology to reduce occurrences of read-modify-write (RMW) operations called bit-banding. This technology allows address aliasing of SRAM and peripheral to allow access of individual bits of the same memory in single atomic operation. For SRAM, the bit-band base is located at address 0x2200.0000. Bit band alias are computed according to following formula. bitband alias= bitband base + byte offset *32 + bit number *4 (2.1) Note: Bit banding is the technique to access and modifying content of bits in a register. It is helpful to finish the read-modify operation in single machine cycle. The region of the memory which device consider for modification is known as bit band region and the region of memory to which device maps the selected memory is known as bit band alias. The SRAM is implemented using two 32-bit wide SRAM banks (separate SRAM arrays). The banks are partitioned in a way that one bank contains all, even words (the even bank) and the other contains all odd words (the odd bank). A write access that is followed immediately by a read access to the same bank. This incurs a stall of a single clock cycle. Internal ROM The internal ROM of the TM4C123GH6PM device is located at address 0x0100.0000 of the device memory map. The ROM contains: -specific peripherals and interfaces functionality The boot loader is used as an initial program loader (when the Flash memory is empty) as well as an application-initiated firmware upgrade mechanism (by calling back to the boot loader). The Peripheral Driver Library, APIs in ROM can be called by applications, reducing flash memory requirements and freeing the Flash memory to be used for other purposes (such as additional features in the application). Advance Encryption Standard (AES) is a publicly defined encryption standard used by the U.S. Government and Cyclic Redundancy Check (CRC) is a technique to validate if a block of data has the same contents as when previously checked. Peripheral All Peripheral devices, timers, and ADCs are mapped as MMIO in address space 0x40000000 to 0x400FFFFF. Since the number of supported peripherals is different among ICs of ARM families, the upper limit of 0x400FFFFF is variant. Memory Layout in TIVATM Launchpad To observe the memory layout of TM4C123GH6PM, users can run an experiment on the board with a simple code provided below. This is a simple code that results in the glow of the GREEN LED. Example: Fig : Flowchart to glow onboard LED Pseudo code: Start: Set clock (division| PLL| 16 Mhz| main OSC) Configure the pins (Pin 1, 2, 3) Output: Toggle the led (Pin1, 2, 3) Delay generation (in nanoseconds) Run infinite Once this code is compiled, under workspace, if we expand <the project>/Debug, we can see the memory map file. 4.5 Watchdog Timer Every CPU has a system clock which drives the program counter. In every cycle, the program counter executes instructions stored in the flash memory of a microcontroller. These instructions are executed sequentially. There exist possibilities where a remotely installed system may freeze or run into an unplanned situation which may trigger an infinite loop. On encountering such situations, system reset or execution of the interrupt subroutine remains the only option. Watchdog timer provides a solution to this. A watchdog timer counter enters a counter lapse or timeout after it reaches certain count. Under normal operation, the program running the system continuously resets the watchdog timer. When the system enters an infinite loop or stops responding, it fails to reset the watchdog timer. In due time, the watchdog timer enters counter lapse. This timeout will trigger a reset signal to the system or call for an interrupt service routine (ISR). Fig : Operation of Watchdog Timer TM4C123GH6PM microcontroller has two Watchdog Timer modules, one module is clocked by the system clock (Watchdog Timer 0) and the other (Watchdog Timer 1) is clocked by the PIOSC therefore it requires synchronizers. Features of Watchdog Timer in TM4C123GH6PM controller: -bit down counter with a programmable load register protection from runaway software -enabled stalling when the microcontroller asserts the CPU halt flag during debug The watchdog timer can be configured to generate an interrupt to the controller on its first time out, and to generate a reset signal on its second time-out. Once the watchdog timer has been configured, the lock register can be written to prevent the timer configuration from being inadvertently altered. 4.6 Low Power Microcontroller Need for Low Power Microcontroller It is imperative for an embedded design to be low on its power consumption. Most embedded systems and devices run on battery. Power demands are increasing rapidly, but battery capacity cannot keep up with its pace. Therefore, a microcontroller which inherently consumes very less power is always encouraging. However, embedded systems engineers usually need to optimize between power and performance. Power and performance are inversely proportional to each other. Let us consider an example where we are to design a system to monitor water level in a tank. When the water level reduces below a particular level, water should be pumped in. There are many ways to go about this design. Hibernation Module on TivaTM Microcontrollers This module manages to remove and restore power to the microcontroller and its associated peripherals. This provides a means for reducing system power consumption. When the processor and peripherals are idle, power can be completely removed if the Hibernation module is only the one powered. Fig : Block diagram of Hibernation module To achieve this, the Hibernation (HiB) Module is added with following features: (i) A Real-Time Clock (RTC) to be used for wake events (ii) A battery backed SRAM for storing and restoring processor state. The SRAM consists of 16 32-bit word memory. The RTC is a 32- bit seconds counter and 15- bit sub second counter. It also has an add-in trim capability for precision control over time. The Microprocessor has a dedicated pin for waking using external signal. The RTC and the SRAM are operational only if there is a valid battery voltage. There is a VDD30N mode, which provides GPIO pin state during hibernation of the device. Thus we are actually shutting the power off for the device or part at the lowest power mode. Under such circumstances, it is safe to assume that in the wake up we are actually coming out of reset. But this will allow the device to the keep the GPIO pins in their state without resetting them. A mechanism for power control is used to shut down the part. In TM4C123GH6PM we have an on-chip power controller which controls power for the CPU only. There is also a pin output from the microcontroller which is used for system power control. It should be duly noted that in TIVA Launchpad, the battery voltage is directly connected to the processor voltage and it is always valid. But in a custom design with TM4C123GH6PM microcontroller running on a battery, if the battery voltage is not valid, it will not go into hibernation mode. The Hibernation module of TM4C123GH6PM provides two mechanisms for power control: -M4F. Table : Power Modes of Tiva The second mechanism controls the power to the microcontroller with a control signal (HIB) that signals an external voltage regulator to turn on or off. The Hibernation module power source is determined dynamically. The supply voltage of the Hibernation module is the larger of the main voltage source (VDD) or the battery voltage source (VBAT). Hibernate mode can be entered through one of two ways: The user initiates hibernation by setting the HIBREQ bit in the Hibernation Control (HIBCTL) register. Power is arbitrarily removed from VDD while a valid VBAT is applied Power Modes There are six power modes in which TM4C123GH6PM operates as shown in the below table. They are Run, Sleep, Deep Sleep, Hibernate with VDD3ON, Hibernate with RTC, and Hibernate without RTC. To understand all these modes and compare them, it is necessary to analyze them under a condition. Let us consider that the device is operating at 40 MHz system clock with PLL. Programming Hibernation Module This code can be compiled and executed on a TIVA Launchpad. When this code executes, the GREEN LED glows continuously. We can observe that after 4s, the system automatically goes into sleep and the LED stops glowing. When SW2 (switch on the right hand bottom corner of the Launchpad) is pressed, it triggers a wake event and the GREEN LED starts glowing again. Now, after 4s, the system goes to sleep again. This shows that, the wakeup process is the same as powering up. When the code starts, we can determine that the processor woke from hibernation and restore the processor state from the memory. Fig : Flowchart for programming hibernation module 4.7 Interrupts The reader is aware that a microprocessor is connected to several input and output devices. It is important at this point for us to know how a microprocessor manages these devices efficiently. Introduction to Interrupts and Polling A microprocessor executes instructions sequentially. Alongside, it is also connected to several devices. Dataflow between these devices and the microprocessor has to be managed effectively. There are two ways it is done in a microprocessor: either by using interrupts or by using polling. Polling Polling is a simple method of I/O access. In this method, the microcontroller continuously probes whether the device requires attention, i.e. if there is data to be exchanged. A polling function or subroutine is called repeatedly while a program is being executed. When the status of the device being polled responds to the interrogation, a data exchange is initiated. The polling subroutine consumes processing time from the presently executing task. This is a very inefficient way because I/O devices do not always crave for attention from the microprocessor. But the microprocessor wastes valuable processing time in unnecessarily polling of the devices. Optimizing for low power in embedded MCU designs Low-power embedded design is motivated by the need to run applications for as long as possible while consuming minimum power. In a battery-powered system, this need is magnified. Furthermore, low power implies lower cost of operation and smaller battery size to make applications more mobile. When energy comes at a premium as it does with today’s green initiatives, ensuring that an embedded design consumes as little energy as possible is even important for wall-powered applications. Designing power-efficient applications also ensures less overhead to manage thermal dissipation, and heat generation is controlled at the source by optimizing the power consumed. Given these advantages, embedded systems engineers can no longer ignore the problem of optimizing power. This article will focus on the major factors contributing to power consumption in an embedded system by analyzing the various power modes which most microcontrollers offer. Then we will analyze a real-life example of an embedded application in terms of power consumption and how its efficiency can be maximized. MCU Power Consumption There are several points to be aware of when selecting an MCU or external components. The overall power consumption of an MCU is defined by its power consumption in different modes, typically active and standby (which includes sleep, hibernate, etc.), and taking into account the power consumed to transition from one mode to another. Active power consumption by an MCU is the power consumed when the MCU is running. As almost all controllers are based upon CMOS logic, power is consumed primarily during the switching of transistors. As a starting point, we will analyze the power consumption of a CMOS inverter , which is the basic building block of any CMOS design. Figure: CMOS inverter CMOS circuits dissipate power by charging the various load capacitances whenever they are switched. When considering internal architectures, this is mainly the gate capacitance but there are drain and source capacitances too. Power is dissipated across the PMOS transistor while the load capacitor is being charged and across the NMOS when the load capacitor is being discharged. Instantaneous power dissipation across the NMOS transistor of a CMOS inverter is given by this equation: PPMOSi = iL(Vdd - Vo) After substituting the value of iL: PPMOSi= CL (Vdd - Vo) dVo/dt Total power dissipation across the PMOS to switch the output from low to high can be found by integrating power dissipation across the PMOS to the charge load capacitor from 0 V to Vdd: PMOS power consumption, PPMOS = ½ CLVdd2 Similarly, to switch the output from high to low, total power dissipation across the NMOS is: NMOS power consumption, PNMOS = ½ CLVdd2 For one switching cycle, then, power dissipation is: PTotal = PNMOS + PPMOS = CLVdd2 If we define the average power in terms of the switching frequency (f), we get: P = fCLVdd2 From this equation we can see that power consumption depends upon the switching frequency, load capacitance, and supply voltage. Load capacitance is determined by the technology parameters and the design layout, and is therefore beyond the control of the embedded system designer. However, the other two factors – switching frequency and supply voltage – are factors a system designer can modify to impact power efficiency for a given microcontroller. Of course, the value of these parameters is also heavily dependent on the application of the design. However, modern controllers run at an internal regulated voltage irrespective of the input voltage on the supply pins. There are controllers available in market that can be operated from 0.5 V to 5.5 V, but the internal core runs at a fixed regulated voltage such as 1.8V, no matter what the supply voltage is. Therefore, this parameter is not as important in the case of modern controllers as it was in the past. However, it is good to keep the supply voltage to the minimum requirement for regulators or near the voltage where the regulator is bypassed. This leaves system designers have just one parameter available for affecting power control: switching frequency. Hence, in the active mode the minimum required operating speed for the MCU should be calculated and higher clock speeds should be avoided. Stand by Power The other major factor which determines battery life is the standby power consumption of an embedded system. Most applications can spend significant periods of time in standby mode. In these systems, the major contributor to total system power consumption is the standby current rather than the active current. Standby current is the sum of leakage current, current consumed by power management circuits, clocking systems, power regulators, RTC, IOs, interrupt controllers, and so on. It varies from controller to controller, based upon the particular features and peripherals supported in standby mode. Finally, the power consumed while transitioning from low power mode to active mode should not be overlooked. Devices may end up wasting a significant amount of power while transitioning between these two modes. Based on these power modes, an MCU’s average power consumption is: MCU average power consumption = (Active Power + Sleep Power + Transition Power) / Total Time, where Active Power= Time for which MCU is active * Active Current Sleep Power= Time for which MCU is in sleep * Sleep Current. Transition Power= Power consumed while making transition from sleep to active mode The amount of time the system remains in active and standby mode is application-dependent. Some applications may need to have the MCU running all the time while some may need to have it running only occasionally. There are MCUs available on the market that come with power-down modes other than sleep, such as hibernate, deep sleep, or shut down, in which power consumption can be on the order of 10s of nA. System designers need to look at the power consumption of the mode in which the system has to operate for the majority of time to ensure that the overall design is as power efficient as it can be. If we look deeper, there are some vital tradeoffs that must be considered. For some applications, it could prove beneficial for the system to run at a higher speed so it can finish its job faster and return to a low power mode. Other systems may do better running at a slower speed to keep active power consumption low. Here, the system designer has to analyze the best case for the application considering the current at different operating speeds, the time it takes to come out of low power mode, the current consumption in low power mode, and the frequency with which the system needs to switch between active and sleep modes. Peripheral Power MCU power consumption is only one factor when considering system power consumption. Some engineers tend to concentrate too much on the MCU and ignore the power consumed by external peripherals. If the objective is to optimize the power consumption of the entire embedded system solution, one cannot afford to do this. Consider a simple temperature measurement system for home use (Figure 2). Figure : Temperature monitoring system This system has one ADC to measure the sensor voltage, one DAC to generate a reference, one LCD module to display data, and one MCU to process the data. Power should be saved beginning at the individual block level. If the power consumption is calculated for this system, it will be given by: Total power consumption = MCU power consumption + ADC power consumption + DAC power consumption + LCD power consumption For this system, the sample rate need not be very high since temperature does not change rapidly. Power consumption can be kept to a minimum by switching on the ADC and DAC only when required and optimizing the time ratio of MCU active and standby modes. If this system is made up of discrete components, it can get quite challenging to coordinate. Power consumption for discrete componentbased architectures will look similar to the system. Figure: Power consumption system using discrete-based designs In Figure 3, standby current is contributed to by MCU standby current and the active current of the ADC, DAC and LCD. To save ADC power consumption, the ADC may have an option by which the MCU can stop ADC conversion before it goes to sleep. However, there will still be some standby current for ADC and similarly for the DAC. Alternatively, the system could be implemented using a system-on-chip (SoC) architecture where all of the peripherals are integrated onto a single chip along with the ability to control the power of each individual peripheral. Power consumption of such a system will look similar to that shown in Figure 4 and can lead to a dramatic reduction in power consumption compared to a discrete component-based implementation. Click on image to enlarge. Figure: Power consumption of SoC-based solution vs. discrete solution When designing any system, we should use what is needed rather than simply using what is available. Choosing a faster or more sophisticated component than is needed results in higher cost and lower power efficiency. For example, a 20-bit ADC running at 1 Msps is clearly more than is needed for a temperature measurement application. In addition, the ADC needs a high-frequency operating clock to sample at this rate. Advancements in SoC technology allow developers to access a wide range of on-chip peripherals such as filters, ADCs, DACs, Op-amps, and programmable analog and digital blocks. For example, PSoC devices from Cypress Semiconductors have a wide operating frequency range with programmable clock sources for different blocks, including the MCU, and support numerous power management modes. These modes range from active mode, where all the features on the device are available, to hibernate modes, where current can be as low as 100 nA while retaining the contents of configuration registers and RAM. As complex as SoC architectures are, they represent almost the complete system, making it more straightforward to compute power consumption. For example, if the system is doing nothing, then the standby current of the whole system can be as low as 100 nA. Since peripherals and the MCU can be switched on or off individually, only the appropriate blocks need to resume operation after the next wake up event. This is one of the key benefits of an SoC can have from a system point of view. In some systems, for some periods the only hardware functions needed do not require an MCU, such as when generating a waveform using a DAC. This task can be completed by the DMA (Direct Memory Access) and DAC without the MCU, and so the MCU can be switched off. SoCs enable developers to design ultra-low-power embedded systems that are also cost and space efficient, with the added advantage of fast time to market. A system’s average power calculation in a SoC-based system becomes more complex since along with the average MCU current, we need to consider the operating state of each individual peripheral on the chip. Average system current is: Battery Life Battery life is a critical specification for any battery-powered application. Battery ratings are given in units of mA -Hr, meaning the battery can supply ‘X’ mA of current for one hour. If we know the average current, we can calculate battery life: Battery Life = Battery rating/ Iavg This equation gives the battery life in hours if Iavg is given in mA. Most consumers care about power consumption in both wall-powered and battery-powered devices. In today’s competitive market, designing a product that consumes higher power or costs more than competitive products can result in reduced market success. When optimizing power consumption is a major criterion, designers should look at critical parameters such as choosing the appropriate components and making sure they are not overrated for the desired end application, as well as making sure the system does not operate at higher speeds than required. In addition, developers will want to seriously consider how long the system spends in active and standby modes and the relative power consumption in each. Interrupts However, in interrupt method, whenever a device requires the attention from the microprocessors, it pings the microprocessor. This ping is called interrupt signal or sometimes interrupt request (IRQ). Every IRQ is associated with a subroutine that needs to be executed within the microprocessor. This subroutine is called interrupt service routine (ISR) or sometimes interrupt handler. The microprocessor halts current program execution and attends to the IRQ by executing the ISR. Once execution of ISR completes, the microprocessor resumes the halted task. The current state of the microprocessor must be saved before it attends the IRQ in order to be able to continue from where it was before the interrupt. To achieve this, the contents of all of its internal registers, both general purpose and special registers, are required to be saved to a memory section called the stack. On completion of the interrupt call, these register contents will be reinstated from the stack. This allows the microprocessor to resume its originally halted task. There are two types of interrupts namely software driven interrupts (SWI) and hardware driven interrupts (HWI). SWIs are generated from within a currently executing program. They are triggered by the interrupt opcode. A SWI will call a subroutine that allows a program to access certain lower level service. HWIs are signals from a device to the microprocessor. The device sets an interrupt line in the control bus high. Microprocessors have two types of hardware interrupts namely, non-maskable interrupt (NMI) and interrupt request (INTR). An NMI has a very high priority and they demand immediate execution. There is no option to ignore an NMI. NMI is exclusively used for events that are regarded as having a higher priority or tragic consequences for the system operation. For example, NMI can be initiated due to an interruption of power supply, a memory fault or pressing of the reset button. An INTR may be generated by a number of different devices all of which are connected to the single INTR control line. An INTR may or may not be attended by the microprocessor. If the microprocessor is attending an interrupt, then no further interrupts, other than an NMI, will be entertained until the current interrupt has been completed. A control signal is used by the microprocessor to acknowledge an INTR. This control signal is called ACK or sometimes INTA. Interrupt vector table It is discussed in the previous section that when an interrupt occurs, the microprocessor runs an associated ISR. IRQ is an input signal to the microprocessor. When a microprocessor receives an IRQ, it pushes the PC register onto the stack and load address of the ISR onto the PC register. This makes the microprocessor execute the ISR. These associated ISRs, corresponding to every interrupt, become a part of the executable program. This executable is loaded in the memory of the device. Under such circumstances, it becomes easier to manage the ISRs if there is a lookup table where address locations of all ISRs are listed. This lookup table is called Interrupt vector table. Table 2.9 shows an interrupt vector table for ARM cortex-M microcontroller. In ARM microcontroller, there exist 256 interrupts. Out of these, some are hardware or peripheral generated IRQs and some are software generated IRQs. However, first 15 interrupts, INT0 to INT15 are called the predefined interrupts. In ARM Cortex-M microcontrollers, Interrupt vector table is an on-chip module, called as Nested Vector Interrupt Controller (NVIC). NVIC is an on-chip interrupt controller for ARM Cortex-M series microcontrollers. No other ARM series has this on-chip NVIC. This means that the interrupt handling is primarily different in ARM Cortex-M microcontrollers compared to other ARM microcontrollers . Table : Interrupt Vector Table for ARM Cortex M4 Predefined Interrupts(INT0-INT15) RESET All ARM devices have a RESET pin which is invoked on device power-up or in case of warm reset. This exception is a special exception and has the highest priority. On the assertion of Reset signal, the execution stops immediately. When the Reset signal stops, execution starts from the address provided by the Reset entry in the vector table i.e. 0x0000.0004. Hereby, to run a program on Reset, it is necessary to place the program in 0x0000.0004 memory address. NMI In the ARM microcontroller, some pins are associated with hardware interrupts. They are often called IRQs (interrupt request) and NMI (non-maskable interrupt). IRQ can be controlled by software masking and unmasking. Unlike IRQ, NMI cannot be masked by software. This is why I is named as nonmaskable interrupt. As shown in Table 2.9, "INT 02" in ARM Cortex-M is used only for NMI. On activation of NMI, the microcontroller load memory location 0x0000008 to program counter. Hard Fault All the classes of fault corresponding to a fault handler cannot be activated. This may be a result of the fault handler being disabled or masked. Memory Management Fault It is caused by a memory protection unit violation. The violation can be caused by attempting to write into a read only memory. An instruction fetch is invalid when it is fetched from non-executable region of memory. In an ARM microcontroller with an on-chip MMU, the page fault can also be mapped into the memory management fault. Bus Fault A bus fault is an exception that arises due to a memory-related fault for an instruction or data memory transaction, such as a pre-fetch fault or a memory access fault. This fault can be enabled or disabled. Usage Fault Exception that occurs due to a fault associated with instruction execution. This includes undefined instruction, illegal unaligned access, invalid state on instruction execution, or an error on exception return may termed as usage fault. An unaligned address of a word or half-word memory access or division by zero can cause a usage fault. SVCall A supervisor call (SVC) is an exception that is activated by the SVC instruction. In an operating system, applications can use SVC instructions to contact OS kernel functions and device drivers. This is a software interrupt since it was raised from software, and not from a Hardware or peripheral exception. PendSV PendSV is pendable service call and interrupt-driven request for system-level service. PendSV is used for framework switching when no other exception is active. The Interrupt Control and State (INTCTRL) register is used to trigger PendSV. The PendSV is an interrupt and can wait until NVIC has time to service it when other urgent higher priority interrupts are being taken care. SysTick A SysTick exception is generated by the system timer when it reaches zero and is enabled to generate an interrupt. The software can also produce a SysTick exception using the Interrupt Control and State (INTCTRL) register. User Interrupts This interrupt is an exception signaled either by a peripheral or by a software request and fed through the NVIC based on their priority. All interrupts are asynchronous to instruction execution. In the system, peripherals use interrupts to communicate with the processor. An ISR can be also propelled as a result of an event at the peripheral devices. This may include timer timeout or completion of analog-to-digital converter (ADC) conversion. Each peripheral device has a group of special function registers that must be used to access the device for configuration. For a given peripheral interrupt to take effect, the interrupt for that peripheral must be enabled. 4.8 Timers Timers are basic constituents of most microcontrollers. Today, just about every microcontroller comes with one or more built-in timers. These are extremely useful to the embedded programmer - perhaps second in usefulness only to GPIO. The timer can be described as the counter hardware and can usually be constructed to count either regular or irregular clock pulses. Depending on the above usage, it can be a timer or a counter respectively. Sometimes, timers may also be termed as “hardware timers” to distinguish them from software timers. Software timers can be described as a stream of bits of software that achieve some timing function. The TM4C123GH6PM General-Purpose Timer Module (GPTM) contains six 16/32-bit GPTM blocks and six 32/64-bit Wide GPTM blocks. These programmable timers can be used to count or time external events that drive the Timer input pins. Timers can also be used to trigger μDMA transfers, to trigger analog-to-digital conversions (ADC) when a time-out occurs in periodic and one-shot modes. The GPT Module is one timing resource available on the Tiva™ C Series microcontrollers. Other timer resources include the System Timer (SysTick) and the PWM timer in PWM modules The General-Purpose Timer Module (GPTM) blocks with the following functional options: 16/32-bit operating modes: 1. 2. 3. 4. 5. 6. 7. 16- or 32-bit programmable one-shot timer 16- or 32-bit programmable periodic timer 16-bit general-purpose timer with an 8-bit prescaler 32-bit Real-Time Clock (RTC) when using an external 32.768-KHz clock as the input 16-bit input-edge count- or time-capture modes with an 8-bit prescaler 16-bit PWM mode with an 8-bit prescaler and software-programmable output inversion of the PWM signal 32/64-bit operating modes: 1. 2. 3. 4. 5. 6. 7. 32- or 64-bit programmable one-shot timer 32- or 64-bit programmable periodic timer 32-bit general-purpose timer with a 16-bit prescaler 64-bit Real-Time Clock (RTC) when using an external 32.768-KHz clock as the input 32-bit input-edge count- or time-capture modes with a16-bit prescaler 32-bit PWM mode with a 16-bit prescaler and software-programmable output inversion of the PWM signal Count up or down Twelve 16/32-bit Capture Compare PWM pins (CCP) Twelve 32/64-bit Capture Compare PWM pins (CCP) Daisy chaining of timer modules to allow a single timer to initiate multiple timing events Timer synchronization allows selected timers to start counting on the same clock cycle ADC event trigger User-enabled stalling when the microcontroller asserts CPU Halt flag during debug (excluding RTC mode) Ability to determine the elapsed time between the assertion of the timer interrupt and entry into the interrupt service routine Efficient transfers using Micro Direct Memory Access Controller (μDMA) 1. Dedicated channel for each timer 2. Burst request generated on timer interrupt Fig : GPTM block diagram The below table lists the external signals of the GP Timer module and describes the function of each. Table : General purpose Timer signals Basic Timers/Counters A standard timer will comprise a pre-scaler, an N-bit timer/counter register, one or more N-bit capture and compare registers. Usually N is 8, 16 or 32 bits. Along with these, there will also be registers for control and status units responsible to configure and monitor the timer. To count the incoming pulses, an up-counter is deployed as fundamental hardware. A counter can be converted to a timer by fixing incoming pulses and setting a known frequency. Also note that the size in bits of a timer should not be related directly to the size in bits of the CPU architecture. An 8-bit microcontroller can have 16-bit timers (in fact mostly do), and a 32-bit microcontroller can have 16-bit timers (and some do). Pre-scaler The pre-scaler takes the basic timer clock frequency as an input and divides it by some value depending upon the circuit requirements before feeding it to the timer, to configure the pre-scaler register(s). This configuration might be limited to a few fixed values (powers of 2), or integers from 1 to 2^m, where m is the number of pre-scaler bits. Pre-scaler is used to set the clock rate of the timer as per your desire. This provides a flexibility in resolution (high clock rate implies better resolution) and range (high clock rate causes quicker overflow Timer Register The timer register can be defined as hardware with an N-bit up-counter, which has accessibility of read and write command rights for the current count value, and to stop or reset the counter. As discussed, the timer is driven by the pre-scaler output. The regular pulses which drive the timer, irrespective of their source are often called “ticks”. We may understand now that it is not necessary for a timer to time in seconds or milliseconds, they do time in ticks. This enables us the elasticity to control the rate of these ticks, depending upon the hardware and software configuration. We may construct our design to some human-friendly value such as e.g. 1 millisecond or 1 microsecond, or any other design specified Capture Registers A capture registers are those hardware which can be routinely loaded with the current counter value upon the occurrence of some event, usually a change on an input pin. Therefore the capture register is used to capture a “snapshot” of the timer at the instant when the event occurs. A capture event can also be constructed to produce an interrupt, and the Interrupt Service Routines (ISR) can save or else use the justcaptured timer snapshot. There is no latency problem in snapshot value as the capture occurs in hardware, which would be if the capture was done in software. Capture registers can be used to time intervals between pulses or input signals, to determine the high and low times of input signals. Compare/Match Registers Compare or match registers hold a value against which the current timer value is routinely compared and shoots to trigger an event when the value in two registers matches. 1.If the timer/counter is configured as a timer, we can generate events at known and precise times. Events can be like output pin changes and/or interrupts and/or timer resets. 2.If the timer/counter is configured as a counter, the compare registers can generate events based on preset counts being achieved. For instance, the compare registers can be used to generate a timer “tick”, a fixed timer interrupt used for system software timing. For example, if a 2ms tick is desired, and the timer is configured with a 0.5us clock, setting a compare register to 4000 will cause a compare event after 2ms. If we set the compare event to generate an interrupt as well as to reset the timer to 0, the result will be an endless stream of 2ms interrupts. Another notable use of a compare register can be to generate a pulse with variable width. Set an output high/low when the timer is at 0, configure the compare register with value of pulse width, and on the compare event set the output low/high. We may use a second compare register with a larger value, to set the pulse interval by retuning the timer on compare. Real Time Clock(RTC) RTC is a mainframe clock that keeps track of the current time. RTCs are present in approximately every electronic device which needs to maintain accurate time. The term RTC came into picture to avoid confusion with regular hardware clocks which are merely signals that administer digital electronics, and do not count time in human units. Benefits of using RTC: 1. Low power consumption 2. Liberates the main system for time-critical tasks 3. Increases accuracy if compared to other methods A GPS receiver can cut down its startup time by comparing the current time as per its RTC, with the moment of last valid signal. If it has been less than a few hours, then the previous ephemeris is still usable. With the option of alternative power source with RTCs, they can continue to keep time while the primary power source being unavailable. This alternate source may be a lithium battery or a supercapacitor. Fig: Real Time Clock with external power source Fig: Frequency Based Measurement System Timing Generation and Measurement In various microprocessor systems, it is desirable to use frequency to formulate measurements, rather than the digital output of an ADC. Motivations for using frequency measurement include: 1. In systems with ground offsets, signals can be capacitively coupled or optically isolated to reduce ground loops and other damaging effects. 2. Noise introduced during analog transmission may be eliminated by transmitting a logic-level frequency signal instead. 3. Measuring frequency instead of analog values may allow an uncomplicated microprocessor to be used, since an ADC is not required. Today mostly, we can convert an analog (physical quantity) input, such as temperature, to a timebased signal that can be calculated with a microprocessor. Microprocessor with Capture Capability In a microprocessor with capture capability, the sensor output and microprocessor input can be connected for pulse capture. In the block diagram below, one such capture system is described. Here, a 16-bit register is used to capture a free-running, 16-bit counter when the input frequency changes from the lower state to higher state. At the same instance, a short pulse is triggered to reset the counter. In the illustration shown in Fig 3.4 below, one time period of the input is 90µs and the second period is 100µs. The counter here will count up 90 counts for the first period and 100 counts for the second period. Microprocessor without Capture Capability To perform measurements similar to discussed previously on Microprocessors without capture capability, we need to construct a counter free-run and join the frequency signal to an interrupt input. Here, the counter can be an external or internal part that is clocked from an imitative of the microprocessor clock. As an interrupt triggers, read and reset are performed by software to the counter. Due to variable interrupt latency, this method is somewhat less accurate than the capture method. In situation, when system latency should not affected by other interrupts, and also microprocessor is available with a non-maskable interrupt input, then this should be used for the frequency input. The frequency input can be linked to the input of a timer, and the timer should be programmed to increment with an external clock. The microprocessor then fetch/read the timer value on a periodic basis to get the number of counts that arose in the measurement period. Issues like Interrupt latency can be reduced by joining a period-based signal to a counter (running on the microprocessor clock), but counts only when the input is high. The counter will Count up while the input is high Hold the count while the input is low The processor can read the count during the count is low. Until the count goes high again, the processor keeps on reading. Hence, the count will be accurate. Measuring Period Based Inputs with Free Running Counter Fig: Period based input Read with free running counter Fig: Period based input Read with counter that increments only while gate input is HIGH (Gate connected to Period Based Input) 4.9 Pulse Width Modulation Pulse width modulation (PWM) is a simple but powerful technique of using a rectangular digital waveform to control an analog variable or simply controlling analog circuits with a microprocessor's digital outputs. PWM is employed in a wide variety of applications, from measurement & communications to power control and conversion. PWM using TIVA TM4C123HG6PM TM4C123GH6PM PWM module provides a great deal of flexibility and can generate simple PWM signals, such as those required by a simple charge pump as well as paired PWM signals with deadband delays, such as those required by a half-H bridge driver. Three generator blocks can also generate the full six channels of gate controls required by a 3-phase inverter bridge. Each PWM generator block has the following features: 1.One fault-condition handling inputs to quickly provide low-latency shutdown and prevent damage to -bit counter Runs in Down or Up/Down mode Output frequency controlled by a 16-bit load value Load value updates can be synchronized Comparator value updates can be synchronized Produces output signals o Output PWM signal is constructed based on actions taken as a result of the counter and PWM comparator output signals Produces two independent PWM signals www.ti.com Pulse Width Modulation 2.Dead-band generator Produces two PWM signals with programmable dead-band delays suitable for driving a half-H bridge. Can be bypassed, leaving input PWM signals unmodified. 3.Can initiate an ADC sample sequence The control block determines the polarity of the PWM signals and which signals are passed through to the pins. The output of the PWM generation blocks are managed by the output control block before being passed to the device pins. Fig 3.14. PWM Module Block Diagram Block Diagram TM4C123GH6PM controller contains two PWM modules, each with four generator blocks that generate eight independent PWM signals or four paired PWM signals with dead-band delays inserted. TM4C123GH6PM controller contains two PWM modules, each with four generator blocks that generate eight independent PWM signals or four paired PWM signals with dead-band delays inserted. Fig 3.15. PWM Generator Block Diagram Functional Description Clock Configuration The PWM has two clock source options: The System Clock A pre divided System Clock The clock source is selected by programming the USPWMDIV bit in the Run-Mode Clock Configuration (RCC) register. The PWMDIV bit field specifies the divisor of the system clock that is used to create the PWM Clock. PWM Timer The timer in each PWM generator runs in one of two modes: Count-Down mode or Count-Up/Down mode. In Count-Down mode, the timer counts from the load value to zero, goes back to the load value, and continues counting down. In Count-Up/Down mode, the timer counts from zero up to the load value, back down to zero, back up to the load value, and so on. Generally, Count-Down mode is used for generating left- or right-aligned PWM signals, while the Count-Up/Down mode is used for generating center-aligned PWM signals. The timers output three signals that are used in the PWM generation process: the direction signal (this is always Low in Count-Down mode, but alternates between low and high in Count-Up/Down mode), a single-clock-cycle-width High pulse when the counter is zero, and a single-clock-cycle-width High pulse when the counter is equal to the load value. Note that in Count-Down mode, the zero pulse is www.ti.com Pulse Width Modulation immediately followed by the load pulse. In the figures in this chapter, these signals are labelled "dir," "zero," and "load." PWM Comparators Each PWM generator has two comparators that monitor the value of the counter, when either comparator matches the counter, they output a single-clock-cycle-width High pulse, labeled "cmpA" and "cmpB" in the figures in this chapter. When in Count-Up/Down mode, these comparators match both when counting up and when counting down, and thus are qualified by the counter direction signal. These qualified pulses are used in the PWM generation process. If either comparator match value is greater than the counter load value, then that comparator never outputs a High pulse. Figure (a): PWM Count- Down Mode Figure(b): PWM Count- Up/ Down Mode PWM Signal Generator Each PWM generator takes the load, zero, cmpA, and cmpB pulses (qualified by the dir signal) and generates two internal PWM signals, pwmA and pwmB. In Count-Down mode, there are four events that can affect these signals: zero, load, match A down, and match B down. In Count-Up/Down mode, there are six events that can affect these signals: zero, load, match A down, match A up, match B down, and match B up. The match A or match B events are ignored when they coincide with the zero or load events. If the match A and match B events coincide, the first signal, pwmA, is generated based only on the match A event, and the second signal, pwmB, is generated based only on the match B event. Dead-Band Generator The pwmA and pwmB signals produced by each PWM generator are passed to the dead-band generator. If the dead-band generator is disabled, the PWM signals simply pass through to the pwmA' and pwmB' signals unmodified. If the dead-band generator is enabled, the pwmB signal is lost and two PWM signals are generated based on the pwmA signal. The first output PWM signal, pwmA' is the pwmA signal with the rising edge delayed by a programmable amount. The second output PWM signal, pwmB', is the inversion of the pwmA signal with a programmable delay added between the falling edge of the pwmA signal and the rising edge of the pwmB' signal. The resulting signals are a pair of active high signals where one is always high, except for a programmable amount of time at transitions where both are low. These signals are therefore suitable for driving a half-H bridge, with the dead-band delays preventing shoot-through current from damaging the power electronics. Fig: QEI Input Signal Logic 4.10 Quadrature Encoder Interface (QEI) A quadrature encoder, also known as a 2-channel incremental encoder, converts linear displacement into a pulse signal. By monitoring both the number of pulses and the relative phase of the two signals, you can track the position, direction of rotation, and speed. In addition, a third channel, or index signal, can be used to reset the position counter. A classic quadrature encoder has a slotted wheel like structure, to which a shaft of the motor is attached and a detector module that captures the movement of slots in the wheel. Interfacing QEI using Tiva TM4C123GH6PM The TM4C123GH6PM microcontroller includes two quadrature encoder interface (QEI) modules. Each QEI module interprets the code produced by a quadrature encoder wheel to integrate position over time and determine direction of rotation. In addition, it can capture a running estimate of the velocity of the encoder wheel. The TM4C123GH6PM microcontroller includes two QEI modules providing control of two motors at the same time with the following features: -in timer example, 12.5 MHz for a 50-MHz system) o Index pulse Quadrature Encoder Interfacewww.ti.com o Velocity-timer expiration o Direction change o Quadrature error detection Functional Description The QEI module interprets the two-bit gray code produced by a quadrature encoder wheel to integrate position over time and determine direction of rotation. In addition, it can capture a running estimate of the velocity of the encoder wheel. The position integrator and velocity capture can be independently enabled, though the position integrator must be enabled before the velocity capture can be enabled. The two phase signals, PhAn and PhBn, can be swapped before being interpreted by the QEI module Fig : QEI Block Diagram The QEI module input signals have a digital noise filter on them that can be enabled to prevent spurious operation. The noise filter requires that the inputs be stable for a specified number of consecutive clock cycles before updating the edge detector. The filter is enabled by the FILTEN bit in the QEI Control (QEICTL) register. The frequency of the input update is programmable using the FILTCNT bit field in the QEICTL register. The QEI module supports two modes of signal operation: Quadrature phase mode, the encoder produces two clocks that are 90 degrees out of phase, the edge relationship is used to determine the direction of rotation. Quadrature Encoder Interface Clock/direction mode, the encoder produces a clock signal to indicate steps and a direction signal to indicate the direction of rotation. This mode is determined by the SIGMODE bit of the QEICTL register. When the QEI module is set to use the quadrature phase mode (SIGMODE bit is clear), the capture mode for the position integrator can be set to update the position counter on every edge of the PhA signal or to update on every edge of both PhA and PhB. Updating the position counter on every PhA and PhB edge provides more positional resolution at the cost of less range in the positional counter. When edges on PhA lead edges on PhB, the position counter is incremented. When edges on PhB lead edges on PhA, the position counter is decremented. When a rising and falling edge pair is seen on one of the phases without any edges on the other, the direction of rotation has changed. The positional counter is automatically reset on one of two conditions: or The reset mode is determined by the RESMODE bit of the QEICTL register. RESMODE is set, the positional counter is reset when the index pulse is sensed. This mode limits the positional counter to the values [0: N-1], where N is the number of phase edges in a full revolution of the encoder wheel. The QEI Maximum Position (QEIMAXPOS) register must be programmed with N-1 so that the reverse direction from position 0 can move the position counter to N-1. In this mode, the position register contains the absolute position of the encoder relative to the index (or home) position once an index pulse has been seen. RESMODE is clear, the positional counter is constrained to the range [0: M], where M is the programmable maximum value. The index pulse is ignored by the positional counter in this mode. Velocity capture uses a configurable timer and a count register. The timer counts the number of phase edges (using the same configuration as for the position integrator) in a given time period. The edge count from the previous time period is available to the controller via the QEI Velocity (QEISPEED) register, while the edge count for the current time period is being accumulated in the QEI Velocity Counter (QEICOUNT) register. As soon as the current time period is complete, the total number of edges counted in that time period is made available in the QEISPEED register (overwriting the previous value), the QEICOUNT register is cleared, and counting commences on a new time period. The number of edges counted in a given time period is directly proportional to the velocity of the encoder. Case Study: TIVA based embedded system application using ADC & PWM This case study is for the application of the PWM based speed control of DC motor using Potentiometer. In this study all the sensors are initialized and then synchronized with the synchronization clock pulse. Here the sensor used is potentiometer which is connected to the ADC of the Tiva C Series Launchpad and the Motor is connected to the PWM pin of Launchpad as shown in the below diagram. The value read from the potentiometer is used to vary the duty cycle of the PWM to which the motor is connected, the value will change as per the rotation of the potentiometer. After executing this we can control the speed of the motor by adjusting the rotation of the potentiometer. Fig 3.19. Schematic for motor control using TIVA Fig 3.20. Flowchart for DC motor control using PWM Unit-V : Embedded communications protocols and Internet of Things Many serial communication interfaces compete for use in embedded systems. The right serial interface for your system depends on several key factors. In this article I will describe seven of the most common serial interfaces, to help you decide which bus is right for your next project. Why serial? There are many different reasons to use a serial interface. One of the most common is the need to interface with a PC, during development and/or in the field. Most, if not all PCs have some sort of serial bus interface available to connect peripherals. For embedded systems that must interface with a general-purpose computer, a serial interface is often easier to use than the ISA or PCI expansion bus. A benefit of serial communications is low pin counts. Serial communications can be performed with just one I/O pin, compared to eight or more for parallel communications. Many common embedded system peripherals, such as analog-to-digital and digital-to-analog converters, LCDs, and temperature sensors, support serial interfaces. Serial buses can also provide for inter-processor communication-a network, if you will. This allows large tasks that would normally require larger processors to be tackled with several inexpensive smaller processors. Serial interfaces allow processors to communicate without the need for shared memory and semaphores, and the problems they can create. This isn't to say that parallel buses have no use. For operational fetches, address and data buses, and other microprogram control, parallel buses have always been the clear winner. "Memory-mapping" peripherals has been a technique commonly used for systems with address and data buses. This tendency allows parallel access to off-chip peripherals. However, with many 8-bit microcontrollers (let alone 8-pin) with no external address/data bus available for designs, memory-mapping is not an option. Terminology Before we get into the individual interface details, we should define several terms: On an asynchronous bus, data is sent without a timing clock. A synchronous bus sends data with a timing clock. Full-duplex means data can be sent and received simultaneously. Half-duplex is when data can be sent or received, but not at the same time. Master/slave describes a bus where one device is the master and others are slaves. Master/slave buses are usually synchronous, as the master often supplies the timing clock for data being sent along in both directions. A multi-master bus is a master/slave bus that may have more than one master. These buses must have an arbitration scheme that can settle conflicts when more than one master wants to control the bus at the same time. Point-to-point or peer interfaces are where two devices have a peer relation to each other; there are no masters or slaves. Peer interfaces are most often asynchronous. The term multi-drop describes an interface in which there are several receivers and one transmitter. Multi-point describes a bus in which there are more than two peer transceivers. This is different from a multi-drop interface as it allows bidirectional communication over the same set of wires. RS-232 TIA/EIA-232-F (typically referred to as RS-232) is a common interface that can be found on almost every personal computer. RS-232 is a complete standard, not only including electrical characteristics, but physical and mechanical characteristics as well, such as connection hardware, pin-outs, and signal names. A point-to-point interface, RS-232 is capable of moderate distances at speeds up to 20Kbps. While not specifically called out in the specification, speeds of greater than 115.2Kbps are possible, provided that connections are short and proper grounding is used. Cable lengths of 30 feet are common, and cables of over 200 feet can be attained with low-capacitance cable. An RS-232 bus is an unbalanced bus capable of full-duplex communication between two receiver/transmitter pairs, named data terminal equipment (DTE) and data communication equipment (DCE). Each one has a transmit signal that is connected to the receive signal on the other end. As such, there is a pin difference between the two sides. (Your PC is a DTE, while the connected peripheral is DCE.) Each transmitter sends data by varying the voltage on the line. A voltage higher than 3V is a binary zero, while a voltage less than --3V is a binary one. Between these voltages, the value is undefined. To convert from logic levels (0 and 5V) to these levels and back, an RS-232 conversion IC, such as the 1488, 1489, or ubiquitous MAX232, can be used. Typical RS-232 communication consists of a start bit, data bits, parity bits (if any), and stop bit(s). When communicating with PCs, the typical format is eight data bits, no parity, and one stop bit (8N1). Seven data bits, even parity, and one stop bit (7E1) is also common. A start bit is often a zero and a stop bit is often a one, as shown in Figure 1. The official specification does not delineate any communications protocol, including the use of start/stop bits. Figure 1: RS-232 Many embedded systems that use the RS-232 bus either interface with PCs or PC peripherals such as modems. Other systems use RS-232 so that bus traffic can be monitored easily with an inexpensive protocol analyzer or a PC equipped with two serial ports. Almost every microcontroller vendor has products that include hardware support for RS-232, called Universal Asynchronous Receiver Transmitters (UARTs). UARTs are often interrupt-driven and capable of speeds up to 115.2Kbps with little software overhead, although this varies by architecture. RS-422 and RS-485 TIA/EIA-422-B (typically referred to as RS-422) and TIA/EIA-485-A (typically referred to as RS-485) are balanced, twisted-pair interfaces capable of speeds up to 10Mbps and distances up to 4,000 feet. Being differential buses, each uses signals from 1.5V to 6V to transmit the data. (With a differential, balanced bus, noise immunity is increased over a comparable single-ended, unbalanced bus such as RS-232.) The RS-422 interface is a multi-drop interface, giving unidirectional communication over a pair of wires from one transmitter to several receivers, up to 10 unit loads (UL). If the devices receiving the data wish to communicate back to the transmitter, the designer must use a separate, dedicated bus between each receiver and the transmitter. (Using this return bus will allow full-duplex transmissions.) For that reason, RS-422 is seldom used between more than two nodes. The RS-485 interface, on the other hand, is a bidirectional communication over one pair of wires between several transceivers. The specification states that the bus can include up to 32 UL worth of transceivers. Many manufacturers produce fractional-UL transceivers, thereby increasing the maximum number of devices to well over 100. The RS-422 and RS-485 interfaces often use the same start bit/data/stop bit format of RS-232. In fact, several converters exist to go from RS-232 to RS-485 and back. Do keep in mind, however, that RS-232 is a full-duplex interface, while RS-485 is half-duplex. Several microcontroller manufacturers provide built-in UARTs that boast special RS-485 abilities. I2C The Inter-Integrated Circuit bus (I2C) is a patented interface developed by Philips Semiconductors. (In order for an IC manufacturer to implement the I 2C bus in hardware, they must obtain licensing from Philips.) The I2C bus is a half-duplex, synchronous, multi-master bus requiring only two signal wires: data (SDA) and clock (SCL). These lines are pulled high via pull-up resistors and controlled by the hardware via opendrain drivers, giving a wired-AND interface. I2C uses an addressable communications protocol that allows the master to communicate with individual slaves using a 7-bit or 10-bit address. Each device has an address that is assigned by Philips to the manufacturer of the device. In addition, several special addresses exist, including a "general call" address (which addresses every device on the bus) and a high-speed initiation address. During communication with slave devices, the master generates all clock signals for both communication to and from the slave. Each communication begins with the master generating a start condition, an 8-bit data word, an acknowledge bit, followed by a stop condition or a repeated start. Each data bit transition takes place while SCL is low, except for the start and stop conditions. The start condition is a high-to-low transition of the SDA line while the SCL line is high. A stop condition is a low-to-high transition of the SDA line while the SCL line is high (see Figure 2). The acknowledge bit is generated by the receiver of the message by pulling the SDA line low while the master releases the line and allows it to float high. If the master reads the acknowledge bit as high, it should consider the last communication word not received and take appropriate action, including possibly resending the data. Figure 2: I2C I2C has a rather interesting feature called clock stretching, which is done when the slave device is unable to process the bit and wishes for more time. When this happens, the slave pulls the SCL line low. Since the signal behaves as a wired-AND, when the master releases the SCL line while the slave is "stretching" the clock, the master should notice that the line stays low. Upon seeing this, the master waits until the slave has processed the data bit and released the line. Once released by the slave, the SCL line floats back high, signaling to the master to send the next data bit.. The I2C bus has three speeds: slow (under 100Kbps), fast (400Kbps), and high-speed (3.4Mbps), each downward compatible. Philips has specified a recommended wiring arrangement should the signals need to leave the circuit board. I2C bus distances are often limited to on-board communications, although I have heard of developers using I2C successfully over distances of 50 feet! The true limit to I 2C distances is the bit-rate and capacitance of the bus. As such, for off-board communications, I2C is practically limited to under 10 feet for moderate speeds. For more details on I2C, read David and Roee Kalinsky's "Beginner's Corner: I2C" (August 2001). SPI The Serial Peripheral Interface (SPI) is a synchronous serial bus developed by Motorola and present on many of their microcontrollers. The SPI bus consists of four signals: master out slave in (MOSI), master in slave out (MISO), serial clock (SCK), and active-low slave select (/SS). As a multi-master/slave protocol, communications between the master and selected slave use the unidirectional MISO and MOSI lines, to achieve data rates over 1Mbps in full duplex mode. The data is clocked simultaneously into the slave and master based on SCK pulses the master supplies. The SPI protocol allows for four different clocking types, based on the polarity and phase of the SCK signal. It is important to ensure that these are compatible between master and slave. In addition to the 1Mbps data rate, another advantage to SPI is if only one slave device is used, the /SS line can be pulled low and the /SS signal does not have to be generated by the master. (This capability is, however, dependent on the phase selection of the SCK.) A disadvantage to SPI is the requirement to have separate /SS lines for each slave. Provided that extra I/O pins are available, or extra board space for a demultiplexer IC, this is not a problem. But for small, lowpin-count microcontrollers, a multi-slave SPI interface might not be a viable solution. For more detail on SPI, read David and Roee Kalinsky's "Beginner's Corner: Serial Peripheral Interface" (February 2002). Microwire Microwire is a three-wire synchronous interface developed by National Semiconductor and present on their COP8 processor family. Similar to SPI, Microwire is a master/slave bus, with serial data out of the master (SO), and serial data in to the master (SI), and signal clock (SK). These correspond to SPI's MOSI, MISO, and SCK, respectively. There is also a chip select signal, which acts similarly to SPI's /SS. A full-duplex bus, Microwire is capable of speeds of 625Kbps and faster (capacitance permitting). Microwire devices from National come with different protocols, based on their data needs. Unlike SPI, which is based on an 8-bit byte, Microwire permits variable length data, and also specifies a "continuous" bitstream mode. Microwire has the same advantages and disadvantages as SPI with respect to multiple slaves, which require multiple chip select lines. In some instances, an SPI device will work on a Microwire bus, as will a Microwire device work on an SPI bus, although this must be reviewed on a per-device basis. Both SPI and Microwire are generally limited to on-board communications and traces of no longer than 6 inches, although longer distances (up to 10 feet) can be achieved given proper capacitance and lower bit rates. 1-Wire Dallas Semiconductor's 1-Wire bus is an asynchronous, master/slave bus with no protocol for multimaster. Like the I2C bus, 1-Wire is half-duplex, using an open-drain topology on a single wire for bidirectional data transfer. However, the 1-Wire bus also allows the data wire to transfer power to the slave devices, although this is somewhat limited. Though limited to a maximum speed of 16Kbps, bus length can be upwards of 1,000 feet, given the proper pull-up resistor. For more detail on the 1-Wire bus, read H. Michael Willey's "One Cheap Network Topology" (January, 2001). Bit banging Should you not have hardware support for any of the above, it is possible to use general-purpose I/O pins. The act of software controlling a serial communication is often referred to as "bit banging," as the software is truly "banging away" at the adopted "serial port." Bit banging requires the software to be cognizant of the exact timing required for each bit, for it must toggle an output line for every bit change (as well as monitor the receive pin for incoming data, if such interface is full-duplex). Luckily for embedded developers, quite a few bit-banging routines are available on the Internet for every serial bus described here, and for use in almost every microcontroller architecture. In fact, several microcontroller manufacturers have developed and published their own such routines. Catching the right bus As you can see, there is a multitude of serial communication buses to choose from. (And we didn't even discuss wireless, networks, Firewire, and USB protocols.) Your choice in a serial bus should not only meet the needs of the product today, but also be available as well as viable for the life of the product. I hope this has helped you decide which serial interface is proper for your current embedded design. Table 1 Protocol comparison Name Sync /Async Type Duplex Max devices Max speed (Kbps) Max distance (Kbps) Pin count(1) RS-232 async peer full 2 20(2) 30(3) 2(4) RS-422 async multi-drop half 10(5) 10,000 4,000 1(6) RS-485 async multi-point half 32(5) 10,000 4,000 2 I2C sync multimaster half -7 3,400 <> 2 SPI sync multimaster full -7 >1,000 <> 3+1(8) Microwire sync master/slave full -7 >625 <> 3+1(8) 1-Wire async master/slave half -7 16 1,000 1s Notes -1 Not including ground. -2 Faster speeds available but not specified. -3 Dependent on capacitance of the wiring. -4 Software handshaking. Hardware handshaking requires additional pins. -5 Device count given in unit loads (UL). More devices are possible if fractional-UL receiv -6 Unidirectional communication only. Additional pins needed for each bidirectional commu -7 Limitation based on bus capacitance and bit rate. -8 Additional pins needed for every slave if slave count is more than one. What is Communication? Before we move on to serial communication, lets discuss a bit about communication in general. In simple terms, communication is an exchange of ideas between two individuals. Ideas can be anything and in any form – they could be written/spoken words, in form of media like audio/video, or if you like sci-fi, then it can also in form of telepathy! ;) But what does communication between two microcontrollers mean? Its simple! An exchange of data (bits)! There are many protocols for communication (which would be discussed later) but all of them are based on either serial communication or parallel communication. Why do we need Communication? Lets take an example. As kids, we all must have played with those remote controlled toy cars and airplanes. It was pretty fun and fascinating at that time. I am sure that most of us at that time didn’t try to figure out how it was possible! How could the remote control device in your hand control the car or the aeroplane? Well, of course, the device in your hand sends some data, which is received by the car/aeroplane. There is a microcontroller onboard the toy, which interprets the signals and acts accordingly. Correct! So far so good, but now it doesn’t end here. As grown ups, there are a few more questions which should arise! Like how does the device send the signal? From where is the signal being sent? What is actually being sent? Who receives it? How is it processed? Lets take another example. This one’s a more common example. You have a file in your mobile and you would like to share it with your friend who is sitting next to you? How would you do it – Bluetooth, IR, NFC, LAN or email? Mostly people would use Bluetooth. IR is obsolete, NFC is still in developmental phase and isn’t available in most devices, LAN needs a WiFi/LAN network whereas email requires an active Internet connection. The same questions can be put forth here as well – how is it send, from where is it sent and to where, what is being sent and how is it processed?! Well, this is why communication is required! And to answer all those questions, several communication protocols have been developed! Now lets discuss a little about serial and parallel communication. Serial Communication Serial Transfer In Telecommunication and Computer Science, serial communication is the process of sending/receiving data in one bit at a time. It is like you are firing bullets from a machine gun to a target… that’s one bullet at a time! ;) Parallel Communication Parallel Transfer Parallel communication is the process of sending/receiving multiple data bits at a time through parallel channels. It is like you are firing using a shotgun to a target – where multiple bullets are fired from the same gun at a time! ;) Serial vs Parallel Communication Now lets have a quick look at the differences between the two types of communications. Serial Communication Parallel Communication 1. One data bit is transceived at a time 1. Multiple data bits are transceived at a time 2. Faster 3. Higher number of cables required 2. Slower 3. Less number of cables required to transmit data Serial vs Parallel So these were the basic differences between serial and parallel communication. From the above differences, one would obviously think that parallel communication is far better than serial communication. But wait, these are just the basic differences. Before we proceed further, we need to be acquainted with a few terminologies: 1. Bit Rate: It is the number of bits that are transmitted (sent/received) per unit time. 2. Clock Skew: In a parallel circuit, clock skew is the time difference in the arrival of two sequentially adjacent registers. To explain it further, let us take the machine gun example again. When, say around 5 people are firing at the same time, there is bound to be a time difference in the arrival of the bullet from the first shooter and that from the second shooter and so on. This time difference is what we call clock skew. This is better illustrated in the picture below: There is a time lag in the data bits through different channels of the same bus. Clock skew is inevitable due to differences in physical conditions of the channels, like temperature, resistance, path length, etc 3. Crosstalk: Phenomenon by which a signal transmitted on one channel of a transmission bus creates an undesired effect in another channel. Undesired capacitive, inductive, or conductive coupling is usually what is called crosstalk, from one circuit, part of a circuit, or channel, to another. It can be seen from the following diagram that clock skew and crosstalk are inevitable. Major Factors Limiting Parallel Communication Before the development of high-speed serial technologies, the choice of parallel links over serial links was driven by these factors: 1. Speed: Superficially, the speed of a parallel link is equal to bit rate*number of channels. In practice, clock skew reduces the speed of every link to the slowest of all of the links. 2. Cable length: Crosstalk creates interference between the parallel lines, and the effect only magnifies with the length of the communication link. This limits the length of the communication cable that can be used. These two are the major factors, which limit the use of parallel communication. Advantages of Serial over Parallel Although a serial link may seem inferior to a parallel one, since it can transmit less data per clock cycle, it is often the case that serial links can be clocked considerably faster than parallel links in order to achieve a higher data rate. A number of factors allow serial to be clocked at a higher rate: Clock skew between different channels is not an issue (for un-clocked asynchronous serial communication links). A serial connection requires fewer interconnecting cables (e.g. wires/fibers) and hence occupies less space. The extra space allows for better isolation of the channel from its surroundings. Crosstalk is not a much significant issue, because there are fewer conductors in proximity. In many cases, serial is a better option because it is cheaper to implement. Many ICs have serial interfaces, as opposed to parallel ones, so that they have fewer pins and are therefore less expensive. It is because of these factors, serial communication is preferred over parallel communication. How is Data sent Serially? Since we already know what are registers and data bits, we would now be talking in these terms only. If not, I would recommend you to first take a detour and go through the introduction of this post by Mayank. When a particular data set is in the microcontroller, it is in parallel form, and any bit can be accessed irrespective of its bit number. When this data set is transferred into the output buffer to be transmitted, it is still in parallel form. This output buffer converts this data into Serial data (PISO) (Parallel In Serial Out), MSB (Most Significant Bit) first or LSB (Least Significant Bit) first as according to the protocol. Now this data is transmitted in Serial mode. When this data is received by another microcontroller in its receiver buffer, the receiver buffer converts it back into parallel data (SIPO) (Serial In Parallel Out) for further processing. The following diagram should make it clear. Data Transfer in Serial Communication This is how serial communication works! But it is not as simple as it looks. There is a catch in it, which we will discuss little later in the same post. For now, lets discuss about two modes of serial data transfer – synchronous and asynchronous. Serial Transmission Modes Serial data can be transferred in two modes – asynchronous and synchronous. Asynchronous Data Transfer Data Transfer is called Asynchronous when data bits are not “synchronized” with a clock line, i.e. there is no clock line at all! Lets take an analogy. Imagine you are playing a game with your friend where you have to throw colored balls (let’s say we have only two colors – red (R) and yellow (Y)). Lets assume you have unlimited number of balls. You have to throw a combination of these colored balls to your friend. So you start throwing the balls. You throw R, then R, then Y, then R again and so on. So you start your sequence RRYR… and then you end your round and start another round. How will your buddy on the other side know that you have finished sending him first round of balls and that you are already sending him the second round of balls?? He/she will be completely lost! How nice it would be if you both sit together and fix a protocol that each round consists of 8 balls! After every 8 balls, you will throw two R balls to ensure that your friend has caught up with you, and then you again start your second round of 8 balls. This is what we call asynchronous data transfer. Asynchronous data transfer has a protocol, which is usually as follows: The first bit is always the START bit (which signifies the start of communication on the serial line), followed by DATA bits (usually 8-bits), followed by a STOP bit (which signals the end of data packet). There may be a Parity bit just before the STOP bit. The Parity bit was earlier used for error checking, but is seldom used these days. The START bit is always low (0) while the STOP bit is always high (1). The following diagram explains it. Asynchronous Data Transfer Timing Diagram Synchronous Data Transfer Synchronous data transfer is when the data bits are “synchronized” with a clock pulse. We will take the same analogy as before. You are still playing the throw-ball game, but this time, you have set a timer in your watch such that it beeps every minute. You will not throw a ball unless you hear a beep from your watch. As soon as you hear a beep from your watch, you and your friend, both know that you are going to throw a ball to her. Both of you can keep a track of time using this; say you start a new round after every 8 beeps. Isn’t it a much better approach? This approach is what we call synchronous data transfer. The concept for synchronous data transfer is simple, and as follows: The basic principle is that data bit sampling (or in other words, say, ‘recording’) is done with respect to clock pluses, as you can see in the timing diagrams. Since data is sampled depending upon clock pulses, and since the clock sources are very reliable, so there is much less error in synchronous as compared to asynchronous. Synchronous Data Transfer Timing Diagram Serial Communication Terminologies Now its time to learn about some new words, which we will use frequently in the next few posts. There are many terminologies, or ‘keywords’ associated with serial communication. We will discuss all of them one by one: 1. MSB/LSB: this stands for Most Significant Bit (or Least Significant Bit). You can refer to Mayank’s this post for more information on MSB and LSB. Since data is transferred bit-by-bit in serial communication, one needs to know which bit is sent out first: MSB or LSB. 2. Simplex Communication: In this mode of serial communication, data can only be transferred from transmitter to receiver and not vice versa. 3. Half Duplex Communication: this means that data transmission can occur in only one direction at a time, i.e. either from master to slave, or slave to master, but not both. 4. Full Duplex Communication: full duplex communication means that data can be transmitted from the master to the slave, and from slave to the master as the same time! Types of Transmission 5. Baud Rate: according to Wikipedia, baud is synonymous to symbols per second or pulses per second. It is the unit of symbol rate, also known as baud or modulation rate. However, though technically incorrect, in the case of modem manufacturers baud commonly refers to bits per second. Importance of Baud Rate For two microcontrollers to communicate serially they should have the samebaud rate, else serial communication won’t work. This is because when you set a baud rate, you direct the microcontroller to transmit/receive the data at that particular rate. So if you set different baud rates, then the receiver might miss out the bits the transmitter is sending (because it is configured to receive data and process it with a different speed!) Different baud rates are available for use. The most common ones are 2400, 4800, 9600, 19200, 38400 etc. You cannot choose any arbitrary baud rate, there are some fixed values which you must use like 2400, 4800, etc. Please note that the unit of baud rate is bps (bits per second). The Catch in Serial Communication Now it’s all clear to you. You have data. You decide how to send your data (synchronous/asynchronous). You send your data by following proper protocols. The transmitter converts your parallel data to serial, sends it across the channel, then the receiver converts your serial data to parallel. Bingo! But that’s not sufficient for a proper serial communication. There are two things which still needs to be taken care of: 1. Baud Rate: Unless the baud rate of both the transmitter and receiver are the same, serial communication cannot work. The reason is specified in the previous section. 2. Address: If you are trying to send multiple data together over the same channel and/or you are sharing the same channel space with other users sending their own data, then you need to take care to properly address your data. We won’t discuss about it in this post, but we will surely discuss about it in one of our upcoming posts. If you take care of these two factors, your serial communication will be established perfectly and your data will go through properly. These are the two main reasons for unsuccessful serial link. UART and USART UART stands for Universal Asynchronous Receiver Transmitter, whereas USART stands for Universal Synchronous Asynchronous Receiver Transmitter. They are basically just a piece of computer hardware that converts parallel data into serial data. The only difference between them is that UART supports only asynchronous mode, whereas USART supports both asynchronous and synchronous modes. Unlike Ethernet, Firewire etc., there is no specific port for UART/USART. They are commonly used in conjugation with protocols like RS-232, RS-434 etc. (we have specific ports for these two!). In synchronous transmission, the clock data is recovered separately from the data stream and no start/stop bits are used. This improves the efficiency of transmission on suitable channels since more of the bits sent are usable data and not character framing. The USART has the following components: A clock generator, usually a multiple of the bit rate to allow sampling in the middle of a bit period Input and output shift registers Transmit/receive control Read/write control logic Transmit/receive buffers (optional) Parallel data bus buffer (optional) First-in, first-out (FIFO) buffer memory (optional) Serial Communication Protocols A variety of communication protocols have been developed based on serial communication in the past few decades. Some of them are: 1. SPI – Serial Peripheral Interface: It is a three-wire based communication system. One wire each for Master to slave and Vice-versa, and one for clock pulses. There is an additional SS (Slave Select) line, which is mostly used when we want to send/receive data between multiple ICs. 2. I2C – Inter-Integrated Circuit: Pronounced eye-two-see or eye-square-see, this is an advanced form of USART. The transmission speeds can be as high as a whopping 400KHz. The I2C bus has two wires – one for clock, and the other is the data line, which is bi-directional – this being the reason it is also sometimes (not always – there are a few conditions) called Two Wire Interface (TWI). It is a pretty new and revolutionary technology invented by Philips. 3. FireWire – Developed by Apple, they are high-speed buses capable of audio/video transmission. The bus contains a number of wires depending upon the port, which can be either a 4-pin one, or a 6-pin one, or an 8-pin one. 4. Ethernet: Used mostly in LAN connections, the bus consists of 8 lines, or 4 Tx/Rx pairs. 5. Universal serial bus (USB): This is the most popular of all. Is used for virtually all type of connections. The bus has 4 lines: VCC, Ground, Data+, and Data-. USB Pins 6. RS-232 – Recommended Standard 232: The RS-232 is typically connected using a DB9 connector, which has 9 pins, out of which 5 are input, 3 are output, and one is Ground. You can still find this socalled “Serial” port in some old PCs. In our upcoming posts, we will discuss mainly about RS232 and USART of AVR microcontrollers.