rapid prototyping of embedded systems

advertisement

RAPID PROTOTYPING OF EMBEDDED SYSTEMS

USING

FIELD PROGRAMMABLE GATE ARRAYS

Summa Cum Laude Thesis

Bhavya Daya

Bachelor of Science in Electrical Engineering

Bachelor of Science in Computer Engineering

Spring 2009

© 2009 Bhavya Daya ii

To:

God for granting me patience

My mom, dad and brother for their unwavering support iii

ACKNOWLEDGEMENTS

I would like to thank my supervisor, Professor Herman Lam, for his assistance throughout the honors research, Professor Eric Schwartz for obtaining the Xilinx development board for the project, and Professor Ann Gordon-Ross and Professor

Prabhat Mishra for being members of my supervisory committee.

I would also like to thank Mr. Steve Permann, student advisor, for his guidance and support throughout my undergraduate studies at the University of Florida. iv

Table of Contents

ACKNOWLEDGEMENTS ................................................................................... iv

LIST OF TABLES .................................................................................................. x

LIST OF FIGURES ............................................................................................... xi

ABSTRACT ......................................................................................................... xiv

CHAPTER 1 ........................................................................................................... 1

INTRODUCTION .................................................................................................. 1

What is an Embedded System? ........................................................................... 1

Design Considerations when Developing an Embedded System ....................... 4

Importance of Rapid Prototyping of Embedded Systems using FPGAs ............ 6

Scope of The Project ......................................................................................... 10

Outline of Chapters ........................................................................................... 11

CHAPTER 2 ......................................................................................................... 12

EMBEDDED SYSTEMS DESIGN ...................................................................... 12

Embedded Systems Design Flow...................................................................... 12

Three Generations of Embedded System Design ............................................. 16

Trends affecting Embedded System Design ..................................................... 19

Overview of Embedded System Hardware and Software ................................. 20

CHAPTER 3 ......................................................................................................... 21 v

EMBEDDED SYSTEM HARDWARE ............................................................... 21

Peripherals......................................................................................................... 22

Processor ........................................................................................................... 24

Microcontroller-Based ...................................................................................31

ASIC-Based ...................................................................................................32

DSP Processor-Based.....................................................................................35

FPGA-Based ..................................................................................................36

Memory ............................................................................................................. 44

CHAPTER 4 ......................................................................................................... 50

EMBEDDED SYSTEM SOFTWARE ................................................................. 50

Intellectual Property .......................................................................................... 50

Stages of Software Development ...................................................................... 51

Embedded Operating System ............................................................................ 54

Xilinx and Altera Software Tools ..................................................................... 58

CHAPTER 5 ......................................................................................................... 63

RAPID PROTOTYPING OF EMBEDDED SYSTEMS ..................................... 63

Rapid System Prototyping ................................................................................ 64

Prototyping of Embedded Hardware and Software Systems ............................ 69

CHAPTER 6 ......................................................................................................... 74

BOARD-LEVEL RAPID PROTOTYPING OF EMBEDDED SYSTEMS ........ 74 vi

Board Level Prototyping Methodology ............................................................ 74

Prototyping Platforms using FPGAs ................................................................. 75

Altera DE2 Development and Education Board

.........................................75

Xilinx FX12 PowerPC and Microblaze Embedded Development Kit

........84

CHAPTER 7 ......................................................................................................... 92

EMBEDDED SYSTEM DEVELOPMENT ......................................................... 92

Embedded System Design

.............................................................................. 95

Altera DE2 Board ..........................................................................................95

USB and Embedded Operating System ....................................................96

Choosing an Embedded Operating System ................................................98

UCLinux Operating System .......................................................................99

Porting uCLinux to Nios II Processor and Cyclone II FPGA ..................100

Means of Implementing Photo Frame Application Using uCLinux ........101

Design of Application Software ...............................................................102

Porting Application to Nios II processor .................................................103

SD Card and Nios Embedded Processor ..................................................104

Research of IP Cores ................................................................................105

Nios II Hardware Design .........................................................................105

SD Card Interface ....................................................................................106

VGA Interface ..........................................................................................110

vii

SRAM Controller .....................................................................................115

JPEG Decoder ..........................................................................................116

Application Software Design ...................................................................116

Xilinx PowerPC and MicroBlaze Development Kit FX12 Edition

...........118

Choosing an Embedded Processor

........................................................118

MicroBlaze Processor

............................................................................119

PowerPC Processor ..................................................................................121

Research IP Cores Available

..................................................................122

Compact Flash Interface

........................................................................123

VGA Interface

..........................................................................................125

Embedded Processor Hardware Design

...............................................126

Embedded Processor Software Design

.................................................129

Embedded System Implementation ................................................................ 130

Altera DE2 Board

........................................................................................130

USB and Embedded Operating System ..................................................131

SD Card and Nios Embedded Processor ..................................................134

Xilinx PowerPC and MicroBlaze Development Kit FX12 Edition

...........138

Altera DE2 and Xilinx FX12 .......................................................................... 140

CHAPTER 8 ....................................................................................................... 142

FUTURE AND SIGNIFICANCE OF EMBEDDED SYSTEMS ...................... 142 viii

CHAPTER 9 ....................................................................................................... 146

CONCLUSION ................................................................................................... 146

APPENDIX ......................................................................................................... 150

SD_Card.h ....................................................................................................... 150

Xsysace_selftest_example.c............................................................................ 157

LIST OF REFERENCES .................................................................................... 159 ix

LIST OF TABLES

Table 1: Comparison of Embedded Processor Cores within FPGAs ............................... 42

Table 2: FPGA comparison to ASIC ................................................................................ 43

Table 3: Intellectual Property Added at Different Phases ................................................ 52

Table 4: Common Embedded Operating Systems and Applications ................................ 57

Table 5: Demand for Rapid System Prototyping .............................................................. 68

Table 6: Nios II CPU Cores and Key Features ................................................................. 80

Table 7: PowerPC Processor Features .............................................................................. 90

Table 8: Embedded Processor Design Checklist [36] ....................................................... 94

Table 9: Comparison of Operating Systems for Altera Development Boards [15] .......... 98

Table 10: SPI Commands [20] ........................................................................................ 109 x

LIST OF FIGURES

Figure 1 : Time-to-market and market window .................................................................. 7

Figure 2: Design Cycles for FPGAs and ASICs ............................................................... 10

Figure 3: Hardware Software Partitioning and Co-design [5] .......................................... 15

Figure 4: Compression Technique Hardware Changes .................................................... 28

Figure 5: Modern Embedded System Components on a Single Chip .............................. 30

Figure 6: Processor Type versus Increasing Flexibility .................................................... 30

Figure 7: FPGA Underlying Fabric [5] ............................................................................. 37

Figure 8: Embedded RAM and Multipliers [5] ................................................................. 40

Figure 9: Hardcore processor within a FPGA [5] ............................................................. 41

Figure 10: Main Memory Options .................................................................................... 45

Figure 11: Intellectual Property Incorporation into the FPGA Design Cycle [5] ............. 51

Figure 12: Real-time kernel (left) vs. general-purpose operating system (right) [3] ........ 58

Figure 13: Debug Information View in Nios II IDE ......................................................... 60

Figure 14: Incremental Development Model [9] .............................................................. 66

Figure 15: Prototyping Design Cycle ............................................................................... 72

Figure 16: DE2 Development Board [10] ......................................................................... 76

Figure 17: DE2 Development Board Peripherals and FPGA [10] .................................... 77

Figure 18: JTAG Programming of Cyclone II FPGA ....................................................... 78

Figure 19: Active Serial Programming of Cyclone II FPGA ........................................... 79

Figure 20: Cyclone FPGAs ............................................................................................... 80

Figure 21: Nios II Processor Core .................................................................................... 81

Figure 22: Avalon Switch Fabric ...................................................................................... 83 xi

Figure 23: ML403 Development Board [7] ...................................................................... 85

Figure 24: ML403 Board and Virtex 4 FPGA Connections ............................................. 86

Figure 25: Different Methods of Programming of Virtex 4 FPGA .................................. 87

Figure 26: Microblaze Processor ...................................................................................... 88

Figure 27: PowerPC Processor Architecture .................................................................... 89

Figure 28: Application Design and Implementation Choices ........................................... 97

Figure 29: Hardware Design – High Level ..................................................................... 103

Figure 30: Hardware Design – SD Card ......................................................................... 106

Figure 31: SD Card Connected to FPGA [17] ................................................................ 107

Figure 32: SPI Command Structure [20] ........................................................................ 108

Figure 33: Initialization of Card into Different Modes [20] ........................................... 109

Figure 34: Block diagram of VGA Core [18] ................................................................. 112

Figure 35: VGA monitor with 640 columns × 480 rows. [18] ....................................... 113

Figure 36: Horizontal and vertical synchronization signals timing diagram [18] ......... 114

Figure 37: VGA Controller Circuit ................................................................................. 115

Figure 38: Software Design Flowchart ........................................................................... 117

Figure 39: Hardware View of ML403 Embedded MicroBlaze System [24] .................. 120

Figure 40: Hardware View of ML403 Embedded PPC405 System [23]........................ 121

Figure 41: Compact Flash 50 Pin Female Connector [26] ............................................. 123

Figure 42: System ACE Controller Block Diagram [25] ................................................ 124

Figure 43: Embedded Processor Hardware Design ........................................................ 128

Figure 44: Software Design Flowchart ........................................................................... 129

Figure 45: Nios II Processor Hardware Components ..................................................... 131 xii

Figure 46: uCLinux Operating System Running on DE2 Board .................................... 133

Figure 47: Hardware and Software Development in Xilinx Platform Studio ................. 138

Figure 48: Embedded Processor Hardware Implementation .......................................... 139

Figure 49: Transition to System on Silicon or System on Chip ..................................... 143

Figure 50: Rapid System Prototyping and Rapid Application Development ................ 149 xiii

ABSTRACT

The objectives of the project were to review developments in embedded system design and future trends, and to explore board-level rapid prototyping using FPGAs. The embedded process design flow consists of many important steps that make it essential to achieve a functioning final product within the allocated design time. In order to make appropriate decisions, embedded systems hardware and software knowledge is an important requirement. The embedded hardware decisions are analyzed in terms of processor, memory, and peripheral requirements and limitations. The embedded software process is reviewed as well as the necessary considerations to be made by the designer. The rapid prototyping strategy and the board-level prototyping method are described as a significant piece of the embedded system design flow. As an addition to simulation, prototyping provides functional and performance verification. The development platforms that were researched are the Altera DE2 board and the Xilinx

Microblaze and PowerPC FX12 Development Kit. A test application was designed following the principles of embedded system design and development. The future of embedded systems greatly shifts from system on board to system on silicon and designers have to take into account different limitations. With many-core processors emerging, designers need to be able to reach peak performance by utilizing the full potential of many cores. xiv

CHAPTER 1

INTRODUCTION

The development of an embedded system contains many stages and decisions.

The decisions are based on the application and the standard challenges posed when developing any embedded system. Before development, it is vital to understand what an embedded system is. The term is used frequently without giving much thought to the definition. Once a product idea is established, the stages of development depend on the product. Is the product an embedded system? If the product isn’t, different design considerations and stages of development are followed. The embedded system market contains many time constraints. The product should be deployed when the demand for the product still exists, if the demand ceases then the product would not yield any profit for the company. Rapid prototyping is needed in order for the product to be revealed within the time frame allotted. The steps of embedded system design and development are outlined and applied to a test application in later chapters.

What is an Embedded System?

An embedded system is a set of circuitry that is lodged within other devices. The presence of the internal computer or system is not immediately obvious, but the embedded system market is the fastest growing portion of the computer market. The embedded devices range from everyday devices to advanced embedded systems used for complex applications. A more formal definition is that an embedded system is a digital system with at least one processor that implements a hardware function that is a

1

part or all of the digital system. The processor that is used in an embedded system is an embedded processor.

The embedded system usually contains a single function. Some systems do exist that are programmable and contain couple different functions, such as PDAs.

Applications of embedded systems can be broken down into four types, signal processing, mission critical, distributed control, and small systems. Signal processing systems could encompass all embedded systems, but the scope is defined to radar, sonar, and real-time video applications. Mission critical systems include avionic, spacecraft control and nuclear plant control. Distributed control systems consist of large networks and routers and transit systems. Small systems are usually thought of when considering embedded systems, but it is important not to forget the other types. The most well-known small embedded system today is the cell phone. Other examples are digital cameras, sensors, and mp3 players. The future and trends of embedded systems are discussed in a later chapter.

Compared to desktop and server systems, embedded systems contain a larger range of processing power. The price of the system is constrained for embedded systems, unlike desktop systems, and it is a key factor when designing the products for this part of the computer market. The typical characteristics of an embedded system are as follows.

1.

Designed to perform a single or application specific task, rather than multiple tasks. Many embedded systems consist of small parts that fit within a larger

2

device. The larger device could be a general purpose system. An embedded system is usually part of a larger system.

2.

Many embedded systems contain real-time constraints. The design requirements vary by the applications, but usually power, cost, reliability and performance are emphasized. The amount of heat produced by the device may be of importance.

The weight of the device should be minimized for most embedded system applications.

3.

The embedded system should not cease operation. This is a farfetched goal, but the power usage and battery life should be utilized appropriately. Reduction in power usage will greatly increase the battery life and the system would operate for a longer period of time.

4.

Embedded systems usually interact with the outside world in the form of LCD displays, speakers, keyboard, and other visual and auditory signals. The interactions allow the users to operate the system and to specify certain commands.

5.

Although embedded systems are application specific, some degree of reprogrammability is desired and essential. The re-programmability assists when upgrading the devices, it is much easier to change the software slightly rather than develop the entire hardware from scratch.

6.

The program written for embedded systems, firmware, is stored in a limited amount of memory. Designers need to consider the limited memory and computer hardware resources when developing embedded systems.

3

The main goals when designing an embedded system are to minimize memory and power usage. The cost of the device will decrease when the parameters are optimized.

Tradeoffs when designing are also encountered and they will be discussed in the next section.

Design Considerations when Developing an Embedded System

Embedded systems are within every industry, from aerospace to consumer applications. With the new advances in embedded systems design, more complex applications may be implemented. During the development of an embedded system certain process models are followed. These models usually include the development of a working prototype of the final system. Embedded systems are single-functioned systems which are tightly constrained by power and cost, and are reactive and real-time.

Embedded problems can be solved using different approaches. Approaches that are used in practice are as follows.

1.

The designer can use a combined hardware/software approach that contains some custom hardware and an embedded processor core integrated within the custom hardware.

2.

The designer can create custom software that runs on an off-the-shelf embedded processor.

3.

The designer uses another type of processor besides a general purpose embedded processor, such as a digital signal processor, and a custom software.

4

Since embedded systems usually perform a single function, an Application Specific

Integrated Circuit (ASIC) is usually used in the final product development. When designing an embedded system, many design challenges emerge. These challenges determine the type of chip that will be used. In order to design a near optimal system, the following need to be considered besides the functionality and safety of the system.

1) Cost

2) Performance

3) Power

4) Maintainability

5) Size

6) Time-to-Market

Many embedded systems have substantially different design constraints than desktop computing applications. A single characterization cannot apply to the diverse spectrum of embedded system, and the considerations are weighed differently, based on the type of application and consumers.

The cost of the embedded system is a very important factor during the embedded system design process. The affordability of the product by many consumers and the profit that can be generated by the device is important while designing. The cost is considered based on the application at hand and it can vary depending on the product requirements. Performance is a factor that is always considered in systems. An embedded system should perform its functions and complete them quickly and accurately. High performance is especially emphasized in many embedded systems.

5

People using these systems want the functions of the system to be optimal. Low power is an important requirement for embedded systems. The embedded systems usually run on batteries and should last a long time before those batteries need to be changed. An ultra-low power design needs to be developed for long-term battery operation.

In many cases embedded systems must be repairable in a few minutes to a few hours, which imply that spare components and maintenance personnel must be located close to the system. A fast repair time may also imply that extensive diagnosis and data collection capabilities must be built into the system, which may affect the goal of keeping production costs low. A system self-test can be created in the design to lower the maintenance and diagnosis costs that might be incurred later.

Typically, embedded computers are physically located within some larger device or casing. Therefore, their shape and size may be dictated by the space available and the connections to the mechanical components. Time-to-market (TTM) is the length of time from the product idea conception until it is available for sale. TTM is important in industries where products are outdated quickly, such as the technology industry. The market window, shown in Figure 1, is crucial to deploying a product in the embedded systems technology industry. The typical TTM is eight months. The company needs to deploy when the peak revenue can be attained.

Importance of Rapid Prototyping of Embedded Systems using FPGAs

The significance of rapid prototyping of embedded systems can best be explained by briefly reviewing the trends seen in the embedded system development process.

6

Figure 1 : Time-to-market and market window

The first trend noticed is that the life cycle of embedded products is becoming increasingly smaller. This will lead to new developments taking place more frequently to replace the outdated products. The second trend is that the complexity of the embedded system is rapidly increasing. With this increase in functionality and complexity of systems, the embedded system design cycle may be longer and require more time and manpower. The consumer’s demand for increasing functionality translates directly into the increased complexity of the embedded system on a chip.

There exists a complexity gap between the application requirements and the capabilities of current silicon technologies. The real world system-on–chip (SOC) complexities lag behind the capabilities of the silicon hardware even though the demand for high complexity functionality is increasing tremendously. The tools to exploit the hardware fully has not been developed as of yet. Rapid prototyping of embedded systems may alleviate the complexity gap problem and assist with the current trends in the embedded system market. Rapid system prototyping will allow the

7

designers to explore other design alternatives and to unveil design errors as early as possible, given the short development period. The embedded system’s short time-tomarket window greatly benefits from the rapid development of prototypes.

Rapid system prototyping is especially useful when new hardware and software developments are being researched. The development model for embedded system creation should consist of a prototyping phase for feasibility studies and final product and testing. The development model should lead to a quick and well-tested product.

With rapid prototyping the product can be developed, tested, and deployed quite easily.

The devices that are used range according to the requirements of the application and the degree to which the design challenges are satisfied. The device that is impacting embedded systems is the Field Programmable Gate Array (FPGA). The impact of FPGAs occurs on the prototyping phase of development as well as the final product development. The prototyping of embedded systems using FPGAs will be considered throughout this paper. Embedded systems can be developed using microcontrollers, microprocessors, ASICs and FPGAs. These methods of implementation usually require hardware to be designed and built. Another alternative method is to use a board-based system. The main advantage of using the board-based method is the reduced work load involved in development. Therefore, board-based embedded system design can be rapidly developed and is ideal for prototyping.

A key question is why should FPGAs be used instead of microprocessors, microcontrollers and ASICs? The individual technologies as it applies to embedded systems will be discussed in detail later. Microprocessors and microcontrollers are

8

already being applied in many systems. FPGAs and ASICs can be placed along a spectrum that ranges from configurable to “frozen in silicon.” The functionality of the FPGA can be customized in the field. The ASIC cannot be changed after a certain point in the design process is passed. The disadvantage of ASICs is that the designing and building of the device is very time-consuming and expensive. The final design created for the ASIC cannot be modified without going through the long process of development again.

FPGAs are of great interest when it comes to prototyping a system due to the efficient system development time. The design flow for each device is shown in Figure 2. The development of a prototype should be efficient in order for the final product to be marketed quickly. Therefore, a FPGA can be used for the prototype and an ASIC can be used for the final product.

It may seem like ASICs are the most risky of the available silicon implementation choices. Traditional cell-based ASICs can be expensive to design, manufacture, and change. Many view this inflexibility to change as the biggest reason to avoid ASICs. But the ASIC problem has been greatly exaggerated and misunderstood. ASICs have been projected in the market as being expensive, unreliable, and unpredictable, but this is true for very high-complexity ASICs (e.g., 10 million gates at 90 nm), it is not true for most ASIC projects.

Since the development of the embedded system contains many challenges, the most prominent being time, a rapid prototype is essential to evaluate the practical necessity and functionality of the system.

9

Figure 2: Design Cycles for FPGAs and ASICs

Due to the efficient design cycle of FPGAs, it is considered for developing rapid prototypes during the embedded system design process. With the use of prototypes, it is also important that simulations be performed to further speed up the design process.

Simulations are useful even before developing a prototype of the device or system. It is preferable that both are used when developing.

Scope of The Project

The objectives of the project were to review developments in embedded system design and future trends, and to explore board-level rapid prototyping using FPGAs.

Although board-based system design is advantageous, it also contains the disadvantage of cost and restrictions on the functionality that is available. The use of

10

boards in the prototyping stage will lead to a quick prototype of the system for evaluation and testing. The final design may not be a board-based design because of the limitations of the final product, such as power, memory, and performance.

Outline of Chapters

The embedded system design flow was briefly mentioned in this introductory chapter. Chapter 2 contains an in-depth description of the embedded system design flow and considerations. Chapters 3 and 4 contain more information about the hardware and software aspects of the embedded system. The design flow requires the designer to be able to make appropriate decisions with regard to the embedded processor, memory, peripherals, tools, compiler, etc…. These chapters provide details necessary for these decisions to be made informatively. Chapter 5 continues the discussion on rapid prototyping and its benefits. Chapter 6 continues onto board-level rapid prototyping and its use in rapid prototyping, benefits, and drawbacks. Knowing the different platforms and the processor, peripherals, and memory capabilities on the board greatly assists in choosing the correct platform for the application. Chapter 7 applies the concepts to an application and reveals feedback as to the design process and usefulness. Chapter 8 concludes with the future and long term trends in embedded system design and development.

11

CHAPTER 2

EMBEDDED SYSTEMS DESIGN

Embedded systems are very diverse and one particular approach does not easily apply to all. When a successful embedded system is developed, the lessons learned when developing that system cannot be generalized to all embedded systems. The design steps emphasized in this section are used to develop a good embedded system design, but it may not be optimal. The tools for creating an optimal design have not been developed yet, and with the short design time allotted it is difficult to even create a good design.

The general design flow taken by all embedded system designers is outlined. The three generations of embedded system design depict the trend taking place in design practices. Design practices are moving away from the specificity that implementation platforms and programming languages offer and more toward greater generality. [1]

Besides the general design flow, other methodologies have been developed. Some of these general methodologies are discussed in detail in order to understand the shift taking place in embedded systems design.

Embedded Systems Design Flow

The embedded system design flow consists of the following steps: modeling, refining, hardware-software partitioning, scheduling, and mapping. Modeling is the process of designing the system and experimenting with algorithms involved in the

12

embedded application. During the initial stage of the design, product planning and requirements engineering are performed. Virtual prototypes and “mock-models” are used to explore the functional and software specification with the client. It is better to adjust the design specification early in the design process to ensure the customer’s needs are met.

The application design is further refined into smaller pieces during the refining or what is also called the partitioning phase. The pieces interact to perform the required function. Hardware-software partitioning is separating the pieces into hardware and software units. The piece of the application can be implemented in either custom hardware or the software will define its functionality on a programmable processor. The crucial aspect of this step is the co-design and the joint optimization of hardware and software. The next step is the scheduling of functions. Several set of instructions may want to access the same hardware; therefore the scheduling has to be completely accurate for correct functionality. The mapping phase is the last part of the design flow.

It involves the mapping of the functional description into software that runs on a processor and/or custom or semi-custom hardware.

Embedded system design can be broken down into two main parts, hardware and software. The hardware aspect of the design is implemented using hardware packages, hardware description language programs, and/or gates. The software aspect deals with the high level C or C++ program that performs the sequence of steps necessary for the system to operate as specified. The decision of separating the design into the software and hardware parts is known as hardware/software partitioning. This

13

is a difficult task when using FPGAs because it isn’t apparent which modules should be implemented in hardware and which in software. For other embedded system technologies, where the hardware is fixed, the hardware/software partitioning step is not necessary.

Figure 3 displays the design flow as it partitions into the hardware and software aspects of the embedded system design. One of the main partitioning criteria is the speed of the individual functions composing the entire system. If the logic is in the picosecond and nanosecond range, the FPGA fabric implements it. If the logic is in the microsecond range, implementation can be performed in hardware as well as software.

For millisecond logic, implementation in software is easier to accomplish than hardware, because the hardware will be slowed down to implement this type of function. The majority of the hardware/software partitioning decisions are made when the function’s speed allow the flexibility of implementation in either software or hardware.

Considerable research is being performed with regards to the hardwaresoftware partitioning problem. Many methods ranging from highly mathematical and theoretical to highly practical are being developed. The importance of the partitioning will be further emphasized when a solution to the creation of an optimal partition is found. The hardware-software partitioning research focuses on the partitioning when applied to a specific set of embedded systems; a general methodology is not present.

The steps discussed are a high level overview of the required processes to develop a completed embedded system. Aside from the five steps, other decisions, not

14

mentioned, need to be taken. The decision of which hardware to use is decided even before the steps are refined and partitioned between hardware and software.

Figure 3: Hardware Software Partitioning and Co-design [5]

The choice of hardware limits the implementation and thus different decisions need to be made along the design flow. As was mentioned before, a hardware/software partitioning step can be eliminated depending on the hardware used for the system design. The integration of hardware and software seems simple, but in some or most cases it can prove to be quite difficult.

A prototype is any form of specification, hardware and/or software that is built and designed for evaluation purposes. The prototype should be executable, and a rapid prototype should be easier and faster to develop than the final product. A prototype is a way for the designers to evaluate the product idea by collecting data and obtaining feedback, unveil any deviations from the requirements developed initially with the stakeholders, and improve on any flaws in the design.

15

The iterative steps of prototyping and simulation are essential when developing an embedded system. It may seem that once a prototype is developed, the need for simulations diminishes, but simulations are crucial to the design process. Simulations allow the evaluation of the design, both hardware and software, before the prototype is even developed. When rapid prototyping wasn’t utilized, simulations were extremely important because time did not allow for many errors. If an error occurred while developing a prototype, adjustment and recreation of the prototype was very timeconsuming. Simulations ensure that the prototype developed functions accurately and the time-to-market goal can be accomplished.

Formal methods in embedded design have not been properly established and therefore much research is left to be completed. In [2] the formal design methods are labeled as the “unexplored frontier of embedded system design.” The author considers the need for formal design methods in embedded system design accelerating within five years. In many cases it isn’t essential for the embedded system design to be optimized according to power or throughput, instead business-driven and life-cycle factors influence the design decisions.

Three Generations of Embedded System Design

Originally the design created for the embedded system relied on the implementation tools that will be used. The evolution of embedded systems shows the separation taking place between the design and implementation details.

16

The first generation consists of language and synthesis based designs. [1]

Language-based designs relate to the software aspect of the design process and synthesis-based designs relate to the hardware aspect of the design process. The particular programming language is mainly considered when designing using the language-based approach. The programming language is dependent on the target system. A synthesis-based approach is taken by first developing a system description in a form easily transferrable to a hardware description language. The system description is used to develop the embedded system.

The second generation begins the separation of the design and implementation levels early in the design process. This allows independence from the execution platform. Implementation technologies have emerged in this generation which can be applied to different platforms. SystemC combines synchronous hardware semantics with asynchronous execution mechanisms from software (C++). The implementation requires hardware-software partitioning.

The third generation is based on modeling languages. [1] Modeling languages allow the next step in achieving design independence from the platform. Examples of modeling languages are the Unified Modeling Language (UML) and Architecture Analysis and Design Language (AADL). The languages are independent of the programming language and platform that will be used for implementation.

The goal in any model-based design approach is to describe system components within a modeling language that does not commit the designer early on either to a specific execution and interaction semantics or to specific implementation choices. The

17

current general methodology is a model-based approach. Systems engineering methodologies are either critical or best effort. Critical methods try to guarantee system safety at all costs, even when the system operates under extreme conditions. Besteffort methods try to optimize system performance (and cost) when the system operates under expected conditions. One views design as a constraint-satisfaction problem and the other views it as an optimization problem.

Meeting hard constraints and making the best possible use of available resources work against each other. Critical and best-effort engineering do not easily coincide.

Critical systems engineering can lead to the underutilization of resources, best-effort systems engineering to temporary unavailability. The gap between the two approaches will continue to widen as the uncertainties in embedded systems design increase. [1]

Embedded systems are becoming more widespread with a range of applications.

Because of this the design environments are not well-known and largely depend on the application. The behavior cannot be accurately predicted and a disparity is seen between the worst-case and expected behaviors. Considerable progress has been seen in VLSI design with multi-core architectures, pipelines, and speculative execution.

Embedded systems are being developed on these sophisticated hardware architectures.

As the gap between critical and best-effort designs increases, partitioned architectures are likely to become more prevalent. Partitions physically separate critical and noncritical system parts, letting each run in dedicated memory space during dedicated time slots. The partitioning the system into parts allows each part to be optimized appropriately.

18

Although a general design method is good practice for embedded system designers, many designers still use the platform-based design methodology. The platform-based approach definitely doesn’t separate the design and implementation details. It is often easier for the designer to develop the system around a particular platform instead of designing for any platform. The reuse of hardware and software components assists when designing different systems with similar components. This allows some generality even though the design method is specific.

Trends affecting Embedded System Design

A designer doesn’t have to only consider the architectures to be utilized for the embedded system, but also the trends at the chip level. The decisions made by the designer impact the future of the embedded system being designed. Choosing the processors and architecture that is moving with the market trends will result in the embedded system design being useful for quite a few years before another design needs to be made with newer technology. Microelectronics issues and trends contribute to the design issues and future trends of embedded systems. Other commercial happenings are also affecting the embedded systems industry. A good embedded system designer considers many aspects when designing and makes choices that are conducive with the technology movement taking place.

19

Overview of Embedded System Hardware and Software

The hardware portion of the design process consists of using Verilog or VHDL to define custom hardware, using parts from the library, or gate level parts. Often a design environment or board contains intellectual property that can be integrated into the design. Configurable regions or parts are developed and available in design environments. These are used for commonly used components such as arithmetic functions. The use of embedded processors in the design is a major decision. The decision depends on the application and design parameters.

The software portion of the design process consists of developing the program either in machine language or a chosen high level language. The software process choices depend on the tools available to the designer. Using a supported processor core assist when designing the software for the embedded system because of the software tools that come with the processor. If the designer uses his or her own processor, the tools would not be available to develop the software in C or C++. In this case, machine language is the most logical choice for software implementation.

Despite the choices made when designing the hardware and software, the embedded system will essentially be a hardware block with inputs and outputs. When completing the software part, it can also be seen as a hardware block with inputs and outputs. The external connections to the software block either connect to an external port of the embedded system or there are interconnections between the hardware and software parts of the system.

20

CHAPTER 3

EMBEDDED SYSTEM HARDWARE

Design of embedded systems has evolved from the transistor level to gate level and register transfer level. Having a higher level of abstraction is beneficial when implementing complex hardware systems. Programming at an even higher level known as the system level, the designer can be concerned with the functionality of the system being designed. At this higher level of abstraction, the designer can specify the functionality of the system using a procedural language, such as the C language.

Embedded systems contain many elements as listed below:

1.

Peripherals

2.

Processor

3.

Memory

4.

Software

5.

Algorithms

Peripheral devices allow the embedded system to communicate with the external world. These devices are used for sending and receiving information or signals, debugging purposes, and timing. Timing is controlled by changing a certain output at a fixed time using counters and timers.

The main criterion for embedded processors is that it needs to provide enough processing power to perform the tasks. Memory is an important part of an embedded

21

system. Memory provides storage for the software that will run on the processor or entire system. Non-volatile memory must be used so that the program contents are retained when power is no longer applied to the device. Random access memory is much more expensive than read only memory, therefore many embedded systems have less RAM available than ROM. The program written for embedded systems sometimes have to minimize the usage of the RAM.

Software component encompasses the entire functionality of the system. It includes the operating system or run-time environment of the embedded system. The configuration and error handling of the system is performed by the software. Algorithms are a part of embedded software which describes in a sequence of steps the function of the embedded system.

The individual hardware components that embedded systems are constituted of are discussed. The design decisions that are not outlined in the high level design methodology are considered through in a stepwise fashion.

Peripherals

Peripherals consist of inputs and outputs to the embedded system. The design of the system relies heavily on the understanding of the communication interfaces in and out of the device. When the interface devices are known, then only can the system be designed to communicate appropriately with them. There are different types of devices that can be used as inputs and outputs. Sensors, LCD displays, speakers, keyboards, and infrared devices are used to communication with the outside world. The communication

22

interfaces are used for output display and for input needed from the user and for debugging purposes. The user interface can range from simple LED systems to complex graphical and touch sensitive features.

There are many different types of sensors. Sensors are designed for many physical quantities, such as water, image, pressure, infrared, sound, and biometrics. [3]

Smart systems can only be developed due to the use of sensors. The other communication interfaces include serial, USB, and Ethernet. The communication interfaces can be categorized as wireless media, optical media, and wires. The wireless media includes radio frequency and infrared signal based communication.

The communication interfaces are essential when the embedded system connect to other devices to communicate data for processing. If the system is independent from other devices the user interfaces are mainly used and the serial and USB communication are encapsulated within the system.

Usually the customer requirements include the peripherals that are sought. The designer is limited by the requirements provided during the first design steps. The communication interfaces within the system and the user interfaces for debugging are the choice of the designer. As interfaces are included in the design, the complexity also increases. Usually for debugging purposes a simple LCD display for LED lights are used.

The simplest wired communication is the RS232 or serial communication. USB and wireless communication can be more complicated, especially if the designer is using it for debugging. The speed of the communication interface influence the decision of which technology to utilize. Speed is a parameter that also needs to be considered when

23

designing a high performance system. Any user will be pleased with a fast embedded system and thus the considerations of speed are always important.

Processor

The hardware design doesn’t only consist of designing with embedded processors. The basics of hardware design consist of using VHDL and/or Verilog to create custom hardware that will complete certain hardware functions of the system.

The first design decision is whether to use a processor or not. Based on the application and the design time constraints a choice has to be made. Relatively simple hardware functions can be developed in VHDL and/or Verilog. The same function could also be implemented in by writing a program that will run on an existing or custom processor. If the design is very simple, an entirely hardware based design is the better approach. The embedded system design process that is reviewed is making the assumption that the application is complex enough for prototypes, simulation and testing to be essential for a successful product to be launched. When designing with an embedded processor, the design process involves: [4]

1.

Selection of a processor

2.

Design and configuration of the processor memory and interfaces

3.

Developing the software for the processor to perform the hardware function

The reason for choosing a processor mostly depends on the application, but there are some general and important requirements that apply to all embedded

24

systems. Factors such as efficiency and cost are major concerns for users of the product and designers. Developing the system requires attention to the interfaces and memory requirements. The second step in the processor design process is not separate from the first step. When selecting a processor, the memory and interfaces are also of utmost importance.

For high-performance applications and for large markets, application-specific integrated circuits (ASICs) can be designed. The cost of manufacturing the ASICs is quite high. Thus, microcontrollers are dominating the embedded system industry due to the programmability and its applicability to many embedded systems. The use of FPGAs is mainly considered for the prototyping stage, but many researchers claim a trend towards FPGA based embedded systems. The advantages of using FPGAs for the prototyping stage were discussed briefly earlier. The FPGA and ASIC design cycles are considerably different. The FPGA design cycle time is shorter than the ASIC design cycle time. Other than time, the cost of the prototype should be kept minimal because it may be destroyed once it is no longer of use. FPGAs are quick and cheap to develop with and are close to ideal for prototype development.

Another decision besides the processor for the prototype is the processor for the final system. The choices depend on the application. The advantages and disadvantages of each of the common processor types are understood in order to make the correct choices when design and implementation development begins. The embedded system hardware can range from traffic controllers to cellular phones. The underlying hardware can consist of a few choices. These are chosen based on the requirements of the

25

system, such as fault tolerance or high processing capability. Standard hardware can be used that will meet to the requirements of the system. This hardware can consist of an

FPGA with an embedded hard or soft microprocessor.

According to [3], the key advantage of processors is their flexibility. The author is referring to processors other than ASICs. The overall behavior of the embedded system can be changed by adjusted the software running on the processor or processors. The flexibility of microprocessors has made it a popular tool for developing embedded systems. Microcontrollers which contain microprocessors are the main processing units in embedded systems. Efficiency is another factor that is important when choosing embedded processors. Efficiency can be considered in terms of energy efficiency, codesize efficiency, and runtime efficiency. [3]

Energy usage is important for embedded systems. The system usually runs on batteries and requires operation to continue for a long time. Energy efficiency is considered at all abstractions levels. From designing the instruction set down to the design of the chip manufacturing process, energy efficiency needs to be considered.

Power dissipation and energy efficiency can be considered at the chip level. Different methods to optimize the chip design exist. Chips can enter different power saving states extending the battery life of the embedded system. The three states are run, idle, and sleep. [3] The run state occurs when the processor is performing some computation.

The idle state is when the processor is waiting for interrupts. The input interfaces and timers cause the processor to transition from the idle state to the run state. During the sleep state, the processor is no longer active and the power consumption decreases

26

tremendously when the transition occurs to the sleep state from another state. In order to reduce power consumption of a processor, either the supply voltage or the load capacitance should be decreased. The supply voltage needs to be optimized more than the load capacitance due to the quadratic relationship between the supply voltage and the power consumption of the processor.

Code-size efficiency means the minimization of the code size. Since embedded systems rely on a limited amount of memory and do not usually have a hard disk. When code size is decreased, the amount of memory needed for the system also decreases.

Smaller memory is faster and consumes less power. The area occupied by the memory is also of importance. The more compact the memory, the memory element can be retrieved faster. A large percentage of a chip is usually taken up by memory. Code size and memory size needed are crucial to energy optimization. For system on a chip, the memory and processors are implemented on the same chip. In this case, the memory is considered as embedded memory. [3]

In order to reduce the size of the program, either the designer of the software should be very cautious when programming and reduce the number of lines of code or compression techniques can be used. The software for complex embedded systems is bound to contain many lines of code. The programmer can reduce the code size slightly if careful programming practices are used. On the other hand, compression techniques seem more promising due to the reduced effort needed by the programmer.

Compression techniques are methods in which the program is compressed and then stored in memory. This reduces the energy usage, area, as well as the time to fetch

27

instructions. [3] When compression is used, a de-compressor needs to be present within the system to decode the compressed instructions. The designer needs to consider the benefits and drawbacks of adding a decoder and using compression. When the program and system isn’t complex and the decoder would just add more complexity to the system, compression should not be used. Figure 4 shows a simplified model of the embedded system with and without the compression technique usage. The memory shown in Figure 4 could also be read-only-memory (ROM). Saving ROM and randomaccess-memory (RAM) areas is the main goal of using compression. The cost of memory increases with the size and therefore reducing the size will also result in a minimized cost of the embedded system.

Figure 4: Compression Technique Hardware Changes

The requirements posed by the stakeholders and users usually contain strict time constraints. High clock frequencies speed up the embedded system, but other methods can be utilized. The architecture of the processor plays a huge role in the speed of the entire system. This reverts back to the importance of choosing the

28

appropriate processor when designing the embedded system. The application domain influences the processor type to use in order to optimize the run-time efficiency of the system.

More options are available when choosing embedded processors. The options allow the designer to tailor the embedded processor to best fit the hardware function of the application.

Different embedded processors and the reasons for using them when designing are reviewed in the sections below. The microcontroller, ASIC, DSP, and FPGA based embedded system hardware design process is examined. The processors that can be contained within FPGAs are briefly discussed in the FPGA based embedded system section. The architecture of the processors within the FPGA is discussed in a later chapter. The overview of each processor will allow a designer to compare and choose the appropriate processor for the application, once the decision to use a hardcore processor has been made.

An embedded system may also contain a combination of many of these processors that are outlined. An embedded system may have application specific hardware for better performance and low power, software on programmable processors such as DSPs and microcontrollers for flexibility, and mechanical transducers or actuators or other peripherals. A modern embedded hardware system, Figure 5, usually contains many different components due to the emerging applications being complex, and multi-purpose. Figure 5 shows the different elements when integrated on a single chip.

29

Figure 5: Modern Embedded System Components on a Single Chip

The embedded processors for final implementation range on the scale of flexibility and performance. The majority of the comparison occurs between FPGAs and

ASICs. The microcontroller and digital signal processors are mentioned to create awareness that FPGAs and ASICs are not the only processor options. The designer’s needs and use for the processor determine which processor to utilize. The figure below shows the four processors and the placement along the spectrum of flexibility. Flexibility is emphasized here because for rapid prototyping flexibility is more important than performance. The flexibility of the processor will allow quick modifications to be performed.

Flexibility of Processor

DSP

Microcontrollor

FPGA

ASIC

Figure 6: Processor Type versus Increasing Flexibility

30

Microcontroller-Based

A large number of processors used in embedded systems are in fact microcontrollers. Microcontrollers can be used easily in a design. The difference between a microprocessor and microcontroller is that a microprocessor is contained within a microcontroller. A microcontroller is essentially a computer system on a chip and it contains a processor core, memory, and programmable input/output peripherals.

It is mandatory that microcontrollers provide real time response to events in the embedded system they are controlling. When a certain event occurs, it triggers an interrupt signal. The signal causes the current operation to be suspended and the interrupt service routine (ISR) begins. The ISR will perform certain processes that depend on the source of the interrupt. Once the routine has finished, the processor returns to the program it was performing before the interrupt occurred. Possible interrupt sources are device dependent, and often include events such as an internal timer overflow, completing an analog to digital conversion, a logic level change on an input such as from a button being pressed, and data received on a communication link.

Where power consumption is important as in battery operated devices, interrupts may also wake a microcontroller from a low power sleep state where the processor is halted until required to do something by a peripheral event.

Many embedded microprocessors include a variety of timers as well. One of the most common types of timers is the Programmable Interval Timer, or PIT for short. A PIT just counts down from some value to zero. Once it reaches zero, it sends an interrupt to the processor indicating that it has finished counting.

31

Microcontroller programs must fit in the available on-chip program memory, since it would be costly to provide a system with external memory. Compilers and assembly language are used to turn high-level language programs into a compact machine code for storage in the microcontroller's memory. Depending on the device, the program memory may be permanent, read-only memory that can only be programmed at the factory, or program memory may be field-alterable flash or erasable read-only memory.

Since embedded processors are usually used to control devices, they sometimes need to accept input from the device they are controlling. This is the purpose of the analog to digital converter. Since processors are built to interpret and process digital data, i.e. 1s and 0s, they won't be able to do anything with the analog signals that is sent to it by a device. The analog to digital converter is used to convert the incoming data into a form that the processor can recognize. There is also a digital to analog converter that allows the processor to send data to the device it is controlling.

ASIC-Based

An ASIC can be easily used for an embedded system that has a single purpose.

According to [5], ASICs can be categorized, in increasing complexity, as gate arrays, structured ASICs, standard cell, and full custom.

Gate arrays are based on basic cells consisting of a collection of unconnected transistors and resistors. The vendor determines the optimum mix to be provided in a basic cell. The use of the basic cell will also determine the optimum mix, such as a pure

32

CMOS cell will contain PMOS, NMOS and resistor. On the other hand, a BiCMOS cell will contain the components in the CMOS cell with additional BJTs. The proportions of the components are the vendors’ choice. These basic cells are arranged into single-column or dual-column arrays. Each array is created on a chip that contains basic cells arranged into an array and I/O pads. The logical implementation required will be achieved by creating metallization layers that will link the components inside the basic cells and link basic cells to each other and to the inputs and outputs of the device. The cost advantages of gate arrays are evident because many components are prefabricated and only one step is required to create the ASIC. The disadvantage is that most designs don’t utilize all the resources and the routing isn’t optimal. [5] The performance and power consumption of the device are negatively impacted by these disadvantages.

Structured ASICs contain a mixture of prefabricated generic logic (implemented either as gates, multiplexer, or LUTs), one or more registers, and possibly small local

RAM. The idea is that the device can be customized using only metallization layers. The difference, when compared to gate arrays, is the greater sophistication of the structured

ASIC tile and also most of the metallization layers are also predefined. Only some metallization layers need to be customized. This form of ASIC device yields the same benefits as the gate array. The time for developing the masks for the device is greatly reduced from the gate array approach. The drawbacks of this method are mainly low performance and large power consumption.

Standard cells address the problems associated with gate arrays. This method was developed prior to structured ASICs. Unlike gate arrays and structured ASICs,

33

standard cells do not use the concept of a basic cell and no components are prefabricated on the chip. Special tools are used to place each logic gate individually in the netlist and determine the optimum way in which the gates are to be routed. Ever layer in the device’s fabrication requires custom masks. On a side note, masks are needed for integrated circuit fabrication because they allow the creation of transistors and gates at the appropriate locations. It is one of the design based parameters for chip fabrication. As the chip design varies, the appropriate masks have to be created. The standard cell approach provides a closer-to-optimal utilization of silicon than gate arrays and structured ASICs. [5]

Full custom ASIC devices contain more complex implementations than the standard cells. The approach is similar to standard cells, but the full-custom ASIC contains a specific, complex functionality while the standard cells address simpler logic implementations.

ASICs are very useful when the embedded system application contains a single function. The definition of an embedded system, clearly states that the system performs a single function. With the new trends taking place towards a system on chip, single functionality may be a thing of the past. The ASICs are useful for developing the singlefunction embedded systems of today, but the system on chips of tomorrow will require greater flexibility and reconfigurability. Microcontrollers are most commonly used due to programmability of the function of the chip. This feature is also not available for

ASICs. Some hybrid processors contain both ASIC hardware and additional

34

programmable hardware. This provides some flexibility as well as higher performance when compared to the other processor choices.

DSP Processor-Based

Digital Signal Processors or DSP are best suited for handling digital signal processing applications. Digital signal processing is the branch of electronics that is concerned with the representation and manipulation of signals in digital form. When the requirement of the embedded system is to mainly perform signal processing, such as wireless signals, DSPs should be chosen. Besides DSPs, there are different methods of implementing the signal processing application. The alternative choices are to use a general purpose microprocessor, dedicated ASIC hardware, dedicated FPGA hardware.

DSPs are superior to these alternatives because the chip is designed to perform digital signal processing tasks much faster, more efficiently, and lower cost. The microprocessor can implement the appropriate signal processing algorithm. A digital signal processing core can be added to the ASIC and FPGA to implement the application.

The choice of processor for digital signal processing applications mainly depends on the consumer’s needs and requirements. If a certain efficiency and speed is required, the best approach would be to use a digital signal processor. When using a microprocessor or DSP chip, software DSP realization is important. The design is greatly dependent on the software aspect of the design. The algorithms require a large number of mathematical operations to be performed quickly on a set of data.

35

Many digital signal processing applications have a constraint on the latency. The time constraints imposed by the application determines the choice of processor, as mentioned before. The memory architecture of a DSP allows for more than one data/instruction to be fetched at the same time. DSP also supports specialized operations that are used frequently in the application set it caters to. An example is the multiply and accumulate (MAC) operation.

Real-time operations are usually handled by DSPs due to the efficiency and lower cost. The FPGA, ASIC, and microcontroller can also perform digital signal processing applications. The ultimate choice is dependent on the stakeholder’s and consumers’ needs and requirements. With further advances in the FPGA and ASIC technology, DSPs could be a thing of the past.

FPGA-Based

The FPGA can be used as the main processor in an embedded system, a coprocessor, or a processor used for quick prototyping. The FPGA design cycle was shown to be short, quick, and affordable for a prototype and maybe for a final implementation.

The FPGA is capable of implementing very large and complex functions and it can also perform DSP, ASIC, and Microcontroller functions. The flexibility offered by FPGAs is incredible because the design can be easily changed without much effort from the designer. This flexibility isn’t available when designing with ASICs and DSPs.

Microcontrollers offer some flexibility but only with regard to the software of the system, the hardware cannot be changed.

36

Most of the FPGA fabric consists of many relatively simple programmable logic blocks and interconnect as shown in Figure 7. Majority of the FPGAs are based on SRAM configuration cells, which means they can be reconfigured many times. [5] New designs can be quickly implemented with SRAM-based FPGA architecture, making it the forefront of technology and useful for research and development (R&D). SRAM based

FPGA architecture has a pitfall that when the power to the FPGA is turned off, the logic blocks no longer retain the previous design data. The FPGA has to be reprogrammed each time the system is powered up.

There are two classes of FPGA architectures. [5] FPGAs can be either fine or course grained. The underlying fabric as shown in Figure 7 remains the same, the choices of how each programmable block should be used is differentiated by the two classes of FPGA architectures.

Figure 7: FPGA Underlying Fabric [5]

When designing, the designer needs to consider which of the two classes of

FPGA Architectures to use for the application. Fine-grained implementations use the

37

programmable logic blocks for simple functions and many interconnects are required to perform the function. A large number of connections are incurred when using the finegrained approach. Parallelizable applications benefit from using a fine-grained approach because the application is broken into many simple functions that can perform computations simultaneously.

The course-grained approach results in the programmable logic blocks being used for more complex purposes than the fine-grained approach. Less communication between the logic blocks is needed when the blocks implement more complex functions; therefore the number of connections between the blocks is smaller than the fine-grained approach. Increased communication or interconnects cause larger delay in the implementation. Course grained approach may be faster than the fine grained approach due to the reduced interconnect.

The internals of the FPGA logic blocks can be further decomposed into MUX

(multiplexer)-based or LUT (lookup table)-based architectures. A multiplexer can be used to implement the logic gate design schematic required of the logic block. A lookup table is essentially a truth table of the digital schematic design that is stored in memory.

The output is found by looking up the value in the table, stored in memory, when certain inputs are provided. The inputs are used as an index or pointer to the lookup table. A multiplexer based approach actually performs computations at each multiplexer until the final result is obtained. The MUX approach was used when designers were handcrafting the circuits, prior to the advent of sophisticated CAD. More recently complex applications began emerging for FPGAs and the programmable logic blocks

38

required quite complex implementation. Handcrafting the designs is no longer feasible and with the sophisticated technology, a LUT-based design can be easily developed for the implementation. The majority of today’s FPGA architectures are LUT-based. [5]

Essentially a LUT-based architecture is developed on a SRAM-based FPGA device.

The SRAM configuration cells are used to create a LUT and can also be used to develop other interesting and useful alternatives. The primary role of the programmable logic block would be a lookup table and the cells can also be used for another purpose without moving away from the LUT-based architecture. The multifaceted LUT that is used in current FPGA’s logic blocks consist of a LUT, RAM, and Shift Register (SR). Based on the FPGA vendor, the design for the logic blocks vary. Some logic blocks contain other logic besides the multifaceted LUT.

With the increasing complexity of applications, the FPGA design needs to be modified to account for them. Embedded memory, multipliers, adders, and obviously processor cores are all incorporated into the FPGA. Applications usually utilize a lot of memory, even after compression and other code size reduction techniques are performed. Embedded RAM within the FPGA can be used independently or combined to create larger blocks of memory. The position of the RAM is essential because the performance improvement potential depends on local access to memory. Figure 8 shows the RAM blocks arranged as columns along the entire FPGA fabric. The orientation could also be scattered or arranged along the periphery of the FPGA. The multipliers are also shown in Figure 8. If programmable logic blocks are used to perform the embedded multiplication function, it is inherently slow. Figure 8 displays special

39

hard-wired blocks that are embedded multipliers. This approach is quite common among FPGA designs. Since multipliers are used frequently after accessing data from the

RAM, the RAM and multipliers are placed within close proximity of each other.

Embedded processor cores are essentially microprocessors within the FPGA fabric.

Figure 8: Embedded RAM and Multipliers [5]

The embedded processor for an FPGA could be hardcore, softcore, or a custom design also called a HDL processor core. A softcore processor contains certain fixed features and can be customized by choosing from the variable features. In a softcore processor, the designer has the flexibility to choose the instruction set, hardware features, and the data and address size. Hardcore processors have a fixed instruction set and architecture and it can be designed for a specific purpose. When the hard processor core is within the FPGA, the processor core is separate from the FPGA fabric. The soft processor core is actually within the FPGA fabric and the logic blocks are used to act as a

40

microprocessor. Figure 9 shows the hardcore processor within the FPGA chip and the separation between the reconfigurable fabric and the fixed fabric.

Figure 9: Hardcore processor within a FPGA [5]

The softcore, hardcore, and HDL processor core all are useful for different applications. The choice of which to utilize comes before actually choosing the processor. The different categories of processors and the advantages and disadvantages of each are shown in Table 1.

The FPGA is very flexible when designing and it offers many options to the designer, as discussed. The performance of an FPGA in the past was greatly lacking behind the performance of ASICS. With advances in FPGA development, the performance requirements of applications may be possible to achieve using FPGAs.

Besides performance, power usage and logic density are also important parameters for consideration. The use of a DSP is greatly dependent on the application. The FPGA can also perform DSP application quite efficiently.

41

Table 1: Comparison of Embedded Processor Cores within FPGAs

Soft-core

Key Features Customization of instructions, hardware features, and data and address size.

Advantages 1.

Instantiate a core when it is needed

2.

Simpler than the hard-core counterparts

3.

Microprocessor developed is customized for application and performance needs

Disadvantages Slower than the hard-core counterparts

Hardcore

1.

Fixed Architecture

2.

Designed for a

Specific

Application

1.

FPGA fabric identical for devices with and without the embedded microprocessor core

2.

Faster than softcore processors

1.

Fixed architecture

2.

Cannot discard extra, unused features

HDL Processor

Core

Designer develops a custom processor to suit the application needs

Designer is able to restructure processor for better power and space utilization

1.

Lack of availability of a compiler and other software tools

2.

Lack of support when developing and testing the HDL code for the processor

In [6], the CMOS 90nm FPGA and CMOS 90nm Standard Cell ASIC are compared to witness the gap between FPGAs and ASICs. The comparison is between the logic density, circuit speed, and power consumption. All these metrics are essential for

42

embedded system design. A system architect can use these measurements in [6] to assess whether implementation in an FPGA is feasible. These measurements can also be useful for those building ASICs that contain programmable logic, by quantifying the impact of leaving part of a design to be implemented in the programmable fabric. An

SRAM-based FPGA was used for comparison due to its domination in the market. The results obtained from [6] are summarized in Table 2. The table shows FPGAs lacking in all the parameter comparisons performed.

Table 2: FPGA comparison to ASIC

FPGA

Logic only

Logic and

DSP

Logic and

Memory

Logic,

Memory, and DSP

Area

(mean)

40 times larger

28 times larger

37 times larger

21 times larger

Speed

(Mean)

3.2 times slower

3.4 times slower

2.3 times slower

2.1 times slower

Dynamic Power

Consumption

(Mean) consumes 12 times more consumes 12 times more consumes 9.2 times more consumes 9 times more

Another advantage of using FPGAs when developing is the shorter design cycle.

If an error occurs in the prototype or final product, changes can be easily made when using and FPGA versus an ASIC. The DSP and microcontroller contain fixed hardware and variable software. Changes to the system software can be easily performed without hardware adjustments, as long as the processor supports the operations needed.

43

The advantages of the use of the FPGA for prototype development in the embedded system design flow are straightforward and apparent. The final implementation hardware design choices are based on the application, speed, power and performance needs, and cost. Tradeoffs have to be made when weighing one parameter highly over another.

Memory

The data and program needs to be stored in some kind of memory. The storage needs to be completed in an efficient way. The run-time, code-size, and energy efficiency needs to be considered. A good compiler and compression techniques assists in achieving code-size efficiency. Memory hierarchies can be utilized to achieve good run-time and energy efficiency. [3] Memory used for embedded systems can consist of on-chip memory and/or external memory such as a SRAM, DRAM, SDRAM, etc. The trend towards the system on a chip needs to consider memory highly. When designing the integrated circuit, the memory occupies majority of the chip. Reducing the amount of memory needed will greatly increase the performance and reduce the area of the system on a chip. The memory speed and processor speed gap is becoming larger. The speed of memory is only increasing by a factor of 1.07 per year, and processor speed is increasing by a factor of 1.5 to 2 per year. [3] To account for this gap, smaller and faster memories are added in between the processor and main memory. This need led to the introduction of caches. Caches provide good run-time efficiency and energy efficiency by providing local access to frequently used data and instructions.

44

Many types of memory devices are available for use in modern computer systems. An embedded system designer must be aware of the differences between them and understand how to use each type effectively. Keep in mind that the development of these devices took several decades and that their underlying hardware differs significantly. The figure shows the classifications of main memory.

Figure 10: Main Memory Options

The RAM family includes two important memory devices, static RAM (SRAM) and dynamic RAM (DRAM). The primary difference between them is the lifetime of the data they store. SRAM retains its contents as long as electrical power is applied to the chip. If the power is turned off or lost temporarily, its contents will be lost forever. DRAM, on the other hand, has an extremely short data lifetime-typically about four milliseconds.

This is true even when power is applied constantly. In short, SRAM has all the properties of the memory you think of when you hear the word RAM. The DRAM is useless by itself. However, a DRAM controller can be used to make DRAM behave more like SRAM.

45

The job of the DRAM controller is to periodically refresh the data stored in the DRAM.

By refreshing the data before it expires, the contents of memory can be kept alive for as long as they are needed. So DRAM is as useful as SRAM after all.

When deciding which type of RAM to use, a system designer must consider access time and cost. SRAM devices offer extremely fast access times (approximately four times faster than DRAM) but are much more expensive to produce. Generally,

SRAM is used only where access speed is extremely important. A lower cost-per-byte makes DRAM attractive whenever large amounts of RAM are required. Many embedded systems include both types, a small block of SRAM along a critical data path and a much larger block of DRAM for noncritical paths.

Memories in the ROM family are distinguished by the methods used to write new data to them, and the number of times they can be rewritten. This classification reflects the evolution of ROM devices from hardwired to programmable to erasableand-programmable. A common feature of all these devices is their ability to retain data and programs forever, even when power fails. The contents of the ROM had to be specified before chip production, so the actual data could be used to arrange the transistors inside the chip. Hardwired memories are still used, though they are now called "masked ROMs" to distinguish them from other types of ROM. The primary advantage of a masked ROM is its low production cost. Unfortunately, the cost is low only when large quantities of the same ROM are required.

One step up from the masked ROM is the PROM (programmable ROM), which is purchased in an un-programmed state. If you were to look at the contents of an un-

46

programmed PROM, you would see that the data is made up entirely of 1's. The process of writing your data to the PROM involves a special piece of equipment called a device programmer. The device programmer writes data to the device one word at a time by applying an electrical charge to the input pins of the chip. Once a PROM has been programmed in this way, its contents can never be changed. If the code or data stored in the PROM must be changed, the current device must be discarded. Thus, PROMs are also known as one-time programmable (OTP) devices.

An EPROM (erasable-and-programmable ROM) is programmed in exactly the same manner as a PROM. However, EPROMs can be erased and reprogrammed repeatedly. To erase an EPROM, you simply expose the device to a strong source of ultraviolet light. By doing this, you essentially reset the entire chip to its initial unprogrammed state. Though more expensive than PROMs, their ability to be reprogrammed makes EPROMs an essential part of the software development and testing process.

As memory technology has matured in recent years, the line between RAM and

ROM has blurred. Now, several types of memory combine features of both. These devices do not belong to either group and can be collectively referred to as hybrid memory devices. Hybrid memories can be read and written as desired, like RAM, but maintain their contents without electrical power, just like ROM. Two of the hybrid devices, EEPROM and flash, are descendants of ROM devices. These are typically used to store code. The third hybrid, NVRAM, is a modified version of SRAM. NVRAM usually holds persistent data.

47

EEPROMs are electrically-erasable-and-programmable. Internally, they are similar to EPROMs, but the erase operation is accomplished electrically, rather than by exposure to ultraviolet light. Any byte within an EEPROM may be erased and rewritten.

Once written, the new data will remain in the device until it is electrically erased. The primary tradeoff for this improved functionality is higher cost, though write cycles are also significantly longer than writes to a RAM. So you wouldn't want to use an EEPROM for your main system memory.

Flash memory combines the best features of the memory devices described thus far. Flash memory devices are high density, low cost, nonvolatile, fast (to read, but not to write), and electrically reprogrammable. These advantages are overwhelming and, as a direct result, the use of flash memory has increased dramatically in embedded systems. From a software viewpoint, flash and EEPROM technologies are very similar.

The major difference is that flash devices can only be erased one sector at a time, not byte-by-byte. Typical sector sizes are in the range 256 bytes to 16KB. Despite this disadvantage, flash is much more popular than EEPROM and is rapidly displacing many of the ROM devices as well.

The third member of the hybrid memory class is NVRAM (non-volatile RAM).

Non-volatility is also a characteristic of the ROM and hybrid memories discussed previously. However, an NVRAM is physically very different from those devices. An

NVRAM is usually just an SRAM with a battery backup. When the power is turned on, the NVRAM operates just like any other SRAM. When the power is turned off, the

NVRAM draws just enough power from the battery to retain its data. NVRAM is fairly

48

common in embedded systems. However, it is even more expensive than SRAM, because of the battery, so its applications are typically limited to the storage of a few hundred bytes of system-critical information that can't be stored in any better way.

Apart from the main memory choices, the designer needs to utilize cache appropriately to increase the speed and energy efficiency of the system. Compression techniques have been briefly reviewed and a more detailed understanding of the developments in memory usage reduction is required when designing embedded systems. The techniques to optimize the system are essential when attempting to reach the requirements provided by the stakeholders and consumers.

49

CHAPTER 4

EMBEDDED SYSTEM SOFTWARE

When designing with FPGAs or any other processor, reuse of older design components will greatly speedup the design process. The designs are also becoming more complex and it is useful if previous designs can be utilized. Any existing functional blocks are typically referred to as Intellectual Property (IP). The sources of IP are internally created blocks from previous designs, FPGA vendors or other chip vendors, and third party IP providers. [5]

Intellectual Property

Intellectual property is the key to coping with the increasing complexity of designs. The platform-based methodology utilizes IP tremendously and usually the vendor of the platform provides an IP library. Examples of IP are embedded operating systems, real-time databases, and middleware. Middleware is the software that is the intermediate layer between the operating system and the application software. [3]

There are three phases of FPGA development where IP can be purchased and used, Figure 11. The phases are, from lowest level to highest level, Register Transfer

Level (RTL), Unplaced and Unrouted Netlist Level (UUNL), and Placed and Routed Netlist

Level (PRNL). When the IP is purchased and used for the RTL level, the device can be used again on another platform if the IP is unencrypted. Most vendors do not want the internal design to be available to the users; therefore most IP is not available in unencrypted form and at a low level of the development. When IP is utilized at the

50

other phases of FPGA development, the IP is in encrypted form, thus viewing and modifying the source code is out of the question.

The advantages and disadvantages of using the IP at the three phases are summarized in Table 3. Figure 11 shows how the IP is incorporated into the FPGA design flow. The stages are labeled as (a), (b), and (c) in Figure 11. The designer needs to make appropriate choices when selecting which IP to purchase.

Figure 11: Intellectual Property Incorporation into the FPGA Design Cycle [5]

Stages of Software Development

When beginning software development, five key tasks are performed along the design path to a final product. These five stages, in order of development, are software design and entry, compilations and/or assembly, linking, and execution.

51

Table 3: Intellectual Property Added at Different Phases – Advantages and

Disadvantages

Advantages

Disadvantages

RTL Level (a)

Can modify code to remove unnecessary functions motivated by design

Expensive and hard to find

Implementation is less efficient in terms of resource requirements and performance

Unplaced-and-

Unrouted Netlist (b)

Optimal implementation in terms of resource utilization and performance

Cannot remove unwanted functionality

Device tied to

FPGA vendor and device family.

Placed-and-Routed

Netlist (c)

Obtain highest level of performance

IP vendor provides cycleaccurate C/C++ model for functional verification

Placement for elements may be absolute

Cannot modify the code to remove extra functionality

The software design and entry stage deals with the choice of which programming language to develop the system with. The conversion of the algorithm into programming language is completed at this phase. The next step is the compilation and/or assembly of the program. The compilation transforms the code into assembler modules in order to be linked to appropriate libraries in the next step. The linking step compares commands and variables that could not be resolved earlier to run-time libraries. If the commands and variables match with the libraries they are linked during this stage. If not, the unresolved commands will yield an error in the software. Once the three stages are properly performed, the last step is execution of the program.

52

The software design and entry stage contains the choice of programming language. The designer is able to use a high level language and compile it in the next stage of development, or the designer could code the algorithm directly into assembly or machine language. The second approach is more time consuming and contains a large amount of effort on the programmer/designer. As design complexity increases, using low level languages should be avoided. If a supported processor core is used, the software tools and compilers are readily available when designing and thus high level languages are the more appropriate choice. If a custom processor is used by the designer, the lack of software resources could push the programmer/designer to implement the algorithm in assembly or machine language. Sometimes simulators also accompany the processor when purchased. If not, the designer can create an instruction set simulator if he or she is aware of the instruction set of the embedded processor. The usefulness of simulation was also emphasized earlier. Prototyping is important but if the simulation is performed earlier, the software can be almost guaranteed to perform as expected when running it on a prototype.

Compilation results in the source code being converted into assembler modules ready for the linker. The symbol and relocation tables are created to hold the unresolved occurrences within the source code. The compiler did not understand some of the variables and commands, and thus stores them into these tables. The symbol table contains the data type and variables used throughout the program but not specified. The relocation table is set up for the linker to populate. Any references to

53

external routines are specified in the relocation table. Object files are created by the compilation process.

When a program comprises multiple object files, the linker combines these files into a unified executable program, resolving the symbols as it goes along. The object files are taken by the linker and the libraries are searched in order to find the routines that the program references. The linker also takes care of arranging the objects in a program's address space. This may involve relocating code that assumes a specific base address to another base. Since a compiler seldom knows where an object will reside, it often assumes a fixed base location (for example, zero). If an error occurs when linking or compiling the program, the designer needs to modify the code and perform the second and third steps again. Once the program is functioning, execution can begin.

Embedded Operating System

Except for simple applications, the usage of embedded operating systems assists in the design and implementation process. Embedded operating systems offer support for scheduling, task switching and I/O handling. Embedded operating systems (EOS) are essentially operating systems for embedded devices. The constraints when developing the EOS vary considerably from the PC operating systems. EOS systems are designed to be very compact and efficient. Many features of non-embedded operating systems are forsaken. Usually EOS systems are developed to cater to specialized applications, therefore the features are limited. The typical characteristics of an embedded operating system are [3]:

54

1.

Configurability

2.

Lack of Protection Mechanisms

3.

Limited I/O Drivers

4.

Interrupts Enabled by any Process

Many applications are very diverse and the requirements differ. The embedded operating system is useful for many embedded systems when the operating system is flexible. The operating system can be tailored according to the application at hand.

Configurability is a key characteristic of embedded operating systems that makes it versatile for many designs. [3] Designers also have the option to choose an operating system to reduce the time and effort to create some low level drivers.

The embedded operating systems available are usually deemed reliable due to the thorough testing that is performed. [3] Also, only safe programs are loaded into the operating system. Protection mechanisms are important for safety and security, but most embedded operating systems do not have them. The protection also uses memory, causing the operating system to require a larger storage capacity. The embedded operating system needs to occupy the least amount of memory possible and still perform the tasks accurately. Usually embedded operating systems occupy majority of the space in memory storage. The use of an embedded operating system has to be weighed. If the application requires the designer to create extra software capabilities besides the operating system capabilities, memory has to be greatly considered.

Embedded systems usually lack the interface devices of a keyboard, screen, mouse, etc. Depending on the application, some interfaces are a requirement. The

55

drivers that are needed for I/O are not usually integrated into the kernel of the operating system. Due to the large variety of peripheral devices that can be used for embedded systems, the designer needs to add the necessary drivers instead of relying on the embedded operating system to contain every possible driver.

Since embedded operating systems and embedded software are thoroughly tested, any process can be allowed to use interrupts. These processes can cause interrupts that stop and restart running tasks. Testing of the software is essential because the software shouldn’t be in a state where it cannot return to normal operation. Therefore in some systems, the interrupt rights are given to only certain processes such as I/O. For greater reliability, the interrupt state should terminate after a certain time has elapsed, to ensure the system continues operation even if the software couldn’t be thoroughly tested to unveil all the bugs.

Embedded systems are used by ordinary people every day. The operating system used within those devices rarely enters anyone’s mind. In order for those technologies to operate, the embedded operating system plays a large role. Examples of embedded operating systems and their typical uses are outlined in Table 4.

Embedded systems that are real-time systems require a real-time operating system (RTOS). Real-time systems are embedded systems that are subject to a real-time constraint. In other words, the requirement of the embedded system is to complete the task within a time limit. The real-time constraint is usually a result of the application.

The time deadline should be met regardless of the load on the system. RTOS assists in developing real-time embedded systems because of its multi-tasking capabilities.

56

Table 4: Common Embedded Operating Systems and Applications

Windows CE

Inferno

Embedded

Linux

IPhone OS

Pixo eCos uClinux

PDAs Music Players Smartphones









Real-Time

Microcontroller













In order for an operating system to be a RTOS, three requirements need to be satisfied. The first requirement is that the timing behavior of the operating system must be predictable. For each operation, an upper bound on the execution time must be guaranteed in order for the real-time constraints of the system to be met. There are times when interrupts need to be disabled in order for a process to complete the task without interference. The periods in which the interrupts are disabled have to be short in order to avoid unpredictable delays in the processing of critical events. [3] The second requirement is the RTOS should handle the timing and scheduling of tasks. Scheduling means the mapping of processes to certain intervals of execution time. When the operating system is in charge of the scheduling, the task timing deadlines can be managed and met. The third requirement is that the RTOS should be fast due to the real-time applications it caters to.

The RTOS usually contains a real-time kernel. The kernel is the resource manager in an operating system. It manages the processor, memory and system timer. [3]

Protection mechanisms are not usually included, as discussed previously. There are two

57

types of RTOS kernels. [3] The first type is a general purpose and the second is a realtime kernel type. The differences between the two types can be visibly seen in Figure

12. The main difference is the device drivers. For the general purpose type, the device drivers are embedded within the kernel. For the real-time kernel type, the device drivers are not deeply embedded into the kernel as in the other type of kernel. Only the necessary drivers are included and implemented on top of the kernel. Middleware is a frequently used term that was defined, previously in this text, as the intermediate software between the operating system and application software. The distinction between operating systems and middleware is not defined very well. Some functionality that was previously provided by separate middleware is now integrated in operating systems.

Xilinx and Altera Software Tools

A brief overview of the Xilinx and Altera software tools are provided. Knowledge of the tools used in industry for developing FPGA prototypes is essential to develop well- designed prototypes or even final implementations. Xilinx and Altera are the two main

FPGA vendors, and both contain software that is designed to assist the designer with

FPGA design considerations.

Figure 12: Real-time kernel (left) vs. general-purpose operating system (right) [3]

58

The Nios II Integrated Development Environment (IDE) is the main software tool provided by Altera when designing using Altera FPGAs. This software tool is mainly used when the FPGA has the capability of housing a softcore, Nios, embedded processor. The

Nios II IDE contains the following main functions for software development:

1.

Project Manager

2.

Editor and Compiler

3.

Debugger

4.

C-to-Hardware (C2H) Acceleration Compiler

5.

Flash Programmer

6.

Nios II Software Build Tools

The Nios II IDE is based on the open, extensible Eclipse IDE project and the

Eclipse C/C++ Development Tools Project. The project manager automates setup of the

C/C++ application project and system library projects. Nios II IDE provides software code examples (in the form of project templates) to bring up working systems as quickly as possible. Each template is a collection of software files and project settings. Custom source code can be added to the project by placing the code in the project directory or importing the files into the project. Full-featured source editor and C/C++ compiler is within the Nios II software.

The Nios II IDE build environment automatically produces a makefile based on the specific system configuration. Changes made in the Nios II IDE compiler/linker settings are automatically reflected in this auto-generated makefile. The Nios II IDE

59

debugger contains basic debug features such as software breakpoints, disassembly code view, debug information view, and instruction set simulator (ISS) target. Advanced debugging capabilities are also supported. Hardware breakpoint functionality for decoding code in the ROM or flash is available. Data triggers and instruction trace are also useful for debugging the embedded system software.

Figure 13 shows the debug information view that provides access to local variables, registers, memory, breakpoints, and expression evaluation functions.

Figure 13: Debug Information View in Nios II IDE

C-to-Hardware Acceleration Compiler Tool boosts the performance of the timecritical C functions by converting them into hardware accelerators in the FPGA.

Hardware accelerators can take full advantage of the parallel processing structure of the

FPGA to calculate more computations per clock cycle than general-purpose CPUs and deliver orders-of-magnitude increase in performance.

60

The Nios II IDE includes a convenient method of programming this flash. Any common flash interface (CFI)-compliant flash device connected to the FPGA can be programmed using the Nios II IDE flash programmer. In addition to CFI flash, the Nios II

IDE flash programmer can program any Altera serial configuration device connected to the FPGA. Nios II software build tools allow the creation, modification, and building of

Nios II programs entirely from command line. The projects that are supported include

C/C++ application projects, library projects, and board support package (BSP) projects.

Embedded software partners provide additional compiler and debugger tools.

Xilinx supplies a wide variety of embedded systems design products to improve the development process and accelerate the time-to-market. “Embedded software tools” often applies tools to create, edit, compile, link, load, and debug high-level language code, usually C or C++, for execution on a processor engine. With the Virtex

Platform FPGA, engineers can target design modules for either silicon hardware (FPGA gates) or software applications that run on the embedded PowerPC or Microblaze. The embedded design products provided by Xilinx are Embedded Development Kit (EDK) which contains Platform Studio, and ChipScope Pro. The Embedded Development Kit is an all encompassing solution for designing embedded programmable systems. This preconfigured kit includes the Platform Studio Tool Suite, documentation, and IP. EDK is used for designing Xilinx Platform FPGAs with embedded IBM PowerPC hard processor cores and/or Xilinx MicroBlaze soft processor cores. ChipScope Pro allows analysis of any internal FPGA signal, including embedded processor busses. The software allows verification of the FPGA on the board at or near operating speed. ChipScope Pro

61

leverages FPGA re-programmability by identifying problems and adjusting the design in minutes or hours, not weeks or months as in traditional ASIC design. Built-in software logic analyzer helps identify and debug problems, including advanced triggering, filter, and display options.

The Xilinx PowerPC and MicroBlaze processor environment can execute embedded or Real-Time Operating Systems. Altera’s Nios processor also caters to embedded operating systems. The support for these EOS or RTOS's is not provided by

Xilinx and Altera, it is supplied by the third party suppliers directly.

Embedded software development requires high-level language compiler and Integrated Development

Environment (IDE) support. Xilinx and their partners provide a number of different development environments for creation of embedded software. Development and debug tools need to be connected from host computers to the embedded target systems under development. JTAG probes for Xilinx tools can be used to download software code and FPGA configurations. JTAG probes can perform a range of functions, such as software debug, register initialization, hardware diagnostics, flash programming, and hardware breakpoints.

The Xilinx and Altera software tools greatly assist in the design and development of FGPA-based applications. The support for the programs and development boards is available online for more information.

62

CHAPTER 5

RAPID PROTOTYPING OF EMBEDDED SYSTEMS

Today, there are approximately 25 billion embedded processors in use and the number is growing continually. [6] Embedded processors are within many devices and can be found everywhere. Embedded chips are devices that appear to belong in the hardware domain but are actually software products, since microprocessors and DSPs make up the majority of embedded processors used. According to [6], “the market for software-dominated embedded systems is enormous and growing steadily.” There is a shift away from dedicated hardware and to flexible systems based on small processors and software.

Combining computing elements, software, fixed hardware, and reconfigurable hardware into an embedded system requires that functional prototypes be built and refined throughout the design phases. [6] The prototype(s) developed will determine the customer requirements and feature preferences. Also, testing and design verification can be performed to ensure proper functioning of the design before final implementation. A prototype can be connected to external I/O interfaces so that customers can view the product features and which features will appear in the final product. The prototype should be indistinguishable and the feedback obtained can result in the final product functioning as expected by the customers.

Rapid prototyping is the action in which a technical system is developing from the specifications and operated for evaluation, testing, and refinement. Prototyping is invaluable for design engineers of embedded systems.

63

Rapid System Prototyping

Multifunctional systems are the majority of embedded systems developed today.

Most modern embedded systems contain video processing, audio streaming, and internet browsing. The necessity of the system to be able to adapt to different operating modes is crucial when considering development. System prototyping benefits from the

FPGA technology with high gate count. The designer is able to map the prototype design developed in the FPGA into a one-time programmable or ASIC device. Many systems are being developed this way because only one software code needs to be written for the prototype and final product. [6] Using this methodology, the step from prototype to product is inexpensive and quick. Utilizing reconfigurable devices, such as the FPGA, for prototyping preserves resources to accommodate the creation of many different embedded system prototypes at separate times.

The type of prototype developed depends on the development cycle chosen.

Prototypes can either be developed as temporary designs that are using for evaluation or they can directly evolve into a product version of the respective design. The three main types of prototypes are throw-away, incremental, and evolutionary prototypes.

As the name implies, throw-away prototypes are discarded after use. The prototype is used during the requirement phase. It is rarely used during the implementation phase as a way to explore technology capabilities. The throw-away prototype is most useful in when analysis is performed for a potential project. The concept feasibility is demonstrated by the prototype and possible funders can be convinced using a throw-away prototype. The prototype has to be a low cost design in

64

case the project isn’t feasible. The disadvantage is the prototype does not contribute to the final product and the implementation effort isn’t used to its full potential.

Abbreviation or skipping of documentation for the throw-away code is common and is harmful because the lessons learned from the prototyping effort may be lost if not recorded. [7] Lessons learned from the prototype are also lost if not recorded and thus the final implementation might contain misunderstandings that require extra time to resolve.

Incremental prototypes are a more clever development approach. [7] As the development progresses, the prototype becomes closer to the final product. The prototype begins with simulated components and then progresses to some implemented and some simulated component, and when all the components are implemented, the first version of the new product is developed. The prototype requirements and design components are defined as soon as possible in the development process. The methodology for incremental development can be easily followed to lead to a successful final product. The drawback is the prototype and component design is heavily relied upon. [7] If the design isn’t robust, the final product will be lacking. Therefore, for incremental prototyping the components and architecture should be well-designed.

Figure 14 displays the incremental development model containing the prototyping steps that occur to reach the final operational device. The model is broken into 4 main steps, determining objectives, identify and resolve risks, development and test, and plan the next iteration. Requirements plan and risk analysis leads directly into

65

prototype development. Following the prototype development, Figure 14 shows that the prototype is validated and verified and based on the information obtained, the next prototype development is planned. Due to time-to-market constraints, it is critical that the number of prototypes developed is kept to a minimum.

Figure 14: Incremental Development Model [9]

Evolutionary prototypes are slightly different from incremental prototypes. The evolutionary prototype starts off with a model that is verified and validated. The implementation is developed using the model in the prototype level. This approach is more flexible than the incremental approach because the architecture can change from one version to the next one (code is automatically generated). [7] Evolutionary prototyping depends on the sophisticated tools and techniques, which are currently emerging.

66

Rapid prototyping is different from Rapid Application Development (RAD) or

Extreme Programming (XP). RAD is a technique used to lead to a quick implementation and focuses on short term issues. RAD or XP rarely contains code that is reusable and maintainable and the documentation for the implementation is limited. In hardware systems, the prototypes developed are never sold and the initial cost of developing is lost. Software systems allow maintenance to be performed on the code, but the simulation process can be slow. Software-hardware techniques such as FPGA-based development allow quick elaboration of a prototype that can be used for higher performance evaluation of the system. Prototyping is utilized by companies as a way to combat the challenges faced in embedded system design and development.

The motivation for using a prototype varies according to the application and company’s interest. According to *7+, there are two levels of prototyping, one that reduces the time-to-market and cost of a system and another that increases security and reliability of a system. The demand for rapid system prototyping is outlined in Table

5. [7]

The demand for hard real-time embedded systems is increasing and the strict requirements of accuracy, safety, and efficiency are emphasized more than ever.

Meeting timing constraints is significant in the development of real-time embedded systems. The requirements of the system consist of the constraints imposed by the application. The feasibility of the system developed depends on the ability of the system to realize the timing constraints under worst operating conditions. [8] Prototyping is

67

more useful for real-time embedded system design because it is difficult to meet the timing requirements without extensive prototyping and testing.

Table 5: Demand for Rapid System Prototyping

Interest

Level 1

Level 2

Description

Reduction of cost and time-tomarket of a system

Systems

Complex Systems – embedded, distributed, realtime, etc…

Increases security and reliability of a system

Safety Critical

Systems

Benefits

Cost of skilled engineers increases rapidly

Automated development could reduce the need for skilled engineers, standard engineers can work the prototyping tools

Prototyping allows formal verification to be operated when required.

This method allows high levels of reliability in system design and implementation

Prototyping is considered at both the hardware and software level. Although the techniques and constraints differ in hardware and software communities, the objectives and methodologies can be quite similar. In order for rapid prototyping to be successful, the development process has to be well-defined as well as the prototype architecture or design. The design of the prototype relies on the convergence of hardware and software prototyping methods.

68

Prototyping of Embedded Hardware and Software Systems

Prototyping plays a major role in the design process in terms of decision making, design validation, feature exploration, and design verification along all stages of product developmental cycle. Prototyping stems into key research areas in platforms, design space exploration, architecture selection, performance estimation, interface and communications, timing validation, and product lifetime extension. [6]

As discussed previously, rapid prototyping establishes an iterative process between the user and the designer to concurrently define requirements and specifications for the critical aspects of the envisioned system. The prototype is a partial representation of the final system, usually only containing the key attributes that require user feedback and evaluation. The prototype is demonstrated and the user or customer evaluates if the expected and actual behavior match. If certain requirements weren’t met, then the designer adjusts the designs and modifies the prototype, in the case of incremental and evolutionary prototyping. This process continues until the customer is satisfied and final implementation can begin. The process mentioned is typically how prototyping is utilized. The other forms of prototyping will yield different processes. Therefore, the design steps to develop a prototype vary depending on the type of prototype being used. The steps that are common to all prototypes will be discussed, with emphasis on how the hardware and software aspects are combined into the process.

The design begins with determining the project and the funders need to be convinced as to the need and feasibility of the product. Once the project has obtained

69

the necessary funding, the design development cycle is chosen. There are many development cycles, but the most popular were discussed before. Generally, the development cycle begins with the requirements engineering phase. In this stage, the specifications for the product are created. It is important to integrate the customers and stakeholders into the specification process because tailored solutions are needed. In [6], it mentions that a survey showed that when customer expectations weren’t met, the product was not well-received. A prototype developed during the requirements phase could be used as a tool to uncover the client’s subconscious expectations. The next steps are the planning, designing, and constructing the system product. These steps are broken into intermediate stages where the prototype is developed and evaluated, followed by modifications of the specification and implementation, leading to the final product development. Throughout all phases of the engineering process, prototyping plays a major role because implementation details of the embedded system are worked out and validated prior to the final implementation. Prototyping is generally the sum of all the activities required to build and evaluate a prototype in a real-world environment.

The prototyping platform should allow for quickly implementing a working prototype. Choosing the right platform is one of the first steps when prototyping.

Depending on the application, the platform and processor needs to be chosen. When specialized high performance hardware is required, funds available may be a limiting factor for useful prototyping. Designers can also create custom platforms consisting of

DSPs, FPGAs, microcontrollers, etc. The FPGA is useful for prototyping due to its versatility and short design cycle time. Based on the design limits placed by the

70

consumers, the prototype architecture that accurately reflects the final design needs to be chosen. A design containing a majority of hardware consumes a lot of power, while a software-dominated choice results in a high system clock. The best choice for the prototype architecture is between the extremes. Performance estimation is important and it is almost impossible to measure the true performance of an embedded hardware/software system by means of simulation and/or emulation. A working embedded prototype must be built to determine bottlenecks and validate the design.

Real-time systems are more difficult to prototype and develop.

Generally, prototypes were used to test the functionality of the design. For realtime systems the prototype has to perform more than just ensure the functionality and requirements are met. The steps in the high-level design process are shown in Figure 15.

Today’s computer systems typically consist of both hardware and software components. For instance, in an embedded signal processing application it is common to use both application-specific hardware accelerator circuits and general-purpose, programmable units with the appropriate software. This is beneficial since applicationspecific hardware is usually much faster than software, hut it is also significantly more expensive. Software on the other hand is cheaper to create and to maintain, but slow.

Hence, performance-critical components of the system should be realized in hardware, and non-critical components in software.

A good tradeoff between cost and performance can be achieved this way.

However, this kind of system design is not without challenges. Usual hardware and software design methodologies are in many aspects inadequate for such design tasks.

71

Figure 15: Prototyping Design Cycle

72

The composition of hardware and software elements also creates new problems, e.g. related to the communication of hardware and software components, as well as system architecture issues. In order to address these problems, hardware-software codesign (HSCD) methods have to be used. One of the most crucial design steps in HSCD is partitioning, which means deciding which components of the system should be realized in hardware and which ones in software. Clearly, this is the step in which the abovementioned optimal trade-off between cost and performance is to be found. Therefore, partitioning has dramatic impact on the cost and performance of the whole system.

Prototyping of embedded hardware/software systems is important because it shortens the path from specification to final product. Prototyping is part of product planning, requirements engineering, and product development. The trends that dominate hardware/software embedded system prototyping are the advent of very large reprogrammable devices with other processing elements on a single chip, or system on a chip, the expansion of embedded systems into virtually all electronic application fields, and the reuse of preexisting components together with new hardware and software. These trends will influence the need for rapid system prototyping, and make it a permanent part of the embedded system design process.

73

CHAPTER 6

BOARD-LEVEL RAPID PROTOTYPING OF EMBEDDED SYSTEMS

The selection of a prototype platform requires careful attention because the application specifications will influence the design partitioning. The platform should allow for the implementation of the application with any range of partitioning choices.

The FPGA is a useful tool due to the flexibility of the hardware and software partitioning provided by it. The hardware and software partitioning can be performed as the requirements are modified during the iterative prototyping cycle. Using a microcontroller for prototyping only allows the software to change because the hardware is kept constant. An ASIC with programmable hardware has a long design cycle time and thus FPGAs are a more efficient prototyping chip selection. The FPGA prototype developed can be mapped onto the final implementation which can be made using ASICs, DSPs, and Microcontrollers.

Board Level Prototyping Methodology

The prototype developed should be flexible because the design specifications are subject to change along the entire design flow. Board level prototyping essentially is referring to the prototype being created using a development platform that is usually available off the shelf. Custom boards can also be designed for the needs of the application prototype. The development boards contain support and many input and output ports. The support available for the board features greatly speeds up the prototyping implementation time. The support is available in the means of intellectual

74

property, compilers, instruction set simulators for the processor, debugging tools, and software for efficient hardware and software design and development. Custom boards are useful but usually lack the support offered by a commercial off-the-shelf development boards.

The two development boards were analyzed and used for the development of the test application. These two boards vary considerably in terms of the performance of the chip, the peripherals available, the internals of the chip and embedded processor options. Clearly, cost increases as the peripherals and processing power increase. The two development boards chosen are from the two main vendors of FPGAs, Xilinx and

Altera. The Xilinx development board consists of the Virtex 4 FPGA and the Altera development board has the Cyclone II FPGA. The architecture of each of these FPGAs differ but are built on the common FPGA architecture components discussed in

Chapter 3.

Prototyping Platforms using FPGAs

Altera DE2 Development and Education Board

The Altera DE2 board is a student development board that contains the Cyclone

II FPGA chip. The board and the hardware it supports are shown in Figure 16. The software tools that are available for the Altera DE2 board are Altera Quartus, SOPC

Builder and Nios IDE. Quartus is the foundation for FPGA logic design. The SOPC (system on programmable chip) Builder is accessed through Quartus and it is used to customize

75

the Nios II soft processor core. Once the hardware has been specified, the C/C++ software application can be developed using Nios II IDE.

Figure 16: DE2 Development Board [10]

The peripherals of the development board range from the JTAG programming interface to the VGA ports. The FPGA, as shown in the Figure 16, is the Cyclone II FPGA.

The board consists of two internal clocks; a 50 MHz and 27 MHz oscillator are part of the development board. When designing, the appropriate frequency can be chosen. A phase locked loop can be used to obtain a different frequency using the oscillator frequencies.

An external clock input is also accommodated by the development board, allowing more flexibility to the designer and the applications that can be prototyped using the board.

Expansion headers are another part of the development board that allots flexibility to

76

the designer. The designer is given the choice to connect to IDE hard drives using the expansion headers. The peripherals and its connection to the Cyclone II FPGA are shown in Figure 17. The figure clearly indicates the programming methods available on the DE2 board.

Figure 17: DE2 Development Board Peripherals and FPGA [10]

The Cyclone II FPGA can be programmed by way of the JTAG programming method or the EPCS16 Configuration Device programming method. JTAG Programming is performed by downloading a configuration bit stream directly into the Cyclone II FPGA through a USB cable. When the device is programmed using this method the FPGA will retain this configuration as long as power is applied to the board. The configuration in the EPCS16 Device is loaded when power is turned on. Figure 18 shows how the JTAG programming is performed. The Quartus II programmer connects to the USB blaster circuit on the DE2 board and when the switch on the board is put to “RUN”, the

77

configuration signals are sent to the FPGA. On startup, the default setting is for the serial configuration device to load the configuration files.

Figure 18: JTAG Programming of Cyclone II FPGA

Active Serial (AS) Programming is performed by downloading the configuration bit stream into the Altera EPCS16 serial EEPROM chip. It provides non-volatile storage of the bit stream, so that the information is retained even when the power supply to the

DE2 board is turned off. As mentioned before, when the board's power is turned on, the configuration data in the EPCS16 device is automatically loaded into the Cyclone II FPGA.

This method of programming the FPGA ensures that the designer doesn’t have to continually reload the design every time power is switched off. Figure 19 illustrates the process that designer has to use to program the FPGA using the active serial method.

The Quartus programmer has to be in AS mode first and the USB blaster cable is also

78

used for this programming method. The switch on the DE2 board has to be set to

“PROG” mode in order for the serial configuration device to be programmed.

Figure 19: Active Serial Programming of Cyclone II FPGA

The cyclone series of FPGAs (Figure 20) all contain Nios II embedded processors, graphics hardware acceleration, memory interface and the avalon switch fabric (ASF).

The embedded processor available for the Cyclone II FPGA is a softcore processor. Nios

II processors implement a 32-bit instruction set based on a RISC architecture. Because it is a soft-core processor, FPGA developers can choose from a myriad of system configurations, picking the best-fit CPU core as well as selecting processor peripherals.

There are three types of Nios II CPU cores. The choice of the CPU core provides the designer with the opportunity to tailor the design to the application demands. The table below summarizes the types of cores and the features of each one.

79

Figure 20: Cyclone FPGAs

Nios II/f (fast)

Table 6: Nios II CPU Cores and Key Features

1.

Designed for maximum performance at the expense of core size.

2.

Single-cycle hardware multiply and barrel shifter.

3.

Optional hardware divide option

4.

Dynamic branch prediction

1.

Designed for smallest possible logic utilization of FPGAs Nios II/e

(economy)

Nios II/s

(standard)

1.

Designed to maintain a balance between performance and cost

2.

Static branch prediction

3.

Hardware multiply, divide, and shift options

The soft-core nature of the Nios II processor lets designers integrate custom logic into the arithmetic logic unit (ALU). By using custom instructions, the system designers can fine-tune the system hardware to meet performance goals. System designers can also create their own custom peripherals that can be integrated with Nios II processor systems. For performance-critical systems that spend most CPU cycles executing a

80

specific section of code, it is a common technique to create a custom peripheral that implements the same function in hardware. The standard Nios II processor is shown in

Figure 21.

Figure 21: Nios II Processor Core

The register file is contains general purpose registers and control registers.

Additional floating point registers can be added by the designer. The arithmetic logic unit (ALU) generally contains instructions that it can calculate. The designer can emulate some unimplemented instructions and also create custom instructions and floating point instructions. The register file and ALU are common elements of processors. The interrupt controller allows the normal operation of the processor to cease once an interrupt occurs, possibly due to I/O information. The software determines the priority of the interrupts, in case more than one hardware device raises the interrupt flag

81

simultaneously. The one with the higher priority will be allowed to perform the interrupt routine first. There are 32 external hardware IRQ interrupts provided by the

Nios II processor. The Avalon switch fabric contains masters and slaves. The instruction and data bus are specified as Avalon Master. The data cache is included in many processors. The tightly coupled memory is an addition to the processor that allows low latency memory access without the unpredictability of the cache. The tightly coupled memory is used to hold interrupt routines because the indeterminacy of the cache is highly undesirable. The JTAG debug module allows the designer to debug the design directly on the chip. It is a PC-based software debugging tool that also contains on-chip emulation features.

The Avalon switch fabric is one of the key features that differentiate the cyclone

FPGA from other vendor FPGA products. ASF is a high-bandwidth interconnect structure that offers greater flexibility than a shared bus. The switched interconnect structure of the ASF connects the master and slave ports. Multiple masters are present in a typical system as shown in the figure below. The DMA controller is given access to the data memory as well as the CPU. This allows data retrieval to take place without relaying the instruction through the system CPU. An interesting point is also that the DMA controller accesses the VGA controller.

The Nios II processor contains advantages that are a combination of it being a softcore processor and the tools that are available. The hardware-assisted debug module is especially useful for tracing through application. The software development environment easily allows the user to rapidly implement a prototype. For obtaining

82

higher performance the Nios II fast CPU core can be chosen. It contains the single cycle hardware multiplier and dynamic branch prediction features that increase the performance of the application. Dynamic branch prediction means to perform prefetching of sequential instructions in order to keep the pipeline as active as possible.

Figure 22: Avalon Switch Fabric

Multipliers usually require more than one clock cycle, but with the single-cycle hardware multiplier, some highly mathematical applications can benefit from the optimization feature. The brief overview of the DE2 board displays the usefulness of the development board for rapid prototyping for a range of applications.

83

Xilinx FX12 PowerPC and Microblaze Embedded Development Kit

The development kit comes with a ML403 development board. The development board contains the Virtex-4 FPGA chip and consists of the following [11].

1.

Xilinx Devices – XC4VFX12-FF668-10C, XC95144XL, XCCACE, XCF32P

2.

Memory – 64 MB DDR SDRAM, 1 MB ZBT SRAM, 512 MB Compact Flash, 8MB

Linear Flash, and 4kb IIC EEPROM

3.

Clocks – 100 MHz Oscillator and 2 Clock Sockets

4.

Display – 16 x 2 Character LCD

5.

Connectors and Interfaces – 4 SMA Connectors, 2 PS/2 Connectors, 2 Audio, RS-

232 Serial Port, 3 USB Ports, PC4 JTAG, DB15 VGA, RJ-45 Ethernet, and GPIO

6.

Embedded Processors – Microblaze and PowerPC

The software tools that are available for the board are Xilinx ISE and Xilinx

Platform Studio embedded tool suite. Xilinx ISE is the foundation for FPGA logic design.

Xilinx Platform Studio consists of the Embedded Development Kit (EDK) and Software

Development Kit (SDK). The embedded processor hardware specifications are chosen, and possibly customized if Microblaze is being used, by means of the EDK. The SDK is used to develop the C/C++ application software using the Eclipse open source framework. The development board being used is shown in Figure 23.

The peripherals supported by the ML403 board are similar to the DE2 board. The

IIC bus expansion allows for the attachment of low-speed peripherals to the embedded system. SMA connectors are used to manage the clock. An external function generator

84

or other clock source can directly feed the global clock input pins of the FPGA. The clock can also be used as an output through the SMA. Figure 24 shows the connections to the

FPGA and the memory and peripherals available.

Configuration of Virtex 4 on the development board can be performed in three different ways, JTAG (System ACE Controller), Platform Flash Memory, and Linear

Flash/CPLD. The configuration source selector switch selects between System ACE,

Platform Flash, and linear flash/CPLD methods of programming the FPGA. The PC4 JTAG connection to the JTAG chain allows a host PC to download bit streams to the FPGA. PC4 also allows debug tools such as the ChipScope™ Pro Analyzer or a software debugger to access the FPGA.

Figure 23: ML403 Development Board [7]

85

The System ACE controller can also program the FPGA through the JTAG port.

Using an inserted CompactFlash card device, configuration information can be stored and played out to the FPGA. When set correctly, the System ACE controller programs the FPGA upon power-up if a CompactFlash card is present.

Figure 24: ML403 Board and Virtex 4 FPGA Connections

The Platform Flash memory can also be used to program the FPGA. The Platform

Flash memory can hold up to four configuration images, which are selectable by the two least significant bits of the configuration address DIP switches. When set correctly, the

86

Platform Flash memory programs the FPGA upon power-up or whenever the Prog button is pressed. Similarly, data stored in the linear flash can be read by the CPLD and used to program the FPGA. This method also programs the FPGA on power-up because the CPLD holds the data even when power is switched off.

Figure 25: Different Methods of Programming of Virtex 4 FPGA

The Virtex 4 FPGA on the ML403 board supports two types of processors, the

Microblaze and PowerPc. The Microblaze is the soft core processor and the PowerPC is the hard core processor. The Microblaze processor contains a fixed feature set that contains:

1.

Thirty-two 32-bit general purpose registers

2.

32-bit instruction word with three operands and two addressing modes

3.

32-bit address bus

4.

Single issue pipeline

When executing from slower memory, instruction fetches may take multiple cycles. This additional latency directly affects the efficiency of the pipeline. Microblaze implements an instruction pre-fetch buffer that reduces the impact of such multi-cycle

87

instruction memory latency. Flexibility is provided to determine if a three-stage or fivestage pipeline is required. The architecture of the Microblaze processor doesn’t differ much from ordinary processors. The block diagram of the processor is shown in Figure

26. In Figure 26, the block showing the different operations such as addition, subtraction, and multiplication is essentially the ALU of the processor. The fast simplex links (FSL) shown to the right of the figure consist of master and slave links. The interface is a uni-directional point-to-point communication channel bus used to perform fast communication between any two design elements on the FPGA. Microblaze can configure a maximum of eight FSL interfaces, with each being 32 bits wide.

Figure 26: Microblaze Processor

Similar to the DE2 board, the development kit provides a debug interface to support JTAG based software debugging. Debug interfaces are connected to the

Microblaze Microprocessor Debug Module (MDM). Multiple instances can be interfaced to a single MDM to enable multiprocessor debugging.

88

The PowerPC processor contains a fixed instruction set and features. Based on the application, it may be necessary to utilize the PowerPC processor over the

Microblaze processor. Although the hard core processor cannot be customized, for high performance and applications that require a real-time operating system the hard core processor is the desired processor. The features of the PowerPC processor are outlined in Table 7. These key features can be visualized in Figure 27.

Figure 27: PowerPC Processor Architecture

The selection of the platform depends on the performance, cost, and obsolescence. When purchasing a development platform, either custom boards or general purpose boards can be purchased. The obsolescence of custom boards is much higher than general purpose boards due to the limited range of applications custom

89

boards can cater to. The cost of the platform is a large determining factor as to the choice of development board. The FPGA platform performance should be accurately measured when deciding which platform to use.

Table 7: PowerPC Processor Features

Architecture

Component

CPU

Description

Memory Management

Unit (MMU)

Cache Units

Debug and Timers

1.

A 5-stage pipeline consisting of fetch, decode, execute, write-back, and load writeback stages

2.

Instructions are queued in the fetch queue if execution stalls. The fetch queue consists of three elements, two pre-fetch buffers and a decode buffer

3.

If the pre-fetch buffers are empty, instructions flow directly to the decode buffer

4.

single-issue execute unit containing the generalpurpose register file (GPR), arithmetic-logic unit

(ALU), and the multiply-accumulate unit (MAC)

1.

Enables RTOS implementation

1.

Separate instruction-cache and data-cache units

1.

Debug support, including a JTAG interface

2.

Three programmable timers: Programmable Interval,

Fixed Interval, Watchdog

The FPGA performance can be benchmarked, but a poor methodology can skew the results and lead to false conclusions. Altera in a white paper, [27], describes how to accurately benchmark the FPGA, and provides an example of a false conclusion that is obtained. The complexity of today’s designs and the computer-aided design (CAD) tools make benchmarking very difficult. Meaningful benchmarks can be obtained when the designer understands the FPGA device and CAD tools. [27]

90

The platform choice is very important when prototyping a system. First of all, the peripherals and capabilities of the system should match the application needs. For example, a board that doesn’t have a VGA connector or means to connect other custom peripherals would not be well suited for a video application. The cost of the platform should fit within the budget allocated for the prototype. Expensive platforms may yield high performance, but if the application doesn’t require the high performance capabilities, the purchase of it may be useless. On the other hand, purchasing one general purpose platform with high performance potential and many peripherals can be used for a range of embedded system applications. Currently the prototype is being used for more than just functional verification of the application. Performance is being measured on the prototype in order to adjust the implementation before the final product development is commenced. FPGA performance usually lags behind the ASIC performance. In order to obtain a good performance metric, the FPGA development board should be comparable to the ASIC processor. This implies that high performance prototyping tools are essential for the designer when analyzing the performance that can be achieved.

91

CHAPTER 7

EMBEDDED SYSTEM DEVELOPMENT

The consumer applications that can be prototyped using the development boards reviewed range from real time to internet based applications. The few that will be discussed are the digital oscilloscope, internet radio, and web server.

The digital oscilloscope can be easily implemented on both platform boards. The implementation can be generalized into three parts, embedded processor, on-chip programmable part, and external expanded circuitry. The creation of a digital oscilloscope requires external channel signal conditioning and analog-to-digital (A/D) conversion. The embedded processor is useful for this application due to the control it establishes over the entire application. System and interface control are the primary tasks of the embedded processor for this application. The on-chip programmable part controls the signaling sample storage. In other words, the digital signal input into the

FPGA requires memory to hold all the sampling information obtained during the A/D conversion. The frequency of the signal is also measured using the programmable part.

The expanded external circuitry performs the conversion of the signals to digital form.

Based on the understanding of the development platforms, both boards are able to prototype this application effectively.

When prototyping internet based applications, it is more useful for the designer to consider the use of an embedded operating system. This way the design time decreases because the cumbersome effort of developing the driver for the Ethernet is

92

bypassed. The downside of the embedded operating system is the storage space available on the platform. If the application doesn’t require additional circuitry, the embedded operating system is a good method. For the internet radio, the Ethernet controller will be used with the real-time operating system to establish a TCP connection to a particular website. The packets obtained through the Ethernet have to be buffered and decoded. This can be performed using a decoder library. The stream obtained is fed into the audio controller which connects to a speaker.

The web server also requires the embedded real-time operating system and embedded processor. The website is stored onto flash memory, such that the web browser to access that information. Different protocols need to be used for a web server, but essentially a processor and operating system are needed in all three applications.

Since the varying embedded systems usually require an embedded processor, the embedded processor design checklist, when designing the FPGA-based prototype, is provided (Table 8). The steps greatly assist new designers by ensuring that the necessary parameters are considered.

The embedded processer design checklist was roughly followed when implementing a chosen embedded system. The checklist assists in addressing the key decisions that need to be made when developing the system.

93

25

26

27

20

21

22

23

24

16

17

18

19

28

29

30

31

32

5

6

7

8

9

10

11

12

13

14

15

1

2

3

4

Table 8: Embedded Processor Design Checklist [36]

Know and understand performance and functional requirements

Develop detailed an accurate requirements for more efficient processor selection process

OS/RTOS selection can impact design efficiency and performance

Embedded FPGA processor core selection can significantly impact design performance and design schedule

Processor bus implementation selection can significantly impact design performance

Assignment of processor peripherals to processor bus is a critical design factor

Research and take advantage of available Intellectual Property

Evaluate hard versus soft processor core implementation choice carefully

Evaluate support for and overhead of multiprocessing implementations

Consider implementing specialty co-processing functionality such as floating-point processing

Estimate memory requirements

Develop detailed processor power-up/boot-cycle sequence

Develop detailed code update strategy

Consider specialized processor debug needs and requirements

Develop a design floorplan for the processor core relationship to high performance peripherals according to a data flow analysis

Adopt and follow team-wide coding guidelines

Evaluate processor loading and options for hardware co-processing

Evaluate availability of low-level device drivers

Develop an interrupt structure implementation plan

Fully define required peripheral performance and potential future enhancements

Develop a plan for peripheral interface and implementation

Understand available design trade-off options (cache memory, MMU, DMA, coprocessor)

Evaluate processor use model options

Work out the details of the processor core speed and required relationship to peripheral bus speeds

Develop a detailed bus implementation plan including bus relationships, bridges, speeds, burst modes, EDAC

Determine planned usage of internal and external memory

Estimate resource requirements for processor core, peripherals, processor buses and bridges, memory controllers and coprocessors

Estimate performance requirements and the projected processor system performance level

Define the complete system memory map

Evaluate the processor power consumption at different operational and bus speeds

Evaluate features, cost, support, usability of software development tools

Evaluate co-design tool flow and availability of design wizards

94

Embedded System Design

The test application, digital picture frame, was chosen to be implemented on both the Altera DE2 board and Xilinx ML403 board. The embedded processor and system design was followed in order to prototype the test application.

Altera DE2 Board

The source of the image in the embedded system was being contemplated. The image could come directly from the computer via the RS232 communications. In order to obtain a stand-alone embedded system, the USB flash drive or SD card was decided upon to hold the image to be displayed on the monitor. Since, the application is a prototype a VGA monitor seems sufficient to display the image. The USB flash drive and

SD card allows the user to change the image that is generated on the screen. This makes the application user friendly. This allows for proper functional testing of the prototype.

During the design stage, the type of image that can be displayed will be discussed. The image types that are of interest are bitmaps and jpegs. JPEG images seem to be predominant. Depending on the design limitations, the proper image type is chosen.

The main design challenge is separating the application into the hardware design and software design. Before separating the design into hardware and software aspects, the approaches to the problem were considered. Since the application was not welldefined, there is room for flexibility. There are two paths that can be taken in order to implement the application. Figure 28 shows the different paths in design and development of the photo frame.

95

The paths are separated based on the form in which the image will be input into the embedded system. The USB flash drive can be used to hold a JPEG image, but the implementation of driver is very time consuming and cumbersome. The use of an embedded operating system can allow for a quick launch of a completed prototype. The problem with using an operating system is the memory concerns. Since the operating system uses most of the memory in the SDRAM, complex programs cannot be implemented without external memory.

The SD card can also holds a JPEG image and the driver implementation is very time consuming. An embedded operating system can be used to interface to the SD card; unfortunately the functionality is not already present within the operating system.

Based on research performed, the SD card driver implementation from scratch is more efficient than modifying the operating system to mount the SD card and read from it appropriately. The two methods decided upon are a USB interface with an embedded operating system and SD card interface with driver and software developed from scratch. Each method of design has different requirements and steps in design and implementation as can be seen in Figure 28.

USB and Embedded Operating System

As mentioned before, development of a USB driver is time consuming and cumbersome. The use of an operating system speeds up the development of a prototype. This operating system needs to have specific features that will allow it to comply with the hardware limitations.

96

Figure 28: Application Design and Implementation Choices

97

Choosing an Embedded Operating System

The operating systems that are compatible with the Altera DE2 board are mostly commercial operating systems (OS) as seen in Table 9. The two operating systems that are open source are eCos and uCLinux.

The eCos OS is a real-time operating system and the real time features were undesirable for this project. The uCLinux operating system was chosen because it does not have a real-time capability making it ideal for this application. The choice of operating system depends on availability and features that are compatible with the hardware and software needs.

Most of the other operating systems in Table 9 are real time operating systems (RTOS). A RTOS is an OS that has a fixed upper bound on the interrupt latency and service time. A real-time system must respond to external events or interrupts within a limited amount of time or the system fails. Examples of RTOS applications are multimedia player and video applications.

Table 9: Comparison of Operating Systems for Altera Development Boards

[15]

98

Since, the Nios II processor does not include Memory Management Unit

(MMU) hardware, virtual addressing is not supported. The operating system required is therefore is MMU-less and uses flat memory addressing. Based on the Nios II processor requirements and the application requirements the uCLinux embedded operating system was chosen to be used on the Cyclone II FPGA.

UCLinux Operating System

The original “micro-controller” Linux, uCLinux was derived from the Linux

2.0 kernel and intended for microcontrollers that do not have a Memory

Management Unit. The Nios II processor is ideal for running the operating system. Currently, uCLinux includes kernel releases 2.0, 2.4, and 2.6 as well as libraries and tool chains.

Recently work has been done on porting uCLinux to the Nios II processor.

The uCLinux project community develops patches and supporting tools for using

Linux on microcontrollers or embedded processors. The documentation is limited with regard to the Nios II processor. There is a Nios forum

(www.niosforum.org) that assists with using uCLinux on Nios.

The uCLinux distribution lags behind the Altera Nios II distributions.

UCLinux may not work with latest releases of Nios II. In order to get uCLinux working on the development board, an older version of Nios II IDE needs to be utilized.

99

The kernel for the operating system can be customized for a specific use.

Prebuilt kernels are available, but to decrease memory usage, custom kernels are very effective.

The main features of the OS that are of great interest are:

1.

Mount command: used to mount/access different file systems including USB flash drives

2.

JPEG : Contains a built in JPEG decoder

3.

Nxview : Displays a JPEG image in a window

Porting uCLinux to Nios II Processor and Cyclone II FPGA

In order to load the uCLinux system into the Nios II processor, a Linux operating system is required to install, customize and build the distribution of the uCLinux Kernel. When using a prebuilt kernel, Linux is not required to load the uCLinux kernel into the Nios II processor.

The hardware design of the Nios processor has to be specified in order to support uCLinux. During the uCLinux build process, the Nios hardware system file

(.sopc) is used to generate a list of peripheral names, addresses and IRQs. Many of the device drivers within uCLinux use hardcoded names for the peripheral being accessed. Thus, designing a Nios II system to be compatible with uCLinux, for Nios II, requires that specific names be used for peripherals in the SOPC

Builder. Failure to do this carefully will result in the loss of functionality when

100

running the operating system. This process needs to be performed when using either a prebuilt kernel or customized kernel.

The SOPC Builder configuration should consist of the following peripherals and naming [16]: cpu (Nios II processor) jtag_uart (JTAG UART) uart0 (UART) timer0 (Interval Timer) timer1 (Interval Timer) buttons (Parallel IO : PIO) switches (Parallel IO : PIO) leds (Parallel IO : PIO) sram (SRAM) sdram (SDRAM Controller) ext_bus (Avalon-MM Tristate Bridge) flash (CFI Flash Memory)

DM9000A (DM9000A) ps2_0 (PS2 Serial Port) vga_controller_0 (VGA Controller)

ISP1362 (ISP1362_IF)

The uCLinux kernel image needs to be ported to the embedded processor. A Linux kernel image is typically called a zImage. The hardware design of the embedded processor and zImage are sent to the FPGA using a JTAG Uart cable and Nios II Command Shell.

Means of Implementing Photo Frame Application Using uCLinux

The main aspects of the application include a picture source, decoder, and viewer. The operating system contains these three aspects. The Nios

101

processor hardware can be adjusted to remove the DM9000A peripheral or

Ethernet controller and any other unused peripherals.

The hardware design should at least consist of the main components needed for the application. The components include a VGA controller, ISP1362 or USB controller, and necessary memory to hold the program. The SDRAM should hold the program and embedded operating system. Figure 29 shows the high level view of the hardware design for the application.

The hardware design and kernel image are needed for the application. It is preferred if the application begins on the operating system startup. The application needs to use the mounting of USB flash drive capability, jpeg decoder, and image viewer of the uCLinux operating system.

Design of Application Software

Since the embedded operating system takes care of the drivers for the peripherals, the software aspect of the design is greatly reduced. The picture viewer, nxview, is a very helpful tool in this application. The application design will just be utilizing the tools available in uCLinux to read the image from the connected USB flash drive and display the image using a graphical windowing environment that runs the nxview program. The necessary information needed is how to run the program on the operating system.

102

Figure 29: Hardware Design – High Level

Porting Application to Nios II processor

Due to lack of documentation, it was difficult to understand how to port the application to the processor. A possible method is to create a customized kernel with the application code within it.

In order to customize the kernel the downloaded uCLinux is run on a

Linux operating system with certain menu for configuration. A Linux environment has to be created on a windows operating system. The method works theoretically, but implementation may be unsuccessful.

103

SD Card and Nios Embedded Processor

The basic idea is to have two peripherals, 1) to control the onboard SD card reader and 2) to control the VGA DAC. The function of first peripheral is to talk to SD card reader and to read things from it. The other peripheral is to interact with VGA DAC and provide it with the proper signals (pixel data, H SYNC,

V SYNC, BLANKING). The main function of the software is to initialize and control the peripherals and also to decode the JPEG image once it is read from the SD card. The top level idea is to have two memory locations. One is where the program sits (SDRAM) and the other (SRAM) is where the image buffer is kept so that the VGA controller can quickly read from it.

The hardware is broken into four main components:

1.

SDRAM

2.

VGA Controller

3.

SD Card Controller

4.

Nios II System

The four main components communicate over the Avalon Bus. The image is stored on the SD Card. The Nios II processor reads in this image and decompresses it. It then sends the image to the VGA controller. The VGA controller stores the data sent from the NIOS into SRAM. The VGA controller then reads the SRAM for a new pixel to display on the screen.

104

Intellectual Property cores are important to speed up the implementation process. SD controller and VGA cores were researched.

Research of IP Cores

IP core research performed revealed that there is support for VGA controller within the SOPC builder as long as it isn’t Quartus and Nios 7.1 edition.

There is a VGA core that is available with DE2 boards, it can also be used for implementation.

The SD card IP core needs to be provided by another company and with specific licenses. It only works with the recent editions of Quartus and Nios.

There exists a fast color JPEG decoder that can assist with the design. It is compatible with Cyclone II FPGA devices.

Nios II Hardware Design

As mentioned earlier, the hardware should incorporate the SDRAM,

SRAM, and VGA controllers. A SD card controller is also required to read the SD card. The SD card controller reads from the connected card and stores the image into the SRAM. The SRAM controller is then supposed to read the SRAM and display the image one pixel at a time at a fast rate. The hardware design, in

Figure 30, allows for this application to be properly developed.

105

Figure 30: Hardware Design – SD Card

SD Card Interface

Communication will be performed with the MMC/SD Card in the SPI mode. Since SD Card is backward compatible with MMC card, the interface is the same. For the SPI mode all 4 pins of the interface are needed. Figure 31 shows how the FPGA is connected to the SD card.

106

Figure 31: SD Card Connected to FPGA [17]

There are four pins on the SD card that are used in SPI mode. The pins are

SD_DAT3, SD_CMD, SD_CLK, and SD_DAT. These pins will be manipulated accordingly to achieve an initialization procedure, read procedure, and possibly a write procedure.

Communications between the microcontroller and the SD card are initiated by different commands (Figure 32)nsent from the FPGA. All commands are 6 bytes long and are transmitted MSB first.

The command consists of a command field, argument field and CRC field.

The CRC field is used for error checking. The interface to the card allows for the activation of CRC error checking by a specific command. There are many other commands that are useful for writing the controller for the SD Card.

107

Figure 32: SPI Command Structure [20]

The SD card can communicate to the FPGA using the commands shown in

Table 10. The response field in Table 10 is the response that is sent by the card back to the controller. This response token is sent by the card after every command. It is 1 byte long; the MSB is always set to zero and the other bits are error indications. A 1 signals an error.

CMD0 in the table is used to specify the mode of the SD card. The initialization procedure for the SD card depends on the mode that the device is set up in. At startup, the device is in SD mode. The FPGA is being used to communicate with the SD card through the use of Serial Peripheral Interface

(SPI) standard.

The initialization flowchart is shown in Figure 33. It can be seen that in order to switch to SPI mode, certain command signals need to be asserted. In order to set up the card to operate in SPI mode, CMD0 needs to be used and CS needs to be asserted low.

108

Table 10: SPI Commands [20]

Figure 33: Initialization of Card into Different Modes [20]

109

The SD mode has other options that can be activated using the flowchart in Figure 33. The simplified SPI mode should suffice for this application.

Once the SD card is initialized to operate in SPI mode, a certain set of instructions need to be specified to lead to proper initialization of the SD card to read and write data

Using this physical specification information and the SD card controller IP core specification, communication with the SD card can be established.

VGA Interface

To understand how it is possible to generate a video image using a FPGA board, it is first necessary to understand the various components of a video signal. A VGA video signal contains 5 active signals. Two signals compatible with

TTL logic levels, horizontal sync and vertical sync, are used for synchronization of the video. Three analog signals are used to control the color. The color signals are red, green, and blue. They are referred to as RBG signals. By changing the analog levels of the three RBG signals all the colors are produced.

In standard VGA format, the screen contains 640 by 480 pixels. The video signal must redraw the entire screen 60 times per second to provide for reduced flicker in the image. The color of each pixel is determined by the value of the

RBG signals when the signal scans across each pixel. In a 640 by 480 pixel mode and 60 Hz refresh rate, there is approximately 40 ns per pixel. A 25 MHz clock has a period of 40 ns.

110

The DE2 board includes a 16-pin D-SUB connector for VGA output. The

VGA synchronization signals are provided directly from the Cyclone II FPGA, and the Analog Devices ADV7123 triple 10-bit high-speed video DAC is used to produce the analog data signals (red, green, and blue). Although the DAC can support resolutions of up to 1600 x 1200 pixels (100 Hz refresh), the DE2 board is limited by its system clock to 640 x 480 VGA resolution (60 Hz refresh).

Since there is a core for the VGA, the timing information does not need to be understood in detail. When a core is instantiated using the SOPC builder, the timing information is already created. The core can be controlled using programs written in Nios IDE. Figure 34 shows the diagram of the VGA core.

In order for the core to work properly, a 25 MHz clock has to be provided to the VGA core. The VGA core contains three modes, pixel mode, character mode and character overlay mode. The pixel mode uses the SRAM chip on the

DE2 board as the pixel buffer.

The software tools in Nios will assist in developing the programs to run the VGA program. The VGA core has a Hardware Abstraction Layer (HAL) interface already created. The HAL assists in rapidly developing programs without knowing the details of the underlying registers.

111

Figure 34: Block diagram of VGA Core [18]

If implementation of the VGA controller is performed without use of the

IP core, a controller needs to be developed. The controller uses the physical layer specification of the VGA interface to communicate with the VGA monitor.

The details of the physical layer specification need to be understood and are provided.

A VGA monitor is loaded with pixels by sweeping the pixels across the screen in row-major order as shown in Figure 35. When developing a driver or controller, the VGA timing has to be well understood. For the VGA monitor to work properly, it must receive data at specific times with specific pulses. There are horizontal and vertical sync pulses are used to synchronize the monitor in order to send color data.

112

Figure 35: VGA monitor with 640 columns × 480 rows. Scan starts from row 0, column 0, and moves to the right and down until row 479, column 639. [18]

The horizontal and vertical synchronization signals timing diagram is shown in Figure 36. When inactive, both synchronization signals are at a logical 1 value. The start of a row scan begins with the horizontal sync signal going low for

3.77 µsec as shown by region B in Figure 36. This is followed by a 1.79 µsec high on the signal as shown by region C. Next, the data for the three color signals are sent, one pixel at a time, for the 640 columns as shown in region D for 25.42

µsec. Finally, after the last column pixel, there is another 0.79 µsec of inactivity on the RGB signal lines as shown in region E before the horizontal sync signal goes low again for the next row scan. The total time to complete one row scan is

31.77 µsec.

The timing for the vertical synchronization signal is analogous to the horizontal sync signal. The 64 µsec active low vertical sync signal resets the scan

113

to the top left corner of the screen as shown in region P, followed by a 1020 μsec high on the signal as shown by region Q. Next, there are the 480 row scans of

31.77 µsec each, giving a total of 15250 µsec as shown in region R. Finally, after the last row scan, there is another 450μsec as shown in region S before the vertical sync signal goes low again to start another complete screen scan starting at the top left corner. The total time to complete one complete scan of the screen is 16784 µsec.

Figure 36: Horizontal and vertical synchronization signals timing diagram for a 25.175MHz clock. [18]

114

Figure 36 also displays that the vertical timings are multiples of the horizontal cycles. The circuit for the VGA monitor controller is shown in Figure

37.

Figure 37: VGA Controller Circuit

SRAM Controller

Communication with the SRAM chip is straightforward according to the

Nios handbook. Simple instructions can be used to write the image data to the

SRAM and read the image data from the SRAM for display onto the VGA

Monitor.

115

JPEG Decoder

The JPEG IP core, fast color jpeg decoder, is compatible with the hardware platform being used for the application.

The JPEG IP core is intended for high-speed decoding of gray-scale, color, or multi-scan images coded with ISO/IEC 10918-1 baseline coding standard. The decoder supports all features of the baseline standard.

With its autonomous behavior, simple FIFO-like interfaces, and 100% synchronous structure, the IP core can integrate in a complex system with little effort. This is reinforced by the stand-alone ability of the decoder that can be instantiated in systems without CPU intervention.

Application Software Design

The software aspect is straightforward, but the uncertainty is present when using the HAL and direct register functions. Figure 38 contains the flowchart for the algorithm.

The SD card IP core contains specific functions to perform the initialization of the card into a specific mode and to perform reading and writing.

The VGA controller that is available for the DE2 board contains a limited amount of function support. It is unclear as to how to control the VGA, but during implementation a better understanding can be gained.

116

Other aspects of the implementation need to be investigated as implementation proceeds. Once experience is gained in using the functions and libraries within the Nios II IDE, implementation of the application software will be quickly completed.

Figure 38: Software Design Flowchart

117

Xilinx PowerPC and MicroBlaze Development Kit FX12 Edition

The source of the image in the embedded system was being contemplated. The image could come directly from the computer via the RS232 communications. In order to obtain a stand-alone embedded system, the USB flash drive or Compact Flash was decided upon to hold the image to be displayed on the monitor. Since, the application is a prototype a VGA monitor seems sufficient to display the image.

The USB flash drive and Compact Flash card allows the user to change the image that is generated on the screen. This makes the application user friendly. This allows for proper functional testing of the prototype.

During the design stage, the type of image that can be displayed will be discussed. The image types that are of interest are bitmaps and jpegs. It seems that bitmaps would be the appropriate image type for the image device and for the software design.

The compact flash card is chosen to be used as the device to hold the images for the digital picture frame. The SANDISK compact flash card that is provided with the development kit and ML403 Xilinx board contains 512 MB of space. The space is sufficient for holding one image or more than one image if desired. The extra components that are needed to accompany the embedded processor are the VGA controller, System ACE controller, and SDRAM.

Choosing an Embedded Processor

The development board contains support for both the PowerPC processor and MicroBlaze processor. The MicroBlaze processor is a soft core

118

processor that can be customized according to the developer’s requirements.

The PowerPC processor is a hard core processor with features that are unchangeable by the developer.

The ML403 Embedded Processor Reference System is an example of a large Virtex-4 based system. An IBM Core Connect infrastructure connects the

CPU to numerous peripherals using Processor Local Bus (PLB), On-Chip

Peripheral Bus (OPB), and Device Control Register (DCR) buses to build a complete system.

The PLB protocol generally supports higher bandwidths, so the high bandwidth devices are placed there. The OPB connects the lower-performance peripheral devices to the CPU. The OPB offers a less complex protocol relative to the PLB, making it easier to design peripherals that do not require the highest performance. The OPB also has the advantage that it can support a greater number of devices. DCR is used with control and status registers for simplicity when performance is not important.

MicroBlaze Processor

The MicroBlaze soft core processor is highly configurable, allowing you to select a specific set of features required by your design. The processor’s fixed feature set includes:

Thirty-two 32-bit general purpose registers

119

32-bit instruction word with three operands and two addressing modes

32-bit address bus

Single issue pipeline

The example system shown in Figure 39 is the MicroBlaze system with the extra peripherals and components added to it.

Figure 39: Hardware View of ML403 Embedded MicroBlaze System [24]

120

PowerPC Processor

The PowerPC 405 processor core is an implementation of the PowerPC embedded environment architecture. The processor provides fixed-point embedded applications with high performance at low power consumption.

Figure 40: Hardware View of ML403 Embedded PPC405 System [23]

121

The PowerPC 405 contains so many elements that would not be used in the application development. One particular example is the Memory

Management Unit. Since the application is being developed without the use of an embedded operating system, the extra features of the PowerPC processor are not necessary. The example system shown in Figure 17 is the PowerPC 405 system.

Research IP Cores Available

The IP cores mostly are included in the EDK and may not be available in an evaluation version of the software. Because most of the IP in the design is attached to the CoreConnect infrastructure under EDK, adding or removing devices is a fairly straightforward process. The main IP cores that will be of use for the application development are:

1.

opb_sysace – System ACE MPU

2.

plb_ddr – DDR SDRAM

3.

plb_tft_cntlr_ref – VGA Controller

There is a 640X480 VGA controller that can be instantiated on the PLB and there is a VGA TFT LCD Controller that can be used to control the VGA using the

DCR buses.

122

Compact Flash Interface

Compact Flash is a mass storage device format used in portable electronic devices. For storage, Compact Flash typically uses flash memory in a standardized enclosure. The format was first specified and produced by SanDisk in 1994. The physical format is now used for a variety of devices.

Compact Flash became a popular storage medium for digital cameras. In recent years it has been widely replaced by smaller Secure Digital cards on the consumer end, but it is still a preferred format for its superior capacity and reliability.

The compact flash pin layout is shown in Figure 41. The pin descriptions change according to the mode that the card will operate in. The three modes are

PC Card Memory Mode, PC Card I/O Mode, True IDE Mode.

Figure 41: Compact Flash 50 Pin Female Connector [26]

The System ACE solution allows for programming of the Compact Flash through the ACE Controller MPU Interface. The System ACE controller is shown in Figure 42 and it interacts with the Compact Flash, Microprocessor, and the configuration JTAG.

123

The Compact Flash interface is comprised of two pieces: a Compact Flash

Controller, and a Compact Flash Arbiter. The Compact Flash Controller detects the presence and maintains the status of the Compact Flash device. This

Controller also handles all Compact Flash device access bus cycles, and abstracts and implements Compact Flash commands such as soft reset, identify drive, and read/write sector(s). The Compact Flash Arbiter controls the interface between the MPU and the Configuration JTAG Controller for access to the Compact Flash data buffer

Figure 42: System ACE Controller Block Diagram [25]

The MPU Interface provides a useful means of monitoring the status of and controlling the System ACE Controller, as well as ACE Flash card READ /

124

WRITE data. This interface enables communication between an MPU device and a Compact Flash module and the FPGA target system. The MPU interface is composed of a set of registers that provide a means for communicating with

Compact Flash control logic, configuration control logic, and other resources in the ACE Controller. Specifically, this interface can be used to read the identity of a Compact Flash device and read/write sectors from or to a Compact Flash device. Two important issues should be understood when using the microprocessor port:

1.

For the controller to be properly synchronized, the MPU must provide the clock.

2.

The MPU must comply with System ACE timing diagrams.

In essence, the System ACE controller can be used to read and write the

Compact Flash card. For this application only reading the card is sufficient.

Writing is important when the user loads the image into the card. The controller is programmed using the application software.

VGA Interface

The VGA controller if developed fully in hardware can be implemented using the same concepts as in the Altera VGA section. There is a VGA controller core in the Xilinx EDK.

125

Embedded Processor Hardware Design

The PLB connects the CPU to high-performance devices, such as memory controllers. Highlights of the PLB protocol include synchronous architecture, independent read/write data paths, and split transaction address/data buses.

The OPB connects lower-performance peripheral devices to the system.

OPB and PLB devices can communicate by way of an OPB-to-PLB Bridge or a PLBto-OPB Bridge. Here are the components that will be needed for the OPB.

The DCR bus offers a very simple interface protocol and is used for accessing control and status registers in various devices. It allows for register access to various devices without overloading the OPB and PLB interfaces. The

DCR specification requires that the DCR master and slave clocks be synchronous to each other and related in frequency by an integer multiple. It is important to be aware of the clock domains of each of the DCR devices to ensure proper functionality.

Below are components that will be needed for the embedded processor hardware.

1.

PLB Masters

640x480 VGA Controller

OPB-to-PLB Bridge (MicroBlaze system)

2.

PLB Slaves

Double Data Rate (DDR) SDRAM Controller

3.

OPB Masters

126

MicroBlaze Processor Instruction-Side Interface

(MicroBlaze system)

MicroBlaze Processor Data-Side Interface (MicroBlaze system)

4.

OPB Slaves

General-Purpose Input/Output (GPIO) x3

Microprocessor Debug Module (MicroBlaze system)

OPB-to-DCR Bridge

System ACE™ MPU Interface

5.

DCR

OPB-to-PLB Bridge-In (MicroBlaze system)

VGA TFT LCD Controller

The System ACE configuration management system allows the user to store hardware and software information on a Compact Flash device and use it to program one or more devices via JTAG. The ML403 platform uses the System

ACE chip in conjunction with standard Compact Flash cards to enable hardware and software programming of the FPGA.

Figure 43 is the entire system with the necessary components for application hardware development. The Parallel I/O is also added for debugging purposes. This will allow a better understanding of where the problem is and what needs to be performed. This would possibly decrease development time.

127

Figure 43: Embedded Processor Hardware Design

128

Embedded Processor Software Design

The function of the software is described from a high-level perspective.

The flowchart of the software algorithm is shown in Figure 44.

Figure 44: Software Design Flowchart

129

Embedded System Implementation

Altera DE2 Board

The hardware part of the designs has been fully understood. The implementation would be straightforward when using the information obtained in the design phase of development. The software aspect is vague due to inexperience in using

Nios to communicate with the peripherals.

The software for the application is written in the Nios II IDE. It is written using C or C++. There are methods of writing applications. Accessing and communicating with

Nios II peripherals can be accomplished in three general methods: direct register access, hardware abstraction layer (HAL) interface, and standard ANSI C library functions.

Depending on the complexity of the operation and the specific device being used, a programmer will often use each of the three methods at one point or another.

For direct register access, each peripheral register is accessed through read and write macros that are defined in the component’s device driver header file. A layer of software called the hardware abstraction layer has been created to reside between the user code and the peripheral hardware registers. The HAL interface contains many useful functions that allow the user to communicate with peripherals at a higher functional level. Standard library functions sometimes allow access to Nios peripherals.

Memory devices can be accessed easily when using this method.

130

USB and Embedded Operating System

The hardware for the embedded processor is created according to the required specifications which make it compatible with uCLinux Operating

System.

The Nios II processor was set up using the SOPC builder. The Nios II processor components are selected and added as shown in Figure 45.

Figure 45: Nios II Processor Hardware Components

A prebuilt kernel was used to port it to the embedded processor. The settings were made to ensure that the operating system runs on the processor.

The operating system was explored by connecting a keyboard, mouse, and VGA monitor to the DE2 board. The applications of the operating system, nxview,

131

nano-x and USB flash drive file system mounting were used in order to experiment with using them.

The following command is run using the Nios II Command shell. nios2-configure-sof project.sof

The project.sof file is the file created to program the DE2 board with the

Nios II hardware design and the appropriate pin assignments. nios2-download -g zImage

The zImage file should be located in the current directory of the command shell. It is a prebuilt kernel image that is used for experimental purposes. Once these commands are performed, the operating system should become visible on the VGA monitor connected to the DE2 board. Navigation of the embedded operating system can also be performed using the PC via a RS232 cable that is connected to the DE2 board. Figure 46 shows the uCLinux operating system accessed using hyperterminal.

According to the two different μClinux releases (Microtronix & original) there are two possible ways to compile a uCLinux kernel:

Nios II developing IDE (includes Cygwin) for Microtronix μClinux

Real Linux environment for original μClinux

132

Figure 46: uCLinux Operating System Running on DE2 Board

The second option for compiling kernel and file system is to work under

Linux instead of Windows and Cygwin, which is a great advantage because some makefiles need only slight modifications from Linux sources.

There were many problems compiling applications in the Nios II IDE and porting of the Makefiles. Since the hardware of the application is created using

Quartus for Windows, it was important to have Linux running on top of the

Windows Operating System on the computer.

Since Windows is being used for development, a Linux environment was chosen to be implemented on top of the Windows operating system. The Linux operating system was run on Windows using the cooperative Linux software. The file system chosen was a Gentoo file system.

133

Cooperative Linux is the first working free and open source method for optimally running Linux on Microsoft Windows natively. More generally,

Cooperative Linux (short-named coLinux) is a port of the Linux kernel that allows it to run cooperatively alongside another operating system on a single machine.

For instance, it allows one to freely run Linux on Windows 2000/XP, without using a commercial PC virtualization software such as VMware, in a way which is much more optimal than using any general purpose PC virtualization software.

SD Card and Nios Embedded Processor

The Nios II library functions were investigated using simple programs. It was noticed that the Nios software was unpredictable. Sometimes the test software would run properly and sometimes it wouldn’t. It also seemed that the

Quartus and Nios 7.1 edition contained some errors that were not present in earlier and later versions of the program.

The upgrade of the software was performed. Different procedures were performed to understand and work with different aspects of the DE2 Board. The

SRAM, LCD, VGA, SD Card and SDRAM were investigated by running experimental software. During the design phase, it was noticed that the SRAM can be written to using a straightforward methodology. The code below was used to test the SRAM. SRAM_MAX_WORDS is defined to be the size of the

SRAM, which is set to 8000.

134

alt_u32 i, val; alt_u32 errors = 0; alt_u32 *buffer = (alt_u32 *)SRAM_BASE;

// Write data to SRAM for ( i = 0; i < SRAM_MAX_WORDS; i++ )

{

buffer[i] = i + 1000;

}

// Check output from SRAM for ( i = 0; i < SRAM_MAX_WORDS; i++ )

{ if ( buffer[i] != (i+1000) )

errors++;

} return ( errors );

The LCD could be written to appropriately after researching the LCD module that is connected to the DE2 board. Different initialization procedures exist for different modules. Part of the LCD program is shown below. The

LCD_WR_COMMAND_REG is a constant that is set to 0 and the

LCD_WR_DATA_REG is set to 2. These register values and LCD information were obtained from [16]. void lcd_init( void )

{

// Set Function Code Four Times -- 8-bit, 2 line, 5x7 mode

IOWR( LCD_BASE, LCD_WR_COMMAND_REG, 0x38 );

usleep(4100); /* Wait 4.1 ms */

IOWR( LCD_BASE, LCD_WR_COMMAND_REG, 0x38 );

usleep(100);

IOWR( LCD_BASE, LCD_WR_COMMAND_REG, 0x38 );

usleep(5000);

IOWR( LCD_BASE, LCD_WR_COMMAND_REG, 0x38 );

usleep(100);

135

// Set Display to OFF

IOWR( LCD_BASE, LCD_WR_COMMAND_REG, 0x08 );

usleep(100);

// Set Display to ON

IOWR( LCD_BASE, LCD_WR_COMMAND_REG, 0x0C );

usleep(100);

// Set Entry Mode -- Cursor increment, display doesn't shift

IOWR( LCD_BASE, LCD_WR_COMMAND_REG, 0x06 );

usleep(100);

// Set the cursor to the home position

IOWR( LCD_BASE, LCD_WR_COMMAND_REG, 0x02 );

usleep(2000);

// Clear the display

IOWR( LCD_BASE, LCD_WR_COMMAND_REG, 0x01 );

usleep(2000);

} alt_u32 test_lcd( void )

{ int i; char message[17] = "Counting... " ;

// Write a simple message on the first line.

for ( i = 0; i < 16; i++ )

{

IOWR( LCD_BASE, LCD_WR_DATA_REG, message[i]

);

usleep(100);

} return (0);

}

The VGA core when instantiated in the SOPC builder did not allow for a resolution of 640x480. The resolution chosen was 320x240. The VGA core function to draw pixels across the screen was tested. The test program code is shown below.

136

int i, j; alt_up_vga_dev *myvga; int ret; unsigned int colorval; myvga = alt_up_vga_open_dev(VGA_CONTROLLER_0_NAME); while (1)

{

printf( "Enter color value: " );

scanf( "%u" , &colorval ); for ( i=0; i<320; i++ )

{ for ( j=0; j<240; j++ )

{

ret = alt_up_vga_draw_pixel_with_back_buffer(myvga, colorval, i, j, 0); if ( ret != 0 )

printf( "alt_up_vga_draw_pixel returned %d\n" , ret);

}

}

alt_up_vga_draw_pixel_with_back_buffer(myvga,

0x0, 0, 0, 1);

}

The SDRAM was written to using a similar format as the SRAM. The difference comes when specifying the buffer of the SDRAM. The SRAM code alt_u32 *buffer = (alt_u32 *)SRAM_BASE , can be replaced by the line alt_u32 buffer[SDRAM_MAX_WORDS].

The SD Card communication procedure is more complicated. The header file is shown in the Appendix.

137

Xilinx PowerPC and MicroBlaze Development Kit FX12 Edition

The hardware part of the design has been fully understood. The implementation would be straightforward when using the information obtained in the design phase of development. The software aspect is vague due to inexperience in using SDK to communicate with the peripherals. The main aspect of implementation will be the embedded processor hardware specification, as developed in the design phase, and the embedded processor software development. A high level view of the software’s functionality was developed when designing the application and this is used to develop the application software.

The hardware and software implementation follows the flowchart in Figure 47.

The figure is available when Xilinx Platform Studio is started.

Figure 47: Hardware and Software Development in Xilinx Platform Studio

138

During the development of a new project in Xilinx Platform Studio, the board type, embedded processor, memory, clock and peripherals need to be specified. Once the hardware is chosen, the wizard offers to develop certain C applications to test the memory and peripherals. These applications are useful for understanding how to communicate with the memory and peripherals chosen. The first three steps of hardware development were completed. The design was populated with the components necessary. The implementation is shown in Figure 48.

Figure 48: Embedded Processor Hardware Implementation

139

The Platform Studio SDK was designed to facilitate the development of embedded software application projects. SDK has its own GUI and is based on the

Eclipse open-source tool suite.

A SDK project must be created for each software application. The project directory contains your C/C++ source files, executable output file, and associated utility files, such as the make files used to build the project. Each SDK project directory is typically located under the XPS project directory tree for the embedded system that the application targets. Each SDK project produces just one executable file,

<project_name>.elf. Therefore, there can be more than one SDK project targeting a single XPS embedded system.

The header file that is needed to communicate with the System ACE controller is

“xsysace.h.” The example program, xsysace_selftest_example.c, in the appendix shows how to communicate with the compact flash.

The communication with DDR SDRAM memory can be tested using the header file “xutil.h.” This file makes it easy to test if memory is working normally.

Altera DE2 and Xilinx FX12

Xilinx and Altera tools are being compared in order to determine how conducive they are to rapidly prototype an embedded system. The digital picture frame test application was designed and compared for both prototyping boards. The only difference is that the Xilinx ML403 board uses a compact flash card and the Altera DE2 board uses a SD card to store the image that needs to be displayed on the monitor.

140

During design and development of the application on both boards, some drawbacks were found in both vendors’ tools. The Altera development tools were clearcut when developing the embedded processor hardware for the device. The drawback occurred when writing the software for the application. Although the software is written in C or C++, the libraries do not contain enough documentation to understand the use of certain functions and which parameters it requires. The software development mostly consisted of becoming familiar with the details of the function’s implementation in order to gain a better understanding of how to use it appropriately. Fortunately, some

IP cores provided an easier way of communicating with certain devices without the development of programs that interact with specific registers.

The Xilinx development tools are easy to use if the design is performed before implementation. The specifics of the embedded processor have to be specified as soon as Platform Studio is started. Altera also has the same property. Both tools force rapid development due to emphasis on the design of the application for success. The libraries associated with the software development have comments. They are easy to use because of them.

141

CHAPTER 8

FUTURE AND SIGNIFICANCE OF EMBEDDED SYSTEMS

In the past, most of the embedded research and development was identified with real-time systems and industrial settings, but things have been changing since.

With wider deployment of computers, the need for embedded systems has increased.

This can be realized because now embedded systems are in most house appliances, cars, electrical devices, and industrial devices and tools. This seems to be a long-lasting trend, although being widespread poses some hard requirements on embedded systems. The embedded systems must be as reliable and robust as other house appliances; as easy to use and as available; connected with other devices; requiring adherence to standards of some kind; and low cost.

Some of the earlier requirements might not be as relevant in the embedded space. The requirements might have to adjust, making trade-offs, such as size versus flexibility, robustness versus richness of functionality, and power consumption versus performance. The producers of the particular systems will define the exact trade-offs.

The impact of many embedded systems being produced is that the cost of the software is greatly minimized.

Connecting embedded devices together will extend the scalability limits of today’s systems. One user can have hundreds and thousands of embedded devices communicating with each other. The traditional wired technologies would not be viable and wireless is the attractive alternative.

142

Recent trends are essentially increasing computation demands, increasingly networked, and increasing need for flexibility. The future of embedded systems is definitely related to the key trends taking place. A paradigm shift is occurring in the embedded system market, the shift is from ASICs to SOC design. Ninety percent of new ASICs already include a CPU developed in 130nm technology. [14] Heterogeneous cores are exploited to meet the tight performance and cost constraints. This trend of building heterogeneous multiprocessor SOC will be even accelerated. SOCs will be composed of multiple, possibly highly parallel processors for applications such as mobile terminals, set top boxes, game processors, video processors, and network processors. Moreover, these chips will contain very sophisticated communication networks called network-onchip (NOC).

Figure 49: Transition to System on Silicon or System on Chip

The new generations of designs are software intensive and feature both short market window and ever increasingly complex functionalities and reliability. Current

ASIC design approaches are hard to scale to such a highly parallel multi-processor SOC.

Designing these new systems by means of classical methods gives unacceptable

143

realization costs and delays. The designers will have a significant problem in delivering competitive products to the market that respect at the same time, short time-tomarket, low cost, and complex and reliable designs for multi-processor SOC. The challenges clearly lay in innovating design methodologies. The abstraction layers and the key enabling technologies are shown in the figure below. The figure, from [14], shows that there is a trend toward automatic partitioning and execution model generation.

The hardware software partitioning was shown to be a difficult task, with automated partitioning, the design of complex embedded systems is made easier. Before attaining the goals of automatic partitioning, certain steps of communication computation partitioning, task allocation, and scheduling automation have to be completed.

Figure 50: Trends in Embedded System Design with More Automated Design

Methodology

144

The key issues when integrating the parts of a SOC is the creation of a continuum between embedded software and hardware. [14] This requires a new technology, namely Hardware dependent Software (HdS) for integrating embedded software to hardware platforms. The HdS concept is a promising approach for the SOC integration.

This concept facilitates: i.

Concurrent design of both hardware and embedded software leading to a shorter time-to-market ii.

Modular design of hardware and software parts leading to a better mastering of complex systems iii.

Validation of SOCs including hardware and embedded software leading to a higher reliability and a better quality of services.

The trends in silicon technology will greatly affect the embedded system industry. The hardware capabilities directly coincide with the application implementation potential. The scaling trends seen in silicon and the possible performance limitations are leading to multi-core processors. The technology nodes being developed can’t continually scale down due to fabrication, leakage, and cooling difficulties. These scaling difficulties are influencing the trend towards multi-core based chip designs. The development of embedded systems on these processors requires understanding of the hardware and the means to obtaining peak performance by utilizing the many cores.

145

CHAPTER 9

CONCLUSION

The objectives of the project were to review developments in embedded system design and future trends, and to explore board-level rapid prototyping using FPGAs. The trends being seen in the embedded system field are essential when designing new systems. These new systems will have to progress with the trends and cater to the emerging demands.

The embedded system design flow was reviewed to obtain an understanding of the process flow. Before beginning this project, random applications were performed without a set methodology. This approach worked for developing simple applications, but when complex applications were encountered, it was hard to determine where to begin. Research was completed on embedded system design methodology and it was followed when developing the test application, the digital picture frame. When the process is closely followed and the correct decisions are made, the accurate implementation is guaranteed to occur. Although many research papers mention that hardware-software partitioning is the most significant aspect of the design flow. This difficulty was experienced when developing the test application. The partitioning was difficult and two methods were explored for hardware-software partitioning. One method was to use an embedded operating system, thus the hardware aspect was minimal. The second method was to implement the system from scratch, without the use of an embedded operating system, and decide on the high level or low level

146

implementation of the system components. The first method was definitely easier when designing the system because minimal design was required. The problem faced was that the embedded operating system occupied the majority of the memory space available on the development board. Only a few images could be added for the picture frame application before the memory was full. The implementation was more difficult due to the lack of documentation of the open source embedded operating system.

Porting the operating system to the development board was a difficult and tedious task.

The second method was more design-intensive because the application was developed from scratch.

The test application resulted in understanding the software and hardware aspects of embedded system design. Memory considerations are becoming more important as the trend toward system on chip technology is taking place. The SoC trend introduces a need for more compact memory. Memory occupies the majority of an integrated circuit with on-chip memory. With the compaction of memory, increased performance and reduced area can be achieved. Compression techniques and smaller memory technologies can be utilized to achieve the memory requirements fitting for a

SoC.

When deciding on memory and possible optimizations, the run-time, code-size, and energy efficiency have to be considered. Memory speed increases by a factor of

1.07 a year and the processor speed increases by a factor of 1.5 to 2 per year. To account for this gap, smaller and faster memories are added in between the processor and main memory. This gap introduced cache as a means to alleviate the problem.

147

Cache provides good run-time and energy-efficiency. This is because local access to commonly used data and instructions speed up the application and reduce the energy required to access the instructions and data directly from main memory.

Different main memory options were researched to evaluate the advantages and disadvantages of using each one. This comparison can be found in chapter 3.

Choosing the correct development platform for the application is another major decision when creating an embedded system. Xilinx and Altera tools were compared in order to determine how conducive they are to rapidly prototype an embedded system.

The designs for the two boards varied slightly, but the differences were minor. During design and development of the application on both boards, some drawbacks were found in both vendors’ tools.

The Altera development tools were clear-cut when developing the embedded processor hardware for the device. The drawback occurred when writing the software for the application. Although the software is written in C or C++, the libraries do not contain enough documentation to understand the use of certain functions and which parameters it requires. The internals of the functions had to be analyzed to understand its usage. Altera provided some IP cores that provided an easier way of communicating with certain devices.

The Xilinx development tools are easy to use if the design is performed before implementation. The specifics of the embedded processor have to be specified as soon as Platform Studio is started. Altera also has the same property. Both tools force rapid

148

development due to emphasis on the design of the application for success. The libraries associated with the software development are well-commented.

The comparison between rapid application development and rapid prototyping was observed to be the following.

Figure 50: Comparison of Rapid System Prototyping and Rapid Application Development

The purpose of this project was met by reviewing the developments in the field of embedded systems and by experimenting with a test application to put those concepts to practical use. The understanding gained during this project will assist in future embedded systems design projects.

149

APPENDIX

SD_Card.h

#ifndef __SD_Card_H__

#define __SD_Card_H__

//-------------------------------------------------------------------------

// SD Card Set I/O Direction

#define SD_CMD_IN IOWR(SD_CMD_BASE, 1, 0)

#define SD_CMD_OUT IOWR(SD_CMD_BASE, 1, 1)

#define SD_DAT_IN IOWR(SD_DAT_BASE, 1, 0)

#define SD_DAT_OUT IOWR(SD_DAT_BASE, 1, 1)

// SD Card Output High/Low

#define SD_CMD_LOW IOWR(SD_CMD_BASE, 0, 0)

#define SD_CMD_HIGH IOWR(SD_CMD_BASE, 0, 1)

#define SD_DAT_LOW IOWR(SD_DAT_BASE, 0, 0)

#define SD_DAT_HIGH IOWR(SD_DAT_BASE, 0, 1)

#define SD_CLK_LOW IOWR(SD_CLK_BASE, 0, 0)

#define SD_CLK_HIGH IOWR(SD_CLK_BASE, 0, 1)

// SD Card Input Read

#define SD_TEST_CMD IORD(SD_CMD_BASE, 0)

#define SD_TEST_DAT IORD(SD_DAT_BASE, 0)

//-------------------------------------------------------------------------

#define BYTE unsigned char

#define UINT16 unsigned int

#define UINT32 unsigned long

//------------------------------------------------------------------------void Ncr( void ); void Ncc( void );

BYTE response_R(BYTE);

BYTE send_cmd(BYTE *);

BYTE SD_read_lba(BYTE *,UINT32,UINT32);

BYTE SD_card_init( void );

//-------------------------------------------------------------------------

BYTE read_status;

BYTE response_buffer[20];

BYTE RCA[2];

150

BYTE cmd_buffer[5]; const BYTE cmd0[5] = {0x40,0x00,0x00,0x00,0x00}; const BYTE cmd55[5] = {0x77,0x00,0x00,0x00,0x00}; const BYTE cmd2[5] = {0x42,0x00,0x00,0x00,0x00}; const BYTE cmd3[5] = {0x43,0x00,0x00,0x00,0x00}; const BYTE cmd7[5] = {0x47,0x00,0x00,0x00,0x00}; const BYTE cmd9[5] = {0x49,0x00,0x00,0x00,0x00}; const BYTE cmd16[5] = {0x50,0x00,0x00,0x02,0x00}; const BYTE cmd17[5] = {0x51,0x00,0x00,0x00,0x00}; const BYTE acmd6[5] = {0x46,0x00,0x00,0x00,0x02}; const BYTE acmd41[5] = {0x69,0x0f,0xf0,0x00,0x00}; const BYTE acmd51[5] = {0x73,0x00,0x00,0x00,0x00};

//------------------------------------------------------------------------void Ncr( void )

{

SD_CMD_IN;

SD_CLK_LOW;

SD_CLK_HIGH;

SD_CLK_LOW;

SD_CLK_HIGH;

}

//------------------------------------------------------------------------void Ncc( void )

{ int i; for (i=0;i<8;i++)

{

SD_CLK_LOW;

}

SD_CLK_HIGH;

}

//-------------------------------------------------------------------------

BYTE SD_card_init( void )

{

BYTE x,y;

SD_CMD_OUT;

SD_DAT_IN;

SD_CLK_HIGH;

SD_CMD_HIGH;

SD_DAT_LOW;

read_status=0; for (x=0;x<40;x++)

151

Ncr(); for (x=0;x<5;x++)

cmd_buffer[x]=cmd0[x];

y = send_cmd(cmd_buffer); do

{ for (x=0;x<40;x++);

Ncc(); for (x=0;x<5;x++)

cmd_buffer[x]=cmd55[x];

y = send_cmd(cmd_buffer);

Ncr(); if (response_R(1)>1) //response too long or crc error return 1;

Ncc(); for (x=0;x<5;x++)

cmd_buffer[x]=acmd41[x];

y = send_cmd(cmd_buffer);

Ncr();

} while (response_R(3)==1);

Ncc(); for (x=0;x<5;x++)

cmd_buffer[x]=cmd2[x];

y = send_cmd(cmd_buffer);

Ncr(); if (response_R(2)>1) return 1;

Ncc(); for (x=0;x<5;x++)

cmd_buffer[x]=cmd3[x];

y = send_cmd(cmd_buffer);

Ncr(); if (response_R(6)>1) return 1;

RCA[0]=response_buffer[1];

RCA[1]=response_buffer[2];

Ncc(); for (x=0;x<5;x++)

cmd_buffer[x]=cmd9[x];

cmd_buffer[1] = RCA[0];

cmd_buffer[2] = RCA[1];

y = send_cmd(cmd_buffer);

Ncr(); if (response_R(2)>1)

152

return 1;

Ncc(); for (x=0;x<5;x++)

cmd_buffer[x]=cmd7[x];

cmd_buffer[1] = RCA[0];

cmd_buffer[2] = RCA[1];

y = send_cmd(cmd_buffer);

Ncr(); if (response_R(1)>1) return 1;

Ncc(); for (x=0;x<5;x++)

cmd_buffer[x]=cmd16[x];

y = send_cmd(cmd_buffer);

Ncr(); if (response_R(1)>1) return 1;

}

read_status =1; //sd card ready return 0;

//-------------------------------------------------------------------------

BYTE SD_read_lba(BYTE *buff,UINT32 lba,UINT32 seccnt)

{

BYTE c=0;

UINT32 i,j;

lba+=101; for (j=0;j<seccnt;j++)

{

{

Ncc();

cmd_buffer[0] = cmd17[0];

cmd_buffer[1] = (lba>>15)&0xff;

cmd_buffer[2] = (lba>>7)&0xff;

cmd_buffer[3] = (lba<<1)&0xff;

cmd_buffer[4] = 0;

lba++;

send_cmd(cmd_buffer);

Ncr();

} while (1)

{

SD_CLK_LOW;

SD_CLK_HIGH;

153

if (!(SD_TEST_DAT)) break ;

} for (i=0;i<512;i++)

{

BYTE j; for (j=0;j<8;j++)

{

SD_CLK_LOW;

SD_CLK_HIGH;

c <<= 1; if (SD_TEST_DAT)

c |= 0x01;

}

*buff=c;

buff++;

} for (i=0; i<16; i++)

{

SD_CLK_LOW;

SD_CLK_HIGH;

}

}

read_status = 1; //SD data next in return 0;

}

//-------------------------------------------------------------------------

BYTE response_R(BYTE s)

{

BYTE a=0,b=0,c=0,r=0,crc=0;

BYTE i,j=6,k; while (1)

{

SD_CLK_LOW;

SD_CLK_HIGH; if (!(SD_TEST_CMD)) break ; if (crc++ >100) return 2;

}

crc =0; if (s == 2)

j = 17;

154

for (k=0; k<j; k++)

{

c = 0; if (k > 0) //for crc culcar

b = response_buffer[k-1]; for (i=0; i<8; i++)

{

SD_CLK_LOW; if (a > 0)

c <<= 1; else

i++;

a++;

SD_CLK_HIGH; if (SD_TEST_CMD)

c |= 0x01; if (k > 0)

{

crc <<= 1; if ((crc ^ b) & 0x80)

crc ^= 0x09;

b <<= 1;

crc &= 0x7f;

}

} if (s==3)

{ if ( k==1 &&(!(c&0x80)))

r=1;

}

response_buffer[k] = c;

} if (s==1 || s==6)

{ if (c != ((crc<<1)+1))

r=2;

}

} return r;

//-------------------------------------------------------------------------

BYTE send_cmd(BYTE *in)

{

155

int i,j;

BYTE b,crc=0;

SD_CMD_OUT; for (i=0; i < 5; i++)

{

b = in[i]; for (j=0; j<8; j++)

{

SD_CLK_LOW; if (b&0x80)

SD_CMD_HIGH; else

SD_CMD_LOW;

crc <<= 1;

SD_CLK_HIGH; if ((crc ^ b) & 0x80)

crc ^= 0x09;

b<<=1;

}

crc &= 0x7f;

}

crc =((crc<<1)|0x01);

b = crc; for (j=0; j<8; j++)

{

SD_CLK_LOW; if (crc&0x80)

SD_CMD_HIGH; else

SD_CMD_LOW;

SD_CLK_HIGH;

crc<<=1;

} return b;

}

//-------------------------------------------------------------------------

#endif

156

Xsysace_selftest_example.c

#define TESTAPP_GEN

#include "xparameters.h"

#include "xstatus.h"

#include "xsysace.h"

#ifndef TESTAPP_GEN

#define SYSACE_DEVICE_ID XPAR_SYSACE_DEVICE_ID

#endif

XStatus SysAceSelfTestExample(Xuint16 DeviceId);

XSysAce SysAce; //an instance of the device

#ifndef TESTAPP_GEN int main( void )

{

XStatus Status;

Status = SysAceSelfTestExample(SYSACE_DEVICE_ID); if (Status != XST_SUCCESS)

{ return XST_FAILURE;

} return XST_SUCCESS;

}

#endif

XStatus SysAceSelfTestExample(Xuint16 DeviceId)

{

XStatus Status;

Status = XSysAce_Initialize(&SysAce, DeviceId); if (Status != XST_SUCCESS)

157

{ return XST_FAILURE;

}

Status = XSysAce_SelfTest(&SysAce); if (Status != XST_SUCCESS)

{ return XST_FAILURE;

} return XST_SUCCESS;

}

158

LIST OF REFERENCES

*1+ Henzinger, T., Sifakis, J., “The Discipline of Embedded System Design”

[2] Johnson, S. “Formal Methods in Embedded Design.” Indiana University

*3+ Marwedel, P. “Embedded System Design.” Springer. 2006

[4] Navabi, Zainalabedin. Embedded Core Design with FPGAs. McGraw-Hill Professional,

2006.

*5+ Maxfield, C. “The Design Warrior’s Guide to FPGAs.” Elsevier. 2004

*6+ Buchenrieder, K. “Rapid Prototyping of Embedded Hardware/Software Systems.”

*7+ Kordon, F., Henkel, J., “An overview of Rapid System Prototyping Today.”

*8+ Luqi, “Handling Timing Constraints in Rapid Prototyping”

[9] Software Engineering Book.

*10+ “DE2 User Manual.” Altera Corporation.

<ftp://ftp.altera.com/up/pub/Webdocs/DE2_UserManual.pdf>

*11+ “PowerPC and MicroBlaze Development Kit Virtex-4 FX12 Edition.” Xilinx

Corporation. <www.xilinx.com/publications/prod_mktg/pn0010871.pdf>

*12+ “MicroBlaze Processor Reference Guide.” Xilinx Corporation.

<www.xilinx.com/support/documentation/sw_manuals/edk92i_mb_ref_guide.pdf>

*13+ “ML40x EDK Processor Reference Design.” Xilinx Corporation.

<www.xilinx.com/support/documentation/boards_and_kits/ug082.pdf>

[14] Jerraya, A. “Long Term Trends for Embedded System Design.”

159

[15+ “Embedded Software.” Altera Corporation.

<www.altera.com/products/ip/processors/nios2/tools/embed-partners/ni2-embed-

partners.html>

[16] Hamblen, J.O, Hall, T.S, and Furman, M.D. Rapid Prototyping of Digital Systems

SOPC Edition. Springer, 2007.

[17+ “DE2 Development and Education Board.” Altera Corporation. DE2 CD-ROM. DE2

Schematics PDF.

<www.altera.com/education/univ/materials/boards/unv-de2-board.html#boardinfo>

[18+ “VGA Core for Altera DE2/DE1 Boards.” Altera Corporation.

<ftp://ftp.altera.com/up/pub/University_Program_IP_Cores/VGA.pdf>

[19+ Kuon, I. Rose, J. “Measuring the Gap Between FPGAs and ASICs.”

[20+ “SD/MMC SPI Core.” El Camino GmbH.

<www.altera.com/products/ip/ampp/elcamino/documents/sd-mmc-spi-core.pdf>

[21+ “PowerPC and MicroBlaze Development Kit Virtex-4 FX12 Edition.” Xilinx

Corporation. - <www.xilinx.com/publications/prod_mktg/pn0010871.pdf>

[22+ “DE2 User Manual.” Altera Corporation.

<ftp://ftp.altera.com/up/pub/Webdocs/DE2_UserManual.pdf>

[23+ “PowerPC 405 processor Block Reference Guide.”Xilinx Corporation.

<www.xilinx.com/support/documentation/user_guides/ug018.pdf>

[24+ “MicroBlaze Processor Reference Guide.” Xilinx Corporation.

<www.xilinx.com/support/documentation/sw_manuals/edk92i_mb_ref_guide.pdf>

160

[25+ “ML40x EDK Processor Reference Design.” Xilinx Corporation.

<www.xilinx.com/support/documentation/boards_and_kits/ug082.pdf>

[26+ “EDK Concepts, Tools and Techniques.” Xilinx Corporation.

<www.xilinx.com/support/documentation/sw_manuals/edk92i_ctt.pdf>

*27+ “FPGA Performance Benchmarking Methodology.” Altera Corporation.

<http://www.altera.com/literature/wp/wpfpgapbm.pdf>

[28] “Quartus II Version 7.1 Handbook Volume 1: Design and Synthesis.” Altera

Corporation

*29+ “Quartus II Version 7.1 Handbook Volume 2: Design Implementation and

Optimization.” Altera Corporation

*30+ “Quartus II Version 7.1 Handbook Volume 3: Verification.” Altera Corporation

*31+ “Quartus II Version 7.1 Handbook Volume 4: SOPC Builder.” Altera Corporation

*32+ “Quartus II Version 7.1 Handbook Volume 5: Altera Embedded Peripherals.” Altera

Corporation

*33+ “Nios II 7.1 Processor Reference Handbook.” Altera Corporation

*34+ “Nios II 7.1 Software Developer’s Handbook.” Altera Corporation

*35+ “Cyclone II Device Handbook.” Altera Corporation

*36+ Cofer, R. Harding, B. “Rapid System Prototyping with FPGAs.” Elsevier Science and

Technology Books, Inc. 2006.

*37+ Salewski, F. Kowalewski, S. “Hardware Platform Design Decisions in Embedded

Systems – A Systematic Teaching Approach.” Embedded Software Laboratory.

RWTH Aachen University

161

Download