Lab Manual 2010 - Current students

COMP22111 Laboratory Manual

The STUMP Processor Chip Project

School of Computer Science

University of Manchester

September 2010

1

INDEX

Chapters

3.

4.

1.

2.

5.

6.

Introduction 3

COMP22111 Laboratory Organisation 5

The Design Process

Design Tasks

7

21

The Processor Specification

Programming the STUMP Processor

35

43

Appendices

A.

B.

C.

Verilog Top Level Behavioural Model

Verilog Control Description

STUMP Processor : RTL Design

49

59

63

Answer Sheets

Exercise 1

Exercise 4

67

75

2

Chapter 1

Introduction

Laboratory Aims

• To learn about the process of designing on silicon by ‘doing’.

• To design a simple RISC processor, comprising the datapath and control, at all levels of the design hierarchy ranging from a high level specification down to the Register Transfer Level (RTL).

• To have exposure to industry standard CAD tools.

• To use assembler test programs which can be used to test all levels of the design.

• To simulate the complete processor at the different levels of the design hierarchy.

• To support the lectures with a practical example.

Learning Outcomes

After completing the laboratory a student will

• be able to specify functionality in the Verilog hardware description language

• gain experience of the different stages of the VLSI design process down to the RTL level.

• gain experience of the composition and running test programs, and checking their results

• be able to use the Cadence CAD tool to find hardware/software errors

• gain experience of the appropriate CAD tools available for use at the different stages of the design process.

VLSI Systems Design requires inspiration and imagination as well as a sound technical background. Most of the technical background can be imparted by means of lectures, but when it comes to design there is no substitute for experience. We believe that, in the words of Albert Camus, “You cannot create experience - you must undergo it”. The COMP22111 course has therefore been structured so that the technical background is covered in a taught course consisting of lectures, while the design process itself is taught by means of a design project in laboratory classes. In the laboratory course students design a small ASIC (Application Specific Integrated Circuit) which, if completed successfully, could be implemented on silicon. This experience should give some feeling for the trials, tribulations and satisfactions of designing systems on silicon.

3

The objective of the project is to learn some of the methodology of VLSI design by carrying out the design of a small RISC processor. It is also intended to help students appreciate and understand the operation and architecture of a RISC processor. The

STUMP processor has been fully specified by D. A. Edwards and parts of it have been designed. Thus students should regard themselves as members of a design team whose job it is to do a significant part of the design; you will complete a partially done design and simulate the whole design.

In contrast to design work carried out in previous Computer Science laboratories the work will not consist of designing gate-level circuits - the emphasis in this project is rather on “systems on silicon”. The work starts with the high level behavioural modelling of the chip and proceeds to Register Transfer Level, where again behavioural modelling is used. Tools are available to automatically translate a RTL description into logic and then to generate layout of the chip onto silicon; this will be described in lectures but not performed in the lab.

The methodology described in lectures and used in the lab is typical of that adopted by many designers in industry for ASICs since only a small proportion of the ASICs produced nowadays are designed using full-custom methods.

This manual includes basic information about the laboratory organisation (chapter 2), a description of the design process (chapter 3) and stage-by-stage details of the design tasks (chapter 4). Chapter 3 is intended to give a background description of the design process while chapter 4 describes how the process is applied to the specific design project being carried out in the laboratory. The specification of the STUMP processor is in Chapter 5. Chapter 6 gives information on how to program the processor.

The appendices contain information that you will need in undertaking the design tasks.

Appendix A contains a copy of the top level (algorithmic) Verilog behavioural description of the processor which is to be completed as part of the second exercise. Appendix

B contains a (very) incomplete Verilog description of the Control for the STUMP at the RTL design level. Appendix C shows the RTL design of the STUMP datapath which has been done for you.

Answer Sheets which students fill in and hand in for laboratory exercises 1 and 4 are to be found following the appendices.

The emphasis of the manual is on “how to do it” and it does not attempt to give a comprehensive account of the many different facets of chip design. A fuller picture should emerge when the design work is taken together with the taught course material.

References:

For a description on the top down design approach see chapter 3 of this manual. For a text on top down design using Verilog see T. R. Padmanabhan and B. Bala Tripura

Sundari Design Through Verilog HDL Wiley-IEEE Press, 2003.

4

Chapter 2

COMP22111 Laboratory Organisation

Schedule:

The COMP22111 lab comprises eight 2-hour sessions in weeks 3 to 11 excluding reading week (week 6). There will be 3 lectures a week in weeks 1, 2 and 12 when there is no formallab. The schedule of lectures and labs in 2010 is shown in Table

2.1.

TABLE 2.1. Schedule for Lectures and Lab in 2010

Semester

Week wk 1 wk 2 wk 3 wk 4 wk 5 wk 6 wk 7 wk 8 wk 9 wk10 wk 11 wk 12

Lab in Tootill 1

Tuesday 10:00-12:00

Lectures in 1.4

Tuesday 10:00-12:00

-

-

Oct 12 (Ex 1)

Oct 19(Ex 2)

Oct 26 (Ex 2)

Sept 28 (Lects 1, 2)

Oct 5(Lects 4, 5)

-

-

Nov 9(Ex 3)

Nov 16 (Ex 3)

Nov 23 (Ex 4)

Nov 30(Ex 5)

reading week

-

-

-

-

Dec 7(Ex 6)

Dec 14 (written work/code hand-in deadline 12:00)

Dec 15 (demo deadline 15:00)

-

Dec 14 (Lects 15, 16)

Lecture in 1.3

Thu 14:00-15:00

Sept 30 (Lect 3)

Oct 7 (Lect 6)

Oct 14 (Lect 7)

Oct 21 (Lect 8)

Oct 28 (Lect 9)

Nov 11 (Lect 10)

Nov 18 (Lect 11)

Nov 25 (Lect 12)

Dec 2 (Lect 13)

Dec 9 (Lect 14)

Dec 16 (Lect 17)

The final deadline for handing in written work or code for marking is 12:00 on

Tuesday December 14th in week 12. The final deadline for demonstrating work is on the afternoon of Wednesday December 15th. Students wishing to demonstrate work must put their names on a list between 14:00 and 15:00. Names will be taken randomly from the list and students given one opportunity to demonstrate their work. Note no work will be accepted or demonstrated after the deadlines unless the student concerned has a lab mark of less than 40%.

The project:

The lab work consists of designing and testing a simple 16-bit RISC processor down to the Register Transfer Level.

Preparation:

Preparation outside the timetabled laboratory classes is necessary and expected.

Students who wish to make good progress in the laboratory time when help is available should not only read the relevant material for each week before coming to the

5

lab but should also do further work on stages of the design outside this. Remember, you are expected to spend the same amount of time on preparation as you spend in the scheduled lab time. In addition, the lab work and lectures are closely integrated, so important and useful information about lab exercises is given in lectures; so attendance at lectures is closely linked to good progress in the lab!.

Deadlines:

The exercise is divided into a number of stages with deadlines as indicated in Table

2.2. The details of the deliverables for each stage are given in Chapter 4 of this manual. Due to the incremental nature of the laboratory, an extension system is not operated and you do not need to request an extension. However, to complete the project work, you should adhere to or be ahead of the deadlines given.

Marks:

This course has more labs and less lectures than other courses and the overall lab and exam mark is weighted accordingly. Students are expected to work individually and independently. Hence work resulting from collaborative efforts will result in the mark awarded for the work being equally split amongst the contributors. As the

COMP22111 lab forms a significant contribution to the overall course mark, it is in your interests to invest the time in obtaining a good lab mark!

TABLE 2.2. COMP22111 Schedule

Design Level Exercise Design Stage

Specification Read Lab Manual

Programmer especially chapter 5

Ex. 1 STUMP assembler

No. of

Sessions

-

1

Top Level

Top Level

RTL

RTL

RTL

-

-

Ex. 2

Ex. 3

Ex. 4

Ex. 5

Ex. 6

-

-

Top level model in

Verilog and entry

Simulation of top level model.

Signal usage charts

Verilog specification of control RTL

Testing RTL design deadline for written work/code hand-in deadline to sign up for demo

2

2

1

1

1

-

-

Semester

Week(s) wk 3

Oct 12 wks 4 & 5

Oct 19 &26 wks 7 & 8

Nov 9 &16 wk 9

Nov 23 wk 10

Nov 30 wk 11

Dec 7 wk 12

Dec 14 wk 12

Dec 15

Exercise handin week

Max

Mark

wk 3

Oct 12 wk 5

Oct 26 wk 8

Nov 16 wk 9

Nov 23 wk 10

Nov 30 wk 11

Dec 7 wk 12

12:00 Dec 16 wk12

15:00 Dec 15

15

25

20

10

15

15

-

-

6

Chapter 3

The Design Process

3.1 Introduction

This chapter describes the sequence of steps and abstractions (levels of detail) that are used in transforming a circuit requirement into a silicon layout when a semi-custom integrated circuit is being designed. A semi-custom design is one in which libraries of pre-defined gates and logic components are provided by the circuit manufacturer; the silicon layout is then carried out using automated CAD tools.

Table 3.1 shows the large number of different representations involved in the design of a semi-custom ASIC. It will be seen that the representations are divided into three domains, behavioural, structural and physical, and into six levels. The design of very large chips may include an extra level; for example there could be a ‘subsystem’ level between the ‘top’ level and the ‘chip architecture’ level. The table indicates that the structural representation consists of a series of schematic diagrams. Although most engineers prefer to work with schematic diagrams, the structure could also be described using a hardware description language (HDL).

It is important to understand that this classification into different representations and levels of detail is not the same as circuit hierarchy. A circuit hierarchy represents a cir-

cuit decomposition into successive levels of detail while Table 3.1 shows design decomposition into different levels of abstraction. The design of a small circuit with no hierarchy would still involve several levels of design abstraction.

The chip design process consists of creating a sequence of different abstractions at successively lower levels. It starts with a chip requirement and specification and proceeds until a representation of the masks required for silicon fabrication is obtained.

The solid arrows in Table 3.1 indicate the main design steps which will be described in this chapter. This sequence of steps constitutes a design methodology. The particular methodology shown is suitable for standard cell ASICs. Verification of each stage of the design is carried out by means of simulation. The test patterns needed as simulation stimuli are summarised in the rightmost column. Note how the same test patterns are used in successive levels to ensure the correct decomposition of one level to the next. The dashed arrows show where the test patterns and the behavioural models for each design stage come from.

The use of a pure top-down design methodology requires considerable experience if the effects of high-level decisions on performance and on lower level implementationare to be anticipated. In practice it is common to carry out low-level feasibility studies before finalising high level specifications and descriptions. The COMP22111 design exercise will, nonetheless, be undertaken in a straightforward top-down manner.

7

TABLE 3. 1 Design representations involved in the design of a semi-custom ASIC

The Three Design Representation Domains

Design Level Components Behavioural

TOP LEVEL Whole chip Written specification

Executable behavioural description

Structural

Schematic shows core logic connected to input and output pads

Physical

Chip architecture shown as pads, core logic outline and power distribution

CHIP

ARCHITEC-

TURE

Major functional blocks

Behaviour of functional blocks described in a

HDL

Block diagram schematic of chip shows interconnectivity of functional blocks

Floorplan shows size and shape of rectangular blocks with routing channels

Tests

Input to behavioural model should reflect all possible system conditions -> whole chip test patterns

Test for each functionalblock +test for whole chip

REGISTER

TRANSFER

(RTL)

Register,

ALU, FSM,

MUX, adder etc.

Behaviour of

RTL components described in a

HDL

Schematic diagrams of each functional block show interconnectivity of RTL components

Components represented as areas of standard cells or as blocks of special cells e.g.

RAM, PLA, datapath

Test for each

RTL block + each functional block + whole chip

LOGIC Logic gates Behaviour of gates as simulation models provided by the silicon vendor

Each RTL component is shown as a schematic of interconnected gates

Outline for each cell + interconnection tracks

Same test as for RTL

TRANSIS-

TOR

PRODUC-

TION

Transistors e.g.CMOS

Electrical models e.g.

SPICE models, of transistors used by the silicon vendor

Circuit diagrams show transistors connected to form gates

Polygons represent mask shapes used for fabrication transistors and interconnect

Tests in form of analogue waveforms

Masks or reticules of pattern for each layer of fabrication

Test patterns designed to find structural production faults

8

3.2 Specification

A chip design starts with a set of requirements from which a specification is drawn.

The specification defines precisely what the chip does - its function - not how to do it.

It is the user’s view of what the chip does.

In the real world the specification is needed to make sure that the designer and customer agree on the function of the chip, and to define the interaction, or interface, of the chip with the external system of which it forms a part. Cost and performance criteria are also a part of the requirements.

In an educational exercise there are no customer requirements to determine design constraints. The main constraint in the class context is for a design which can be completed within a limited time.

When deciding what to put in a specification and how to write it, it is useful to consider what information will be needed in the data sheet of the completed device because the two are very similar. A good summary of the main functions of a specification are:

• a summary description of what a chip does

• a list of the chip’s input and output pins

• required performance (clock rate) and power dissipation

• a list of the major modes in which the chip operates

• for each mode

• signals which control the mode

• function executed in that mode

• performance constraints on execution such as minimum and maximum times between inputs and outputs

3.3 Top-level Behavioural Description

The top-level behavioural description is written as an executable program using a suitable programming language. It provides a means of simulating the function of the chip and is much more precise than the specification which is written in English.

The program should accept inputs and compute the appropriate outputs, then wait for the next set of inputs. The way the program works does not describe how the chip itself works - all that matters is that it captures the intended function of the device and that running it checks that what is specified is what is wanted.

This high level simulation is an extremely useful step. It clarifies the specification, brings to light potential difficulties and hidden assumptions and helps identify the major internal states of the chip. It is not surprising that the functional simulation often leads to a revised specification.

9

Programming languages have been developed especially for the behavioural and structural modelling of integrated circuits; they are known as Hardware Description Languages, or HDLs. Two examples are VHDL and Verilog. Verilog is a widely used standard and is used in preference to VHDL in most CAD tools. Thus Verilog will be used in this course. Other general purpose languages, such as C, Java or C++, could also be used for the top-level behavioural modelling of a chip.

When Verilog is being used to model a whole chip, a common procedure is to connect

it to a test bench as shown in Figure 3.1. The module representing the chip contains a

model of the chip and the “Tester module” (the test bench) which emulates the external environment of the chip. During a simulation the Tester module reads some form of input from an external file and extracts data to be applied as inputs to the chip model. It also captures the chip output data and writes it to an external file. The form of the external test file will depend on the type of chip being modelled. For example if the chip is a processor the test file could be in the form of the binary representation of a program to be executed by the processor.

Verilog test bench circuit

Test file

Chip model

Chip inputs

Chip outputs

Tester module

Output file

FIGURE 3.1. A Verilog Modelling System

The form of the chip model will depend on the stage reached in the design. A purely behavioural model, which describes the function of the chip but not its internal structure, is used for the Top Level design stage. At lower levels of the design the model contains internal information usually in the form of a behavioural model describing the internal data flow and operations but it can also contain a structural (gate) description. The same test bench and test program should be used at all levels of the design to ensure that each level of the design decomposition carries out exactly the same function as the top level description.

3.4 Chip architecture design

Having decided on the overall chip function, the design is partitioned into major functional blocks. For example a processor chip might be partitioned into input and output

10

interface blocks, a datapath, RAM, control etc. The means by which data is transferred between blocks must also be decided. This will usually be in the form of a bus structure.

The structure of the architecture level of design can be captured as a schematic diagram or described using a HDL. The design is then simulated using behavioural models of each of the functional blocks written using a HDL.

In a small design this level of design may well be omitted, or a simple block diagram might be produced as an intermediate stage but no simulation carried out.

3.5 Register-Transfer Level (RTL) design

3.5.1 What is Register-Transfer?

A register-transfer system is specified as a set of memory elements (e.g. registers) and

combinational logic functions between the memory elements as shown in Figure 3.2.

The basic memory elements used in student designs are usually D-type edge triggered flip-flops. All operations in an RTL system take place between clocked registers.

D Q register clk combinational logic

D Q clk combinational logic

D Q clk combinational logic

FIGURE 3.2. A register-transfer system

On each active clock-edge data is clocked from the D inputs of the flip-flops (FFs) to

the Q outputs which form the inputs to the following block(s) of logic; see Figure 3.2.

After a short delay the outputs of each CL block change as a result of the change to the block inputs.

The elements in the RTL design are usually represented as boxes, or blocks, in a block diagram which shows the interconnections between the blocks. The internal logic structure of the combinational logic is not defined at this stage but the function, or behaviour, is described as a model which can be used in a simulation of the RTL design.

11

Thus it can be seen that a register-transfer design gives a complete specification of what the chip will do on every clock cycle.

Students may already have come across most typical combinational RTL elements in earlier courses: adders, multiplexers, comparators, ALUs etc. In addition to these there will be ‘designer’ elements i.e. blocks of random logic designed to carry out arbitrary combinations of functions not included in standard libraries; the combinational logic block of a FSM (Finite State Machine) will be of this type.

Sequential elements consist of either straightforward storage registers - a set of D-type flip-flops for example, or more complex assemblages such as counters or state machines. Counters and FSMs contain combinational logic in addition to memory elements. Thus the separate combinational logic and register blocks of an RTL block diagram will not always be obvious because some of the RT structure is hidden within these more complex blocks. However each block in the diagram should only contain one register.

3.5.2 Starting an RTL design - ASM diagrams

At Register Transfer Level the operation of the circuit is described as operations between clocked registers, where each clocking of registers corresponds to moving from one state to another. In the COMP22111 exercise, the RTL design of the datapath has been done for you and is shown in Appendix C.

If you need to do an RTL datapath design then one starting point is to summarise the design in a diagram which shows both the major states of the circuit and the operations. The ASM (Algorithmic State Machine) diagram is such a diagram. It is a form of flow chart in which states are represented by rectangular boxes and decisions by

angled boxes as shown in Figure 3.3. Note the “two-way” and “multi-way” decision

boxes. The operations to be carried out within a state are written inside the state box.

3.5.3 How to carry out a RTL design:

Most first-time designers find it difficult to know where to start on decomposing a behavioural description of a chip into a RTL design and, having made a start, go through many cycles of trial and error before arriving at a satisfactory design. This is because, even in a small design, there are many different ways in which events can be scheduled and functions allocated to different blocks.

The following three-stage (and many-step) procedure can be used by first-time designers. Stages 1 and 2 are carried out on paper; the design is then transferred to the CAD system in Stage 3.

12

The circuit specification:

The circuit is a 4-bit triangle waveform generator. The output ramps from 0 to 15 then from 15 to 0 and so on. The waveform period depends on the clock period - it is 32x clock period. The circuit has a RESET input and a 4-bit output.

The ASM diagram: reset acc=0 inc=+1

acc and wave are 4-bit variables

inc and reset are 1-bit variables

wave forms the output

15 inc=-1 acc=acc+inc wave=acc wave

1-> 14 acc=acc+inc wave=acc

0 inc=+1 acc=acc+inc wave=acc

Yes reset

No

FIGURE 3.3. An ASM Example

Experienced designers will spend time and effort optimising their designs for silicon area, performance etc. but a first-time designer will be happy with a completed design which works!

It will probably be helpful to think of the design as made up of three parts:

• memory storage - registers, RAM etc.

• datapath functions - e.g. logical and arithmetic functions

• control - a block which includes an FSM to control the state sequence of the circuit.

13

STAGE 1 - Preliminary design:

1. Draw an ASM diagram of the design.

2. Draw an outline block diagram including memory storage registers and datapath functions but omitting the control block, as follows:

- from the ASM diagram identify all the registers and memory needed

- select combinational functions to carry out the data operations

- draw in the connections (wires) needed to transfer data between blocks and add multiplexers where necessary - check the block diagram against the ASM diagram.

3. List all the control signals which will be needed to control the operations of the blocks and orchestrate the clocking of registers. Also identify the signals needed as inputs to the control block.

4. Define the functions of the control block and extract a state transition graph for the

FSM from the ASM diagram.

5. Complete the block diagram by adding the control block and the control signals and write out a detailed specification of the control functions.

STAGE 2 - Refining the design:

The design should now be checked, critically examined and revised:

1. Work through the design, comparing it with your top level model and ASM diagram to check for the correct sequence, the correct production of control signals and correct data operations.

2. Modify the design if necessary.

3. Examine the design to see if there are any obvious simplifications that can be made.

It will often be found that step (2) will have led to additions and modifications which are rather clumsy. A re-examination may show that a design revision will give a simpler solution.

4. Repeat steps (2) to (4) until satisfied with the design.

STAGE 3 - Verifying the design:

Before the design can be verified by simulation it must be entered into the CAD system.

1. The structure can be entered as a schematic block diagram. In this case, great care should be taken to avoid errors and inconsistencies in the labelling of pins and bus signals. Careless labelling can make nonsense of simulations. Alternatively, the entire design can be entered as a HDL description.

2. If the design is entered as a block diagram then the functional descriptions of the blocks must be entered using a HDL i.e. Verilog in the present exercise. Models will already exist for library blocks.

3. The behavioural/functional models of each of the RTL blocks are tested for correct functionality by simulation. A set of test patterns will be needed, (see below).

14

4. When the functional models of all the blocks have been verified the whole design is simulated using the same chip test that was used for the top level behavioural simulation.

5. Corrections are made if needed and the design is re-simulated until correct outputs are obtained.

3.5.4 Simulating the RTL design:

Whole chip simulation:

The same test program and test bench that were used for the Top Level behavioural simulation are used for the RTL simulation, but with one important difference - a clock signal will now be used by the chip model. There was no need for a clock signal at the

Top Level because the Register Transfer structure had not then been defined. Note that the chip model now consists of a structural description, derived from the schematic block diagram, and behavioural models of each of the individual blocks in the schematic diagram.

The output file which is produced by the RTL simulation should be compared with, and agree exactly with, the output file which was produced by the Top Level simulation.

Testing individual blocks within the RTL design:

In a large design, it is important to test the behavioural models that have been written for each of the functional blocks before testing the whole design. A test bench is now needed to supply the test input stimuli which emulate the signals that the block will

“see” when embedded in the whole circuit. The test input stimuli consist of a set of test

vectors representing the input signals to the block under test for each clock cycle. The test vectors also include the expected output signals for each clock cycle.

In a simple design, the individual testing of blocks can usually be omitted. Thus in the

COMP22111 design exercise, it should only be necessary to test the complete design.

3.5.5 Design for testability

It is important during the design of a chip to consider the ease with which it can be tested after fabrication to find manufacturing faults. A common method of ensuring testability is the use of scan paths. A scan path is made by using registers and flip-flops which can be re-configured in test-mode to act in a ‘serial-in serial-out’ mode. They can then be connected in long chains into which a test pattern can be shifted from a test pin or pins (refer to course notes for details). Scan path testability can be incorporated at the RTL stage by selecting library registers, counters etc. which are configurable as scan registers and by using FFs with multiplexed inputs.

The inclusion of a scan path adds more complexity to the design and will be omitted in the STUMP design. Thus, the elements used in the COMP22111 project are not con-

15

figured for scan path operation. However students who do final year chip design projects are expected to include scan paths.

3.6 Logic Level Design

In the first year students learn how to design logic circuits using basic gates and flipflops. For large designs, of tens or hundreds of thousands of gates, this approach is too slow. Nowadays engineers use a number of different CAD tools to create gate level circuits automatically from RTL designs so that whole ASICs can be designed without any “by hand” gate level design.

The methods which are commonly used for the design of standard cell ASICs are summarised below.

1.

Use of library components: Many widely used RTL elements, e.g. registers, multiplexers, adders, can be pre-designed and stored as library components.

2.

Logic synthesis: Automatic synthesis tools can be used to create gate-level logic designs from internal behavioural descriptions. Tools for synthesising combinational logic and FSMs are well established and widely available. Tools for synthesising whole RTL designs are also available and are now highly sophisticated so as to be able to optimise performance, power or area. However, this sophistication requires user interaction and usually design iteration.

3.

Logic block compilers: Compilers are used to generate blocks which have some form of regular geometry. Most ASIC vendors supply compilers for ROM, RAM,

PLAs and datapaths.

In COMP22111, library components have been used to define the datapath and a logic synthesis tool is used for the control block.

It is sensible to arrange that the behavioural descriptions of RTL logic elements which have already been written for the RTL design can also be used as inputs to the synthesis tools. In the processor design the Verilog program describing the control block is used as the input for the synthesis software.

The logic-level design is seen to be an almost automatic decomposition from the RTL design. After decomposition to logic, the whole design is then re-simulated using the same chip test that was used for the Top Level and the RTL simulations. There may be a few problems this stage because:

1. Synthesis tools do not always do the ‘sensible’ thing and may misinterpret a description which was adequate for RTL simulation but not sufficiently specified for the unambiguous decomposition to gate level.

2. Simulations at higher levels do not take any account of gate delays. The gate level simulation models do include information about the delay characteristics of the gates and the simulation results show gate delays; the logic simulator can also make worst case predictions of the effect of the wiring between gates (but the actual wir-

16

ing delays cannot be calculated/known until after layout). The simulator results may show that some delays are unacceptable or that the active clock edges occur too close to data transitions.

Tests of individual blocks may be needed in addition to the whole chip simulation in order to sort out problems. RTL block test patterns are needed for these logic level simulations.

3.7 Field Programmable Gate Arrays (FPGAs)

Although the design is normally targeted at semi-custom chip design, it can also be aimed at a FPGA at this point in the design. This is because the design process down to the end of the logic design stage is independent of the medium it is implemented on.

A FPGA consists of preformed silicon comprising functional logic blocks and interconnections which are programmable. Here, the logic needs to be mapped onto the logic blocks of the particular FPGA and these are then placed and routed. This can all be done automatically by CAD tools. The design can then be downloaded onto the

FPGA, again using appropriate available tools. To check the operation of the downloaded design, a test program is run. This should be the same as that used in the top level behavioural simulation.

Unlike semi custom design, any design errors are not fatal. They only require that the design process is repeated from the highest level amended followed by downloading and testing of the updated design!

3.8 Layout

For a semi-custom design, there are still a number of design stages following the logic design which need to be performed. These are described in the remaining sections of this chapter.

Layout is the process of placing geometrical representations of gates on the surface representing the chip and interconnecting them with tracks. When a semi-custom chip is being designed layout is carried out using automatic ‘Place and Route’ CAD tools.

Each gate is represented as a rectangular shape of a standard height on a chip which uses the standard cell architecture. The internal representation of each gate as a set of polygons is added at a later stage by the manufacturer before making the masks for

17

fabrication. The cells are abutted together in rows with channels between the rows for

routing the interconnections (Figure 3.4).

FIGURE 3.4. Some rows of a standard cell layout

The layout of a small chip will consist of a single rectangle containing a number of rows and channels of the same length but a floorplan is needed for a large chip. A floorplan subdivides the total surface of the chip into separate areas for the placement of the different functional blocks and for the routing of signals between the blocks.

Although CAD tools can be used to assist the creation of a floorplan it is a difficult process to automate. The objective will usually be to obtain a layout with as small an area as possible that maintains the signal integrity.

The Place and Route procedure for a standard cell chip consists of carrying out a sequence of separate steps. First, I/O pads must be added to the top-level schematic.

The circuit description will usually be held in the CAD database in a hierarchical format but the layout tools need a flattened description containing every instance to be used in the layout. The next step is therefore to ‘flatten the netlist’. Further steps define floorplan areas, assign cells to rows and carry out local channel routing and global routing.

3.9 Postlayout

The layout stage is not the end of the story for the ASIC designer. Having obtained a layout, a ‘Design Rule Check’ (DRC) is carried out to ensure that none of the fabrication process design rules are broken. If the software is well designed and bug-free there should be no errors at this stage - regrettably it is sometimes necessary to make a few edits to the layout by hand. When the DRC passes, then a ‘Layout versus Schematic’ (LVS) check is performed. This checks that every feature extracted from the layout appears on the schematics generated from the logic. Any mismatches need to be investigated and fixed until the components in the logic correspond exactly with the layout features.

18

When this check passes, the next step is to use a program to calculate (extract) the parasitic capacitances of all the interconnection tracks. The whole chip is then re-simulated and the effects of the extra track capacitances are included in the delay calculations in order to get a fairly accurate estimation of performance.

Further testing is normally done to ensure that the design functions correctly despite the maximum allowable variations in transistor characteristics and in environmental parameters (temperature, voltage etc.). This is referred to as testing in the ‘corners’.

When the designer is satisfied that the design functions correctly under all conditions and meets its performance/power/area specification under typical conditions, a final

‘Design Rule Check’ (DRC) and ‘Layout versus Schematic’ (LVS) check are undertaken. Once these has been done and pass, the chip design files can be shipped to the manufacturer for the fabrication of the chip.

3.10 Packaging and test

Although the main design task is now complete the designer has more work to do:-

1. A ‘bonding diagram’ showing how the chip is to be packaged must be sent to the manufacturer. The bonding diagram shows the connections between the bond pads on the chip and the ‘lead frame’ pins of the package.

2. A set of test vectors for the testing of the chip after fabrication must be provided.

This should be the same test program as used in the top level behavioural simulation.

19

20

Chapter 4

Design Tasks

Ex. 1 - Operation of STUMP Assembler

Aim: to familiarise you with the STUMP assembler and to gain an understanding of the programmer’s top-level view of the STUMP operation.

Hand in: - the completed charts for Ex. 1 to be found at the back of the manual.

Read: Chapters 5 and 6.

Sessions: 1

Assessment: 15 marks

Learning Outcomes: understanding of operation of assembler code and practice in relating it to machine behaviour, practice at handling binary, hex and decimal quantities.

INSTRUCTIONS

You will be given a sheet with a few consecutive lines of assembler code together with the initial state of the Register Bank and Condition Code register prior to executing the code. For each instruction fill in the sheets to specify the register state after executing the instruction. Hand in your sheets for marking on completion.

To help you, note that

1. The Program Counter is incremented directly after fetching an instruction and before the instruction is executed.

2. The Result Register holds the result computed by the ALU.

21

Ex. 2- Top Level Model of STUMP in Verilog

Aim: to complete the Verilog model of the algorithmic view of the processor chip and to enter it into the Cadence CAD system.

Hand in: - a listing of the Verilog code for the top level model

Read: Chapter 3, section 3.3 and chapter 5

References:

T. R. Padmanabhan and B. Bala Tripura Sundari, Design Through Verilog HDL,

Wiley-IEEE Press, 2003.

M. D. Ciletti, Advanced Digital Design with Verilog HDL, Pearson 2002.

J. Bhasker, A Verilog HDL Primer, 2nd ed., Star Galaxy Press, 1999.

Sessions: 2

Assessment: 25 marks in total will be allocated to the Verilog code and syntactic errors will to a large extent be ignored.

Learning Outcomes: familiarity with Verilog description of a specification, experience of writing Verilog.

INSTRUCTIONS:

You may not have met the hardware description language Verilog before. However, you should find that the examples and templates provided should be a sufficient guide as to what is needed to complete the exercises. Demonstrators will be able to give help and advice.

The amount of Verilog code to be written is not very long - about one page without comment lines. In order to write this code and to attempt the other exercises well, you need to have a thorough understanding of the processor specification (Chapter 5). You should remember that the high level behavioural model of a chip is in effect an executable specification. It is the user’s view of the chip. It is very important to get it right because it is what will finally be made - all the lower levels of design are tested against this specification.

Your first task is to complete the Verilog behavioural model of the STUMP processor chip. Most of the model is provided - a listing of the code is given in Appendix A of this manual. It includes all the functions and tasks that you will need, and the main program includes reset, instruction fetch and instruction decoding. The part left for you to do is the execution part of the instruction i.e. the reading of the Register Bank, the setting up of operands to the ALU, the execution of the instructions in the ALU and the setting of condition codes in the Condition Code Register. The writeback phase following execution has been done for you. You can complete the program using

CASE and IF constructs together with variable assignments and task and function calls. You will find examples of all the syntax needed in other parts of the program.

22

Those parts of the Cadence CAD systems needed to complete the exercise are described below:

CADENCE:

Accessing and Modifying the Top Level Model

The Cadence CAD system is used in this laboratory for the design work. Create the

COMP22111 Cadence directory structure by typing mk_cadence 22111. This should only be done once. Thereafter start a Cadence session by typing start_cadence 22111.

Eventually, an icds window opens. Choose File->Open. This brings up an Open File window. In the Open File window set the Library Name to comp22111, Cell Name to

processor, View Name to algorithmic and then click OK. This brings up a verilog.v window containing the incomplete Verilog code describing the top level of the

STUMP. Type in this window to add your code and then save it using File->Save from the window’s toolbar.

PRINTING:

Print out your code from the verilog.v window toolbar using File->Print. This brings up a Printer window. Enter lpr -Pugpr3 and click on Print. Dismiss the verilog.v file using File->Exit. If the HDL Parser Errors/Warnings window comes up, click No.

EXIT

:

To exit from Cadence, click on File in the icds window and then select Exit. In the Exit

icds? window this brings up, click yes.

23

Ex. 3 - Behavioural Simulation of the STUMP

Aim: To use the test programs provided to simulate and debug your top level behavioural model of the STUMP.

Demonstrate: that the test programs work correctly on your Verilog model.

Read: Chapter 6

Sessions: 2

Assessment: This exercise is assessed in the lab and is worth 20 marks, with 4 marks awarded for passing the Register Bank tests, 6 marks for the ALU tests, 3 marks for the shifter tests and 7 marks for the branch tests.

Learning Outcomes: how a test strategy evolves for complex hardware, how this translates to test programs, experience of using CAD tools to control and simulate a design using a test bench and experience in debugging hardware specifications.

INSTRUCTIONS

Parsing (syntactic analysis)

It is first necessary to parse a design that satisfies the Verilog syntax. Start Cadence

(start_cadence 22111). This brings up the icds window. Choose File->Open. This brings up an Open File window. In the Open File window set the Library Name to

comp22111, Cell Name to processor, View Name to algorithmic and then click OK.

This brings up a verilog.v window containing the Verilog code. You need to perform at least one edit on it and then save it. Now, in the verilog.v window, select File->Exit.

The edit operation on the file causes all the Verilog code to be checked for syntax errors on Exit. If you have errors a HDL Parser Error/Warnings window comes up telling you that parsing of the Verilog file failed. A failed design check indicates syntax errors and by Clicking Yes in the HDL Parser Error/Warnings window you can inspect the error report to gain some indication of where the error is, and the verilog.v window will reopen. Correct any errors in the top level description, save it and then exit to recheck for syntax errors. Repeat this until the code correctly passes the checks.

Test Files

In the COMP22111 directory, you will find 4 test files (test1.s to test4.s) written in the

STUMP assembly code. These four tests provide a fairly good test of most of the

STUMP and are used to test the STUMP at all stages of the design from top level to layout. The tests would also used to test the fabricated design. The tests are incremental i.e. test 2 assumes that test 1 works, test 3 relies on tests 1 and 2 etc. The tests start at line 0 and all write results back to memory, starting at line 0 and thus overwrite the program! Test 1 is a basic test which checks that the internal buses are connected correctly, that the Register Bank can be correctly addressed, that instructions can be fetched, and that data can be written back to memory (for checking). If test 1 does not work, something fundamental is wrong and this should be fixed before running any other tests.

24

Test 2 checks that the ALU operates correctly for various data combinations and different logical and arithmetic operations. It only checks the ALU and does not use the shifter. It aims to identify any signals in the fabricated ALU which are unable to change state (because they are stuck at ‘1’ or ‘0’) and pinpoint any adjacent signals

(bits i and i+1) which are shorted together.

Test 3 is a rudimentary program which checks the different shifting operations, and test 4 checks the branch operations. As these are the test programs used throughout this laboratory, you are advised to peruse them carefully. Furthermore, you will be using them to debug your design so familiarity will certainly be necessary if the test programs indicate any errors in your design.

The assembler is fully described in chapter 6 and instructions to convert the assembler programs into a format suitable for the processor memory are given in section 6.2.1.

They are repeated here for convenience: in a shell window, change the directory to

COMP22111 using cd $HOME/Cadence/COMP22111. Then type sasm <filename.s> to create 3 files. Binary is in <filename.bin> while hex versions are in <filename.hex> and <filename.mem>. The file for the processor memory is called xc4000mem.ram and is created by typing loadmem.sh <filename.mem> in the terminal window; this creates the file in the $HOME/Cadence/COMP22111/test_bench directory.

Waveform Viewer

In the icds window, select Tools->Verilog Integration->NC-Verilog. This brings up a

Virtuoso Verilog Environment etc. window. Fill in its fields with test_bench for the Run

Directory, and in the Top Level Design section enter comp22111 for Library,

processor_test_bench for cell, and schematic for View. Click on the top left icon (of the running man) to initialise the simulator. When the icds window shows that initialisation is complete, click on the second icon down on the left (of three separate ticks) to generate a netlist. When the icds indicates this is complete, click on the Simulate icon which is the third icon down in the Virtuoso Verilog Environment window. This launches SimVision and (eventually) brings up two windows: a Design Browser 1-

SimVision window containing the processor and a Console-SimVision window which is a command window.

You will probably want to view signals on the Waveform Viewer, select the fifth icon from the right (showing waveforms) in the Design Browser 1-SimVision window to bring up a Waveform 1-SimVision window with the signals to be displayed listed down the left. To monitor signals/buses within the test bench, expand test (press its + button) to reveal top in the Design Browser 1-SimVision window and then click on the top symbol to list the signals to and from the processor. Select the signals you want to monitor (the address, data and clock lines are particularly recommended) and send them to the Waveform Viewer (fifth icon from right). Continue this process of signal expansion and selection until all the signals you require have been sent to the Waveform Viewer. A good starting point for monitored signals would be the input/output signals to the processor.

25

Command Files

The simulation can be run by using the menus to issue commands to the simulator with the commands given appearing in the Console-SimVision window. However, this is tedious and prone to errors, so normally these commands are placed in a file and the simulator instructed to take this file as its input. Command files have a .sv extension and the command files for the first three tests, test1.sv to test3.sv are in your

COMP22111 directory. test1.sv is shown below: force test.RESET = 1 run 10 ns deposit test.RESET = { 1’b0} -after 2 us -absolute -release stop -create -condition { #test.top.A = 45 & #test.top.OE = 0 } stop -create -time 10 ms -absolute run stop -delete * deposit test.DUMP = { 1’b1} -after 1 ns -relative stop -create -time 10 ns -relative run stop -delete *

The first three lines define the RESET signal, setting it to ‘1’ at 10nsecs and removing it by taking it to ‘0’ at 2

µ secs. The next two lines define breakpoints where the simulation is to be stopped. The first is when the address lines are 45 with the Memory Output Enable signal at ‘0’. Remembering that the test program starts at line 0 and that instructions are placed in consecutive lines, this statement recognises the request to memory to fetch the program’s final instruction (bal Fin). If your processor description is correct, the simulation always stops at this point. The second stop command is only required if your processor is stuck in an infinite loop in which case the simulation terminates at 10msecs. The simulation is then instructed to run up to the time a breakpoint is encountered. There is a stop -delete * after each run statement; this line is necessary to remove the breakpoint enabling the simulator to run past this time. The

DUMP signal is activated 1ns after the current time when the current values in the memory will be dumped to the file $HOME/Cadence/COMP22111/test_bench/

xc4000mem.dump. The final three statements in the command file continues the simulation for 10nsecs beyond the last breakpoint before stopping.

Using the template for the three provided command files, you need to create a command file test4.sv for the branch test.

Running Simulation from Command Files

To run the simulation from a command file, reset the simulation using Simulation->

Reset to Start in the Design Browser 1-SimVision window. Run the simulation from the command file by selecting File->Source Command Script from the Design Browser 1-

SimVision window to bring up a Source Command Script window. Here select the browse button (...) which opens a Source Script File window in which <name of

file.sv> in the COMP22111 directory is selected followed by Open; this closes the

Source Script File window. This filename now appears in the Source Command Script

26

window. Select simulator Console (NC-Sim) for the Send commands to: field and press

OK to cause the simulator to run (as can be observed by the change in the icon displayed by the Play button in the Design Browser 1-SimVision). You can now inspect the file dumped to memory and you should compare the created outputs with those expected from the test program. If the contents are not correct, you need to pinpoint the cause of the fault; in this case, you will find the Waveform Viewer a valuable tool in debugging your design. Correcting any faults will involve modifying the Verilog code for the processor. After modifying the Verilog code, you need to repeat the syntax checks as previously described.

When your top level design description simulates correctly for all the supplied test programs, you should show the results of running the assembler code for each test to a demonstrator who will mark them off. You should complete and hand-in the top level code and demonstrate the simulation of the processor operation before moving on to the next exercises.

Stopping The Simulation

If for any reason the simulation fails to complete within a few seconds and the time is galloping on uncontrollably, you need stop the simulation! In the Design Browser 1-

SimVision window the button next to the play button (having two parallel vertical lines) is a stop button and will halt the simulation. Alternatively, use Simulation->Stop.

Exit from Simulation

To exit from the Waveform Viewer, select File->Exit SimVision in the Waveform 1-

SimVision window. To exit from the simulator, in the Design Browser 1-SimVision window select File->Exit SimVision. This removes both the Design Browser 1-SimVi-

sion window and the Console-SimVision window. To exit from the simulator environment, in the Virtuoso Schematic Composer etc. window select Commands->Close.

Note that the created netlist does not need to be recreated if further simulations are performed unless there are changes to the top level Verilog code. Finally, exit from

Cadence by clicking on File in the icds window and then selecting Exit. In the Exit

icds? window this brings up, click yes.

27

Ex. 4- Signal Usage Charts for the Control

Aim: to produce Signal Usage Charts to aid with the composing of the Verilog description of the STUMP control at the RTL level.

Hand in: - the Signal Usage Charts for Ex. 4 to be found at the back of the manual.

Read: Chapters 5, 6 and Appendix C.

Sessions: 1

Assessment: Worth 10 marks. Marks will be awarded based on the correct operation of the STUMP with 2 marks for Fetch signals, 4 for Execute and 4 for Writeback.

Learning Outcomes: understanding how to specify control in a formal way.

INSTRUCTIONS:

The processor design can be thought of as comprising a 16-bit datpath (which does the actual computation) plus a control block. The RTL level design of the datapath has been done for you and is shown in Appendix C.

The control logic is all about ensuring that the right things happen at the correct time by activating control signals at the appropriate time in the instruction cycle. Each instruction takes three clock cycles to complete; instruction fetch is performed on the first cycle, execute which reads the operands and performs the arithmetic is done in the second cycle, and a writeback phase which operates on the contents of the Result

Register occupies the third cycle. In general, a control signal is required by each multiplexer (to select the desired input at the appropriate time) and by each register (to enable data to be loaded into a particular register) in the datapath. In addition, there are control signals to access the desired read and write locations in the Register Bank and signals to control the access of information in the memory.

At the RTL level, the STUMP processor requires 12 different lots of signals to control the datapath and the memory. These are:

BR indicates a branch instruction. It is used by the Sign Extender element to extend the least significant 8-bits while 5 bits are extended if BR is low.

FETCH indicates an instruction fetch is occurring

EXE indicates the Execute phase of operation i.e. data is read from the Register

Bank, operated upon and the result placed in the Result Register.

SRPA[2:0] 3-bit address specifying the register (Reg0 to Reg7) to be read out onto port A of the Register Bank

SRPB[2:0] 3-bit address specifying the register (Reg0 to Reg7) to be read out onto port B of the Register Bank

SWP[2:0] 3-bit address specifying the register in the Register Bank (Reg0 to Reg7) to be written to

IMMED selects the extended immediate operand rather than port B of the Register

Bank as the B-input to the ALU

LDCC enables the loading of the 4-bit Condition Code Register

28

LDREG enables writing to a register in the Register Bank

SALUD (Select ALU Data) selects data from the Result Register for writing into the Register Bank. If SALUD is low, data from the memory (MDIN[15:0] is selected instead.

MEMWR active-low signal to Bus Interface when Memory is to be written to

MEMOE active-low signal to Bus Interface when Memory is to be read from

The Control forms these signals based on the contents of the Instruction Register

(INSTR[15:0]) and on a State Register which is internal to the Control. The State Register is updated on each clock and when running instructions has 3 states: FETCH,

EXECUTE and WRITEBACK. FETCH indicates the Instruction Fetch phase, EXE

(i.e. EXECUTE) indicates the Execute phase and LDREG indicates Writeback when the Register Bank is written to; naturally, only one of FETCH, EXE and LDREG can be a ‘1’ at any time and all three may be ‘0’ during the writeback phase of a store instruction. The action required in each phase (which takes one clock period) is summarised in Table 4.1. below:

Phase 1

Phase 2

Phase 3

Fetch

Decode/

Execute

Writeback

The next instruction is fetched from memory and +1 is added to the Program Counter.

The instruction is decoded and executed. This phase finishes when the result from the ALU is clocked into the

Result Register. The Condition Code Register may also be updated in this phase.

ALU operations are written back into the register bank.

Load/Store operations access memory. A branch instruction that is taken will update the Program Counter.

Table 4.1. Instruction Phases

In specifying the control signals, it is useful to make Signal Usage Charts which show the state of control signals at any time. Signal Usage Charts sheets can be found on pages 77 and 78 at the back of the manual. Your task is to determine how each control signal is formed in each phase of the instruction for each instruction type. The control signal may be ‘0’, ‘1’, ‘don’t care’ or formed from bits in the Instruction Register

INSTR[15:0] and/or the phase signals (FETCH, EXE and LDREG) and/or other signals. Hand in the completed sheets on pages 77 and 78 for your Signal Usage Charts.

The Signal Usage Charts sheets on pages 75 and 76 are for you to make a copy of what you hand in, as this will help you in writing the Verilog code required in the next exercise.

29

Ex. 5 - Verilog Specification of Control Block

Aim: to produce a Verilog specification of the control for your processor and enter it into Cadence.

Hand in: - Verilog listing of the Control

Read: Section 3.5 and Appendix B and C of this manual

Sessions: 1

Assessment: Worth 15 marks. Marks will be awarded based on the correct operation of the Control, the program structure and the comments provided; syntactic errors will be penalised.

Learning Outcomes: A complete understanding of the operation of the STUMP

RISC processor, more practice in writing a substantial hardware descriptions in Verilog but this time at the RTL level.

INSTRUCTIONS:

Introduction

The processor comprises the Bus Interface which forms the memory to processor interface, the Datapath elements which performs the computational part of the

STUMP, and the Control which generates the signals at the correct time required to perform instructions. In this exercise you are going to complete the specification of the

Control block in Verilog. The interconnection of the Bus Interface, Control and the components of the Datapath components is shown in Appendix C. The Datapath and

Bus Interface have been designed for you (down to the gate level).

RTL Design of the Control

Advice on how to proceed with this stage of the design is given in section 3.5 - but the most important thing is to know precisely what you want your design to do. In determining the signals that have to be asserted for each of the three phases by

Control, the Signal Usage Charts you generated in the last exercise (if correct) should provide you with a precise specification of the control for the Fetch, Execute and

Writeback phases. These need to be translated into a Verilog control block design specification and entered into the functional view of the control cell. This is provided in the form of a template which is listed in Appendix B.

Accessing and Modifying the Control

From the icds window, choose File->Open. This brings up an Open File window. In the Open File window set the Library Name to comp22111, Cell Name to control, View

Name to functional and then click OK. This brings up a verilog.v window containing the incomplete Verilog code describing the control for the STUMP datapath. The code contains an always code block, written for you, which defines the processor state and advances the state on each positive clock edge. State 0 is the Reset state, State 1 Fetch,

30

State 2 Execute, and State 3 is the Writeback state. If the state is in none of these states, then a Default state is entered which sets all signals to ‘don’t care’. It is strongly recommended that the signals listed in the Default state are explicitly set in each of the four states you need to write. At the end of the code, a function Testbranch, written for you, takes as its input parameters bits 11 to 8 of the Instruction Register and the bits in the Condition Code Register returning a ‘1’ if the branch is to be taken (and ‘0’ if a branch is not taken). Use the Edit facilities of the window to add to your control code and then save it using File->Save.

Print and exit as described in Ex. 2. Don’t forget to hand your code in.

31

Ex. 6 - Test of RTL Design

Aim: to simulate and debug the whole RTL design.

Demonstrate: the simulation of the whole processor design with the test programs.

Sessions: 1

Assessment: Assessed in the lab. 15 marks allocated on the basis of demonstrating that your RTL description is correct and passes the tests; 3 marks are awarded for passing the Register Bank test, 5 marks for the ALU test, 3 marks for the shifter test and 4 marks for the branch test.

Learning Outcomes: practice in testing and fault finding in a large digital system at the RTL level using the Cadence CAD tools, experience of the design iteration process.

INSTRUCTIONS

In this exercise, you will run the test programs we have given you on the RTL description of the processor. This will allow you to identify and debug faults in your Verilog code of the Control. You can assume that the test programs we provide are correct and that the RTL datapath we provide is correct. Therefore any errors in operation are due to faults in your Verilog control description!

When you are satisfied that your design is working correctly, show your simulation results for each test to a demonstrator.

The Cadence procedures needed for this exercise are similar to those in Ex. 3 and are briefly summarised below:

Parsing (syntactic analysis)

It is first necessary to parse a design that satisfies the Verilog syntax. Start Cadence

(start_cadence 22111). This brings up the icds window. Choose File->Open. This brings up an Open File window. In the Open File window set the Library Name to

comp22111, Cell Name to control, View Name to functional and then click OK. This brings up a verilog.v window containing the Verilog code. Make at least one edit then save. Now, in the verilog.v window, select File->Exit to check the Verilog code for syntax errors. Correct any errors and repeat the parsing process until correct.

Test Files

As before the test files are in test1.s to test4.s and you need to create a file for the processor memory called xc4000mem.ram for use in the simulation. See Ex. 3 or Ch. 6 for instructions on this.

32

Simulation

In the icds window, select Tools->Verilog Integration->NC-Verilog. This brings up a

Virtuoso Verilog Environment etc window. Fill in its fields with test_bench for the Run

Directory, and in the Top Level Design section enter comp22111 for Library,

processor_test_bench for cell, and schematic for View. Click on the top left icon (of the running man) to initialise the simulator. When the icds window shows that initialisation is complete, click on Setup->Netlist in the Virtuoso Verilog Environment etc window along its top toolbar. This brings up a Netlist Setup window. To the Netlist These

Views line, remove algorithmic at the beginning of the line (to enable the netlister to pick up the Verilog code for control). Click OK along the top toolbar of the Netlist

Setup window. Back in the Virtuoso Verilog Environment etc window, click on the second icon down on the left (of three separate ticks) to generate a netlist. When the icds indicates this has completed correctly, click on the Simulate icon which is the third icon down in the Virtuoso Verilog Environment etc window. This brings up the Design

Browser 1-SimVision and Console-SimVision windows. As described in Ex. 3, place any signals you wish to observe on the Waveform Viewer, then run the simulation from a command file (reset simulation, use File->Source Command Script to select an input command file <filename.sv>, send commands to the Simulator Console (NC-

Sim) and press OK to run the simulation).

When your processor simulates correctly using your Verilog description of Control for all the supplied test programs, you should show the results of running the assembler code to a demonstrator who will mark them off.

33

34

Chapter 5

The Processor Specification

5.1 Architecture

The processor is a 16-bit machine with a RISC style architecture. Operands for ALU operations come from registers inside the processor and the result is returned to a register. Separate instructions are provided to move data between the registers and external memory.

There are 8 registers. R0 is always zero and can be used as a source operand, allowing

Move instructions to be synthesised from an Add instruction. R0 may be written to, but the result is always discarded, allowing Compare instructions to be synthesised from

Subtract instructions. Register R7 is the program counter and, from a programmer’s view of the machine, has equal status with the other registers allowing PC-relative addressing to be supported.

CC 3

Sign Flag (N)

CC 2

Zero Flag (Z)

CC 1

Overflow Flag (V)

Table 5.1: Condition Code Bits

CC 0

Carry-Out Flag (C)

The processor has a 4-bit condition code register shown in Table 5.1. It holds status information relating to the ALU output. The four status bits indicate if the ALU result is negative (N bit is ‘1’ if ALU result is -ve; N is ‘0’ for +ve or zero), zero (Z bit is ‘1’ if ALU result is all ‘0’s; Z is ‘0’ if non-zero), overflows (V bit is ‘1’ if adding two +ve numbers yields a -ve result, or if adding two -ve numbers yields a +ve result; V is ‘0’ if number is within range), or has a carry out (C bit is ‘1’ if there is a carry out of the ms

ALU result bit; C is ‘0’ if bit 15 of the ALU result has no carry out). Each arithmetic and logical instruction has the option of updating or not updating the condition code register. If an arithmetic/logical instruction does not update the condition code register then its state remains as is. Load, Store and Branch instructions never update the condition code register and so do not change the existing state of this register.

There are 3 instruction formats in a fixed-length 16-bit instruction. The machine operates on 16-bit words only. Byte addressing is not supported.

35

5.2 Instruction Set

Instruction Code

000

001

010

011

100

101

110

111

Instruction Explanation

ADD 2’s complement add

ADC

SUB

2’s complement add with carry-in

2’s complement subtract

SBC

AND

OR

LD/ST

Bcc

2’s complement subtract with borrow

Bitwise AND of two 16-bit words

Bitwise OR of two 16-bit words

Load register from memory or

Store register to memory

Branch if condition cc is satisfied.

Table 5.2: Basic Instruction Set

There are 8 basic instructions shown in Table 5.2. enabling arithmetic, logical, load, store and branch operations to be performed; as is common in processors, all the arithmetic is performed by an adder since subtraction of A-B can be performed as

A+B+‘1’. Some other instructions such as Cmp, Nop and Mov can be expressed directly in terms of the basic instructions and are supported by the assembler. Other instructions may be synthesised from the combinations of the basic instruction set as shown in Chapter 6.

Shift instructions are somewhat special. Shift-left instructions can be derived from the basic instruction set. Shift-right instructions have been added as a rather ugly “kludge” and are dealt with in next section.

5.3 Instruction Formats

There are just 3 instruction formats which are shown below:

Type 1: 2 source registers:

15 14 13 12 11 10 9

INSTR 0

LD

CC

DST

8 7 6

SRC A

5 4 3

SRC B

2 1

SHIFT

0

Type 2: 1 source register, 1 immediate value:

15 14 13 12 11 10 9

INSTR 1

LD

CC

DST

8 7 6

SRC A

5 4 3 2

Immediate

1 0

36

Type 3: Conditional branch:

15 14 13 12 11 10 9

1 1 1 1 Condition

8 7 6 5 4

Offset

3 2 1 0

The processor is a 3-address machine specifying two source operands and a destination operand. In the case of arithmetic and logical instructions (instruction codes 0 to 5), the two source operands are either two registers (Type-1 instructions) or a register and a 5 bit signed immediate value (Type-2 instructions). The result of the operation is returned to the destination register (DST) and the condition codes are updated depending on the state of bit 11 (if LDCC is ‘1’ then update condition-codes; if LDCC is ‘0’, do not update condition codes)

Branch (code 7) and load/store instructions (code 6) do not update the condition-code register. In the case of a LD/ST instruction, bit 11 is used to determine the direction of the data transfer: if LDCC is ‘1’, the operation is store to memory; if LDCC is ‘0’, the operation is load from memory. The memory address is constructed from the sum of the two source operands, i.e. the two registers specified by SRC A and SRC B for

Type-1 instructions or the register specified by SRC A and the 5-bit signed immediate for Type-2 instructions. The register specified by DST is the register to be written into memory for a Store operation or the register to be loaded from memory for a Load operation.

Type 3 instructions are branch instructions (code 7). Here, the 8-bit signed offset is added to the Program Counter to compute the address of the instruction to be jumped to if the branch is taken. This is written into the Program Counter if the branch is taken but is ignored otherwise. In branch instructions, bits 8 to 11 specify the conditions under which a branch is taken. These usually involve bit(s) in the condition code register. The branch conditions are described in section 5.5.

37

5.4 Shift operations

Bits 1 and 0 in Type-1 instructions are used to control various shift-right operations. If a shift is specified then the one bit right shift of operand A from the Register Bank is performed before it reaches the ALU. The shifts that can be specified are an arithmetic shift right (ASR) with the sign bit copied to bit 15, clockwise circular shift (ROR) with the bit 0 moving to bit 15, and clockwise circular shift through the carry (RRC) with the C-bit in the condition code register moving to bit 15. These shift operations are summarised in Table 5.3 below.

Operation

No shift

ASR

ROR

RRC

Instr Bit 1 Instr Bit 0

1

1

0

0

Shifter Output, bit 15:=

0

1

0

1

A15

A15

A0

CC0

Table 5.3: Shift Operations

Shifter Carry-out

(CSH):=

0

A0

A0

A0

Refer to section 5.6 for information on how the shifter carry-out is used.

Assuming that the data input to the Shifter is A<15:0>, then the effect of the shift operations can be summarised as follows: no shift

A15 A0 ‘0’ CSH

ASR A15 A15 A14 --> A1 A0 CSH

ROR A0 A15 --> A1 A0 CSH

RRC CC0 CC0 A15 A14 --> A1 A0 CSH

FIGURE 5.1 Shift Instructions

38

5.5 Conditional branch instructions

Type-3 instructions implement 16 conditional branch instructions shown in Table 5.4.

The range of the branch target address is PC + 1

±

8 bit signed offset.

Mnemonic

BCS

BNE

BEQ

BVC

BVS

BPL

BMI

BGE

BLT

BGT

BLE

BAL

BNV

BHI

BLS

BCC

Bits 11:8 Branch Condition

0000

0001

0010

Always branch

Never

C+Z=0

0011

0100

C+Z=1

C=0 comparison: unsigned arithmetic overflow test: unsigned arithmetic 0101

0110

0111

1000

C=1

Z=0

Z=1

V=0 zero test

1001 V=1 overflow test: signed arithmetic

1010

1011

1100

1101

1110

1111

N=0

N=1

N.V+N.V=0

N.V+N.V=1

(N.V+N.V)+Z=0

(N.V+N.V)+Z=1 comparison: signed arithmetic

Table 5.4: Conditional Branch Instructions

39

5.6 Condition Codes

The following table summarises the conditions under which the various condition bits are set. The column labelled C in and SBC instructions. C in shows where the carry-in comes from for the ADC is the carry into the least significant bit of the adder. The column labelled CC0 shows how the carry bit in the Condition Code register is derived if the register is updated.

C in

CC3

Sign

CC2

Zero

CC1

Overflow

CC0

Carry

Update codes if LDCC is set

ADD

ADC

SUB

SBC

AND

OR

LD/ST

BR

0

CC0

1

CC0

0

0

0

0

S

15

S

15

S

15

S

15

S

15

S=0

S=0

S=0

S=0

S=0

C

14

!=C

15

C

14

!=C

15

C

14

!=C

15

C

14

!=C

15

0

C

15

C

15

C

15

C

15

CSH if shift else 0

S

15

S=0 0

δ

δ

δ

δ

δ

δ

Table 5.5: Condition code settings

CSH if shift else 0

δ

δ yes yes yes yes yes yes no no where:

• S is a 16 bit result of an arithmetic or logic operation, i.e. the ALU result

• C

14 and C

15 are the carry bits from bits 14 and 15 respectively of an arithmetic operation

• CSH is the shifter carry-out (see Table 5.3) and is only used to update the condition code register for a logical order which performs a ASR, ROR or RRC shift

• shift is TRUE for type 1 instructions when bits 0 and 1 are NOT equal to “00”.

• SUB and SBC are done as an addition with CC0 and C in

settings as shown in the above table. CC0 is stored as a borrow and is C

15

since a borrow = carry. Thus A -

B - borrow = A + B + borrow. For SUB, there is no borrow, so A - B = A + B + ‘1’ while for SBC A - B - borrow = A + B + borrow = A + B + CC0.

5.7 Processor interface

The processor is shown in Figure 5.2 in a system with a clock, reset and memory.

Communication with memory is controlled by the processor signals Ram_Cs (memory select), Bus_Rd (enable memory output) and Bus_Wr (enable memory write). These are all active low signals i.e. their normal inactive state is high and they go low to activate the memory. The data is transferred on a 16-bit bidirectional Bus_D(15:0) and the

40

address at which reading or writing takes place is specified by the processor on the 16bit bus address bus, Bus_A(15:0).

If reading, the address to be read is placed on the address lines and the chip is enabled together with the output enable signal. After the access time of the chip has elapsed, the data from that address appears on the bus data lines. If writing, the address lines of the line to be written to are driven and the chip is enabled together with the write signal. The data on the bidirectional data bus is then written in to this location.

You can assume that data can be read from or written to memory within one clock cycle if the clock period = 400nsecs).

clk reset

Bus_D(15:0)

16 bit bidirectional data bus

D(15:0)

BusClk

Bus_A(15:0)

STUMP

PROCESSOR

Ram_Cs

Bus_Rd

GSR

Bus_Wr

A(15:0) cs oe wr

Data(15:0)

Addr(15:0) cs

MEMORY oe wr dump reload dump reload

FIGURE 5.2 The processor system

41

42

Chapter 6

Programming the STUMP Processor

6.1 Introduction

The STUMP processor design will be tested by supplying a program in binary form for loading into the memory model. An assembler, the SASM

1 assembler, has been produced which can be used to create a binary program from an assembly language program. The SASM assembler is described in the following section. Section 6.3

shows how the eight basic instructions described in Chapter 5 can be used to synthesise a further fifteen instructions.

6.2 Using the SASM assembler

6.2.1 Usage:

sasm <filename.s>

The input file is parsed on line by line basis. Each line should contain a single instruction or assembler directive or full line comment. Three output files are produced

<filename.mem> contains code suitable for loading into the processor memory model using the loadmem.sh command as detailed below.

<filename.hex> contains code suitable for down-loading into the memory on the

Xilinx board

<filename.bin> contains contains the binary of the assembled code

A Verilog memory model is used and this can only read from a file named

xc4000mem.ram and dumped to a file named xc4000mem.dump. To create the

xc4000mem.ram file from the <filename.mem> created by sasm, use

loadmem.sh <filename.mem>

Thus, for example sasm test1.s

loadmem.sh test1.mem

will create a memory file xc4000mem.ram of the assembler comprising test program

1.

1.The assembler was written by Andrew Bardsley who also devised the original STUMP architecture

43

6.2.2 Assembler Instruction Format

The format of instruction lines is:

[<label>[:]] <instruction name> <operands>

The label must begin in column 1 of the line and can optionally be terminated by a colon. It is valid to omit the label and also to place the first character of the instructionname in column 1.

Labels consist of one of the characters [a-zA-Z_] followed by any number of the characters [a-zA-Z_0-9]. Labels may not be any of the following reserved words (in either upper or lower case, although a mixed case version (e.g. Nop) of any of these keywords is a valid label): adc add adcs adds align and ands asr bal bcc bcs beq bge bgt bhi bhs ble blo bls blt bne bnv bmi bpl bvcbvs cmp data equ idem include ld mov movs nop or org ors pc r0 r1 r2 r3 r4 r5 r6 r7 ror rrc sbc sbcs st sub subs

Register Names

Valid register names are (pc is an alias for r7): r0 r1 r2 r3 r4 r5 r6 r7 pc

Mnemonics

Instruction mnemonics are listed below by instruction type. Instructions names that end in an “s” have the effect of setting the condition code bits based on the result of the instruction.

<shift> is one of ror, asr, rrc and indicates that value of <src_reg1> or <offset_reg> is affected by the specified shift operation before carrying out the specified operation

<expr> is a value in the range -16 to +15. An error is reported if the value is out of range.

Expressions

Expressions are similar to expressions in C. Supported operators are:

+ - * / % & | ^ << >> - ~ ()

6.2.3 Diadic arithmetic/Logical Instructions

Valid Instructions adc, adcs, add, adds, and, ands, or, ors, sbc, sbcs, sub, subs

Instruction Formats

<instruction> <dst_reg>, <src_reg1>, <src_reg2>

<instruction> <dst_reg>, <src_reg1>, <src_reg2>, <shift>

<instruction> <dst_reg>, <src_reg1>, #<expr>

44

6.2.4 Load/Store Instructions

Valid Instructions ld st

Instruction Formats

<instruction> <src/dest_reg>, [<base_reg>, <offset_reg>]

<instruction> <src/dest_reg>, [<base_reg>, <offset_reg>, shift]

<instruction> <src/dest_reg>, [<base_reg>, #<expr>]

<instruction> <src/dest_reg>, [<base_reg>, <label>]

In the last form of the instruction, the offset is calculated by subtracting the address of the label <label> from the current instruction. It is used with <base_reg> = r7 to allow pc-offset index addressing. For examples, see test programs code.

6.2.5 Branch instructions

Valid Instructions bal bnv bhi bls bcc bhs bcs blo bne beq bvc bvs bpl bmi bge blt bgt ble

Instruction Format

<instruction> <label>

<label> is translated into an offset from the address of the next instruction and must be in the range of -127 to 128 from the current address.

6.2.6 Instruction Aliases

Some common instructions, not visible in the basic instruction set, are available as aliases: nop cmp <src1_reg>, <src2_reg> cmp <src1_reg>, <src2_reg>, <shift> cmp <src1_reg>, #<expr> mov <dst_reg>, <src_reg> mov <dst_reg>, <src_reg>, <shift> mov <dst_reg>, #<expr>

Similar movs instructions are also allowed.

6.2.7 Assembler Directives

org

[<label>[:]] org <expr>

Set the current program address to the value of <expr>. An error is reported if this expression evaluates to less than zero or greater than 65535. The optional <label> is

45

assigned to the new address. The value of <expr> must be resolvable in the 1st pass of the assembler.

equ label[:] equ <expr>

Bind the value from evaluating <expr> to the identifier <label>. The value of <expr> can take any 32 bit value but must be resolvable in the 1st pass of the assembler.

data

[<label>[:]] data <list of data_items>

Inserts constants at the current program address onwards. <list of data_items> is a comma separated list of the elements:

<expr> Any expression. An unadorned expression is truncated to a 16 bit value and occupies a single word. Expressions with a suffix .b, .w or .s, .l represent a byte, a word, and a long word respectively. Long words are stored as two words in a little-endian format. Byte expressions are packed two to a word, least-significant byte first.

“<string>” Any sequence of characters except “ stored as signed 6-bit values unless the string is suffixed with .b in which the characters are byte packed as above align

[<label>[:]] align .w

[<label>[:]] align .s

[<label>[:]] align .l

[<label>[:]] align <expr>

The program counter is aligned to the nearest word (.w and .s) or long word (.l) or

<expr> words. The last form is useful reserving a block of memory. Word alignment is only useful between data statements which are byte packed. Using a label with a data statement has the side-effect of word-aligning the first data element.

Comments

[<instruction or directive>] ;< comment>

Any line can be appended with a comment. However only comments that start in column 1 are echoed to the listing file. Other comments are discarded.

Constants

The ‘C’ form of constants are allowed with the addition of binary constants which are introduced by ‘0b’

46

6.3 Extending the instruction set

There are 8 basic instructions + the modifiers that affect operand-A shifting and the conditional updating of condition codes. Other instructions can be synthesised from the basic instruction set. Note only NOP, MOV and CMP are recognized by the assembler.

NOP

MOV ra, rb

CMP ra, rb

ASL ra

LSL ra

RLC ra

ROL ra

LSR ra

CCF

SCF

NEG ra

CPL ra

BL

RL

RET

ADD r0, r0, r0

ADD ra, rb, r0

SUBS r0 , ra, rb

ADDS ra, ra, ra

ADDS ra, ra, ra

ADCS ra, ra, ra

ADDS ra, ra, ra

ADC ra, ra, r0

ANDS r0, r0, r0

ADD ra, ra, r0 rrc

ANDS r0, r0, r0

ADD r1, r0, #1

ANDS r1, r1, r0, asr

SUB ra, r0, ra

AND r0, r0, r0

SBC ra, r0, ra

ADD r5, pc, #1

BAL <label>

ADD pc, r5, #0

ADD r6, r6, #-1

LD r5, [r6, #0]

ADD pc, r5, #1

No-Op: do nothing

Move register rb to ra

Compare registers ra and rb

Arithmetic Shift Left

Logical Shift Left

Rotate Left through Carry

Rotate Left

Logical Shift Right

Clear Carry Flag

Set Carry Flag

2’s complement of ra

1’s complement of ra

Branch & Link for leaf procedures r5 is link register = ret address

Return from link

General return

Table 6.1: Extending the Instruction Set

47

48

Appendix A

Verilog Top Level Behavioural Model of the STUMP Processor

Introduction

The Verilog listing of the processor module, processor, at the top level is given below.

It is a behavioural or the programmer’s view of its operation. The algorithmic code describes a fetch instruction phase, an execute phase where the instruction is decoded and the arithmetic/logical operation specified is performed, and a writeback phase where the result computed during the execute phase is used. The ALU result is either written back to the Rgister Bank. If used as a memory address, data can either be stored to memory from the Register Bank or loaded from memory into the Register

Bank depending on the instruction. The code given to students is incomplete and the task in the second execise is to complete the high level code for the processor so that the model runs the given test programs successfully. The instruction fetch and the code for the writeback is complete and should not be altered by students. However, the code for the Execute phase is missing and this is the code which needs to be added by students.

The code for the high level model calls functions and tasks. These are listed after high level model. Functions and tasks are passed parameters and perform some operation.

Functions return a result. Tasks are similar to procedures in that they operate on parameters and can modify them during task execution. Both functions and tasks may declare local variables to assist with their operation.

N.B. If the code stored in the files differs from that shown here, it should be assumed that the stored code is correct.

49

// Verilog HDL for STUMP processor “processor_v” module processor (BUS_A, RAM_CS, BUS_RD, BUS_WR, BUS_D, BUSCLK,

GSR);

//processor to memory signals :

output [15:0] BUS_A; //address bus

output RAM_CS;// memory chip select

output BUS_RD;// memory read

output BUS_WR;// memory write

inout [15:0] BUS_D; // data bus

// processor signals

input BUSCLK; //processor clock

input GSR; // reset signal to processor

reg [15:0] D_OUT;

reg [15:0] BUS_A;

reg RAM_CS, BUS_RD, BUS_WR;

reg [15:0] INSTR;

reg [15:0] REG_BANK [7:0];

reg [15:0] RD_A, ALUA, ALUB, S;

reg [3:0] CC ;

reg [15:14] C;

reg CSH;

wire [15:0] PC;

50

assign BUS_D = D_OUT;

assign PC = REG_BANK[7]; // PC is an alias for REG_BANK[7]

// Used for debug only - do not use PC in your code

always

begin

if (GSR == 0)

begin

// Fetch State

Memory_Read(REG_BANK[7], INSTR); // Get instr pointed to by PC

REG_BANK[7] = REG_BANK[7] + 1; // add +1 to PC as soon as instr fetched

// Execute State

if(INSTR[15:13] == 3’b111) // branch instruction

begin

//

// put your code here to form ALU inputs ALUA and ALUB

// note that the ALUA input is the output from the shifter

//

end

else if(INSTR[12] == 1’b1) // type 2 instruction

begin

//

// put your code here to form ALU inputs ALUA and ALUB for type 2 instrs

//

51

end

else // type 1 instruction

begin

//

// put your code here to form ALU inputs ALUA and ALUB

//

end

// op decode

case (INSTR[15:13])

0 : Add(ALUA, ALUB, 1’b0, S, C); //add instr done for you

//

// put your code here to form ALU result S and carry bits C14 and C15 if needed

//

endcase

//

// put your code here to update the condition code register.

//

// Write state

case (INSTR[15:13])

3’b111 : if (Testbranch(INSTR[11:8], CC) == 1) REG_BANK[7] = S;

3’b110 : if (INSTR[11] == 1) Memory_Write(S, REG_BANK[INSTR[10:8]]);

else

begin

Memory_Read(S, REG_BANK[INSTR[10:8]]);

REG_BANK[0] = 0;

end

52

default : begin REG_BANK[INSTR[10:8]] = S; REG_BANK[0] = 0; end

endcase

end

else // reset state

begin

RAM_CS = 1;

wait (GSR == 0)

begin

REG_BANK[7] = 0;

REG_BANK[0] = 0;

CC = 0;

end

end

end // end of always

// start of tasks and functions ////////////////////////////////////////////////////////////////////////////

task Memory_Write;

// writes data on DMW to memory address AMW

input [15:0] AMW, DMW;

begin

RAM_CS = 0; BUS_RD = 1; BUS_WR = 1; BUS_A = AMW; D_OUT = DMW ;

#25 BUS_WR = 0;

#50 BUS_WR = 1;

#25 RAM_CS = 1;

53

end

endtask

task Memory_Read;

//reads memory address AMR and places data on DMR

input [15:0] AMR;

output [15:0] DMR;

begin

RAM_CS = 0; BUS_RD = 1; BUS_WR = 1; BUS_A = AMR; D_OUT =

16’hzzzz ;

#25 BUS_RD = 0;

#50 DMR = BUS_D ;

#25 BUS_RD = 1; RAM_CS = 1;

end

endtask

task Add;

//adds a 1-bit carry in Cin to two 16-bit quantities A and B

// produces 16-bit sum S and carry out C from addition in bits 14 and 15

input [15:0] A, B;

input CIN ;

output [15:0] S; output [15:14] C;

reg [16:0] RESULT;

54

begin

RESULT = A[14:0] + B[14:0] + CIN ;

C[14] = RESULT[15];

RESULT[16:15] = A[15] + B[15] + C[14];

S = RESULT[15:0];

C[15] = RESULT[16];

end

endtask

task Shift;

//shifts input A and a carry in Cin according to shift type INSTR[1:0]

// produces shifter output ASH and shifter carry out CSH

input [15:0] A;

input [1:0] INSTR;

input CIN;

output [15:0] ASH;

output CSH;

begin

case (INSTR)

0 : begin CSH = 0; ASH = A ; end

1 : begin CSH = A[0]; ASH = A >> 1 ; ASH[15] = A[15] ; end

2 : begin CSH = A[0]; ASH = A >> 1 ; ASH[15] = A[0] ; end

3 : begin CSH = A[0]; ASH = A >> 1 ; ASH[15] = CIN ; end

endcase

55

end

endtask

function Testbranch;

//compares branch condition INSTR[11:8] with cond code reg

// returns ‘1’ if branch to be taken, returns ‘0’ if jump not taken

input [11:8] BRANCH_INSTR;

input [3:0] CC;

reg N, Z, V, C;

begin

{N,Z,V,C} = CC;

case (BRANCH_INSTR)

0 : Testbranch = 1 ;

1 : Testbranch = 0 ;

2 : Testbranch = ~(C | Z);

3 : Testbranch = C | Z ;

4 : Testbranch = ~C ;

5 : Testbranch = C ;

6 : Testbranch = ~Z ;

7 : Testbranch = Z ;

8 : Testbranch = ~V ;

9 : Testbranch = V ;

10 : Testbranch = ~N ;

11 : Testbranch = N ;

12 : Testbranch = V ~^ N ;

56

13 : Testbranch = V ^ N ;

14 : Testbranch =~((V ^ N) | Z) ;

15 : Testbranch = ((V ^ N) | Z) ;

endcase

end

endfunction endmodule

57

58

Appendix B

Verilog Control Description

The following is very incomplete Verilog code for the RTL description of a non-pipelined control unit for the STUMP.

It is your task in exercise 5 to complete the code required.

// Stump Control unit

// Original:ADP 9/5/06

// Last modified:ADP 11/5/06 module control ( LDCC, LDREG, EXE, BR, FETCH, IMMED, MEMOE, MEMWR,

SALUD, SRPA, SRPB, SWP, CC, CLK, INSTR, RESET );

//--------Input ports-------input CLK, RESET; input [15:2] INSTR; input [3:0] CC;

//-------Output ports-------output BR, FETCH, EXE, IMMED, LDCC, LDREG, MEMOE, MEMWR, SALUD; reg BR, FETCH, EXE, IMMED, LDCC, LDREG, MEMOE, MEMWR, SALUD; output [2:0] SRPA, SRPB, SWP; reg [2:0] SRPA, SRPB, SWP;

//--------Internals--------reg [1:0] state;

59

// Control of finite state machine always @ (posedge CLK)

begin if (RESET == 1) state = 0; else if (state == 3) state = 1; else state = state + 1;

end

// Control of state driven combinatorial logic always @ (state, INSTR, CC)

begin case (state)

0: // Reset cycle

begin

// put your code here for signals set in reset phase

// put your code here for don’t care signals in the reset state

end

1: // Fetch cycle

// Fetches instruction from memory, loads into

// instruction register & increments program counter

begin

// put your code here for signals that need to be set in the fetch phase

// put your code here for don’t care signals in the fetch state

end

2: // Execute cycle

// Instruction is decoded and executed

60

// Instruction may be: load/store (Type I or Type II)

// or a Branch

// or an ALU operation (Type I or Type II) begin

end

// you need to decode the instruction and then set signals appropriately

// to a value or don’t care for that instruction

3: // Write back cycle begin

// Instruction is decoded to determine whether

// the output of the ALU should be written back

end

// to memory or register bank or discarded and signals are set to

//a value or don’t care appropriately end default: // All signals set to ’don’t care’ begin

MEMWR = ’bx; MEMOE = ’bx;

FETCH = ’bx; EXE = ’bx; LDREG = ’bx; endcase end

LDCC = ’bx; IMMED = ’bx; BR = ’bx; SALUD = ’bx;

SRPA = 3’bxxx; SRPB = 3’bxxx; SWP =3’bxxx;

function Testbranch; //returns ‘1’ if branch taken, ‘0’ otherwise

input [11:8] BRANCH_INSTR;

input [3:0] CC;

reg N, Z, V, C;

61

begin

{N,Z,V,C} = CC;

case (BRANCH_INSTR)

0 : Testbranch = 1 ; // BAL

1 : Testbranch = 0 ; // BNV

2 : Testbranch = ~(C | Z); // BHI

3 : Testbranch = C | Z ; // BLS

4 : Testbranch = ~C ; // BCC

5 : Testbranch = C ; // BCS

6 : Testbranch = ~Z ; // BNE

7 : Testbranch = Z ; // BEQ

8 : Testbranch = ~V ; // BVC

9 : Testbranch = V ; // BVS

10 : Testbranch = ~N ; // BPL

11 : Testbranch = N ; // BMI

12 : Testbranch = V ~^ N ; // BGE

13 : Testbranch = V ^ N ; // BLT

14 : Testbranch =~((V ^ N) | Z) ; // BGT

15 : Testbranch = ((V ^ N) | Z) ; // BLE

endcase

end

endfunction endmodule

62

Appendix C

STUMP Processor : RTL Design

This appendix describes the RTL (Register Transfer Level) design of the processor. It contains an RTL schematic of the processor showing its component parts. The Bus

Interface and datapath components have already been designed for you. Your task in

Ex. 5 is to complete the Verilog specification of the Control Block. The entire processor design will then be complete and can be simulated with the same test program as was used for the top level design.

1.0 RTL Datapath Design:

The schematic on the next page shows a RTL schematic of the STUMP processor. The address for the instruction fetch is kept in the Program Counter (Reg7). This is sent to memory when a fetch is performed. Instructions from memory are clocked in to the

Instruction Register (INSTR) at the end of the clock phase (on the positive clock edge); the Program Counter is also incremented at this time.

Instruction execution is split in to two phases. In the execute phase, a register (Reg0 to

Reg7) is read on port A and B of the Register Bank. The port A data is optionally shifted to form the ALU A operand. The ALU B operand is either supplied by port B of the Register Bank or by the immediate data in the instruction register which is sign extended to 16 bits. The ALU operation specified in the instruction is performed and its result is clocked in to the Result Register (ALUR) at the end of the execute phase

(on the positive clock edge). The condition bits are also clocked in to the 4-bit

Condition Code Register (CC) at this time if it is enabled.

The writeback phase completes the instruction execution. Usually the ALU result is written back to the Register Bank. Here, the register to be written is specified as a 3-bit address (Reg0 to Reg7) and this register is enabled by the Write signal (LDREG).

Since Reg0 contains zero, writing to Reg0 has no effect. The write occurs at the end of the phase (on the positive clock edge). A branch instruction which is not taken can write to Reg0 instead of Reg7 (or can make the write enable to the Register Bank inactive).

Load and store instructions operate differently in their writeback phase. Here, the ALU result is used as the memory address, MA(15:0). For a load instruction, the memory is read (MEMOE = ‘0’ i.e. is active) and the memory output, MDIN(15:0), placed in the specified register in the Register Bank. For a store instruction, the destination register specified in the instruction is read onto Port A and written to memory (MEMWR =

‘0’) at the address given by the ALU Result Register.

Notes: 1. The clock is applied to all registers at all times but the clock is ignored unless the register is enabled.

63

DATA IN

(FROM MEMORY)

MDIN

FETCH

SRPA

SRPB

SWP

LDREG

DATA OUT

(TO MEMORY)

SALUD

0 1

MUX

WR_D incPC

A

REG BANK

B

B_RD

A_RD

PC

SHIFTER

ALUA

0 1

MUX

ALUB

FETCH

INSTR REG

SIGN

EXTEND

XIMMED

IMMED

BR

ALU

LDCC

CCI

EXE

CC REG

CC

FETCH

RESULT REG

ALUR

PC

0 1

MUX

MA

MEMORY ADDRESS MEMOE MEMWR

Figure C.1. Register Transfer Level Design of the STUMP Processor

64

2. The control signals to the memory are all active low.

3. Signals names have been defined for the Control Block. Please also use these names in your Verilog code.

4. Although Verilog is case sensitive with regard to signal names, other tools in the flow are not. Hence, you should be consistent with the use of upper/lower case in your names and signal names should be unique.

2.0 Bus Interface Component:

All processor signals to/from memory or the outside world proceed via the Bus Interface Component. The Bus Interface signals to/from memory or other external devices are shown in Figure C.2. and are described below: to/from

Processor to/from

Memory

A_RD<15:0>

MDIN,15:0>

MA<15:0>

MEMOE

MEMWR

CLK

RESET

Bus Interface

BUS_D<15:0>

BUS_A<15:0>

BUS_RD

BUS_WR

BUS_CLK

GSR

BUS INTERFACE SIGNALS TO/FROM MEMORY AND OUTSIDE WORLD

BUS_D(15:0) bidirectional bus between the Bus Interface and the memory

BUS_A(15:0) 16-bit address to memory

BUS_WR active low signal which writes to memory

BUS_RD active low signal which reads from memory

RAM_CS active low signal which enables the memory. It is tied low so the memory is enabled all the time.

BUSCLK clock generated by a Clock Module

GSR global reset

Apart from the clock, the signals above are generated either by the datapath elements or the Control Block. They are listed below:

DATAPATH SIGNALS TO BUS INTERFACE

MA(15:0) 16-bit address to Bus Interface

MDIN(15:0) 16-bits of data from Bus Interface

65

A_RD(1:0) 16-bits of data to Bus Interface

CONTROL BLOCK SIGNALS TO BUS INTERFACE

MEMWR active-low signal to Bus Interface when memory is to be written

MEMOE active-low signal to Bus Interface when Memory is to be read

BUS INTERFACE TO DATAPATH &CONTROL BLOCK

CLK the clock signal which goes to all flip flops in the processor.

RESET reset signal - high when active

3.0 Control Block Signals:

Apart from the RESET signal which is a global signal applied to both the datapath and Control Block prior to operating the processor, the remaining signals to/from the Control Block are internal to the processor. They can be partitioned into signals to and from the datapath, and they are summarised below:

DATAPATH SIGNALS TO CONTROL BLOCK

INSTR[15:2] the most significant 14 bits of the Instruction Register

CC[3:0] the 4-bit Condition Code Register

CONTROL BLOCK SIGNALS TO DATAPATH

BR indicates a branch instruction. It is used by the Sign Extender element to extend the least significant 8-bits while 5 bits are extended if BR is low.

FETCH indicates an instruction fetch is occurring

EXE indicates the Execute phase of operation i.e. data is read from the Register

Bank, operated upon and the result placed in the Result Register.

SRPA[2:0] 3-bit address specifying the register (Reg0 to Reg7) to be read out onto port A of the Register Bank

SRPB[2:0] 3-bit address specifying the register (Reg0 to Reg7) to be read out onto port B of the Register Bank

SWP[2:0] 3-bit address specifying the register in the Register Bank (Reg0 to Reg7) to be written to

IMMED selects the sign immediate operand rather than he Register Bank port B operand as the B-input to the ALU

LDCC enables the loading of the 4-bit Condition Code Register

LDREG enables writing to a register in the Register Bank

SALUD (Select ALU Data) selects data from the Result Register for writing into the Register Bank. If SALUD is low, data from the memory (MDIN[15:0] is selected instead.

66

COMP22111 lab - Ex. 1 Answer Sheets

Name:

Assembler Code Sequence :

Initial Register States:

R0[15:0]=’0000000000000000’=0x0000=0

R1[15:0]=

R2[15:0]=

R3[15:0]=

R4[15:0]=

R5[15:0]=

R6[15:0]=

R7[15:0]=

CC[3:0]=

Memory Address of first instruction in sequence=

67

Assembler Instruction:

Memory Address of Instruction

Instruction Register

R0

R1

R2

R3

R4

R5

R6

R7 = PC

CC

Result Register for a Store Instruction Data Written

(decimal) binary hex

0x

0x

0x

0x

0x

0x

0x

0x

0x

0x

0x

0x

Memory Address

(decimal) decimal




R0

R1

R2

R3

R4

R5

R6

R7 = PC

CC



0x

0x

0x

0x

0x

0x

0x

0x

0x

0x

0x

0x

Memory Address

(decimal) decimal

68




R0

R1

R2

R3

R4

R5

R6

R7 = PC

CC



0x

0x

0x

0x

0x

0x

0x

0x

0x

0x

0x

0x

Memory Address

(decimal) decimal




R0

R1

R2

R3

R4

R5

R6

R7 = PC

CC



0x

0x

0x

0x

0x

0x

0x

0x

0x

0x

0x

0x

Memory Address

(decimal) decimal

69




R0

R1

R2

R3

R4

R5

R6

R7 = PC

CC



0x

0x

0x

0x

0x

0x

0x

0x

0x

0x

0x

0x

Memory Address

(decimal) decimal




R0

R1

R2

R3

R4

R5

R6

R7 = PC

CC



0x

0x

0x

0x

0x

0x

0x

0x

0x

0x

0x

0x

Memory Address

(decimal) decimal

70




R0

R1

R2

R3

R4

R5

R6

R7 = PC

CC



0x

0x

0x

0x

0x

0x

0x

0x

0x

0x

0x

0x

Memory Address

(decimal) decimal




R0

R1

R2

R3

R4

R5

R6

R7 = PC

CC



0x

0x

0x

0x

0x

0x

0x

0x

0x

0x

0x

0x

Memory Address

(decimal) decimal

71




R0

R1

R2

R3

R4

R5

R6

R7 = PC

CC



0x

0x

0x

0x

0x

0x

0x

0x

0x

0x

0x

0x

Memory Address

(decimal) decimal




R0

R1

R2

R3

R4

R5

R6

R7 = PC

CC



0x

0x

0x

0x

0x

0x

0x

0x

0x

0x

0x

0x

Memory Address

(decimal) decimal

72




R0

R1

R2

R3

R4

R5

R6

R7 = PC

CC



0x

0x

0x

0x

0x

0x

0x

0x

0x

0x

0x

0x

Memory Address

(decimal) decimal




R0

R1

R2

R3

R4

R5

R6

R7 = PC

CC



0x

0x

0x

0x

0x

0x

0x

0x

0x

0x

0x

0x

Memory Address

(decimal) decimal

73




R0

R1

R2

R3

R4

R5

R6

R7 = PC

CC



0x

0x

0x

0x

0x

0x

0x

0x

0x

0x

0x

0x

Memory Address

(decimal) decimal




R0

R1

R2

R3

R4

R5

R6

R7 = PC

CC



0x

0x

0x

0x

0x

0x

0x

0x

0x

0x

0x

0x

Memory Address

(decimal) decimal

74

Copy of Signal Usage Charts

(for your use in Exercise 5)

ALU

Operation Branch Load

FETCH

EXE

LDREG

SRPA[2:0]

SRPB[2:0]

SWP[2:0]

BR

IMMED

LDCC

SALUD

MEMOE

MEMWR

Phase 1 : Instruction Fetch

Store

FETCH

EXE

LDREG

SRPA[2:0]

SRPB[2:0]

SWP[2:0]

BR

IMMED

LDCC

SALUD

MEMOE

MEMWR

ALU Operation

Reg op Reg

⇒

Reg

Branch Load Store

Reg op Immed

⇒

Reg

PC + Immed

⇒

PC

Reg + Reg

⇒

Addr;

[Addr]

⇒

Reg

Reg + Immed

⇒

Addr;

[Addr]

⇒

Reg

Reg + Reg

⇒

Addr;

Reg

⇒

[Addr]

Reg+Immed

⇒

Addr;

Reg

⇒

[Addr]

Phase 2 : Decode/Execute

75

ALU Operation Branch Load Store

RegopReg

⇒

Reg

Reg op Immed

⇒

Reg

PC + Immed

⇒

PC

Reg+Reg

⇒

Addr;

[Addr]

⇒

Reg

Reg+Immed

⇒

Addr;

[Addr]

⇒

Reg

Reg+Reg

⇒

Addr;

Reg

⇒

[Addr]

Reg+Immed

⇒

⇒

Addr;

Reg

⇒

[Addr]

FETCH

EXE

LDREG

SRPA[2:0]

SRPB[2:0]

SWP[2:0]

BR

IMMED

LDCC

SALUD

MEMOE

MEMWR

Phase 3 : Writeback

Note: in Signal Usage charts, use

δ for ‘don’t care’.

76

Name:

COMP22111 - Ex. 4 Signal Usage

Charts

ALU

Operation Branch Load

FETCH

EXE

LDREG

SRPA[2:0]

SRPB[2:0]

SWP[2:0]

BR

IMMED

LDCC

SALUD

MEMOE

MEMWR

Phase 1 : Instruction Fetch

Store

FETCH

EXE

LDREG

SRPA[2:0]

SRPB[2:0]

SWP[2:0]

BR

IMMED

LDCC

SALUD

MEMOE

MEMWR

ALU Operation

Reg op Reg

⇒

Reg

Branch Load Store

Reg op Immed

⇒

Reg

PC + Immed

⇒

PC

Reg + Reg

⇒

Addr;

[Addr]

⇒

Reg

Reg + Immed

⇒

Addr;

[Addr]

⇒

Reg

Reg + Reg

⇒

Addr;

Reg

⇒

[Addr]

Reg+Immed

⇒

Addr;

Reg

⇒

[Addr]

77

Phase 2 : Decode/Execute

ALU Operation Branch Load Store

RegopReg

⇒

Reg

Reg op Immed

⇒

Reg

PC + Immed

⇒

PC

Reg+Reg

⇒

Addr;

[Addr]

⇒

Reg

Reg+Immed

⇒

Addr;

[Addr]

⇒

Reg

Reg+Reg

⇒

Addr;

Reg

⇒

[Addr]

Reg+Immed

⇒

⇒

Addr;

Reg

⇒

[Addr]

FETCH

EXE

LDREG

SRPA[2:0]

SRPB[2:0]

SWP[2:0]

BR

IMMED

LDCC

SALUD

MEMOE

MEMWR

Phase 3 : Writeback

Note: in Signal Usage charts, use

δ for ‘don’t care’.

78