School of Computer Science
University of Manchester
September 2010
1
2
Laboratory Aims
• To learn about the process of designing on silicon by ‘doing’.
• To design a simple RISC processor, comprising the datapath and control, at all levels of the design hierarchy ranging from a high level specification down to the Register Transfer Level (RTL).
• To have exposure to industry standard CAD tools.
• To use assembler test programs which can be used to test all levels of the design.
• To simulate the complete processor at the different levels of the design hierarchy.
• To support the lectures with a practical example.
Learning Outcomes
After completing the laboratory a student will
• be able to specify functionality in the Verilog hardware description language
• gain experience of the different stages of the VLSI design process down to the RTL level.
• gain experience of the composition and running test programs, and checking their results
• be able to use the Cadence CAD tool to find hardware/software errors
• gain experience of the appropriate CAD tools available for use at the different stages of the design process.
VLSI Systems Design requires inspiration and imagination as well as a sound technical background. Most of the technical background can be imparted by means of lectures, but when it comes to design there is no substitute for experience. We believe that, in the words of Albert Camus, “You cannot create experience - you must undergo it”. The COMP22111 course has therefore been structured so that the technical background is covered in a taught course consisting of lectures, while the design process itself is taught by means of a design project in laboratory classes. In the laboratory course students design a small ASIC (Application Specific Integrated Circuit) which, if completed successfully, could be implemented on silicon. This experience should give some feeling for the trials, tribulations and satisfactions of designing systems on silicon.
3
The objective of the project is to learn some of the methodology of VLSI design by carrying out the design of a small RISC processor. It is also intended to help students appreciate and understand the operation and architecture of a RISC processor. The
STUMP processor has been fully specified by D. A. Edwards and parts of it have been designed. Thus students should regard themselves as members of a design team whose job it is to do a significant part of the design; you will complete a partially done design and simulate the whole design.
In contrast to design work carried out in previous Computer Science laboratories the work will not consist of designing gate-level circuits - the emphasis in this project is rather on “systems on silicon”. The work starts with the high level behavioural modelling of the chip and proceeds to Register Transfer Level, where again behavioural modelling is used. Tools are available to automatically translate a RTL description into logic and then to generate layout of the chip onto silicon; this will be described in lectures but not performed in the lab.
The methodology described in lectures and used in the lab is typical of that adopted by many designers in industry for ASICs since only a small proportion of the ASICs produced nowadays are designed using full-custom methods.
This manual includes basic information about the laboratory organisation (chapter 2), a description of the design process (chapter 3) and stage-by-stage details of the design tasks (chapter 4). Chapter 3 is intended to give a background description of the design process while chapter 4 describes how the process is applied to the specific design project being carried out in the laboratory. The specification of the STUMP processor is in Chapter 5. Chapter 6 gives information on how to program the processor.
The appendices contain information that you will need in undertaking the design tasks.
Appendix A contains a copy of the top level (algorithmic) Verilog behavioural description of the processor which is to be completed as part of the second exercise. Appendix
B contains a (very) incomplete Verilog description of the Control for the STUMP at the RTL design level. Appendix C shows the RTL design of the STUMP datapath which has been done for you.
Answer Sheets which students fill in and hand in for laboratory exercises 1 and 4 are to be found following the appendices.
The emphasis of the manual is on “how to do it” and it does not attempt to give a comprehensive account of the many different facets of chip design. A fuller picture should emerge when the design work is taken together with the taught course material.
References:
For a description on the top down design approach see chapter 3 of this manual. For a text on top down design using Verilog see T. R. Padmanabhan and B. Bala Tripura
Sundari Design Through Verilog HDL Wiley-IEEE Press, 2003.
4
Schedule:
The COMP22111 lab comprises eight 2-hour sessions in weeks 3 to 11 excluding reading week (week 6). There will be 3 lectures a week in weeks 1, 2 and 12 when there is no formallab. The schedule of lectures and labs in 2010 is shown in Table
2.1.
TABLE 2.1. Schedule for Lectures and Lab in 2010
Semester
Week wk 1 wk 2 wk 3 wk 4 wk 5 wk 6 wk 7 wk 8 wk 9 wk10 wk 11 wk 12
Lab in Tootill 1
Tuesday 10:00-12:00
Lectures in 1.4
Tuesday 10:00-12:00
-
-
Oct 12 (Ex 1)
Oct 19(Ex 2)
Oct 26 (Ex 2)
Sept 28 (Lects 1, 2)
Oct 5(Lects 4, 5)
-
-
Nov 9(Ex 3)
Nov 16 (Ex 3)
Nov 23 (Ex 4)
Nov 30(Ex 5)
reading week
-
-
-
-
Dec 7(Ex 6)
Dec 14 (written work/code hand-in deadline 12:00)
Dec 15 (demo deadline 15:00)
-
Dec 14 (Lects 15, 16)
Lecture in 1.3
Thu 14:00-15:00
Sept 30 (Lect 3)
Oct 7 (Lect 6)
Oct 14 (Lect 7)
Oct 21 (Lect 8)
Oct 28 (Lect 9)
Nov 11 (Lect 10)
Nov 18 (Lect 11)
Nov 25 (Lect 12)
Dec 2 (Lect 13)
Dec 9 (Lect 14)
Dec 16 (Lect 17)
The final deadline for handing in written work or code for marking is 12:00 on
Tuesday December 14th in week 12. The final deadline for demonstrating work is on the afternoon of Wednesday December 15th. Students wishing to demonstrate work must put their names on a list between 14:00 and 15:00. Names will be taken randomly from the list and students given one opportunity to demonstrate their work. Note no work will be accepted or demonstrated after the deadlines unless the student concerned has a lab mark of less than 40%.
The project:
The lab work consists of designing and testing a simple 16-bit RISC processor down to the Register Transfer Level.
Preparation:
Preparation outside the timetabled laboratory classes is necessary and expected.
Students who wish to make good progress in the laboratory time when help is available should not only read the relevant material for each week before coming to the
5
lab but should also do further work on stages of the design outside this. Remember, you are expected to spend the same amount of time on preparation as you spend in the scheduled lab time. In addition, the lab work and lectures are closely integrated, so important and useful information about lab exercises is given in lectures; so attendance at lectures is closely linked to good progress in the lab!.
Deadlines:
The exercise is divided into a number of stages with deadlines as indicated in Table
2.2. The details of the deliverables for each stage are given in Chapter 4 of this manual. Due to the incremental nature of the laboratory, an extension system is not operated and you do not need to request an extension. However, to complete the project work, you should adhere to or be ahead of the deadlines given.
Marks:
This course has more labs and less lectures than other courses and the overall lab and exam mark is weighted accordingly. Students are expected to work individually and independently. Hence work resulting from collaborative efforts will result in the mark awarded for the work being equally split amongst the contributors. As the
COMP22111 lab forms a significant contribution to the overall course mark, it is in your interests to invest the time in obtaining a good lab mark!
TABLE 2.2. COMP22111 Schedule
Design Level Exercise Design Stage
Specification Read Lab Manual
Programmer especially chapter 5
Ex. 1 STUMP assembler
No. of
Sessions
-
1
Top Level
Top Level
RTL
RTL
RTL
-
-
Ex. 2
Ex. 3
Ex. 4
Ex. 5
Ex. 6
-
-
Top level model in
Verilog and entry
Simulation of top level model.
Signal usage charts
Verilog specification of control RTL
Testing RTL design deadline for written work/code hand-in deadline to sign up for demo
2
2
1
1
1
-
-
Semester
Week(s) wk 3
Oct 12 wks 4 & 5
Oct 19 &26 wks 7 & 8
Nov 9 &16 wk 9
Nov 23 wk 10
Nov 30 wk 11
Dec 7 wk 12
Dec 14 wk 12
Dec 15
Exercise handin week
Max
Mark
wk 3
Oct 12 wk 5
Oct 26 wk 8
Nov 16 wk 9
Nov 23 wk 10
Nov 30 wk 11
Dec 7 wk 12
12:00 Dec 16 wk12
15:00 Dec 15
15
25
20
10
15
15
-
-
6
This chapter describes the sequence of steps and abstractions (levels of detail) that are used in transforming a circuit requirement into a silicon layout when a semi-custom integrated circuit is being designed. A semi-custom design is one in which libraries of pre-defined gates and logic components are provided by the circuit manufacturer; the silicon layout is then carried out using automated CAD tools.
Table 3.1 shows the large number of different representations involved in the design of a semi-custom ASIC. It will be seen that the representations are divided into three domains, behavioural, structural and physical, and into six levels. The design of very large chips may include an extra level; for example there could be a ‘subsystem’ level between the ‘top’ level and the ‘chip architecture’ level. The table indicates that the structural representation consists of a series of schematic diagrams. Although most engineers prefer to work with schematic diagrams, the structure could also be described using a hardware description language (HDL).
It is important to understand that this classification into different representations and levels of detail is not the same as circuit hierarchy. A circuit hierarchy represents a cir-
cuit decomposition into successive levels of detail while Table 3.1 shows design decomposition into different levels of abstraction. The design of a small circuit with no hierarchy would still involve several levels of design abstraction.
The chip design process consists of creating a sequence of different abstractions at successively lower levels. It starts with a chip requirement and specification and proceeds until a representation of the masks required for silicon fabrication is obtained.
The solid arrows in Table 3.1 indicate the main design steps which will be described in this chapter. This sequence of steps constitutes a design methodology. The particular methodology shown is suitable for standard cell ASICs. Verification of each stage of the design is carried out by means of simulation. The test patterns needed as simulation stimuli are summarised in the rightmost column. Note how the same test patterns are used in successive levels to ensure the correct decomposition of one level to the next. The dashed arrows show where the test patterns and the behavioural models for each design stage come from.
The use of a pure top-down design methodology requires considerable experience if the effects of high-level decisions on performance and on lower level implementationare to be anticipated. In practice it is common to carry out low-level feasibility studies before finalising high level specifications and descriptions. The COMP22111 design exercise will, nonetheless, be undertaken in a straightforward top-down manner.
7
TABLE 3. 1 Design representations involved in the design of a semi-custom ASIC
The Three Design Representation Domains
Design Level Components Behavioural
TOP LEVEL Whole chip Written specification
Executable behavioural description
Structural
Schematic shows core logic connected to input and output pads
Physical
Chip architecture shown as pads, core logic outline and power distribution
CHIP
ARCHITEC-
TURE
Major functional blocks
Behaviour of functional blocks described in a
HDL
Block diagram schematic of chip shows interconnectivity of functional blocks
Floorplan shows size and shape of rectangular blocks with routing channels
Tests
Input to behavioural model should reflect all possible system conditions -> whole chip test patterns
Test for each functionalblock +test for whole chip
REGISTER
TRANSFER
(RTL)
Register,
ALU, FSM,
MUX, adder etc.
Behaviour of
RTL components described in a
HDL
Schematic diagrams of each functional block show interconnectivity of RTL components
Components represented as areas of standard cells or as blocks of special cells e.g.
RAM, PLA, datapath
Test for each
RTL block + each functional block + whole chip
LOGIC Logic gates Behaviour of gates as simulation models provided by the silicon vendor
Each RTL component is shown as a schematic of interconnected gates
Outline for each cell + interconnection tracks
Same test as for RTL
TRANSIS-
TOR
PRODUC-
TION
Transistors e.g.CMOS
Electrical models e.g.
SPICE models, of transistors used by the silicon vendor
Circuit diagrams show transistors connected to form gates
Polygons represent mask shapes used for fabrication transistors and interconnect
Tests in form of analogue waveforms
Masks or reticules of pattern for each layer of fabrication
Test patterns designed to find structural production faults
8
A chip design starts with a set of requirements from which a specification is drawn.
The specification defines precisely what the chip does - its function - not how to do it.
It is the user’s view of what the chip does.
In the real world the specification is needed to make sure that the designer and customer agree on the function of the chip, and to define the interaction, or interface, of the chip with the external system of which it forms a part. Cost and performance criteria are also a part of the requirements.
In an educational exercise there are no customer requirements to determine design constraints. The main constraint in the class context is for a design which can be completed within a limited time.
When deciding what to put in a specification and how to write it, it is useful to consider what information will be needed in the data sheet of the completed device because the two are very similar. A good summary of the main functions of a specification are:
• a summary description of what a chip does
• a list of the chip’s input and output pins
• required performance (clock rate) and power dissipation
• a list of the major modes in which the chip operates
• for each mode
• signals which control the mode
• function executed in that mode
• performance constraints on execution such as minimum and maximum times between inputs and outputs
The top-level behavioural description is written as an executable program using a suitable programming language. It provides a means of simulating the function of the chip and is much more precise than the specification which is written in English.
The program should accept inputs and compute the appropriate outputs, then wait for the next set of inputs. The way the program works does not describe how the chip itself works - all that matters is that it captures the intended function of the device and that running it checks that what is specified is what is wanted.
This high level simulation is an extremely useful step. It clarifies the specification, brings to light potential difficulties and hidden assumptions and helps identify the major internal states of the chip. It is not surprising that the functional simulation often leads to a revised specification.
9
Programming languages have been developed especially for the behavioural and structural modelling of integrated circuits; they are known as Hardware Description Languages, or HDLs. Two examples are VHDL and Verilog. Verilog is a widely used standard and is used in preference to VHDL in most CAD tools. Thus Verilog will be used in this course. Other general purpose languages, such as C, Java or C++, could also be used for the top-level behavioural modelling of a chip.
When Verilog is being used to model a whole chip, a common procedure is to connect
it to a test bench as shown in Figure 3.1. The module representing the chip contains a
model of the chip and the “Tester module” (the test bench) which emulates the external environment of the chip. During a simulation the Tester module reads some form of input from an external file and extracts data to be applied as inputs to the chip model. It also captures the chip output data and writes it to an external file. The form of the external test file will depend on the type of chip being modelled. For example if the chip is a processor the test file could be in the form of the binary representation of a program to be executed by the processor.
Verilog test bench circuit
Test file
Chip model
Chip inputs
Chip outputs
Tester module
Output file
FIGURE 3.1. A Verilog Modelling System
The form of the chip model will depend on the stage reached in the design. A purely behavioural model, which describes the function of the chip but not its internal structure, is used for the Top Level design stage. At lower levels of the design the model contains internal information usually in the form of a behavioural model describing the internal data flow and operations but it can also contain a structural (gate) description. The same test bench and test program should be used at all levels of the design to ensure that each level of the design decomposition carries out exactly the same function as the top level description.
Having decided on the overall chip function, the design is partitioned into major functional blocks. For example a processor chip might be partitioned into input and output
10
interface blocks, a datapath, RAM, control etc. The means by which data is transferred between blocks must also be decided. This will usually be in the form of a bus structure.
The structure of the architecture level of design can be captured as a schematic diagram or described using a HDL. The design is then simulated using behavioural models of each of the functional blocks written using a HDL.
In a small design this level of design may well be omitted, or a simple block diagram might be produced as an intermediate stage but no simulation carried out.
A register-transfer system is specified as a set of memory elements (e.g. registers) and
combinational logic functions between the memory elements as shown in Figure 3.2.
The basic memory elements used in student designs are usually D-type edge triggered flip-flops. All operations in an RTL system take place between clocked registers.
D Q register clk combinational logic
D Q clk combinational logic
D Q clk combinational logic
FIGURE 3.2. A register-transfer system
On each active clock-edge data is clocked from the D inputs of the flip-flops (FFs) to
the Q outputs which form the inputs to the following block(s) of logic; see Figure 3.2.
After a short delay the outputs of each CL block change as a result of the change to the block inputs.
The elements in the RTL design are usually represented as boxes, or blocks, in a block diagram which shows the interconnections between the blocks. The internal logic structure of the combinational logic is not defined at this stage but the function, or behaviour, is described as a model which can be used in a simulation of the RTL design.
11
Thus it can be seen that a register-transfer design gives a complete specification of what the chip will do on every clock cycle.
Students may already have come across most typical combinational RTL elements in earlier courses: adders, multiplexers, comparators, ALUs etc. In addition to these there will be ‘designer’ elements i.e. blocks of random logic designed to carry out arbitrary combinations of functions not included in standard libraries; the combinational logic block of a FSM (Finite State Machine) will be of this type.
Sequential elements consist of either straightforward storage registers - a set of D-type flip-flops for example, or more complex assemblages such as counters or state machines. Counters and FSMs contain combinational logic in addition to memory elements. Thus the separate combinational logic and register blocks of an RTL block diagram will not always be obvious because some of the RT structure is hidden within these more complex blocks. However each block in the diagram should only contain one register.
At Register Transfer Level the operation of the circuit is described as operations between clocked registers, where each clocking of registers corresponds to moving from one state to another. In the COMP22111 exercise, the RTL design of the datapath has been done for you and is shown in Appendix C.
If you need to do an RTL datapath design then one starting point is to summarise the design in a diagram which shows both the major states of the circuit and the operations. The ASM (Algorithmic State Machine) diagram is such a diagram. It is a form of flow chart in which states are represented by rectangular boxes and decisions by
angled boxes as shown in Figure 3.3. Note the “two-way” and “multi-way” decision
boxes. The operations to be carried out within a state are written inside the state box.
Most first-time designers find it difficult to know where to start on decomposing a behavioural description of a chip into a RTL design and, having made a start, go through many cycles of trial and error before arriving at a satisfactory design. This is because, even in a small design, there are many different ways in which events can be scheduled and functions allocated to different blocks.
The following three-stage (and many-step) procedure can be used by first-time designers. Stages 1 and 2 are carried out on paper; the design is then transferred to the CAD system in Stage 3.
12
The circuit specification:
The circuit is a 4-bit triangle waveform generator. The output ramps from 0 to 15 then from 15 to 0 and so on. The waveform period depends on the clock period - it is 32x clock period. The circuit has a RESET input and a 4-bit output.
The ASM diagram: reset acc=0 inc=+1
acc and wave are 4-bit variables
inc and reset are 1-bit variables
wave forms the output
15 inc=-1 acc=acc+inc wave=acc wave
1-> 14 acc=acc+inc wave=acc
0 inc=+1 acc=acc+inc wave=acc
Yes reset
No
FIGURE 3.3. An ASM Example
Experienced designers will spend time and effort optimising their designs for silicon area, performance etc. but a first-time designer will be happy with a completed design which works!
It will probably be helpful to think of the design as made up of three parts:
• memory storage - registers, RAM etc.
• datapath functions - e.g. logical and arithmetic functions
• control - a block which includes an FSM to control the state sequence of the circuit.
13
STAGE 1 - Preliminary design:
1. Draw an ASM diagram of the design.
2. Draw an outline block diagram including memory storage registers and datapath functions but omitting the control block, as follows:
- from the ASM diagram identify all the registers and memory needed
- select combinational functions to carry out the data operations
- draw in the connections (wires) needed to transfer data between blocks and add multiplexers where necessary - check the block diagram against the ASM diagram.
3. List all the control signals which will be needed to control the operations of the blocks and orchestrate the clocking of registers. Also identify the signals needed as inputs to the control block.
4. Define the functions of the control block and extract a state transition graph for the
FSM from the ASM diagram.
5. Complete the block diagram by adding the control block and the control signals and write out a detailed specification of the control functions.
STAGE 2 - Refining the design:
The design should now be checked, critically examined and revised:
1. Work through the design, comparing it with your top level model and ASM diagram to check for the correct sequence, the correct production of control signals and correct data operations.
2. Modify the design if necessary.
3. Examine the design to see if there are any obvious simplifications that can be made.
It will often be found that step (2) will have led to additions and modifications which are rather clumsy. A re-examination may show that a design revision will give a simpler solution.
4. Repeat steps (2) to (4) until satisfied with the design.
STAGE 3 - Verifying the design:
Before the design can be verified by simulation it must be entered into the CAD system.
1. The structure can be entered as a schematic block diagram. In this case, great care should be taken to avoid errors and inconsistencies in the labelling of pins and bus signals. Careless labelling can make nonsense of simulations. Alternatively, the entire design can be entered as a HDL description.
2. If the design is entered as a block diagram then the functional descriptions of the blocks must be entered using a HDL i.e. Verilog in the present exercise. Models will already exist for library blocks.
3. The behavioural/functional models of each of the RTL blocks are tested for correct functionality by simulation. A set of test patterns will be needed, (see below).
14
4. When the functional models of all the blocks have been verified the whole design is simulated using the same chip test that was used for the top level behavioural simulation.
5. Corrections are made if needed and the design is re-simulated until correct outputs are obtained.
Whole chip simulation:
The same test program and test bench that were used for the Top Level behavioural simulation are used for the RTL simulation, but with one important difference - a clock signal will now be used by the chip model. There was no need for a clock signal at the
Top Level because the Register Transfer structure had not then been defined. Note that the chip model now consists of a structural description, derived from the schematic block diagram, and behavioural models of each of the individual blocks in the schematic diagram.
The output file which is produced by the RTL simulation should be compared with, and agree exactly with, the output file which was produced by the Top Level simulation.
Testing individual blocks within the RTL design:
In a large design, it is important to test the behavioural models that have been written for each of the functional blocks before testing the whole design. A test bench is now needed to supply the test input stimuli which emulate the signals that the block will
“see” when embedded in the whole circuit. The test input stimuli consist of a set of test
vectors representing the input signals to the block under test for each clock cycle. The test vectors also include the expected output signals for each clock cycle.
In a simple design, the individual testing of blocks can usually be omitted. Thus in the
COMP22111 design exercise, it should only be necessary to test the complete design.
It is important during the design of a chip to consider the ease with which it can be tested after fabrication to find manufacturing faults. A common method of ensuring testability is the use of scan paths. A scan path is made by using registers and flip-flops which can be re-configured in test-mode to act in a ‘serial-in serial-out’ mode. They can then be connected in long chains into which a test pattern can be shifted from a test pin or pins (refer to course notes for details). Scan path testability can be incorporated at the RTL stage by selecting library registers, counters etc. which are configurable as scan registers and by using FFs with multiplexed inputs.
The inclusion of a scan path adds more complexity to the design and will be omitted in the STUMP design. Thus, the elements used in the COMP22111 project are not con-
15
figured for scan path operation. However students who do final year chip design projects are expected to include scan paths.
In the first year students learn how to design logic circuits using basic gates and flipflops. For large designs, of tens or hundreds of thousands of gates, this approach is too slow. Nowadays engineers use a number of different CAD tools to create gate level circuits automatically from RTL designs so that whole ASICs can be designed without any “by hand” gate level design.
The methods which are commonly used for the design of standard cell ASICs are summarised below.
1.
Use of library components: Many widely used RTL elements, e.g. registers, multiplexers, adders, can be pre-designed and stored as library components.
2.
Logic synthesis: Automatic synthesis tools can be used to create gate-level logic designs from internal behavioural descriptions. Tools for synthesising combinational logic and FSMs are well established and widely available. Tools for synthesising whole RTL designs are also available and are now highly sophisticated so as to be able to optimise performance, power or area. However, this sophistication requires user interaction and usually design iteration.
3.
Logic block compilers: Compilers are used to generate blocks which have some form of regular geometry. Most ASIC vendors supply compilers for ROM, RAM,
PLAs and datapaths.
In COMP22111, library components have been used to define the datapath and a logic synthesis tool is used for the control block.
It is sensible to arrange that the behavioural descriptions of RTL logic elements which have already been written for the RTL design can also be used as inputs to the synthesis tools. In the processor design the Verilog program describing the control block is used as the input for the synthesis software.
The logic-level design is seen to be an almost automatic decomposition from the RTL design. After decomposition to logic, the whole design is then re-simulated using the same chip test that was used for the Top Level and the RTL simulations. There may be a few problems this stage because:
1. Synthesis tools do not always do the ‘sensible’ thing and may misinterpret a description which was adequate for RTL simulation but not sufficiently specified for the unambiguous decomposition to gate level.
2. Simulations at higher levels do not take any account of gate delays. The gate level simulation models do include information about the delay characteristics of the gates and the simulation results show gate delays; the logic simulator can also make worst case predictions of the effect of the wiring between gates (but the actual wir-
16
ing delays cannot be calculated/known until after layout). The simulator results may show that some delays are unacceptable or that the active clock edges occur too close to data transitions.
Tests of individual blocks may be needed in addition to the whole chip simulation in order to sort out problems. RTL block test patterns are needed for these logic level simulations.
Although the design is normally targeted at semi-custom chip design, it can also be aimed at a FPGA at this point in the design. This is because the design process down to the end of the logic design stage is independent of the medium it is implemented on.
A FPGA consists of preformed silicon comprising functional logic blocks and interconnections which are programmable. Here, the logic needs to be mapped onto the logic blocks of the particular FPGA and these are then placed and routed. This can all be done automatically by CAD tools. The design can then be downloaded onto the
FPGA, again using appropriate available tools. To check the operation of the downloaded design, a test program is run. This should be the same as that used in the top level behavioural simulation.
Unlike semi custom design, any design errors are not fatal. They only require that the design process is repeated from the highest level amended followed by downloading and testing of the updated design!
For a semi-custom design, there are still a number of design stages following the logic design which need to be performed. These are described in the remaining sections of this chapter.
Layout is the process of placing geometrical representations of gates on the surface representing the chip and interconnecting them with tracks. When a semi-custom chip is being designed layout is carried out using automatic ‘Place and Route’ CAD tools.
Each gate is represented as a rectangular shape of a standard height on a chip which uses the standard cell architecture. The internal representation of each gate as a set of polygons is added at a later stage by the manufacturer before making the masks for
17
fabrication. The cells are abutted together in rows with channels between the rows for
routing the interconnections (Figure 3.4).
FIGURE 3.4. Some rows of a standard cell layout
The layout of a small chip will consist of a single rectangle containing a number of rows and channels of the same length but a floorplan is needed for a large chip. A floorplan subdivides the total surface of the chip into separate areas for the placement of the different functional blocks and for the routing of signals between the blocks.
Although CAD tools can be used to assist the creation of a floorplan it is a difficult process to automate. The objective will usually be to obtain a layout with as small an area as possible that maintains the signal integrity.
The Place and Route procedure for a standard cell chip consists of carrying out a sequence of separate steps. First, I/O pads must be added to the top-level schematic.
The circuit description will usually be held in the CAD database in a hierarchical format but the layout tools need a flattened description containing every instance to be used in the layout. The next step is therefore to ‘flatten the netlist’. Further steps define floorplan areas, assign cells to rows and carry out local channel routing and global routing.
The layout stage is not the end of the story for the ASIC designer. Having obtained a layout, a ‘Design Rule Check’ (DRC) is carried out to ensure that none of the fabrication process design rules are broken. If the software is well designed and bug-free there should be no errors at this stage - regrettably it is sometimes necessary to make a few edits to the layout by hand. When the DRC passes, then a ‘Layout versus Schematic’ (LVS) check is performed. This checks that every feature extracted from the layout appears on the schematics generated from the logic. Any mismatches need to be investigated and fixed until the components in the logic correspond exactly with the layout features.
18
When this check passes, the next step is to use a program to calculate (extract) the parasitic capacitances of all the interconnection tracks. The whole chip is then re-simulated and the effects of the extra track capacitances are included in the delay calculations in order to get a fairly accurate estimation of performance.
Further testing is normally done to ensure that the design functions correctly despite the maximum allowable variations in transistor characteristics and in environmental parameters (temperature, voltage etc.). This is referred to as testing in the ‘corners’.
When the designer is satisfied that the design functions correctly under all conditions and meets its performance/power/area specification under typical conditions, a final
‘Design Rule Check’ (DRC) and ‘Layout versus Schematic’ (LVS) check are undertaken. Once these has been done and pass, the chip design files can be shipped to the manufacturer for the fabrication of the chip.
Although the main design task is now complete the designer has more work to do:-
1. A ‘bonding diagram’ showing how the chip is to be packaged must be sent to the manufacturer. The bonding diagram shows the connections between the bond pads on the chip and the ‘lead frame’ pins of the package.
2. A set of test vectors for the testing of the chip after fabrication must be provided.
This should be the same test program as used in the top level behavioural simulation.
19
20
Aim: to familiarise you with the STUMP assembler and to gain an understanding of the programmer’s top-level view of the STUMP operation.
Hand in: - the completed charts for Ex. 1 to be found at the back of the manual.
Read: Chapters 5 and 6.
Sessions: 1
Assessment: 15 marks
Learning Outcomes: understanding of operation of assembler code and practice in relating it to machine behaviour, practice at handling binary, hex and decimal quantities.
You will be given a sheet with a few consecutive lines of assembler code together with the initial state of the Register Bank and Condition Code register prior to executing the code. For each instruction fill in the sheets to specify the register state after executing the instruction. Hand in your sheets for marking on completion.
To help you, note that
1. The Program Counter is incremented directly after fetching an instruction and before the instruction is executed.
2. The Result Register holds the result computed by the ALU.
21
Aim: to complete the Verilog model of the algorithmic view of the processor chip and to enter it into the Cadence CAD system.
Hand in: - a listing of the Verilog code for the top level model
Read: Chapter 3, section 3.3 and chapter 5
References:
T. R. Padmanabhan and B. Bala Tripura Sundari, Design Through Verilog HDL,
Wiley-IEEE Press, 2003.
M. D. Ciletti, Advanced Digital Design with Verilog HDL, Pearson 2002.
J. Bhasker, A Verilog HDL Primer, 2nd ed., Star Galaxy Press, 1999.
Sessions: 2
Assessment: 25 marks in total will be allocated to the Verilog code and syntactic errors will to a large extent be ignored.
Learning Outcomes: familiarity with Verilog description of a specification, experience of writing Verilog.
You may not have met the hardware description language Verilog before. However, you should find that the examples and templates provided should be a sufficient guide as to what is needed to complete the exercises. Demonstrators will be able to give help and advice.
The amount of Verilog code to be written is not very long - about one page without comment lines. In order to write this code and to attempt the other exercises well, you need to have a thorough understanding of the processor specification (Chapter 5). You should remember that the high level behavioural model of a chip is in effect an executable specification. It is the user’s view of the chip. It is very important to get it right because it is what will finally be made - all the lower levels of design are tested against this specification.
Your first task is to complete the Verilog behavioural model of the STUMP processor chip. Most of the model is provided - a listing of the code is given in Appendix A of this manual. It includes all the functions and tasks that you will need, and the main program includes reset, instruction fetch and instruction decoding. The part left for you to do is the execution part of the instruction i.e. the reading of the Register Bank, the setting up of operands to the ALU, the execution of the instructions in the ALU and the setting of condition codes in the Condition Code Register. The writeback phase following execution has been done for you. You can complete the program using
CASE and IF constructs together with variable assignments and task and function calls. You will find examples of all the syntax needed in other parts of the program.
22
Those parts of the Cadence CAD systems needed to complete the exercise are described below:
Accessing and Modifying the Top Level Model
The Cadence CAD system is used in this laboratory for the design work. Create the
COMP22111 Cadence directory structure by typing mk_cadence 22111. This should only be done once. Thereafter start a Cadence session by typing start_cadence 22111.
Eventually, an icds window opens. Choose File->Open. This brings up an Open File window. In the Open File window set the Library Name to comp22111, Cell Name to
processor, View Name to algorithmic and then click OK. This brings up a verilog.v window containing the incomplete Verilog code describing the top level of the
STUMP. Type in this window to add your code and then save it using File->Save from the window’s toolbar.
Print out your code from the verilog.v window toolbar using File->Print. This brings up a Printer window. Enter lpr -Pugpr3 and click on Print. Dismiss the verilog.v file using File->Exit. If the HDL Parser Errors/Warnings window comes up, click No.
:
To exit from Cadence, click on File in the icds window and then select Exit. In the Exit
icds? window this brings up, click yes.
23
Aim: To use the test programs provided to simulate and debug your top level behavioural model of the STUMP.
Demonstrate: that the test programs work correctly on your Verilog model.
Read: Chapter 6
Sessions: 2
Assessment: This exercise is assessed in the lab and is worth 20 marks, with 4 marks awarded for passing the Register Bank tests, 6 marks for the ALU tests, 3 marks for the shifter tests and 7 marks for the branch tests.
Learning Outcomes: how a test strategy evolves for complex hardware, how this translates to test programs, experience of using CAD tools to control and simulate a design using a test bench and experience in debugging hardware specifications.
Parsing (syntactic analysis)
It is first necessary to parse a design that satisfies the Verilog syntax. Start Cadence
(start_cadence 22111). This brings up the icds window. Choose File->Open. This brings up an Open File window. In the Open File window set the Library Name to
comp22111, Cell Name to processor, View Name to algorithmic and then click OK.
This brings up a verilog.v window containing the Verilog code. You need to perform at least one edit on it and then save it. Now, in the verilog.v window, select File->Exit.
The edit operation on the file causes all the Verilog code to be checked for syntax errors on Exit. If you have errors a HDL Parser Error/Warnings window comes up telling you that parsing of the Verilog file failed. A failed design check indicates syntax errors and by Clicking Yes in the HDL Parser Error/Warnings window you can inspect the error report to gain some indication of where the error is, and the verilog.v window will reopen. Correct any errors in the top level description, save it and then exit to recheck for syntax errors. Repeat this until the code correctly passes the checks.
Test Files
In the COMP22111 directory, you will find 4 test files (test1.s to test4.s) written in the
STUMP assembly code. These four tests provide a fairly good test of most of the
STUMP and are used to test the STUMP at all stages of the design from top level to layout. The tests would also used to test the fabricated design. The tests are incremental i.e. test 2 assumes that test 1 works, test 3 relies on tests 1 and 2 etc. The tests start at line 0 and all write results back to memory, starting at line 0 and thus overwrite the program! Test 1 is a basic test which checks that the internal buses are connected correctly, that the Register Bank can be correctly addressed, that instructions can be fetched, and that data can be written back to memory (for checking). If test 1 does not work, something fundamental is wrong and this should be fixed before running any other tests.
24
Test 2 checks that the ALU operates correctly for various data combinations and different logical and arithmetic operations. It only checks the ALU and does not use the shifter. It aims to identify any signals in the fabricated ALU which are unable to change state (because they are stuck at ‘1’ or ‘0’) and pinpoint any adjacent signals
(bits i and i+1) which are shorted together.
Test 3 is a rudimentary program which checks the different shifting operations, and test 4 checks the branch operations. As these are the test programs used throughout this laboratory, you are advised to peruse them carefully. Furthermore, you will be using them to debug your design so familiarity will certainly be necessary if the test programs indicate any errors in your design.
The assembler is fully described in chapter 6 and instructions to convert the assembler programs into a format suitable for the processor memory are given in section 6.2.1.
They are repeated here for convenience: in a shell window, change the directory to
COMP22111 using cd $HOME/Cadence/COMP22111. Then type sasm <filename.s> to create 3 files. Binary is in <filename.bin> while hex versions are in <filename.hex> and <filename.mem>. The file for the processor memory is called xc4000mem.ram and is created by typing loadmem.sh <filename.mem> in the terminal window; this creates the file in the $HOME/Cadence/COMP22111/test_bench directory.
Waveform Viewer
In the icds window, select Tools->Verilog Integration->NC-Verilog. This brings up a
Virtuoso Verilog Environment etc. window. Fill in its fields with test_bench for the Run
Directory, and in the Top Level Design section enter comp22111 for Library,
processor_test_bench for cell, and schematic for View. Click on the top left icon (of the running man) to initialise the simulator. When the icds window shows that initialisation is complete, click on the second icon down on the left (of three separate ticks) to generate a netlist. When the icds indicates this is complete, click on the Simulate icon which is the third icon down in the Virtuoso Verilog Environment window. This launches SimVision and (eventually) brings up two windows: a Design Browser 1-
SimVision window containing the processor and a Console-SimVision window which is a command window.
You will probably want to view signals on the Waveform Viewer, select the fifth icon from the right (showing waveforms) in the Design Browser 1-SimVision window to bring up a Waveform 1-SimVision window with the signals to be displayed listed down the left. To monitor signals/buses within the test bench, expand test (press its + button) to reveal top in the Design Browser 1-SimVision window and then click on the top symbol to list the signals to and from the processor. Select the signals you want to monitor (the address, data and clock lines are particularly recommended) and send them to the Waveform Viewer (fifth icon from right). Continue this process of signal expansion and selection until all the signals you require have been sent to the Waveform Viewer. A good starting point for monitored signals would be the input/output signals to the processor.
25
Command Files
The simulation can be run by using the menus to issue commands to the simulator with the commands given appearing in the Console-SimVision window. However, this is tedious and prone to errors, so normally these commands are placed in a file and the simulator instructed to take this file as its input. Command files have a .sv extension and the command files for the first three tests, test1.sv to test3.sv are in your
COMP22111 directory. test1.sv is shown below: force test.RESET = 1 run 10 ns deposit test.RESET = { 1’b0} -after 2 us -absolute -release stop -create -condition { #test.top.A = 45 & #test.top.OE = 0 } stop -create -time 10 ms -absolute run stop -delete * deposit test.DUMP = { 1’b1} -after 1 ns -relative stop -create -time 10 ns -relative run stop -delete *
The first three lines define the RESET signal, setting it to ‘1’ at 10nsecs and removing it by taking it to ‘0’ at 2
µ secs. The next two lines define breakpoints where the simulation is to be stopped. The first is when the address lines are 45 with the Memory Output Enable signal at ‘0’. Remembering that the test program starts at line 0 and that instructions are placed in consecutive lines, this statement recognises the request to memory to fetch the program’s final instruction (bal Fin). If your processor description is correct, the simulation always stops at this point. The second stop command is only required if your processor is stuck in an infinite loop in which case the simulation terminates at 10msecs. The simulation is then instructed to run up to the time a breakpoint is encountered. There is a stop -delete * after each run statement; this line is necessary to remove the breakpoint enabling the simulator to run past this time. The
DUMP signal is activated 1ns after the current time when the current values in the memory will be dumped to the file $HOME/Cadence/COMP22111/test_bench/
xc4000mem.dump. The final three statements in the command file continues the simulation for 10nsecs beyond the last breakpoint before stopping.
Using the template for the three provided command files, you need to create a command file test4.sv for the branch test.
Running Simulation from Command Files
To run the simulation from a command file, reset the simulation using Simulation->
Reset to Start in the Design Browser 1-SimVision window. Run the simulation from the command file by selecting File->Source Command Script from the Design Browser 1-
SimVision window to bring up a Source Command Script window. Here select the browse button (...) which opens a Source Script File window in which <name of
file.sv> in the COMP22111 directory is selected followed by Open; this closes the
Source Script File window. This filename now appears in the Source Command Script
26
window. Select simulator Console (NC-Sim) for the Send commands to: field and press
OK to cause the simulator to run (as can be observed by the change in the icon displayed by the Play button in the Design Browser 1-SimVision). You can now inspect the file dumped to memory and you should compare the created outputs with those expected from the test program. If the contents are not correct, you need to pinpoint the cause of the fault; in this case, you will find the Waveform Viewer a valuable tool in debugging your design. Correcting any faults will involve modifying the Verilog code for the processor. After modifying the Verilog code, you need to repeat the syntax checks as previously described.
When your top level design description simulates correctly for all the supplied test programs, you should show the results of running the assembler code for each test to a demonstrator who will mark them off. You should complete and hand-in the top level code and demonstrate the simulation of the processor operation before moving on to the next exercises.
Stopping The Simulation
If for any reason the simulation fails to complete within a few seconds and the time is galloping on uncontrollably, you need stop the simulation! In the Design Browser 1-
SimVision window the button next to the play button (having two parallel vertical lines) is a stop button and will halt the simulation. Alternatively, use Simulation->Stop.
Exit from Simulation
To exit from the Waveform Viewer, select File->Exit SimVision in the Waveform 1-
SimVision window. To exit from the simulator, in the Design Browser 1-SimVision window select File->Exit SimVision. This removes both the Design Browser 1-SimVi-
sion window and the Console-SimVision window. To exit from the simulator environment, in the Virtuoso Schematic Composer etc. window select Commands->Close.
Note that the created netlist does not need to be recreated if further simulations are performed unless there are changes to the top level Verilog code. Finally, exit from
Cadence by clicking on File in the icds window and then selecting Exit. In the Exit
icds? window this brings up, click yes.
27
Aim: to produce Signal Usage Charts to aid with the composing of the Verilog description of the STUMP control at the RTL level.
Hand in: - the Signal Usage Charts for Ex. 4 to be found at the back of the manual.
Read: Chapters 5, 6 and Appendix C.
Sessions: 1
Assessment: Worth 10 marks. Marks will be awarded based on the correct operation of the STUMP with 2 marks for Fetch signals, 4 for Execute and 4 for Writeback.
Learning Outcomes: understanding how to specify control in a formal way.
The processor design can be thought of as comprising a 16-bit datpath (which does the actual computation) plus a control block. The RTL level design of the datapath has been done for you and is shown in Appendix C.
The control logic is all about ensuring that the right things happen at the correct time by activating control signals at the appropriate time in the instruction cycle. Each instruction takes three clock cycles to complete; instruction fetch is performed on the first cycle, execute which reads the operands and performs the arithmetic is done in the second cycle, and a writeback phase which operates on the contents of the Result
Register occupies the third cycle. In general, a control signal is required by each multiplexer (to select the desired input at the appropriate time) and by each register (to enable data to be loaded into a particular register) in the datapath. In addition, there are control signals to access the desired read and write locations in the Register Bank and signals to control the access of information in the memory.
At the RTL level, the STUMP processor requires 12 different lots of signals to control the datapath and the memory. These are:
BR indicates a branch instruction. It is used by the Sign Extender element to extend the least significant 8-bits while 5 bits are extended if BR is low.
FETCH indicates an instruction fetch is occurring
EXE indicates the Execute phase of operation i.e. data is read from the Register
Bank, operated upon and the result placed in the Result Register.
SRPA[2:0] 3-bit address specifying the register (Reg0 to Reg7) to be read out onto port A of the Register Bank
SRPB[2:0] 3-bit address specifying the register (Reg0 to Reg7) to be read out onto port B of the Register Bank
SWP[2:0] 3-bit address specifying the register in the Register Bank (Reg0 to Reg7) to be written to
IMMED selects the extended immediate operand rather than port B of the Register
Bank as the B-input to the ALU
LDCC enables the loading of the 4-bit Condition Code Register
28
LDREG enables writing to a register in the Register Bank
SALUD (Select ALU Data) selects data from the Result Register for writing into the Register Bank. If SALUD is low, data from the memory (MDIN[15:0] is selected instead.
MEMWR active-low signal to Bus Interface when Memory is to be written to
MEMOE active-low signal to Bus Interface when Memory is to be read from
The Control forms these signals based on the contents of the Instruction Register
(INSTR[15:0]) and on a State Register which is internal to the Control. The State Register is updated on each clock and when running instructions has 3 states: FETCH,
EXECUTE and WRITEBACK. FETCH indicates the Instruction Fetch phase, EXE
(i.e. EXECUTE) indicates the Execute phase and LDREG indicates Writeback when the Register Bank is written to; naturally, only one of FETCH, EXE and LDREG can be a ‘1’ at any time and all three may be ‘0’ during the writeback phase of a store instruction. The action required in each phase (which takes one clock period) is summarised in Table 4.1. below:
Phase 1
Phase 2
Phase 3
Fetch
Decode/
Execute
Writeback
The next instruction is fetched from memory and +1 is added to the Program Counter.
The instruction is decoded and executed. This phase finishes when the result from the ALU is clocked into the
Result Register. The Condition Code Register may also be updated in this phase.
ALU operations are written back into the register bank.
Load/Store operations access memory. A branch instruction that is taken will update the Program Counter.
Table 4.1. Instruction Phases
In specifying the control signals, it is useful to make Signal Usage Charts which show the state of control signals at any time. Signal Usage Charts sheets can be found on pages 77 and 78 at the back of the manual. Your task is to determine how each control signal is formed in each phase of the instruction for each instruction type. The control signal may be ‘0’, ‘1’, ‘don’t care’ or formed from bits in the Instruction Register
INSTR[15:0] and/or the phase signals (FETCH, EXE and LDREG) and/or other signals. Hand in the completed sheets on pages 77 and 78 for your Signal Usage Charts.
The Signal Usage Charts sheets on pages 75 and 76 are for you to make a copy of what you hand in, as this will help you in writing the Verilog code required in the next exercise.
29
Aim: to produce a Verilog specification of the control for your processor and enter it into Cadence.
Hand in: - Verilog listing of the Control
Read: Section 3.5 and Appendix B and C of this manual
Sessions: 1
Assessment: Worth 15 marks. Marks will be awarded based on the correct operation of the Control, the program structure and the comments provided; syntactic errors will be penalised.
Learning Outcomes: A complete understanding of the operation of the STUMP
RISC processor, more practice in writing a substantial hardware descriptions in Verilog but this time at the RTL level.
Introduction
The processor comprises the Bus Interface which forms the memory to processor interface, the Datapath elements which performs the computational part of the
STUMP, and the Control which generates the signals at the correct time required to perform instructions. In this exercise you are going to complete the specification of the
Control block in Verilog. The interconnection of the Bus Interface, Control and the components of the Datapath components is shown in Appendix C. The Datapath and
Bus Interface have been designed for you (down to the gate level).
RTL Design of the Control
Advice on how to proceed with this stage of the design is given in section 3.5 - but the most important thing is to know precisely what you want your design to do. In determining the signals that have to be asserted for each of the three phases by
Control, the Signal Usage Charts you generated in the last exercise (if correct) should provide you with a precise specification of the control for the Fetch, Execute and
Writeback phases. These need to be translated into a Verilog control block design specification and entered into the functional view of the control cell. This is provided in the form of a template which is listed in Appendix B.
Accessing and Modifying the Control
From the icds window, choose File->Open. This brings up an Open File window. In the Open File window set the Library Name to comp22111, Cell Name to control, View
Name to functional and then click OK. This brings up a verilog.v window containing the incomplete Verilog code describing the control for the STUMP datapath. The code contains an always code block, written for you, which defines the processor state and advances the state on each positive clock edge. State 0 is the Reset state, State 1 Fetch,
30
State 2 Execute, and State 3 is the Writeback state. If the state is in none of these states, then a Default state is entered which sets all signals to ‘don’t care’. It is strongly recommended that the signals listed in the Default state are explicitly set in each of the four states you need to write. At the end of the code, a function Testbranch, written for you, takes as its input parameters bits 11 to 8 of the Instruction Register and the bits in the Condition Code Register returning a ‘1’ if the branch is to be taken (and ‘0’ if a branch is not taken). Use the Edit facilities of the window to add to your control code and then save it using File->Save.
Print and exit as described in Ex. 2. Don’t forget to hand your code in.
31
Aim: to simulate and debug the whole RTL design.
Demonstrate: the simulation of the whole processor design with the test programs.
Sessions: 1
Assessment: Assessed in the lab. 15 marks allocated on the basis of demonstrating that your RTL description is correct and passes the tests; 3 marks are awarded for passing the Register Bank test, 5 marks for the ALU test, 3 marks for the shifter test and 4 marks for the branch test.
Learning Outcomes: practice in testing and fault finding in a large digital system at the RTL level using the Cadence CAD tools, experience of the design iteration process.
In this exercise, you will run the test programs we have given you on the RTL description of the processor. This will allow you to identify and debug faults in your Verilog code of the Control. You can assume that the test programs we provide are correct and that the RTL datapath we provide is correct. Therefore any errors in operation are due to faults in your Verilog control description!
When you are satisfied that your design is working correctly, show your simulation results for each test to a demonstrator.
The Cadence procedures needed for this exercise are similar to those in Ex. 3 and are briefly summarised below:
It is first necessary to parse a design that satisfies the Verilog syntax. Start Cadence
(start_cadence 22111). This brings up the icds window. Choose File->Open. This brings up an Open File window. In the Open File window set the Library Name to
comp22111, Cell Name to control, View Name to functional and then click OK. This brings up a verilog.v window containing the Verilog code. Make at least one edit then save. Now, in the verilog.v window, select File->Exit to check the Verilog code for syntax errors. Correct any errors and repeat the parsing process until correct.
As before the test files are in test1.s to test4.s and you need to create a file for the processor memory called xc4000mem.ram for use in the simulation. See Ex. 3 or Ch. 6 for instructions on this.
32
In the icds window, select Tools->Verilog Integration->NC-Verilog. This brings up a
Virtuoso Verilog Environment etc window. Fill in its fields with test_bench for the Run
Directory, and in the Top Level Design section enter comp22111 for Library,
processor_test_bench for cell, and schematic for View. Click on the top left icon (of the running man) to initialise the simulator. When the icds window shows that initialisation is complete, click on Setup->Netlist in the Virtuoso Verilog Environment etc window along its top toolbar. This brings up a Netlist Setup window. To the Netlist These
Views line, remove algorithmic at the beginning of the line (to enable the netlister to pick up the Verilog code for control). Click OK along the top toolbar of the Netlist
Setup window. Back in the Virtuoso Verilog Environment etc window, click on the second icon down on the left (of three separate ticks) to generate a netlist. When the icds indicates this has completed correctly, click on the Simulate icon which is the third icon down in the Virtuoso Verilog Environment etc window. This brings up the Design
Browser 1-SimVision and Console-SimVision windows. As described in Ex. 3, place any signals you wish to observe on the Waveform Viewer, then run the simulation from a command file (reset simulation, use File->Source Command Script to select an input command file <filename.sv>, send commands to the Simulator Console (NC-
Sim) and press OK to run the simulation).
When your processor simulates correctly using your Verilog description of Control for all the supplied test programs, you should show the results of running the assembler code to a demonstrator who will mark them off.
33
34
The processor is a 16-bit machine with a RISC style architecture. Operands for ALU operations come from registers inside the processor and the result is returned to a register. Separate instructions are provided to move data between the registers and external memory.
There are 8 registers. R0 is always zero and can be used as a source operand, allowing
Move instructions to be synthesised from an Add instruction. R0 may be written to, but the result is always discarded, allowing Compare instructions to be synthesised from
Subtract instructions. Register R7 is the program counter and, from a programmer’s view of the machine, has equal status with the other registers allowing PC-relative addressing to be supported.
CC 3
Sign Flag (N)
CC 2
Zero Flag (Z)
CC 1
Overflow Flag (V)
Table 5.1: Condition Code Bits
CC 0
Carry-Out Flag (C)
The processor has a 4-bit condition code register shown in Table 5.1. It holds status information relating to the ALU output. The four status bits indicate if the ALU result is negative (N bit is ‘1’ if ALU result is -ve; N is ‘0’ for +ve or zero), zero (Z bit is ‘1’ if ALU result is all ‘0’s; Z is ‘0’ if non-zero), overflows (V bit is ‘1’ if adding two +ve numbers yields a -ve result, or if adding two -ve numbers yields a +ve result; V is ‘0’ if number is within range), or has a carry out (C bit is ‘1’ if there is a carry out of the ms
ALU result bit; C is ‘0’ if bit 15 of the ALU result has no carry out). Each arithmetic and logical instruction has the option of updating or not updating the condition code register. If an arithmetic/logical instruction does not update the condition code register then its state remains as is. Load, Store and Branch instructions never update the condition code register and so do not change the existing state of this register.
There are 3 instruction formats in a fixed-length 16-bit instruction. The machine operates on 16-bit words only. Byte addressing is not supported.
35
Instruction Code
000
001
010
011
100
101
110
111
Instruction Explanation
ADD 2’s complement add
ADC
SUB
2’s complement add with carry-in
2’s complement subtract
SBC
AND
OR
LD/ST
Bcc
2’s complement subtract with borrow
Bitwise AND of two 16-bit words
Bitwise OR of two 16-bit words
Load register from memory or
Store register to memory
Branch if condition cc is satisfied.
Table 5.2: Basic Instruction Set
There are 8 basic instructions shown in Table 5.2. enabling arithmetic, logical, load, store and branch operations to be performed; as is common in processors, all the arithmetic is performed by an adder since subtraction of A-B can be performed as
A+B+‘1’. Some other instructions such as Cmp, Nop and Mov can be expressed directly in terms of the basic instructions and are supported by the assembler. Other instructions may be synthesised from the combinations of the basic instruction set as shown in Chapter 6.
Shift instructions are somewhat special. Shift-left instructions can be derived from the basic instruction set. Shift-right instructions have been added as a rather ugly “kludge” and are dealt with in next section.
There are just 3 instruction formats which are shown below:
Type 1: 2 source registers:
15 14 13 12 11 10 9
INSTR 0
LD
CC
DST
8 7 6
SRC A
5 4 3
SRC B
2 1
SHIFT
0
Type 2: 1 source register, 1 immediate value:
15 14 13 12 11 10 9
INSTR 1
LD
CC
DST
8 7 6
SRC A
5 4 3 2
Immediate
1 0
36
Type 3: Conditional branch:
15 14 13 12 11 10 9
1 1 1 1 Condition
8 7 6 5 4
Offset
3 2 1 0
The processor is a 3-address machine specifying two source operands and a destination operand. In the case of arithmetic and logical instructions (instruction codes 0 to 5), the two source operands are either two registers (Type-1 instructions) or a register and a 5 bit signed immediate value (Type-2 instructions). The result of the operation is returned to the destination register (DST) and the condition codes are updated depending on the state of bit 11 (if LDCC is ‘1’ then update condition-codes; if LDCC is ‘0’, do not update condition codes)
Branch (code 7) and load/store instructions (code 6) do not update the condition-code register. In the case of a LD/ST instruction, bit 11 is used to determine the direction of the data transfer: if LDCC is ‘1’, the operation is store to memory; if LDCC is ‘0’, the operation is load from memory. The memory address is constructed from the sum of the two source operands, i.e. the two registers specified by SRC A and SRC B for
Type-1 instructions or the register specified by SRC A and the 5-bit signed immediate for Type-2 instructions. The register specified by DST is the register to be written into memory for a Store operation or the register to be loaded from memory for a Load operation.
Type 3 instructions are branch instructions (code 7). Here, the 8-bit signed offset is added to the Program Counter to compute the address of the instruction to be jumped to if the branch is taken. This is written into the Program Counter if the branch is taken but is ignored otherwise. In branch instructions, bits 8 to 11 specify the conditions under which a branch is taken. These usually involve bit(s) in the condition code register. The branch conditions are described in section 5.5.
37
Bits 1 and 0 in Type-1 instructions are used to control various shift-right operations. If a shift is specified then the one bit right shift of operand A from the Register Bank is performed before it reaches the ALU. The shifts that can be specified are an arithmetic shift right (ASR) with the sign bit copied to bit 15, clockwise circular shift (ROR) with the bit 0 moving to bit 15, and clockwise circular shift through the carry (RRC) with the C-bit in the condition code register moving to bit 15. These shift operations are summarised in Table 5.3 below.
Operation
No shift
ASR
ROR
RRC
Instr Bit 1 Instr Bit 0
1
1
0
0
Shifter Output, bit 15:=
0
1
0
1
A15
A15
A0
CC0
Table 5.3: Shift Operations
Shifter Carry-out
(CSH):=
0
A0
A0
A0
Refer to section 5.6 for information on how the shifter carry-out is used.
Assuming that the data input to the Shifter is A<15:0>, then the effect of the shift operations can be summarised as follows: no shift
A15 A0 ‘0’ CSH
ASR A15 A15 A14 --> A1 A0 CSH
ROR A0 A15 --> A1 A0 CSH
RRC CC0 CC0 A15 A14 --> A1 A0 CSH
FIGURE 5.1 Shift Instructions
38
Type-3 instructions implement 16 conditional branch instructions shown in Table 5.4.
The range of the branch target address is PC + 1
±
8 bit signed offset.
Mnemonic
BCS
BNE
BEQ
BVC
BVS
BPL
BMI
BGE
BLT
BGT
BLE
BAL
BNV
BHI
BLS
BCC
Bits 11:8 Branch Condition
0000
0001
0010
Always branch
Never
C+Z=0
0011
0100
C+Z=1
C=0 comparison: unsigned arithmetic overflow test: unsigned arithmetic 0101
0110
0111
1000
C=1
Z=0
Z=1
V=0 zero test
1001 V=1 overflow test: signed arithmetic
1010
1011
1100
1101
1110
1111
N=0
N=1
N.V+N.V=0
N.V+N.V=1
(N.V+N.V)+Z=0
(N.V+N.V)+Z=1 comparison: signed arithmetic
Table 5.4: Conditional Branch Instructions
39
The following table summarises the conditions under which the various condition bits are set. The column labelled C in and SBC instructions. C in shows where the carry-in comes from for the ADC is the carry into the least significant bit of the adder. The column labelled CC0 shows how the carry bit in the Condition Code register is derived if the register is updated.
C in
CC3
Sign
CC2
Zero
CC1
Overflow
CC0
Carry
Update codes if LDCC is set
ADD
ADC
SUB
SBC
AND
OR
LD/ST
BR
0
CC0
1
CC0
0
0
0
0
S
15
S
15
S
15
S
15
S
15
S=0
S=0
S=0
S=0
S=0
C
14
!=C
15
C
14
!=C
15
C
14
!=C
15
C
14
!=C
15
0
C
15
C
15
C
15
C
15
CSH if shift else 0
S
15
S=0 0
δ
δ
δ
δ
δ
δ
Table 5.5: Condition code settings
CSH if shift else 0
δ
δ yes yes yes yes yes yes no no where:
• S is a 16 bit result of an arithmetic or logic operation, i.e. the ALU result
• C
14 and C
15 are the carry bits from bits 14 and 15 respectively of an arithmetic operation
• CSH is the shifter carry-out (see Table 5.3) and is only used to update the condition code register for a logical order which performs a ASR, ROR or RRC shift
• shift is TRUE for type 1 instructions when bits 0 and 1 are NOT equal to “00”.
• SUB and SBC are done as an addition with CC0 and C in
settings as shown in the above table. CC0 is stored as a borrow and is C
15
since a borrow = carry. Thus A -
B - borrow = A + B + borrow. For SUB, there is no borrow, so A - B = A + B + ‘1’ while for SBC A - B - borrow = A + B + borrow = A + B + CC0.
The processor is shown in Figure 5.2 in a system with a clock, reset and memory.
Communication with memory is controlled by the processor signals Ram_Cs (memory select), Bus_Rd (enable memory output) and Bus_Wr (enable memory write). These are all active low signals i.e. their normal inactive state is high and they go low to activate the memory. The data is transferred on a 16-bit bidirectional Bus_D(15:0) and the
40
address at which reading or writing takes place is specified by the processor on the 16bit bus address bus, Bus_A(15:0).
If reading, the address to be read is placed on the address lines and the chip is enabled together with the output enable signal. After the access time of the chip has elapsed, the data from that address appears on the bus data lines. If writing, the address lines of the line to be written to are driven and the chip is enabled together with the write signal. The data on the bidirectional data bus is then written in to this location.
You can assume that data can be read from or written to memory within one clock cycle if the clock period = 400nsecs).
clk reset
Bus_D(15:0)
16 bit bidirectional data bus
D(15:0)
BusClk
Bus_A(15:0)
STUMP
PROCESSOR
Ram_Cs
Bus_Rd
GSR
Bus_Wr
A(15:0) cs oe wr
Data(15:0)
Addr(15:0) cs
MEMORY oe wr dump reload dump reload
FIGURE 5.2 The processor system
41
42
The STUMP processor design will be tested by supplying a program in binary form for loading into the memory model. An assembler, the SASM
1 assembler, has been produced which can be used to create a binary program from an assembly language program. The SASM assembler is described in the following section. Section 6.3
shows how the eight basic instructions described in Chapter 5 can be used to synthesise a further fifteen instructions.
sasm <filename.s>
The input file is parsed on line by line basis. Each line should contain a single instruction or assembler directive or full line comment. Three output files are produced
<filename.mem> contains code suitable for loading into the processor memory model using the loadmem.sh command as detailed below.
<filename.hex> contains code suitable for down-loading into the memory on the
Xilinx board
<filename.bin> contains contains the binary of the assembled code
A Verilog memory model is used and this can only read from a file named
xc4000mem.ram and dumped to a file named xc4000mem.dump. To create the
xc4000mem.ram file from the <filename.mem> created by sasm, use
loadmem.sh <filename.mem>
Thus, for example sasm test1.s
loadmem.sh test1.mem
will create a memory file xc4000mem.ram of the assembler comprising test program
1.
1.The assembler was written by Andrew Bardsley who also devised the original STUMP architecture
43
The format of instruction lines is:
[<label>[:]] <instruction name> <operands>
The label must begin in column 1 of the line and can optionally be terminated by a colon. It is valid to omit the label and also to place the first character of the instructionname in column 1.
Labels consist of one of the characters [a-zA-Z_] followed by any number of the characters [a-zA-Z_0-9]. Labels may not be any of the following reserved words (in either upper or lower case, although a mixed case version (e.g. Nop) of any of these keywords is a valid label): adc add adcs adds align and ands asr bal bcc bcs beq bge bgt bhi bhs ble blo bls blt bne bnv bmi bpl bvcbvs cmp data equ idem include ld mov movs nop or org ors pc r0 r1 r2 r3 r4 r5 r6 r7 ror rrc sbc sbcs st sub subs
Register Names
Valid register names are (pc is an alias for r7): r0 r1 r2 r3 r4 r5 r6 r7 pc
Mnemonics
Instruction mnemonics are listed below by instruction type. Instructions names that end in an “s” have the effect of setting the condition code bits based on the result of the instruction.
<shift> is one of ror, asr, rrc and indicates that value of <src_reg1> or <offset_reg> is affected by the specified shift operation before carrying out the specified operation
<expr> is a value in the range -16 to +15. An error is reported if the value is out of range.
Expressions
Expressions are similar to expressions in C. Supported operators are:
+ - * / % & | ^ << >> - ~ ()
Valid Instructions adc, adcs, add, adds, and, ands, or, ors, sbc, sbcs, sub, subs
Instruction Formats
<instruction> <dst_reg>, <src_reg1>, <src_reg2>
<instruction> <dst_reg>, <src_reg1>, <src_reg2>, <shift>
<instruction> <dst_reg>, <src_reg1>, #<expr>
44
Valid Instructions ld st
Instruction Formats
<instruction> <src/dest_reg>, [<base_reg>, <offset_reg>]
<instruction> <src/dest_reg>, [<base_reg>, <offset_reg>, shift]
<instruction> <src/dest_reg>, [<base_reg>, #<expr>]
<instruction> <src/dest_reg>, [<base_reg>, <label>]
In the last form of the instruction, the offset is calculated by subtracting the address of the label <label> from the current instruction. It is used with <base_reg> = r7 to allow pc-offset index addressing. For examples, see test programs code.
Valid Instructions bal bnv bhi bls bcc bhs bcs blo bne beq bvc bvs bpl bmi bge blt bgt ble
Instruction Format
<instruction> <label>
<label> is translated into an offset from the address of the next instruction and must be in the range of -127 to 128 from the current address.
Some common instructions, not visible in the basic instruction set, are available as aliases: nop cmp <src1_reg>, <src2_reg> cmp <src1_reg>, <src2_reg>, <shift> cmp <src1_reg>, #<expr> mov <dst_reg>, <src_reg> mov <dst_reg>, <src_reg>, <shift> mov <dst_reg>, #<expr>
Similar movs instructions are also allowed.
org
[<label>[:]] org <expr>
Set the current program address to the value of <expr>. An error is reported if this expression evaluates to less than zero or greater than 65535. The optional <label> is
45
assigned to the new address. The value of <expr> must be resolvable in the 1st pass of the assembler.
equ label[:] equ <expr>
Bind the value from evaluating <expr> to the identifier <label>. The value of <expr> can take any 32 bit value but must be resolvable in the 1st pass of the assembler.
data
[<label>[:]] data <list of data_items>
Inserts constants at the current program address onwards. <list of data_items> is a comma separated list of the elements:
<expr> Any expression. An unadorned expression is truncated to a 16 bit value and occupies a single word. Expressions with a suffix .b, .w or .s, .l represent a byte, a word, and a long word respectively. Long words are stored as two words in a little-endian format. Byte expressions are packed two to a word, least-significant byte first.
“<string>” Any sequence of characters except “ stored as signed 6-bit values unless the string is suffixed with .b in which the characters are byte packed as above align
[<label>[:]] align .w
[<label>[:]] align .s
[<label>[:]] align .l
[<label>[:]] align <expr>
The program counter is aligned to the nearest word (.w and .s) or long word (.l) or
<expr> words. The last form is useful reserving a block of memory. Word alignment is only useful between data statements which are byte packed. Using a label with a data statement has the side-effect of word-aligning the first data element.
Comments
[<instruction or directive>] ;< comment>
Any line can be appended with a comment. However only comments that start in column 1 are echoed to the listing file. Other comments are discarded.
Constants
The ‘C’ form of constants are allowed with the addition of binary constants which are introduced by ‘0b’
46
There are 8 basic instructions + the modifiers that affect operand-A shifting and the conditional updating of condition codes. Other instructions can be synthesised from the basic instruction set. Note only NOP, MOV and CMP are recognized by the assembler.
NOP
MOV ra, rb
CMP ra, rb
ASL ra
LSL ra
RLC ra
ROL ra
LSR ra
CCF
SCF
NEG ra
CPL ra
BL
RL
RET
ADD r0, r0, r0
ADD ra, rb, r0
SUBS r0 , ra, rb
ADDS ra, ra, ra
ADDS ra, ra, ra
ADCS ra, ra, ra
ADDS ra, ra, ra
ADC ra, ra, r0
ANDS r0, r0, r0
ADD ra, ra, r0 rrc
ANDS r0, r0, r0
ADD r1, r0, #1
ANDS r1, r1, r0, asr
SUB ra, r0, ra
AND r0, r0, r0
SBC ra, r0, ra
ADD r5, pc, #1
BAL <label>
ADD pc, r5, #0
ADD r6, r6, #-1
LD r5, [r6, #0]
ADD pc, r5, #1
No-Op: do nothing
Move register rb to ra
Compare registers ra and rb
Arithmetic Shift Left
Logical Shift Left
Rotate Left through Carry
Rotate Left
Logical Shift Right
Clear Carry Flag
Set Carry Flag
2’s complement of ra
1’s complement of ra
Branch & Link for leaf procedures r5 is link register = ret address
Return from link
General return
Table 6.1: Extending the Instruction Set
47
48
The Verilog listing of the processor module, processor, at the top level is given below.
It is a behavioural or the programmer’s view of its operation. The algorithmic code describes a fetch instruction phase, an execute phase where the instruction is decoded and the arithmetic/logical operation specified is performed, and a writeback phase where the result computed during the execute phase is used. The ALU result is either written back to the Rgister Bank. If used as a memory address, data can either be stored to memory from the Register Bank or loaded from memory into the Register
Bank depending on the instruction. The code given to students is incomplete and the task in the second execise is to complete the high level code for the processor so that the model runs the given test programs successfully. The instruction fetch and the code for the writeback is complete and should not be altered by students. However, the code for the Execute phase is missing and this is the code which needs to be added by students.
The code for the high level model calls functions and tasks. These are listed after high level model. Functions and tasks are passed parameters and perform some operation.
Functions return a result. Tasks are similar to procedures in that they operate on parameters and can modify them during task execution. Both functions and tasks may declare local variables to assist with their operation.
N.B. If the code stored in the files differs from that shown here, it should be assumed that the stored code is correct.
49
// Verilog HDL for STUMP processor “processor_v” module processor (BUS_A, RAM_CS, BUS_RD, BUS_WR, BUS_D, BUSCLK,
GSR);
//processor to memory signals :
output [15:0] BUS_A; //address bus
output RAM_CS;// memory chip select
output BUS_RD;// memory read
output BUS_WR;// memory write
inout [15:0] BUS_D; // data bus
// processor signals
input BUSCLK; //processor clock
input GSR; // reset signal to processor
reg [15:0] D_OUT;
reg [15:0] BUS_A;
reg RAM_CS, BUS_RD, BUS_WR;
reg [15:0] INSTR;
reg [15:0] REG_BANK [7:0];
reg [15:0] RD_A, ALUA, ALUB, S;
reg [3:0] CC ;
reg [15:14] C;
reg CSH;
wire [15:0] PC;
50
assign BUS_D = D_OUT;
assign PC = REG_BANK[7]; // PC is an alias for REG_BANK[7]
// Used for debug only - do not use PC in your code
always
begin
if (GSR == 0)
begin
// Fetch State
Memory_Read(REG_BANK[7], INSTR); // Get instr pointed to by PC
REG_BANK[7] = REG_BANK[7] + 1; // add +1 to PC as soon as instr fetched
// Execute State
if(INSTR[15:13] == 3’b111) // branch instruction
begin
//
// put your code here to form ALU inputs ALUA and ALUB
// note that the ALUA input is the output from the shifter
//
end
else if(INSTR[12] == 1’b1) // type 2 instruction
begin
//
// put your code here to form ALU inputs ALUA and ALUB for type 2 instrs
//
51
end
else // type 1 instruction
begin
//
// put your code here to form ALU inputs ALUA and ALUB
//
end
// op decode
case (INSTR[15:13])
0 : Add(ALUA, ALUB, 1’b0, S, C); //add instr done for you
//
// put your code here to form ALU result S and carry bits C14 and C15 if needed
//
endcase
//
// put your code here to update the condition code register.
//
// Write state
case (INSTR[15:13])
3’b111 : if (Testbranch(INSTR[11:8], CC) == 1) REG_BANK[7] = S;
3’b110 : if (INSTR[11] == 1) Memory_Write(S, REG_BANK[INSTR[10:8]]);
else
begin
Memory_Read(S, REG_BANK[INSTR[10:8]]);
REG_BANK[0] = 0;
end
52
default : begin REG_BANK[INSTR[10:8]] = S; REG_BANK[0] = 0; end
endcase
end
else // reset state
begin
RAM_CS = 1;
wait (GSR == 0)
begin
REG_BANK[7] = 0;
REG_BANK[0] = 0;
CC = 0;
end
end
end // end of always
// start of tasks and functions ////////////////////////////////////////////////////////////////////////////
task Memory_Write;
// writes data on DMW to memory address AMW
input [15:0] AMW, DMW;
begin
RAM_CS = 0; BUS_RD = 1; BUS_WR = 1; BUS_A = AMW; D_OUT = DMW ;
#25 BUS_WR = 0;
#50 BUS_WR = 1;
#25 RAM_CS = 1;
53
end
endtask
task Memory_Read;
//reads memory address AMR and places data on DMR
input [15:0] AMR;
output [15:0] DMR;
begin
RAM_CS = 0; BUS_RD = 1; BUS_WR = 1; BUS_A = AMR; D_OUT =
16’hzzzz ;
#25 BUS_RD = 0;
#50 DMR = BUS_D ;
#25 BUS_RD = 1; RAM_CS = 1;
end
endtask
task Add;
//adds a 1-bit carry in Cin to two 16-bit quantities A and B
// produces 16-bit sum S and carry out C from addition in bits 14 and 15
input [15:0] A, B;
input CIN ;
output [15:0] S; output [15:14] C;
reg [16:0] RESULT;
54
begin
RESULT = A[14:0] + B[14:0] + CIN ;
C[14] = RESULT[15];
RESULT[16:15] = A[15] + B[15] + C[14];
S = RESULT[15:0];
C[15] = RESULT[16];
end
endtask
task Shift;
//shifts input A and a carry in Cin according to shift type INSTR[1:0]
// produces shifter output ASH and shifter carry out CSH
input [15:0] A;
input [1:0] INSTR;
input CIN;
output [15:0] ASH;
output CSH;
begin
case (INSTR)
0 : begin CSH = 0; ASH = A ; end
1 : begin CSH = A[0]; ASH = A >> 1 ; ASH[15] = A[15] ; end
2 : begin CSH = A[0]; ASH = A >> 1 ; ASH[15] = A[0] ; end
3 : begin CSH = A[0]; ASH = A >> 1 ; ASH[15] = CIN ; end
endcase
55
end
endtask
function Testbranch;
//compares branch condition INSTR[11:8] with cond code reg
// returns ‘1’ if branch to be taken, returns ‘0’ if jump not taken
input [11:8] BRANCH_INSTR;
input [3:0] CC;
reg N, Z, V, C;
begin
{N,Z,V,C} = CC;
case (BRANCH_INSTR)
0 : Testbranch = 1 ;
1 : Testbranch = 0 ;
2 : Testbranch = ~(C | Z);
3 : Testbranch = C | Z ;
4 : Testbranch = ~C ;
5 : Testbranch = C ;
6 : Testbranch = ~Z ;
7 : Testbranch = Z ;
8 : Testbranch = ~V ;
9 : Testbranch = V ;
10 : Testbranch = ~N ;
11 : Testbranch = N ;
12 : Testbranch = V ~^ N ;
56
13 : Testbranch = V ^ N ;
14 : Testbranch =~((V ^ N) | Z) ;
15 : Testbranch = ((V ^ N) | Z) ;
endcase
end
endfunction endmodule
57
58
The following is very incomplete Verilog code for the RTL description of a non-pipelined control unit for the STUMP.
It is your task in exercise 5 to complete the code required.
// Stump Control unit
// Original:ADP 9/5/06
// Last modified:ADP 11/5/06 module control ( LDCC, LDREG, EXE, BR, FETCH, IMMED, MEMOE, MEMWR,
SALUD, SRPA, SRPB, SWP, CC, CLK, INSTR, RESET );
//--------Input ports-------input CLK, RESET; input [15:2] INSTR; input [3:0] CC;
//-------Output ports-------output BR, FETCH, EXE, IMMED, LDCC, LDREG, MEMOE, MEMWR, SALUD; reg BR, FETCH, EXE, IMMED, LDCC, LDREG, MEMOE, MEMWR, SALUD; output [2:0] SRPA, SRPB, SWP; reg [2:0] SRPA, SRPB, SWP;
//--------Internals--------reg [1:0] state;
59
// Control of finite state machine always @ (posedge CLK)
begin if (RESET == 1) state = 0; else if (state == 3) state = 1; else state = state + 1;
end
// Control of state driven combinatorial logic always @ (state, INSTR, CC)
begin case (state)
0: // Reset cycle
begin
// put your code here for signals set in reset phase
// put your code here for don’t care signals in the reset state
end
1: // Fetch cycle
// Fetches instruction from memory, loads into
// instruction register & increments program counter
begin
// put your code here for signals that need to be set in the fetch phase
// put your code here for don’t care signals in the fetch state
end
2: // Execute cycle
// Instruction is decoded and executed
60
// Instruction may be: load/store (Type I or Type II)
// or a Branch
// or an ALU operation (Type I or Type II) begin
end
// you need to decode the instruction and then set signals appropriately
// to a value or don’t care for that instruction
3: // Write back cycle begin
// Instruction is decoded to determine whether
// the output of the ALU should be written back
end
// to memory or register bank or discarded and signals are set to
//a value or don’t care appropriately end default: // All signals set to ’don’t care’ begin
MEMWR = ’bx; MEMOE = ’bx;
FETCH = ’bx; EXE = ’bx; LDREG = ’bx; endcase end
LDCC = ’bx; IMMED = ’bx; BR = ’bx; SALUD = ’bx;
SRPA = 3’bxxx; SRPB = 3’bxxx; SWP =3’bxxx;
function Testbranch; //returns ‘1’ if branch taken, ‘0’ otherwise
input [11:8] BRANCH_INSTR;
input [3:0] CC;
reg N, Z, V, C;
61
begin
{N,Z,V,C} = CC;
case (BRANCH_INSTR)
0 : Testbranch = 1 ; // BAL
1 : Testbranch = 0 ; // BNV
2 : Testbranch = ~(C | Z); // BHI
3 : Testbranch = C | Z ; // BLS
4 : Testbranch = ~C ; // BCC
5 : Testbranch = C ; // BCS
6 : Testbranch = ~Z ; // BNE
7 : Testbranch = Z ; // BEQ
8 : Testbranch = ~V ; // BVC
9 : Testbranch = V ; // BVS
10 : Testbranch = ~N ; // BPL
11 : Testbranch = N ; // BMI
12 : Testbranch = V ~^ N ; // BGE
13 : Testbranch = V ^ N ; // BLT
14 : Testbranch =~((V ^ N) | Z) ; // BGT
15 : Testbranch = ((V ^ N) | Z) ; // BLE
endcase
end
endfunction endmodule
62
This appendix describes the RTL (Register Transfer Level) design of the processor. It contains an RTL schematic of the processor showing its component parts. The Bus
Interface and datapath components have already been designed for you. Your task in
Ex. 5 is to complete the Verilog specification of the Control Block. The entire processor design will then be complete and can be simulated with the same test program as was used for the top level design.
The schematic on the next page shows a RTL schematic of the STUMP processor. The address for the instruction fetch is kept in the Program Counter (Reg7). This is sent to memory when a fetch is performed. Instructions from memory are clocked in to the
Instruction Register (INSTR) at the end of the clock phase (on the positive clock edge); the Program Counter is also incremented at this time.
Instruction execution is split in to two phases. In the execute phase, a register (Reg0 to
Reg7) is read on port A and B of the Register Bank. The port A data is optionally shifted to form the ALU A operand. The ALU B operand is either supplied by port B of the Register Bank or by the immediate data in the instruction register which is sign extended to 16 bits. The ALU operation specified in the instruction is performed and its result is clocked in to the Result Register (ALUR) at the end of the execute phase
(on the positive clock edge). The condition bits are also clocked in to the 4-bit
Condition Code Register (CC) at this time if it is enabled.
The writeback phase completes the instruction execution. Usually the ALU result is written back to the Register Bank. Here, the register to be written is specified as a 3-bit address (Reg0 to Reg7) and this register is enabled by the Write signal (LDREG).
Since Reg0 contains zero, writing to Reg0 has no effect. The write occurs at the end of the phase (on the positive clock edge). A branch instruction which is not taken can write to Reg0 instead of Reg7 (or can make the write enable to the Register Bank inactive).
Load and store instructions operate differently in their writeback phase. Here, the ALU result is used as the memory address, MA(15:0). For a load instruction, the memory is read (MEMOE = ‘0’ i.e. is active) and the memory output, MDIN(15:0), placed in the specified register in the Register Bank. For a store instruction, the destination register specified in the instruction is read onto Port A and written to memory (MEMWR =
‘0’) at the address given by the ALU Result Register.
Notes: 1. The clock is applied to all registers at all times but the clock is ignored unless the register is enabled.
63
MDIN
0 1
WR_D incPC
A
B
B_RD
A_RD
PC
ALUA
0 1
ALUB
XIMMED
CCI
CC
ALUR
PC
0 1
MA
64
2. The control signals to the memory are all active low.
3. Signals names have been defined for the Control Block. Please also use these names in your Verilog code.
4. Although Verilog is case sensitive with regard to signal names, other tools in the flow are not. Hence, you should be consistent with the use of upper/lower case in your names and signal names should be unique.
All processor signals to/from memory or the outside world proceed via the Bus Interface Component. The Bus Interface signals to/from memory or other external devices are shown in Figure C.2. and are described below: to/from
Processor to/from
Memory
A_RD<15:0>
MDIN,15:0>
MA<15:0>
MEMOE
MEMWR
CLK
RESET
Bus Interface
BUS_D<15:0>
BUS_A<15:0>
BUS_RD
BUS_WR
BUS_CLK
GSR
BUS INTERFACE SIGNALS TO/FROM MEMORY AND OUTSIDE WORLD
BUS_D(15:0) bidirectional bus between the Bus Interface and the memory
BUS_A(15:0) 16-bit address to memory
BUS_WR active low signal which writes to memory
BUS_RD active low signal which reads from memory
RAM_CS active low signal which enables the memory. It is tied low so the memory is enabled all the time.
BUSCLK clock generated by a Clock Module
GSR global reset
Apart from the clock, the signals above are generated either by the datapath elements or the Control Block. They are listed below:
DATAPATH SIGNALS TO BUS INTERFACE
MA(15:0) 16-bit address to Bus Interface
MDIN(15:0) 16-bits of data from Bus Interface
65
A_RD(1:0) 16-bits of data to Bus Interface
CONTROL BLOCK SIGNALS TO BUS INTERFACE
MEMWR active-low signal to Bus Interface when memory is to be written
MEMOE active-low signal to Bus Interface when Memory is to be read
BUS INTERFACE TO DATAPATH &CONTROL BLOCK
CLK the clock signal which goes to all flip flops in the processor.
RESET reset signal - high when active
Apart from the RESET signal which is a global signal applied to both the datapath and Control Block prior to operating the processor, the remaining signals to/from the Control Block are internal to the processor. They can be partitioned into signals to and from the datapath, and they are summarised below:
DATAPATH SIGNALS TO CONTROL BLOCK
INSTR[15:2] the most significant 14 bits of the Instruction Register
CC[3:0] the 4-bit Condition Code Register
CONTROL BLOCK SIGNALS TO DATAPATH
BR indicates a branch instruction. It is used by the Sign Extender element to extend the least significant 8-bits while 5 bits are extended if BR is low.
FETCH indicates an instruction fetch is occurring
EXE indicates the Execute phase of operation i.e. data is read from the Register
Bank, operated upon and the result placed in the Result Register.
SRPA[2:0] 3-bit address specifying the register (Reg0 to Reg7) to be read out onto port A of the Register Bank
SRPB[2:0] 3-bit address specifying the register (Reg0 to Reg7) to be read out onto port B of the Register Bank
SWP[2:0] 3-bit address specifying the register in the Register Bank (Reg0 to Reg7) to be written to
IMMED selects the sign immediate operand rather than he Register Bank port B operand as the B-input to the ALU
LDCC enables the loading of the 4-bit Condition Code Register
LDREG enables writing to a register in the Register Bank
SALUD (Select ALU Data) selects data from the Result Register for writing into the Register Bank. If SALUD is low, data from the memory (MDIN[15:0] is selected instead.
66
Name:
Assembler Code Sequence :
Initial Register States:
R0[15:0]=’0000000000000000’=0x0000=0
R1[15:0]=
R2[15:0]=
R3[15:0]=
R4[15:0]=
R5[15:0]=
R6[15:0]=
R7[15:0]=
CC[3:0]=
Memory Address of first instruction in sequence=
67
Assembler Instruction:
Memory Address of Instruction
Instruction Register
R0
R1
R2
R3
R4
R5
R6
R7 = PC
CC
Result Register for a Store Instruction Data Written
(decimal) binary hex
0x
0x
0x
0x
0x
0x
0x
0x
0x
0x
0x
0x
Memory Address
(decimal) decimal
Assembler Instruction:
Memory Address of Instruction
Instruction Register
R0
R1
R2
R3
R4
R5
R6
R7 = PC
CC
Result Register for a Store Instruction Data Written
(decimal) binary hex
0x
0x
0x
0x
0x
0x
0x
0x
0x
0x
0x
0x
Memory Address
(decimal) decimal
68
Assembler Instruction:
Memory Address of Instruction
Instruction Register
R0
R1
R2
R3
R4
R5
R6
R7 = PC
CC
Result Register for a Store Instruction Data Written
(decimal) binary hex
0x
0x
0x
0x
0x
0x
0x
0x
0x
0x
0x
0x
Memory Address
(decimal) decimal
Assembler Instruction:
Memory Address of Instruction
Instruction Register
R0
R1
R2
R3
R4
R5
R6
R7 = PC
CC
Result Register for a Store Instruction Data Written
(decimal) binary hex
0x
0x
0x
0x
0x
0x
0x
0x
0x
0x
0x
0x
Memory Address
(decimal) decimal
69
Assembler Instruction:
Memory Address of Instruction
Instruction Register
R0
R1
R2
R3
R4
R5
R6
R7 = PC
CC
Result Register for a Store Instruction Data Written
(decimal) binary hex
0x
0x
0x
0x
0x
0x
0x
0x
0x
0x
0x
0x
Memory Address
(decimal) decimal
Assembler Instruction:
Memory Address of Instruction
Instruction Register
R0
R1
R2
R3
R4
R5
R6
R7 = PC
CC
Result Register for a Store Instruction Data Written
(decimal) binary hex
0x
0x
0x
0x
0x
0x
0x
0x
0x
0x
0x
0x
Memory Address
(decimal) decimal
70
Assembler Instruction:
Memory Address of Instruction
Instruction Register
R0
R1
R2
R3
R4
R5
R6
R7 = PC
CC
Result Register for a Store Instruction Data Written
(decimal) binary hex
0x
0x
0x
0x
0x
0x
0x
0x
0x
0x
0x
0x
Memory Address
(decimal) decimal
Assembler Instruction:
Memory Address of Instruction
Instruction Register
R0
R1
R2
R3
R4
R5
R6
R7 = PC
CC
Result Register for a Store Instruction Data Written
(decimal) binary hex
0x
0x
0x
0x
0x
0x
0x
0x
0x
0x
0x
0x
Memory Address
(decimal) decimal
71
Assembler Instruction:
Memory Address of Instruction
Instruction Register
R0
R1
R2
R3
R4
R5
R6
R7 = PC
CC
Result Register for a Store Instruction Data Written
(decimal) binary hex
0x
0x
0x
0x
0x
0x
0x
0x
0x
0x
0x
0x
Memory Address
(decimal) decimal
Assembler Instruction:
Memory Address of Instruction
Instruction Register
R0
R1
R2
R3
R4
R5
R6
R7 = PC
CC
Result Register for a Store Instruction Data Written
(decimal) binary hex
0x
0x
0x
0x
0x
0x
0x
0x
0x
0x
0x
0x
Memory Address
(decimal) decimal
72
Assembler Instruction:
Memory Address of Instruction
Instruction Register
R0
R1
R2
R3
R4
R5
R6
R7 = PC
CC
Result Register for a Store Instruction Data Written
(decimal) binary hex
0x
0x
0x
0x
0x
0x
0x
0x
0x
0x
0x
0x
Memory Address
(decimal) decimal
Assembler Instruction:
Memory Address of Instruction
Instruction Register
R0
R1
R2
R3
R4
R5
R6
R7 = PC
CC
Result Register for a Store Instruction Data Written
(decimal) binary hex
0x
0x
0x
0x
0x
0x
0x
0x
0x
0x
0x
0x
Memory Address
(decimal) decimal
73
Assembler Instruction:
Memory Address of Instruction
Instruction Register
R0
R1
R2
R3
R4
R5
R6
R7 = PC
CC
Result Register for a Store Instruction Data Written
(decimal) binary hex
0x
0x
0x
0x
0x
0x
0x
0x
0x
0x
0x
0x
Memory Address
(decimal) decimal
Assembler Instruction:
Memory Address of Instruction
Instruction Register
R0
R1
R2
R3
R4
R5
R6
R7 = PC
CC
Result Register for a Store Instruction Data Written
(decimal) binary hex
0x
0x
0x
0x
0x
0x
0x
0x
0x
0x
0x
0x
Memory Address
(decimal) decimal
74
(for your use in Exercise 5)
ALU
Operation Branch Load
FETCH
EXE
LDREG
SRPA[2:0]
SRPB[2:0]
SWP[2:0]
BR
IMMED
LDCC
SALUD
MEMOE
MEMWR
Phase 1 : Instruction Fetch
Store
FETCH
EXE
LDREG
SRPA[2:0]
SRPB[2:0]
SWP[2:0]
BR
IMMED
LDCC
SALUD
MEMOE
MEMWR
ALU Operation
Reg op Reg
⇒
Reg
Branch Load Store
Reg op Immed
⇒
Reg
PC + Immed
⇒
PC
Reg + Reg
⇒
Addr;
[Addr]
⇒
Reg
Reg + Immed
⇒
Addr;
[Addr]
⇒
Reg
Reg + Reg
⇒
Addr;
Reg
⇒
[Addr]
Reg+Immed
⇒
Addr;
Reg
⇒
[Addr]
Phase 2 : Decode/Execute
75
ALU Operation Branch Load Store
RegopReg
⇒
Reg
Reg op Immed
⇒
Reg
PC + Immed
⇒
PC
Reg+Reg
⇒
Addr;
[Addr]
⇒
Reg
Reg+Immed
⇒
Addr;
[Addr]
⇒
Reg
Reg+Reg
⇒
Addr;
Reg
⇒
[Addr]
Reg+Immed
⇒
⇒
Addr;
Reg
⇒
[Addr]
FETCH
EXE
LDREG
SRPA[2:0]
SRPB[2:0]
SWP[2:0]
BR
IMMED
LDCC
SALUD
MEMOE
MEMWR
Phase 3 : Writeback
Note: in Signal Usage charts, use
δ for ‘don’t care’.
76
Name:
ALU
Operation Branch Load
FETCH
EXE
LDREG
SRPA[2:0]
SRPB[2:0]
SWP[2:0]
BR
IMMED
LDCC
SALUD
MEMOE
MEMWR
Phase 1 : Instruction Fetch
Store
FETCH
EXE
LDREG
SRPA[2:0]
SRPB[2:0]
SWP[2:0]
BR
IMMED
LDCC
SALUD
MEMOE
MEMWR
ALU Operation
Reg op Reg
⇒
Reg
Branch Load Store
Reg op Immed
⇒
Reg
PC + Immed
⇒
PC
Reg + Reg
⇒
Addr;
[Addr]
⇒
Reg
Reg + Immed
⇒
Addr;
[Addr]
⇒
Reg
Reg + Reg
⇒
Addr;
Reg
⇒
[Addr]
Reg+Immed
⇒
Addr;
Reg
⇒
[Addr]
77
Phase 2 : Decode/Execute
ALU Operation Branch Load Store
RegopReg
⇒
Reg
Reg op Immed
⇒
Reg
PC + Immed
⇒
PC
Reg+Reg
⇒
Addr;
[Addr]
⇒
Reg
Reg+Immed
⇒
Addr;
[Addr]
⇒
Reg
Reg+Reg
⇒
Addr;
Reg
⇒
[Addr]
Reg+Immed
⇒
⇒
Addr;
Reg
⇒
[Addr]
FETCH
EXE
LDREG
SRPA[2:0]
SRPB[2:0]
SWP[2:0]
BR
IMMED
LDCC
SALUD
MEMOE
MEMWR
Phase 3 : Writeback
Note: in Signal Usage charts, use
δ for ‘don’t care’.
78