as a PDF

advertisement
Action Systems in Pipelined
Processor Design
J. Plosila
University of Turku, Department of Applied Physics,
Laboratory of Electronics and Information Technology,
FIN-20014 Turku, Finland,
tel: +358-2-333 6657,
fax: +358-2-333 5070,
email: juplos @utu.fi
K. Sere
Abo Akademi University, Department of Computer Science,
FIN-20520 Turku, Finland,
tel: +358-2-265 4537,
fax: +358-2-265 4732,
email: Kaisa.Sere@abo.fi
Turku Centre for Computer Science
TUCS Technical Report No 54
October 1996
ISBN 951-650-858-8
ISSN 1239-1891
Abstract
We show that the action systems framework combined with the re nement calculus is a powerful
method for handling a central problem in hardware design, the design of pipelines. We present a
methodology for developing asynchronous pipelined microprocessors relying on this framework.
Each functional unit of the processor is stepwise brought about which leads to a structured and
modular design. The handling of dierent hazard situations is realized when verifying re nement
steps. Our design is carried out with circuit implementation using speed-independent techniques
in mind.
Keywords: action systems, re nement, microprocessors, pipelines, asynchronous circuits
TUCS Research Group
Programming Methodology Research Group
1 Introduction
The design of pipelines is an important and complicated basic activity in hardware design 17, 14].
We show how the action systems framework combined with the re nement calculus is used to
design an asynchronous pipelined microprocessor. Action systems have proved to be very suited
to the design of parallel and distributed systems 2, 4, 3]. They are similar to the UNITY
programs 6] which have an associated temporal logic. The design and reasoning about action
systems is carried out within the re nement calculus that is based on the use of predicate
transformers. The re nement calculus for sequential programs has been studied by several
researchers 1, 12, 13].
Our starting point is a conventional sequential program describing the behaviour of the processor. Via a number of re nement steps we end up in a non-trivial pipelined processor where
several components can work concurrently. The main re nement technique used is atomicity
re nement 4] where the granularity of an action is changed. We also utilize the idea of decomposing an action system into a number of systems working in parallel and communicating
asynchronously via channels. Compared to our previous work 3] we have here given a more
lower-level derivation of the pipeline. Furthermore, we here present ways to handle pipeline
hazard and stalling within the framework.
We show how a realistic pipelined processor emerges step by step. At each step we concentrate on the design of one functional unit of the processor. The steps follow a very clear pattern
and are easily mechanizable. The basic strategy in the derivation is to realize ve pipeline stages:
(1) instruction fetch , (2) instruction decode, (3) execution, (4) memory access, and (5) write
back. The parallel operation of these stages is made possible by inserting bu ering assignments
into the data path, which corresponds to adding storage devices, i.e., registers, between the
stages. The decomposition process does not directly take a stand whether the data path should
be realized using the single-rail bundled data convention or the delay-insensitive dual-rail code
8]. Both approaches are possible in principle. However, the style in which the control and data
paths are here separated from each other is characteristic for the speed-independent bundled
data approach.
Asynchronous microprocessors have been modelled and designed by several research groups
11, 7, 10, 16]. What makes our approach dierent is the use of re nement calculus-based
transformation rules. We have identi ed the relevant language constructs and re nement rules
for pipeline design. We demonstrate that the same design method can be used for extracting
the functional components, dividing the control and data paths and for taking care of the
hazard situations. Only a small, limited set of transformation rules and re nement techniques
is necessary. The main advantage we gain is that in addition to having an uniform formalism to
rely on during the design, each design step can be formally veri ed correct within the re nement
calculus. This can be done at every design phase, not only in the high-level derivation, but also in
transforming a program into an implementable gate-level form containing boolean variables only
15]. Furthermore, we believe that the derivation presented here is useful in understanding the
ideas behind pipelined processors and the diculties managing with dierent hazard situations.
We proceed as follows. In section 2 we present the action systems framework together with
the needed re nement calculus concepts. The design methodology is sketched in section 3. The
design of the pipelined processor is presented in sections 4-7. We end with some concluding
remarks in section 8.
2 Action systems
Actions An action is a guarded command of the form g ! S where g is a boolean condition,
the guard and S , the body, is any statement in the guarded command language 9]. This language
1
includes assignment, sequential composition, conditional choice and iteration, and is de ned
using weakest precondition predicate transformers. The action A is said to be enabled when the
guard is true. The guard of an action A is denoted by gA and the body by sA.
We also use the following constructs:
Choice: The action A1 ] A2 tries to choose an enabled action from A1 and A2 , the choice
being nondeterministic when both are enabled.
Sequencing: The sequential composition of two actions A1 A2 rst behaves as A1 if this is
enabled, then as A2 if this is enabled at the termination of A1 . Sequencing forces A1 and
A2 to be exclusive. The scope of the sequential composition is indicated with parenthesis,
for example A (B ] C ).
Action systems An action system has the form:
A = j var x:I do A1 ] : : : ] Am od ]j : z:
The action system A is initialised by action I . Then, repeatedly, an enabled action from
A1 : : : Am is nondeterministically selected and executed. The action system terminates when
no action is enabled.
The local variables of A are the variables x and the global variables of A are the variables
z . The local and global variables are assumed to be distinct. The state variables of A consist
of the local variables and the global variables. The actions are allowed to refer to all the state
variables of an action system.
The actions are taken to be atomic. Therefore two actions that do not have any variables
in common can be executed in any order, or simultaneously. Hence, we can model parallel
programs with action systems taking the view of interleaving action activations.
Parallel composition Consider two action systems A and B
A = j var x:I do A1 ] : : : ] Am od ]j : z
B = j var y:J do B1 ] : : : ] Bn od ]j : v
where x \ y = . We de ne the parallel composition A k B of A and B to be the action system
C = j var x:I y:J do A1 ] : : : ] Am ] B1 ] : : : ] Bn od ]j : z v:
Thus, parallel composition will combine the state spaces of the two constituent action systems,
merging the global variables and keeping the local variables distinct.
The behaviour of a parallel composition of action systems is dependent on how the individual
action systems, the reactive components, interact with each other via the global variables that
are referenced in both components. We have for instance that a reactive component does not
terminate by itself: termination is a global property of the composed action system.
Hiding and revealing variables In the sequel we will explicitely denote with a bullet nota-
tion the action system where the global variables are declared as follows: The action system
v A : z accesses the global variables v and z. The variables v are declared within A whereas
the variables z are declared in some reactive component of A. We can hide some of the variables v by removing them from the list v. Hiding the variables makes them inaccessible from
other actions outside A in a parallel composition. Hiding thus has an eect only on the global
variables declared in A. The opposite operation, revealing, is also useful.
2
In connection with the parallel composition below we will use the following convention. Let
a1 A : a2 and b1 B : b2 be two action systems. Then their parallel composition is the action
system a1 b1 A k B : a2 b2 . Sometimes there is no need to reveal all these identi ers, i.e.,
when they are exclusively accessed by the two component action systems A and B. This eect is
achieved with the following construct that turns out to be extremely useful later: v j A k B ]j : c
Here the identi ers v are a subset of a1 b1 . In the sequel we will often omit the variable lists
v and c when they are clear from the context.
Renement Action systems are intended to be developed in a stepwise manner within the
re nement calculus. In the processor derivation, atomicity re nement is used as a main tool.
Here we briey describe these techniques.
The re nement calculus is based on the following de nition. Let S S be statements. The
statement S is rened by statement S , denoted S S , if
0
0
0
8Q:(wp(S Q) ) wp(S Q)):
0
Re nement between actions is de ned similarly, where the weakest precondition for an action
A is de ned as wp(A Q) =b gA ) wp(sA Q). This usual re nement relation is reexive and
transitive. It is also monotonic with respect to most of the action constructors used here,
e.g. choice and sequencing, see 5]. (Re nement between actions does not necessarily imply
re nement between action systems.)
Properties of actions We de ne some properties of actions that will be useful in describing
the atomicity re nement rule for action systems.
A predicate I is invariant over an action A, if I ) wp(A I ) holds. The way in which actions
can enable and disable each other is captured by the following de nitions.
A cannot enable B = :gB ) wp(A :gB )
A cannot disable B = gB ) wp(A gB )
A excludes B
= gA ) :gB:
Another important set of properties has to do with commutativity of actions. We say that
A commutes with B if A B B A. A sucient condition for two actions A and B to commute
is that there are no read-write conicts for the variables that they access: none of the variables
written by A is read or written by B and vice versa.
Atomicity renement The re nement calculus can be used to derive special purpose trans-
formation rules to be used within program development. The following theorem expresses one
such rule that will be applied repeteadly in our design. It is the so called atomicity re nement
rule that is a powerful tool when developing parallel and distributed systems 4]. It gives the
conditions under which the atomicity of an action can be changed by transforming one big
atomic action (b0 ! S0 do A1 od) into a number of smaller actions (b0 ! S0 and A1 ).
Theorem 1 fQg do (b0 ! S0 do A1 od) ] L ] R od do (b0 ! S0 ) ] A1 ] L ] R od
provided that
(i) Q ) :gA1 ,
(ii) fL Rg cannot enable or disable A1 ,
(iii) A0 = b0 ! S0 is excluded by A1 and
(iv) the actions in fL Rg that are not excluded by fA0 A1 g are such that
3
(a)
(b)
(c)
(d)
for each i = 0 1, L commutes with Ai ,
A1 commutes with R,
L commutes with R and
do R od always terminates.
Observe that above both A1 L and R can model a nondeterministic choice of actions. Furthermore, when applied to an action system A in parallel composition of another action system B,
the actions L and R above might also come from the system B. Hence, we need to consider the
entire system j A k B ]j.
The atomicity re nement rule is very general. It can, however, be used to derive more
specialiced rules, dedicated towards circuit design. Below we give one such rule:
Corollary 1
fQg do (b0 ! S0 do A1 A2 od) ] L ] R od do ((b0 ! S0) A1 A2 ) ] L ] R od
provided that
(i) Q ) :gA1 and
(ii) the actions in fL Rg that are not excluded by fA0 = b0 ! S0 A1 A2 g are such that
(a)
(b)
(c)
(d)
for each i = 0::2, L commutes with Ai ,
for each i = 1::2, Ai commutes with R,
L commutes with R and
do R od always terminates.
3 Initial specication and the design approach
The approach we use in our design is as follows. We start from an initial speci cation that
describes the behavior of the microprocessor as one big action that is continuously executed.
This system is stepwise re ned into a number of smaller, atomic actions and the single system
is decomposed into a number of reactive components modelling the dierent functional parts of
the microprocessor. At a design step one functional component is identi ed and extracted from
the rest of the design. The following re nements are carried out:
Bu ering assignment: The state of the relevant data variables is copied to the new component: x := f (y) j var y y := y x := f (y ) ]j.
Communication channel: A communication channel is introduced between the component
0
0
0
and the rest of the design. Here a channel is a variable c which in the simplest case
has two values: the value req (\request") is assigned by the active system and the value
ack (\acknowledgement") by the passive system, respectively. The initial state of such
a channel is ack. The sequential composition operator `' is used as a powerful tool for
separating the requesting action from the action that waits for the acknowledgement.
Atomicity renement: The action system is brought into a form where the atomicity
re nement theorem can be applied and thereafter the rule is applied. Applying the theorem
and proving it correct the dierent hazard situations arise as actions that don not commute
as required by the theorem.
4
Decomposition: The system is decomposed using the de nition of parallel composition of
action systems making the necessary adjustments for revealing variables.
In the following derivation the atomic entities, actions, are enclosed in the angle brackets `<>'
in order to make reading of the programs easier. The initial description of the microprocessor
is given as
P
:: imem dmem
j var i pc imem0::l] dmem0::m]
pc := pc0
do < true ! i := imempc]
if i:t = R ! pc reg i:c] := pc + 1 aluf(reg i:a] reg i:b] i:f )
] i:t = ld ! pc regi:c] := pc + 1 dmemregi:a] + i:o s]
] i:t = st ! pc dmemregi:a] + i:o s] := pc + 1 regi:b]
] i:t = be !
if reg i:a] 6= reg i:b] ! pc := pc + 1
] regi:a] = regi:b] ! pc := pc + i:o s
]j
od
>
where the variable i contains an instruction fetched from the instruction memory imem. The
variable pc is the program counter of the processor pointing to the instruction to be fetched.
An instruction is a record of six elds: i = (t a b c f o s). The eld t identi es the type of
the instruction: R denotes an arithmetic-logical or R operation speci ed in the eld f which is
a parameter of the function aluf , ld is a load operation from the data memory dmem to the
register bank reg, st is a store operation from the register bank to the data memory, and, nally,
be denotes a \branch when equal"-operation. The elds a and b contain the numbers of the read
register needed by the instruction, and c the number of the write register, respectively. The
oset eld o s contains the relative address used by the load, store and branch instructions.
4 Instruction fetching and decoding
Extracting the fetch and program counter units We start the derivation by separating
the fetch unit F and its close partner, the program counter unit P c, from the initial system P .
In this case, auxiliary buering assignments are not needed, because the variables i and pc can
be regarded as buers. Let us consider the fetch unit more closely.
We introduce an auxiliary variable fd with the values req and ack by re ning the system
P into
P1
:: imem dmem
j var fd i pc imem0::l] dmem0::m]
fd pc := ack pc0
do < fd = ack ! i fd := imempc] req if as before fd := ack > od
]j
This is a trivial transformation because at this point the variable fd is, even though it will
later model a communication channel, only an internal variable.
We bring the system P1 into the form P2 , where the atomicity re nement theorem can be
applied:
5
P2
:: imem dmem
j var fd i pc imem0::l] dmem0::m]
fd pc := ack pc0
do < fd = ack ! i fd := imempc] req do fd = req ! if as before fd := ack od >
]j
od
The above operation is only an syntactical trick: the systems P1 and P2 are actually
equivalent. Referring to the Theorem 1 we have that Q, A0 , and A1 correspond to (fd =
ack ^ pc = pc0), < fd = ack ! i fd := imempc] req >, and < fd = req ! if >,
respectively. The actions L and R do not exist in this case. Hence we only have to check
the conditions (i) and (iii) of the Theorem 1:
(i)
(iii)
(fd = ack ^ pc = pc0 ) :(fd = req)) true
(fd = req ) :(fd = ack)) true
Hence, according to the theorem, we can re ne the single action of P2 into two atomic
actions yielding the system
P3
:: imem dmem
j var fd i pc imem0::l] dmem0::m]
fd pc := ack pc0
do < fd = ack ! i fd := imempc] req > ] < fd = req ! if as before fd := ack > od
]j
The last step is quite straightforward: the above system P3 is decomposed into two separate
systems by the de nition of parallel composition. In other words, P3 j F k P 1 ]j, where
F
P1
:: fd i imem
j var i fd imem0::l]
fd := ack
do < fd = ack ! i fd := imempc] req > od
]j: pc
:: dmem
j var fd i pc dmem0::m]
pc := pc0
do < fd = req ! if as before fd := ack > od
]j
The program counter unit P c is extracted, beginning from the above system P 1 , basically
in the similar manner. We have that P 1 j P c k P 2 ]j, where
P c ::
pc
j var pc
pc := pc0
do < p = inc ! pc p := pc + 1 ack > ] < p = load ! pc p := iaddr ack > od
]j: p iaddr
6
P2
:: p iaddr dmem
j var p reg 0::k] iaddr dmem0::m]
p := ack
do < fd = req !
if i:t = R ! p reg i:c] := inc aluf(reg i:a] reg i:b] i:f )
] i:t = ld ! p regi:c] := inc dmemregi:a] + i:o s]
] i:t = st ! p dmemregi:a] + i:o s] := inc regi:b]
] i:t = be !
if reg i:a] 6= reg i:b] ! p := inc
] regi:a] = regi:b] ! iaddr p := pc + i:o s load
>
< p = ack ! fd := ack >
od
]j: fd i pc
The case is now somewhat more complex, because the single action of the P 1 is actually devided
into three parts instead of two as in the previous re nement. However, the same atomicity
re nement theorem can be applied. The required channel variable is now p which has three
values: the requests inc (increment the counter) and load (load the counter), and an acknowledgement ack. Observe that in P 2 the sending of a request and receival of the corresponding
acknowledgement takes place in two separate actions separated by the semicolon operator.
Extracting the decode unit Next we separate the decode unit D from P 2 . The job of this
system is to separate the elds of the incoming instruction and to read the needed registers. By
introducing the channel dx we re ne using the same approach as above P 2 j D k P 3 ]j, where
D
:: dx p t c f o s A B pc
j var i dx p t a b c o s f A B pc p dx := ack ack
do < fd = req ^ dx = ack ! i pc := i pc >
((< i :t 6= be !
if i :t = R ! t a b c f := i :t i :a i :b i :c i :f A B := reg a] reg b]
] i :t = ld ! t a c o s := i :t i :a i :c i :o s A := rega]
] i :t = st ! t a b o s A B := req i :t i :a i :b i :o s A B := rega] regb]
p := inc >
< p = ack ! dx fd := req ack >)
] (< i :t = be ! t a b o s := i :t i :a i :b i :o s A B dx := rega] regb] req >
< dx = ack ! fd := ack >))
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
od
]j: fd i reg pc
P3
:: iaddr reg dmem
j var reg 0::k] iaddr dmem0::m]
do < dx = req ^ t 6= be !
if t = R ! reg c] := aluf(A B f )
] t = ld ! regc] := dmemA + o s]
] t = st ! dmemA + o s] := B
dx := ack >
] (< dx = req ^ t = be !
if A 6= B ! p := inc
] A = B ! iaddr p := pc + o s load
>
< p = ack ! dx := ack >)
0
od
]j: dx p t c f o s A B pc
0
Note that now we have used a buering assignment, where the values of the variables i and
pc are copied into the local variables i and pc , respectively. The eect is that the systems F
and P 3 can execute concurrently, if i :t 6= be. Furthermore, communication through the channel
0
0
0
7
p is distributed to both above systems, because in the case of the branch instruction we must
postpone the program counter update until the comparison between the registers rega] and
regb] has been completed. In the case of the R, load, or store instruction we can update the
counter earlier. This distribution of the channel p is valid since the p-communications in the
systems are mutually exclusive.
5 Execution and hazard handling
Extracting the execution unit The execution unit X , given below, contains the ALU-
functions and the memory address computations. Observe that the jump address (iaddr) computation skips the buering assignment and uses the old variables t, A, B and o s instead of the
new ones t , A , B , and o s . This arrangement makes the execution of the branch instruction
more ecient in practise. The trick in question is possible, because a new instruction is not
fetched before the branch instruction has been completed.
The system X is obtained from the system P 3 by introducing the new channel xm and
re ning j D k P 3 ]j j D k X k P 4 ]j with
0
X
0
0
0
:: xm t iaddr daddr c C 1
j var xm iaddr daddr t c o s f A B C 1
xm := ack
do < dx = req ^ t 6= be ^ xm = ack !
if t = R ! t c f A B := t c f A B C 1 := aluf(A B f )
] t = ld ! t c o s A := t c o s A daddr := A + o s
] t = st ! o s A B := A B o s daddr := A + o s
xm dx := req ack >
] (< dx = req ^ t = be !
if A 6= B ! p := inc
] A = B ! iaddr p := pc + o s load
>
< p = ack ! dx := ack >)
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
od
]j: dx p t c f o s A B pc
4
P :: reg dmem
j var reg 0::k] dmem0::m]
do < xm= req !
if t = R ! reg c ] := C 1
] t = ld ! regc ] := dmemdaddr]
] t = st ! dmemdaddr] := B
xm := ack >
0
0
0
0
0
0
od
0
]j: xm t c B C 1 daddr
0
0
0
Application of the atomicity re nement theorem reveals that the action in P 4 and a corresponding L-action in the system D, a register read operation in fact, do not exclude each other, but
they do not commute. This violates the condition (iv)(a) of Theorem 1 which requires that such
an L-action commutes with the inner action A1 which in this case is the action of P 4 . Hence,
the above re nement is valid only under certain constraints, i.e., if the processor satis es the
invariant
Ihaz =b :(((i :t 6= ld) ^ haz1 ) _ ((i :t = ld) ^ haz2 )))
where
haz1 =b (xm = req) ^ (t = R _ t = ld) ^ (i :a = c _ i :b = c )
haz2 =b (xm = req) ^ (t = R _ t = ld) ^ (i :a = c )
This is justi ed as follows. The violation occurs when (1) the type of the instruction in the
system P 4 , the variable t , is either R or ld, which indicates a register write operation, and (2)
0
0
0
0
0
0
0
0
0
0
0
8
0
0
the write register in P 4 , the variable c , is the same as a read register in D, the variable i :a
or i :b. In other words, an incoming instruction in D wants to read the register into which the
executing instruction in P 4 is going to write. This read-write conict is called a pipeline hazard .
The above predicate haz1 applies to a case, where the incoming instruction is of the R, store,
or branch type, because such an instruction needs two read registers rega] and regb]. The
predicate haz2 , in turn, applies to an incoming load instruction which reads the register rega]
only.
0
0
0
Adding the hazard detection unit It is not a very practical approach to assume that the
system describing the processor satis es always the above invariant Ihaz . Instead the decode
unit D must deal with a conict state by postponing the execution of the incoming instruction
(stalling the pipeline) until the conict has been resolved, i.e., until the executing R or load
instruction has completed its register write operation. For this we add the hazard or conict
detection unit Haz into the above composition by introducing the channel hz and re ning
j D k X k P 4 ]j j Haz k D1 k X k P 4 ]j, where the initial composition satis es the invariant
Ihaz , and
Haz
:: j do < hz = req ^ t 6= ld ^ :haz1 ! hz := ack >
] < hz = req ^ t = ld ^ :haz2 ! hz := ack >
0
0
od
]j: t a b t c hz
D1
0
0
:: dx p t c f o s A B pc
j var i dx p hz t a b c o s f A B pc p dx hz := ack ack ack
do < fd = req ^ dx = ack ! i pc := i pc >
((< i :t 6= be !
if i :t = R ! t a b c f := i :t i :a i :b i :c i :f
] i :t = ld ! t a c o s := i :t i :a i :c i :o s
] i :t = st ! t a b o s A B := req i :t i :a i :b i :o s
hz := req >
< hz := ack ! if t 6= ld ! A B := reg a] reg b] ] t = ld ! A := reg a] p := inc >
< p = ack ! dx fd := req ack >)
] (< i :t = be ! t a b o s := i :t i :a i :b i :o s hz := req >
< hz = ack ! A B dx := reg a] reg b] req >
< dx = ack ! fd := ack >))
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
od
]j: fd i reg pc
where haz1 =b haz1 a b=i :a i :b] and haz2 =b haz2 a=i :a]. The resulting composition satis es
dx = req ) Ihaz t haz1 haz2 =i :t haz1 haz2 ], which indicates that there is no conict present
whenever the execution unit X is active. Furthermore, the hazard detection with the possible
waiting period takes place after the incoming instruction has been decoded in D1 , but before the
register bank is read and the program counter incremented. This ensures that the right register
values are copied to the variables A and B .
0
0
0
0
0
0
0
0
6 Memory and register access
Extracting the memory access and write back units The system P 4 is split into two
parts: the memory access unit M and the write back unit W which communicate via the channel
mw. Hence, we re ne j P 4 k Haz ]j j M k W k Haz1 ]j, where
9
M ::
mw t C 2 dmem
j var mw C 2 dmem0::m]
mw := ack
do < xm = req ^ t 6= st ^ mw = ack !
if t = R ! t c C 1 := t c C 1
] t = ld ! t c daddr := t c daddr C 2 := dmemdaddr ]
mw xm := req ack >
] < xm = req ^ t = st ^ mw = ack !
t B daddr := t B daddr dmemdaddr ] xm := B ack >
00
0
0
00
00
0
00
00
0
0
0
0
0
0
0
0
00
od
00
0
0
0
0
00
]j: xm t B c daddr
W
0
0
0
:: reg
j var reg 0::k] c C do < mw = req !
if t = R ! c C := c C 1
] t = ld ! c C := c C 2
regc ] mw := C ack >
000
00
000
00
00
000
00
0
000
od
]j: mw t c C 1 C 2
00
00
0
Because this atomicity re nement splits the initial register write operation into two atomic actions connected by the channel mw, we have to weaken the hazard conditions into haz1 =b haz1 _
(mw = req ^ (a = c _ b = c )), and haz2 =b haz2 _ (mw = req ^ a = c ). This yields the system
Haz1 mentioned in the above re nement relation.
00
00
00
00
0
0
00
Extracting the register bank The register access operations are separated from the systems
D and W . As the result, we get the register system Rgs. The register read operations are
activated through the channel regr and the write operations through regw, respectively. We
re ne j D1 k W k Haz 1 ]j j Rgs k D2 k W 1 k Haz 2 ]j, where
Rgs ::
reg
j var reg 0::k] A B do < regw = req ! reg c ] regw := C ack >
] < reqr = req !
if t 6= ld ! A B := reg a] reg b]
] t = ld ! A := rega]
regr := ack >
000
od
]j: regw regr c a b C
D2
000
:: dx p t c f o s A B pc
j var i dx p hz t a b c o s f A B pc p dx hz := ack ack ack
do < fd = req ^ dx = ack ! i pc := i pc >
((< i :t 6= be !
if i :t = R ! t a b c f := i :t i :a i :b i :c i :f
] i :t = ld ! t a c o s := i :t i :a i :c i :o s
] i :t = st ! t a b o s A B := req i :t i :a i :b i :o s
hz := req >
< hz := ack ! p regr := inc req >
< p = ack ^ regr = ack ! dx fd := req ack >)
] (< i :t = be ! t a b o s := i :t i :a i :b i :o s hz := req >
< hz = ack ! regr := req >
< regr = ack ! dx := req >
< dx = ack ! fd := ack >))
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
od
]j: fd i reg pc
10
0
0
0
0
0
W1
:: regw c C
j var regw C c regw := ack
do < mw = req ^ regw = ack !
if t = R ! c C := c C 1
] t = ld ! c C := c C 2
regw mw := req ack >
000
000
od
00
000
00
00
000
00
0
]j: mw t c C 1 C 2
00
00
0
The atomicity re nement using the channel regw enforces us to weaken the hazard conditions
into haz1 =b haz1 _ (regw = req ^ (a = c _ b = c )), and haz2 =b haz2 _ (regw = req ^ a = c ),
which gives the mentioned system Haz 2 .
000
00
000
000
000
00
000
Extracting the memory blocks As done with the register operations, also the memory
manipulations are placed into separate systems. The instruction memory system I m is separated
from the fetch unit F , and the data memory system Dm from the memory access unit M,
respectively. The system I m needs only a read channel imr, while Dm requires both a read
channel dmr and a write channel dmw. We re ne j F k M k Haz 2 ]j j I m k Dm k F 1 k
M1 k Haz3 ]j, where
I m ::
imem i
j var imem0::l] i
do < imr = req ! i imr := dmempc] ack > od
]j: imr pc
Dm :: C 2 dmem
j var dmem0::m] C 2
do < dmw = req ! dmemdaddr ] dmw := B ack >
] < dmr = req ! C 2 dmr := dmemdaddr ] ack >
0
00
0
od
]j: dmw dmr daddr B
1
F :: fd imr
j var fd imr
fd imr := ack ack
do < fd = ack ! imr := req >
< imr = ack ! fd := req >
0
00
od
]j
1
M :: mw dmw t B c daddr em
j var mw dmw t B c daddr em mw dmw dmr em := ack ack ack false
do (< xm = req ^ t = R ^ :em ^ dmw = ack !
t c C 1 := t c C 1 em mw xm := true req ack >
< mw = ack ! em := false >)
] (< xm = req ^ t = ld ^ :em ^ dmw = ack !
t c daddr := t c daddr em dmr xm := true req ack >
< dmr = ack ! mw := req >
< mw = ack ! em := false >)
] < xm = req ^ t = st ^ :em ^ dmw = ack !
B daddr := B daddr dmw xm := req ack >
00
00
00
0
00
00
00
0
0
00
00
00
00
0
0
0
0
0
0
0
0
00
od
0
0
]j: xm t B c daddr
0
0
0
In this case, atomicity of the involved actions is changed in such a way, that the hazard conditions
haz1 and haz2 cannot be preserved, i.e., weakened enough, without introducing an auxiliary
variable. The reason for this is that in the system M1 the assignments xm := ack and mw := req
are executed sequentially, in two separate actions, if t = ld. The problem is solved by using the
boolean variable em in M1 and weakening the hazard conditions into haz1 =b haz1 em =(mw =
000
000
0
0000
11
000
req)], and haz2 =b haz2 em =(mw = req)]. This update yields the hazard detection system
Haz3.
0000
000
7 Extracting the other data path components
By combining the above re nement steps we have that
P j F 1 k D2 k X k M1 k W 1 k P c k I m k Rgs k Dm k Haz3 ]j
Dm
C2
C1’
t’’
c’’
M
1
dmw
mw
W
1
The block diagram of this composition is shown in Fig. 1
dmr
c’’’
C
regw
B’
B
B’’
daddr
A
C1
c’
b
D
Pc
p
pc
imr
F1
Im
i
fd
a
b
t
a
Rgs
c
f
t
offs
dx
pc’
hz
Haz
2
regr
3
X
t’
xm
c’’’
c’’
c’
xm
em
regw
daddr’
iaddr
Figure 1: Intermediate block diagram of the processor
We complete the decomposition process by separating the four pipeline registers P r1 , . . . ,
P r4 from the stages. In addition, the decode unit and the execution unit are split into the
control and data parts. The control systems are D:c and X :c, and the data systems D:d and X :d,
12
respectively. These transformations follow exactly the same principles as the above derivation.
To summarize, we perform the re nements
D2 j D:c k D:d k P r1 ]j
j X k Haz3 ]j j X :c k X :d k P r2 k Haz4 ]j
M1 j M2 k P r3 ]j
W 1 j W 2 k P r4 ]j
As an example, for the execution unit we have that
X :c ::
xm xd r2w ex
j var xm xd r2w ex xm xd r2w ex := ack ack ack false
do (< dx = req ^ :ex ^ t 6= be ! r2w := req >
< r2w = ack ! ex xd dx := true req ack >
< xd = ack ! xm := req >
< xm = ack ! ex := false >)
] (< dx = req ^ t = be ! xd := req >
(< xd = inc ! p := inc >
] < xd = load ! p := load >)
< p = ack ! dx := ack >)
od
]j: dx t
X :d :: iaddr daddr C 1
j var iaddr daddr C 1
do < xd = req !
if t = R ! C 1 xd := aluf(A B f ) ack
] t = ld ! daddr xd := A + o s ack
] t = st ! daddr xd := A + o s ack
] t = be !
if A 6= B ! xd := inc
] A = B ! iaddr xd := pc + o s load
0
0
0
0
0
0
0
0
0
0
0
>
od
]j: xd t t f o s A B A B
P r2 :: t A B c f o s
j var t A B f o s c do < r2w = req !
if t = R ! t A
] t = ld ! t A
] t = st ! t A
r2w := ack >
0
0
0
0
0
0
0
0
0
0
0
0
pc
0
0
0
0
0
0
0
0
0
0
0
0
B c f := t A B c f
c o s := t A c o s
B o s := t A B c f
0
0
0
0
0
0
0
od
]j: t A B c f o s r 2w
where two new communication channels, xd and rw2, are introduced. Note that the auxiliary
boolean variable ex is used to make updating of the hazard conditions possible in the above
atomicity re nement. This procedure corresponds to the earlier discussion concerning the auxiliary variable em of the memory access system M1 . The new hazard detection unit Haz 4 is
obtained by substituting haz1 =b haz1 ex =(xm = req)], and haz2 =b haz2 ex =(xm = req)].
The block diagram of the nal composition is shown in Fig. 2
00000
0000
00000
0000
8 Concluding remarks
We have presented a formal framework for the design of asynchronous pipelined microprocessors.
The main tools used were the atomicity re nement rule for action systems and the spatial
13
Figure 2: Final block diagram of the processor
14
Pc
p
iaddr
pc
Im
F
imr
1
i
fd
Pr1
i’
pc’
r1w
D.d
dd
D.c
b
a
Rgs
A
B
f’
f
A’
B’
offs’
c’’’
c’’
c’
offs
c’
Pr2
r2w
c
dx
4
t’
hz
Haz
t
regr
a
b
t
ex
em
regw
X.d
xd
X.c
B’
daddr
C1
c’
t’
Pr3
xm
B’’
daddr’
C1’
c’’
t’’
r3w
2
Dm
dmw
M
C2
dmr
Pr4
mw
C
c’’’
r4w
regw
W
2
decomposition of an action system into a number of reactive components communicating via
shared variables.
We did not give the low-level descriptions of the pipeline stages in this paper, but the
emphasis was on identifying and separating the functional units by introducing an abstract
communication mechanism which can be easily transformed into a concrete, realizable form
15].
Acknowledgements Our work was inspired by discussions with the participants (especially
with Mark Josephs) at the ACiD-WG working group meeting in Groningen in September 1996.
The research is supported by the Academy of Finland.
References
1] R. J. R. Back. On the Correctness of Renement Steps in Program Development. PhD thesis, Department
of Computer Science, University of Helsinki, Helsinki, Finland, 1978. Report A{1978{4.
2] R. J. R. Back and R. Kurki-Suonio. Decentralization of process nets with centralized control. In Proc. of
the 2nd ACM SIGACT{SIGOPS Symp. on Principles of Distributed Computing, pages 131{142, 1983.
3] R. J. R. Back, A. J. Martin, and K. Sere. Specifying the Caltech asynchronous microprocessor. Science of
Computer Programming, North-Holland. Accepted for publication.
4] R.J.R. Back and K. Sere. Stepwise renement of action systems. Structured Programming, 12:17-30, 1991.
5] R. J. R. Back and J. von Wright. Renement calculus, part I: Sequential nondeterministic programs. In J.
W. de Bakker, W.{P. de Roever, and G. Rozenberg, editors, Stepwise Renement of Distributed Systems:
Models, Formalisms, Correctness. Proceedings. 1989, volume 430 of Lecture Notes in Computer Science,
pages 42{66. Springer{Verlag, 1990.
6] K. Chandy and J. Misra. Parallel Program Design: A Foundation. Addison{Wesley, 1988.
7] I. David, Ran Ginosar, and Michael Yoeli. Self-timed architecture of a reduced instruction set computer. In
S. Furber, M. Edwards, editors, Asynchronous Design Methodologies , pages 29-43, North-Holland, 1993.
8] A. Davis and S.M. Nowick. Asynchronous circuit design: motivation, background and methods. In G.
Birtwistle and A. Davis, editors, Asynchronous Digital Circuit Design , pages 1-49. Springer, 1995.
9] E. W. Dijkstra. A Discipline of Programming. Prentice{Hall International, 1976.
10] S. Furber. Computing wthout clocks: micropipelining the ARM processor. In G. Birtwistle and A. Davis,
editors, Asynchronous Digital Circuit Design , pages 211-262. Springer, 1995.
11] A. J. Martin, S. Burns, T. Lee, D. Borkovic, and P. Hazewindus. The rst asynchronous microprocessor: the
test results. Computer Architecture News , pages 95-110, MIT Press,1989.
12] C. C. Morgan. The specication statement. ACM Transactions on Programming Languages and Systems,
10(3):403{419, July 1988.
13] J. M. Morris. A theoretical basis for stepwise renement and the programming calculus. Science of Computer
Programming, 9:287{306, 1987.
14] D. A. Patterson and J. L. Hennessy. Computer Organization and Design. The Hardware/Software Interface.
Morgan Kaufmann Publishers, 1994.
15] J. Plosila, R. Rukse_ nas, K. Sere, and T. Kuusela. Synthesis of DI Circuits within the Action Systems
Framework. Manuscript,1996.
16] R. Sproull, I. Sutherland, C. Molnar. Counterow pipeline architecture. Technical report, SMLI TR-94-25,
Sun Microsystems Laboratories Inc, April 1994.
17] I. E. Sutherland. Micropipelines. Communications of the ACM, 32(6):720{738, June, 1989.
15
Turku Centre for Computer Science
Lemminkaisenkatu 14
FIN-20520 Turku
Finland
http://www.tucs.abo.
University of Turku
Department of Mathematical Sciences
Abo Akademi University
Department of Computer Science
Institute for Advanced Management Systems Research
Turku School of Economics and Business Administration
Institute of Information Systems Science
Download