Blackfin Reference Sheet Developed 1st December 2003, smithmr

advertisement
Blackfin (ADSP-BFXXX) Reference
V1.3 developed
1st
Nov. 2004, smithmr@ucalgary.ca
PROGRAMMING MODEL
R0 to R7
P0 to P5
FP
A0, A1
Data registers
Pointer registers
Frame pointer
Accumulator registers
R0, R1, R2 volatile
P0, P1 volatile
SP
Stack pointer
LC0, LC1 Loop counters
DSP REGISTERS – ALL VOLATILE
I0 to I3 index registers (Ireg)
M0 to M3 modify registers (Mreg)
B0 to B3 base registers
L0 to L3 length registers
Breg start of circular buffer of length Lreg using post-increment register Mreg with index
register Ireg
NOTATION CONVENTION
imm
imm3
reg
dreg
statbit:
reg_lo
signed immediate
uimm unsigned immediate
-4 to +3
uimm3 0 to 7
Any register R0 to R7, P0 to P5
Any data register R0 to R7 Preg
Any pointer register P0 to P5
AZ, AN, AC0, AC1, V, VS, AV0, AV0S, AV1, AV1S, AQ
low part of register (R0.L) reg_hi high part of register (P0.H)
PARAMETER PASSING EXAMPLE
#define INPAR4_ON_STACK
#define INPAR3_SPACE_ON_STACK
#define INPAR2_SPACE_ON_STACK
#define INPAR1_SPACE_ON_STACK
#define RETS_LOCATION_ON_STACK
#define OLD_FP_LOCATION_ON_STACK
#define SAVED_P3
#define SAVED_P4
#define OUTPAR4_ON_STACK
#define OUTPAR3_SPACE_ON_STACK
#define OUTPAR2_SPACE_ON_STACK
#define OUTPAR1_SPACE_ON_STACK
section program;
.extern _Somewhere;
.global _Foo;
20
16
12
8
4
0
20
16
12
8
4
0
// NOT IN R3
// In R2
// In R1
// In R0
// Relative to FP
// Relative to SP
// NOT IN R3
// In R2
// In R1
// In R0
.extern _Subroutine;
// void Foo(INPAR1, INPAR2, INPAR3, INPAR4)
_Foo: LINK 24;
// 16 spaces for new stack + 2 saved registers
[SP + SAVED_P4] = P4;
// Save non-volatile registers on the stack
P4.L = _Somewhere;
// Point to memory location _Somewhere
P4.H = _Somewhere;
// Reference resolved by linker since .extern
[FP + INPAR1_SPACE_ON_STACK] = R0; // Save for later
[FP + INPAR2_SPACE_ON_STACK] = R2; // Save for later
R0 = [FP + INPAR4_ON_STACK];
// OUTPAR4 = INPAR4
[SP + OUTPAR4_ON_STACK] = R0;
R2 = 0xFFFF (X);
// Sign extend OUTPAR3value
//
R1 = R1;
// OUTPAR2 = INPAR2
R0 = 0xFFFF (Z);
// Zero extend OUTPAR1 value
CALL _Subroutine;
// Subroutine(0xFFFF, INPAR2, 0xFFFF, INPAR4)
W[P4] = R0;
// Store return value as 16-bit
P4.L = lo(FIO_FLAG_D); P4.H = hi(FIO_FLAG_D); // Constant from
// <defsBF533.h> requires hi/lo macros
P4 = [SP + SAVED_P4];
Also see P0 = [FP + 4]; // Get RETS
UNLINK
UNLINK
RTS
JUMP (P0); // Not clear why used
PROGRAM FLOW INSTRUCTIONS
COMPARE (CONTINUED)
JUMP User_Label PC replaced by address of User_Label
JUMP (Preg)
PC replaced by value in P-register
IF CC Jump UserLabel if CC = 1 PC replaced by address of User_Label
IF !CC Jump UserLabel if CC = 0 PC replaced by address of User_Label
IF CC Jump UserLabel (bp) IF !CC Jump UserLabel (bp) are versions where the branch
is predicted to be taken. Correctly predicting branches improves pipeline performance
Compare Pointer Registers -- Not parallel (16-bit)
CALL User_Label PC replaced by address of User_Label next instruction  RETS
CALL (Preg)
PC replaced by value in P-register
next instructions  RETS
RTS return from subroutine (RETS)
RTX return from exception (RETX)
RTE return from emulation (RETE)
RTI return from interrupt (RETI) ,
RTN return from NME (RETN)
Return register used in brackets
Loop loop_name loopcounter;
Loop_begin loop_name; 1st instr. Loop_end loop_name; last instruction
Lsetup(Label_1stinstruction, Label_last) loopcounter;
Can use Loopcounter, Loopcounter = Preg or Loopcounter = Preg >> 1
LTn, LBn, LCn (Loop_Top, Loop_Bottom, Loop_Counter) can be set directly
LOAD / STORE INSTRUCTIONS
reg_lo = uimm16; reg_hi = uimm16; half-word loads
reg = uimm16 (Z); zero extended to 32 bits
reg = imm16 (X); signed extended to 32 bits (also imm7 version)
Loading 32 bit values
reg.L = uimm32 & 0xFFFF; reg.H =(uimm32 >>16) & 0xFFFF;
BUT .IMPORT value; reg.L = value; reg.H = value; (half-word correct)
Preg = [ indirect_address ]; [indirect_address] = Pref;
where indirect address is Preg, Preg++, Preg--, Preg + offset, Preg – offset, FP – offset Offsets
factor of 4
Dreg = [ indirect address ]; [indirect_address] = Dreg; where indirect address is Preg,
Preg++, Preg--, Preg + small / large offset, Preg – large offset, FP – offset. Preg ++ Preg, Ireg,
Ireg++. Ireg--, Ireg ++ Mreg
Dreg = W [ indirect address ] (Z); zero-extend half word fetch
Dreg = W [ indirect address ] (X); sign-extend half word fetch
Dreg = B[indirect_address] (Z); Dreg = B[indirect_address] (X) where indirect address is
Preg, Preg++, Preg--, Preg + offset, Preg - offset,
Word access only Preg ++ Preg offset factor of 2
Dreg_lo = W[indirect_address]; Dreg_hi = W[indirect_address];
W[indirect_addres] = Dreg_lo; W[indirect_address] = Dreg_hi;
where indirect address is Ireg, Ireg++, Ireg--, Preg, Preg ++ Preg
COMPARE INSTRUCTIONS
CC = Operand_1
CC = Operand_1
CC = Operand_1
CC = Operand_1
CC = Operand_1
== Operand_2;
<= Operand_2;
signed compare
<= Operand_2 (UI); unsigned compare
< Operand_2;
signed compare
< Operand_2 (UI); unsigned compare
Compare Data Registers -- Not parallel (16-bit)
Operand_1 Dreg
uimm3
Operand_2 Dreg or small constant where small constant is imm3 or
Operand_1 Preg
uimm3
Operand_2 Preg or small constant where small constant is imm3 or
Compare Accumulator Registers -- Not parallel (16-bit)
Operand_1 A0
Operand_2 A1
Always signed compares
MOVE CC INSTRUCTIONS
Dest OP CC Dest Dreg, statbit
OP =, |=, &=, ^= e.g. R0 |= CC;
CC OP Source; Source Dreg, statbit
Note: CC = Dreg, CC = 1 if Dreg != 0
NEGATE CC INSTRUCTIONS
CC = ! CC;
MOVE INSTRUCTIONS
genreg = genreg ;
genreg = dagreg ;
dagreg = genreg ;
dagreg = dagreg ;
genreg = USP ;
USP = genreg ;
Dreg = sysreg ; /* sysreg to 32-bit D-register */
sysreg = Dreg ; /* 32-bit D-register to sysreg */
sysreg = Preg ; /* 32-bit P-register to sysreg */
sysreg = USP;
A0 = A1 ;
/* move 40-bit Accumulator value */
A1 = A0 ;
/* move 40-bit Accumulator value */
A0 = Dreg ;
/* 32-bit D-register to 40-bit A0, sign extended */
A1 = Dreg ;
/* 32-bit D-register to 40-bit A1, sign extended */
Accumulator to D-register Move:
Dreg_even = A0 (opt_mode) ; /* move 32-bit A0.W to even Dreg */
Dreg_odd = A1 (opt_mode) ; /* move 32-bit A1.W to odd Dreg */
Dreg_even = A0, Dreg_odd = A1 (opt_mode) ;
/* move both Accumulators to a register pair */
Dreg_odd = A1, Dreg_even = A0 (opt_mode) ;
/* move both Accumulators to a register pair */
IF CC DPreg = DPreg ;
IF ! CC DPreg = DPreg ;
Dreg = Dreg_lo (Z) ;
Dreg = Dreg.B (Z);
Acc.X = Dreg_lo;
Dreg_lo = Acc.X;
Acc.L = Dreg_lo;
Dreg_lo = Acc.L;
Acc.H = Dreg_hi;
Dreg_hi = Acc.H;
/* move if CC = 1 */
/* move if CC = 0 */
Dreg, Preg, SP, FP
Dreg, Preg, SP, FP
Dreg = Dreg_lo (X) ;
Dreg = Dreg.B (X); lowest 8 bits
Least significant 8-bits moved
8 bits moved, sign extended
Least significant 16-bits moved
16 bits moved
Most significant 16-bits moved
16 bits moved
Accumulator to Half D-register Move supports the following options
Signed fraction format (default).
Unsigned fraction format (saturated) (FU).
Signed and unsigned integer formats (IS) (IU).
Signed fraction with truncation (T),
Signed fraction with scaling and rounding (S2RND),
Signed integer with scaling (ISS2),
Signed integer with high word extract (IH) MORE INFO TO BE ADDED
STACK INSTRUCTIONS
SHIFT / ROTATE INSTRUCTIONS
ARITHMETIC INSTRUCTIONS
LINK uimm (Manual says minimum value is 8, but LINK 0 and LINK 4 seem OK)
Saves RETS and FP on stack, copies SP into FP and then decrements SP
UNLINK causes FP  SP then Mem[SP ++]  FP, Mem[SP++]  RETS
dest_pntr = (dest_pntr + src_reg) << 1; Down shift not allowed
dest_pntr = (dest_pntr + src_reg) << 2;
dest_reg = (dest_reg + src_reg) << 1;
dest_reg = (dest_reg + src_reg) << 2;
dest_pntr = adder_pntr + ( src_pntr << 1 );
dest_pntr = adder_pntr + ( src_pntr << 2 );
LOGICAL INSTRUCTIONS
ARITHMETIC SHIFT
dest_reg = ABS src_reg;
dest_reg = src_reg_1 + src_reg_2;
NOTE: dest_reg.LorH = src_reg1.LorH + src_reg2.LorH (mode); mode = (NS) or (S)
// Arithmetic is saturating or non-saturating (normal math is NS)
NOTE: dest_reg = src_reg_1 +|- srec_reg_2; H + H and L + L operations both done
// Can also do + | +, + | -, - | +, - | Dreg_lo_hi = Dreg + Dreg (RND20) ; STEP 1: Downshift by 4 and then
Dreg_lo_hi = Dreg - Dreg (RND20) ; STEP 2: perform operation, round top 16 bits
STEP 3: and use top 16 bits – fractional number
Dreg_lo_hi = Dreg + Dreg (RND12) ; STEP 1: Upshift by 4 and then
Dreg_lo_hi = Dreg - Dreg (RND12) ; STEP 2: perform operation,
STEP 3: round and use top 16 bits
Dreg = MAX ( Dreg , Dreg ) ;
Dreg = MIN ( Dreg , Dreg ) ;
Preg -= Preg ;
Ireg -= Mreg ;
Preg += Preg (BREV) ;
Ireg += Mreg (opt_brev) ;
dest_reg = src_reg_0 * src_reg_1 (opt_mode) (16 bit mult)
Dreg *= Dreg ; (32 bit mult)
accumulator = src_reg_0 * src_reg_1 (opt_mode)
accumulator += src_reg_0 * src_reg_1 (opt_mode)
accumulator –= src_reg_0 * src_reg_1 (opt_mode)
dest_reg_half = (accumulator = src_reg_0 * src_reg_1) (opt_mode)
dest_reg_half = (accumulator += src_reg_0 * src_reg_1) (opt_mode)
dest_reg_half = (accumulator –= src_reg_0 * src_reg_1) (opt_mode)
dest_reg = (accumulator = src_reg_0 * src_reg_1) (opt_mode)
dest_reg = (accumulator += src_reg_0 * src_reg_1) (opt_mode)
dest_reg = (accumulator –= src_reg_0 * src_reg_1) (opt_mode)
dest_reg = – src_reg;
dest_accumulator = – src_accumulator
dest_reg = src_reg (RND) (32 bit to 16 bit round and saturate)
accumulator = accumulator (S)
dest_reg = SIGNBITS sample_register
dest_reg = src_reg_1 - src_reg_2;
Ireg -= 2 ;
Ireg -= 4 ;
-- SP point to next used location
[ -- SP] = allreg;
allreg = [SP ++];
[ -- SP] = ( R7 : Dreglim, P5 : Preglimit) – or Dreg and Preg on their own
Dreg = Dreg1 LOGICAL_OP Dreg2;
LOGICAL_OP - &, |, ^
Dreg = ~Dreg1; complement
Also BXOR and BXORSHIFT -- more later
BIT INSTRUCTIONS
BitInstruction(Dreg, bit position) where bit_position is 0 to 31
BitInstruction is BITCLR (clear), BITSET (set), BITTGL (toggle),
CC =BITTST (Dreg, bit position) Bit test
CC = !BITTST (Dreg, bit position) Bit test
R0 = R1.B(X); R0 = R1.B(Z); // Extract and sign extend a byte value
// CAN”T DO MATH ON A BYTE VALUE DIRECTLY
Dreg = DEPOSIT ( backgroundDreg, foregroundDreg ) ;
Dreg = DEPOSIT ( Dreg, Dreg ) (X) ; /* sign-extended */
Foreground format – bits 31 to 16, pattern to be moved,
bits 15 to 8, position in backgroundDreg where last (right) bit is moved
bits 7 to 6, length of bits 31 to 16 to be moved
R7 = DEPOSIT(R4, R3);
R4=0b1111 1111 1111 1111 1111 1111 1111 1111
R3=0b0000 0000 0000 0000
0000 0111
0000 0011
R7=0b1111 1111 1111 1111 1111 1100 0111 1111
R7 = DEPOSIT(R4, R3) (x) ; /* sign-extended*/
R4=0b1111 1111 1111 1111 1111 1111 1111 1111
R3=0b0101 1010 0101 1010 0000 0111 0000 0011
R7=0b0000 0000 0000 0000 0000 0001 0111 1111
Dreg = EXTRACT ( sceneDreg, patternDreg_lo ) (Z) ;
Dreg = EXTRACT (Dreg, Dreg_lo ) (X) ; /* sign-extended (b)*/
PatternDreg format bits 15 to 8, position in screenDreg extracted
bits 7 to 6, length to be extracts from sceneDreg
R7 = EXTRACT (R4, R3L) (Z) ; /* zero-extended*/
R4=0b1010 0101 1010 0101 1100 0011 1010 1010
R3=0bxxxx xxxx xxxx xxxx 0000 0111 0000 0100
R7=0b0000 0000 0000 0000 0000 0000 0000 0111
R7 = EXTRACT (R4, R3.L) (X) ; /* sign-extended*/
R4=0b1010 0101 1010 0101 1100 0011 1010 1010
R3=0bxxxx xxxx xxxx xxxx 0000 0111 0000 0100
R7=0b0000 0000 0000 0000 0000 0000 0000 0111
BITMUX ( Dreg , Dreg , A0 ) (ASR) ; /* shift right, LSB is shifted out */
BITMUX ( Dreg , Dreg , A0 ) (ASL) ; /* shift left, MSB is shifted out */
In the Shift Right version, the processor performs the following sequence.
1. Right shift Accumulator A0 by one bit. Right shift the LSB of source_1 into the MSB of
the Accumulator.
2. Right shift Accumulator A0 by one bit. Right shift the LSB of source_0 into the MSB of
the Accumulator.
In the Shift Left version, the processor performs the following sequence.
1. Left shift Accumulator A0 by one bit. Left shift the MSB of source_0 into the LSB of the
Accumulator.
2. Left shift Accumulator A0 by one bit. Left shift the MSB of source_1 into the LSB of the
Accumulator.
Dreg.L = ONES Dreg; return the number of bits set in Dreg
ASHIFT or >>>
dest_reg >>>= shift_magnitude;
dest_reg = src_reg >>> shift_magnitude (opt_sat);
dest_reg = src_reg << shift_magnitude (S);
accumulator = accumulator >>> shift_magnitude;
dest_reg = ASHIFT src_reg BY shift_magnitude (opt_sat);
accumulator = ASHIFT accumulator BY shift_magnitude;
LOGICAL SHIFT
LSHIFT or >>
dest_pntr = src_pntr >> 1;
dest_pntr = src_pntr << 1;
dest_pntr = src_pntr >> 2;
dest_pntr = src_pntr << 2;
dest_reg >>= shift_magnitude; dest_reg <<= shift_magnitude;
dest_reg = src_reg >> shift_magnitude;
dest_reg = src_reg << shift_magnitude;
dest_reg = LSHIFT src_reg BY shift_magnitude;
ROTATE
dest_reg = ROT src_reg BY rotate_magnitude;
accumulator_new = ROT accumulator_old BY rotate_magnitude;
PARALLEL OPERATION EXAMPLES
32-bit ALU/MAC instruction || 16-bit instruction || 16-bit instruction ;
saa (r1:0, r3:2) || r0=[i0++] || r2=[i1++] ;
mnop || r1 = [i0++] || r3 = [i1++] ;
r7.h=r7.l=sign(r2.h)*r3.h + sign(r2.l)*r3.l || i0+=m3 || r0=[i0] ;
NOTE: If two parallel memory operations, only one can involve a Preg
NOTE: If two parallel memory operations, then only one can be a write
EXTERNAL EVENT MANAGEMENT
NOP 16-bit NOP
MNOP 32-bit NOP e.g. MNOP || NOP || NOP ;
IDLE;
CSYNC; (core sync),
SSYNC; (system sync),
CLI Dreg (clear interrupts,and save old interrupts to Dreg.
STI Dreg (set interrupts from Dreg),
RAISE uimm4 (force interrupt – effectively software interrupt of any interrupt)
EXCPT uimm4 (force exception – effectively software interrupt of any exception)
TESTSET (Preg) The Test and Set Byte (Atomic) instruction loads an indirectly addressed
memory byte, tests whether it is zero, then sets the most significant bit of the memory byte
without affecting any other bits. If the byte is originally zero, the instruction sets the CC bit. If
the byte is originally nonzero the instruction clears the CC bit.
The sequence of this memory transaction is atomic – meaning it can’t be blocked by interrupts
as would the sequence
Read memory into R0, test R0, if CC zero then set R0 = 1, Store R0 back to memory.
VIDEO PIXEL INSTRUCTIONS
ALIGN8, ALIGN16, ALIGN24, DISALGNEXCPT, BYTEOP3P (Dual 16-Bit Add / Clip),
Dual 16-Bit Accumulator Extraction with Addition, BYTEOP16P (Quad 8-Bit Add),
BYTEOP1P (Quad 8-Bit Average – Byte), BYTEOP2P (Quad 8-Bit Average – Half-Word),
BYTEPACK (Quad 8-Bit Pack), BYTEOP16M (Quad 8-Bit Subtract), SAA (Quad 8-Bit
Subtract-Absolute-Accumulate), BYTEUNPACK (Quad 8-Bit Unpack)
VECTOR INSTRUCTIONS basically 2 16 bit ops
Add on Sign, VIT_MAX (Compare-Select), Vector Arithmetic Shift, Vector Logical Shift,
Vector MIN, Vector Multiply, Vector Multiply and Multiply-Accumulate, Vector Negate
(Two’s Complement), Vector PACK, Vector SEARCH
Example Vector Add / Subtract
dest = src_reg_0 +|+ src_reg_1;
Example Vector MAX
dest_reg = MAX ( src_reg_0, src_reg_1 ) (V)
Example Vector ABS dest_reg = ABS source_reg (V)
Programmable flags (PF) registers
INTERRUPT CONTROL
Note that FIO_FLAG_D bits are set during edge-triggered
interrupts and must be cleared
NOTE: The following have a similar format
FIO_MASKA_C (Clear – W1C)
FIO_MASKA_T (Toggle – W1T)
There are also FIO_MASKB registers with same functionalit
WATCH-DOG TIMER
IPEN has same format as ILAT but is read only
CORE TIMER
SPI INTERFACE
TIMER0, TIMER1, TIMER2
All three timers have equivalent registers
There is also an equivalent Timer disable
register (write one to clear)
EVENT TABLE
SPI transmit and receive registers
Download