Blackfin (ADSP-BFXXX) Reference V1.3 developed 1st Nov. 2004, smithmr@ucalgary.ca PROGRAMMING MODEL R0 to R7 P0 to P5 FP A0, A1 Data registers Pointer registers Frame pointer Accumulator registers R0, R1, R2 volatile P0, P1 volatile SP Stack pointer LC0, LC1 Loop counters DSP REGISTERS – ALL VOLATILE I0 to I3 index registers (Ireg) M0 to M3 modify registers (Mreg) B0 to B3 base registers L0 to L3 length registers Breg start of circular buffer of length Lreg using post-increment register Mreg with index register Ireg NOTATION CONVENTION imm imm3 reg dreg statbit: reg_lo signed immediate uimm unsigned immediate -4 to +3 uimm3 0 to 7 Any register R0 to R7, P0 to P5 Any data register R0 to R7 Preg Any pointer register P0 to P5 AZ, AN, AC0, AC1, V, VS, AV0, AV0S, AV1, AV1S, AQ low part of register (R0.L) reg_hi high part of register (P0.H) PARAMETER PASSING EXAMPLE #define INPAR4_ON_STACK #define INPAR3_SPACE_ON_STACK #define INPAR2_SPACE_ON_STACK #define INPAR1_SPACE_ON_STACK #define RETS_LOCATION_ON_STACK #define OLD_FP_LOCATION_ON_STACK #define SAVED_P3 #define SAVED_P4 #define OUTPAR4_ON_STACK #define OUTPAR3_SPACE_ON_STACK #define OUTPAR2_SPACE_ON_STACK #define OUTPAR1_SPACE_ON_STACK section program; .extern _Somewhere; .global _Foo; 20 16 12 8 4 0 20 16 12 8 4 0 // NOT IN R3 // In R2 // In R1 // In R0 // Relative to FP // Relative to SP // NOT IN R3 // In R2 // In R1 // In R0 .extern _Subroutine; // void Foo(INPAR1, INPAR2, INPAR3, INPAR4) _Foo: LINK 24; // 16 spaces for new stack + 2 saved registers [SP + SAVED_P4] = P4; // Save non-volatile registers on the stack P4.L = _Somewhere; // Point to memory location _Somewhere P4.H = _Somewhere; // Reference resolved by linker since .extern [FP + INPAR1_SPACE_ON_STACK] = R0; // Save for later [FP + INPAR2_SPACE_ON_STACK] = R2; // Save for later R0 = [FP + INPAR4_ON_STACK]; // OUTPAR4 = INPAR4 [SP + OUTPAR4_ON_STACK] = R0; R2 = 0xFFFF (X); // Sign extend OUTPAR3value // R1 = R1; // OUTPAR2 = INPAR2 R0 = 0xFFFF (Z); // Zero extend OUTPAR1 value CALL _Subroutine; // Subroutine(0xFFFF, INPAR2, 0xFFFF, INPAR4) W[P4] = R0; // Store return value as 16-bit P4.L = lo(FIO_FLAG_D); P4.H = hi(FIO_FLAG_D); // Constant from // <defsBF533.h> requires hi/lo macros P4 = [SP + SAVED_P4]; Also see P0 = [FP + 4]; // Get RETS UNLINK UNLINK RTS JUMP (P0); // Not clear why used PROGRAM FLOW INSTRUCTIONS COMPARE (CONTINUED) JUMP User_Label PC replaced by address of User_Label JUMP (Preg) PC replaced by value in P-register IF CC Jump UserLabel if CC = 1 PC replaced by address of User_Label IF !CC Jump UserLabel if CC = 0 PC replaced by address of User_Label IF CC Jump UserLabel (bp) IF !CC Jump UserLabel (bp) are versions where the branch is predicted to be taken. Correctly predicting branches improves pipeline performance Compare Pointer Registers -- Not parallel (16-bit) CALL User_Label PC replaced by address of User_Label next instruction RETS CALL (Preg) PC replaced by value in P-register next instructions RETS RTS return from subroutine (RETS) RTX return from exception (RETX) RTE return from emulation (RETE) RTI return from interrupt (RETI) , RTN return from NME (RETN) Return register used in brackets Loop loop_name loopcounter; Loop_begin loop_name; 1st instr. Loop_end loop_name; last instruction Lsetup(Label_1stinstruction, Label_last) loopcounter; Can use Loopcounter, Loopcounter = Preg or Loopcounter = Preg >> 1 LTn, LBn, LCn (Loop_Top, Loop_Bottom, Loop_Counter) can be set directly LOAD / STORE INSTRUCTIONS reg_lo = uimm16; reg_hi = uimm16; half-word loads reg = uimm16 (Z); zero extended to 32 bits reg = imm16 (X); signed extended to 32 bits (also imm7 version) Loading 32 bit values reg.L = uimm32 & 0xFFFF; reg.H =(uimm32 >>16) & 0xFFFF; BUT .IMPORT value; reg.L = value; reg.H = value; (half-word correct) Preg = [ indirect_address ]; [indirect_address] = Pref; where indirect address is Preg, Preg++, Preg--, Preg + offset, Preg – offset, FP – offset Offsets factor of 4 Dreg = [ indirect address ]; [indirect_address] = Dreg; where indirect address is Preg, Preg++, Preg--, Preg + small / large offset, Preg – large offset, FP – offset. Preg ++ Preg, Ireg, Ireg++. Ireg--, Ireg ++ Mreg Dreg = W [ indirect address ] (Z); zero-extend half word fetch Dreg = W [ indirect address ] (X); sign-extend half word fetch Dreg = B[indirect_address] (Z); Dreg = B[indirect_address] (X) where indirect address is Preg, Preg++, Preg--, Preg + offset, Preg - offset, Word access only Preg ++ Preg offset factor of 2 Dreg_lo = W[indirect_address]; Dreg_hi = W[indirect_address]; W[indirect_addres] = Dreg_lo; W[indirect_address] = Dreg_hi; where indirect address is Ireg, Ireg++, Ireg--, Preg, Preg ++ Preg COMPARE INSTRUCTIONS CC = Operand_1 CC = Operand_1 CC = Operand_1 CC = Operand_1 CC = Operand_1 == Operand_2; <= Operand_2; signed compare <= Operand_2 (UI); unsigned compare < Operand_2; signed compare < Operand_2 (UI); unsigned compare Compare Data Registers -- Not parallel (16-bit) Operand_1 Dreg uimm3 Operand_2 Dreg or small constant where small constant is imm3 or Operand_1 Preg uimm3 Operand_2 Preg or small constant where small constant is imm3 or Compare Accumulator Registers -- Not parallel (16-bit) Operand_1 A0 Operand_2 A1 Always signed compares MOVE CC INSTRUCTIONS Dest OP CC Dest Dreg, statbit OP =, |=, &=, ^= e.g. R0 |= CC; CC OP Source; Source Dreg, statbit Note: CC = Dreg, CC = 1 if Dreg != 0 NEGATE CC INSTRUCTIONS CC = ! CC; MOVE INSTRUCTIONS genreg = genreg ; genreg = dagreg ; dagreg = genreg ; dagreg = dagreg ; genreg = USP ; USP = genreg ; Dreg = sysreg ; /* sysreg to 32-bit D-register */ sysreg = Dreg ; /* 32-bit D-register to sysreg */ sysreg = Preg ; /* 32-bit P-register to sysreg */ sysreg = USP; A0 = A1 ; /* move 40-bit Accumulator value */ A1 = A0 ; /* move 40-bit Accumulator value */ A0 = Dreg ; /* 32-bit D-register to 40-bit A0, sign extended */ A1 = Dreg ; /* 32-bit D-register to 40-bit A1, sign extended */ Accumulator to D-register Move: Dreg_even = A0 (opt_mode) ; /* move 32-bit A0.W to even Dreg */ Dreg_odd = A1 (opt_mode) ; /* move 32-bit A1.W to odd Dreg */ Dreg_even = A0, Dreg_odd = A1 (opt_mode) ; /* move both Accumulators to a register pair */ Dreg_odd = A1, Dreg_even = A0 (opt_mode) ; /* move both Accumulators to a register pair */ IF CC DPreg = DPreg ; IF ! CC DPreg = DPreg ; Dreg = Dreg_lo (Z) ; Dreg = Dreg.B (Z); Acc.X = Dreg_lo; Dreg_lo = Acc.X; Acc.L = Dreg_lo; Dreg_lo = Acc.L; Acc.H = Dreg_hi; Dreg_hi = Acc.H; /* move if CC = 1 */ /* move if CC = 0 */ Dreg, Preg, SP, FP Dreg, Preg, SP, FP Dreg = Dreg_lo (X) ; Dreg = Dreg.B (X); lowest 8 bits Least significant 8-bits moved 8 bits moved, sign extended Least significant 16-bits moved 16 bits moved Most significant 16-bits moved 16 bits moved Accumulator to Half D-register Move supports the following options Signed fraction format (default). Unsigned fraction format (saturated) (FU). Signed and unsigned integer formats (IS) (IU). Signed fraction with truncation (T), Signed fraction with scaling and rounding (S2RND), Signed integer with scaling (ISS2), Signed integer with high word extract (IH) MORE INFO TO BE ADDED STACK INSTRUCTIONS SHIFT / ROTATE INSTRUCTIONS ARITHMETIC INSTRUCTIONS LINK uimm (Manual says minimum value is 8, but LINK 0 and LINK 4 seem OK) Saves RETS and FP on stack, copies SP into FP and then decrements SP UNLINK causes FP SP then Mem[SP ++] FP, Mem[SP++] RETS dest_pntr = (dest_pntr + src_reg) << 1; Down shift not allowed dest_pntr = (dest_pntr + src_reg) << 2; dest_reg = (dest_reg + src_reg) << 1; dest_reg = (dest_reg + src_reg) << 2; dest_pntr = adder_pntr + ( src_pntr << 1 ); dest_pntr = adder_pntr + ( src_pntr << 2 ); LOGICAL INSTRUCTIONS ARITHMETIC SHIFT dest_reg = ABS src_reg; dest_reg = src_reg_1 + src_reg_2; NOTE: dest_reg.LorH = src_reg1.LorH + src_reg2.LorH (mode); mode = (NS) or (S) // Arithmetic is saturating or non-saturating (normal math is NS) NOTE: dest_reg = src_reg_1 +|- srec_reg_2; H + H and L + L operations both done // Can also do + | +, + | -, - | +, - | Dreg_lo_hi = Dreg + Dreg (RND20) ; STEP 1: Downshift by 4 and then Dreg_lo_hi = Dreg - Dreg (RND20) ; STEP 2: perform operation, round top 16 bits STEP 3: and use top 16 bits – fractional number Dreg_lo_hi = Dreg + Dreg (RND12) ; STEP 1: Upshift by 4 and then Dreg_lo_hi = Dreg - Dreg (RND12) ; STEP 2: perform operation, STEP 3: round and use top 16 bits Dreg = MAX ( Dreg , Dreg ) ; Dreg = MIN ( Dreg , Dreg ) ; Preg -= Preg ; Ireg -= Mreg ; Preg += Preg (BREV) ; Ireg += Mreg (opt_brev) ; dest_reg = src_reg_0 * src_reg_1 (opt_mode) (16 bit mult) Dreg *= Dreg ; (32 bit mult) accumulator = src_reg_0 * src_reg_1 (opt_mode) accumulator += src_reg_0 * src_reg_1 (opt_mode) accumulator –= src_reg_0 * src_reg_1 (opt_mode) dest_reg_half = (accumulator = src_reg_0 * src_reg_1) (opt_mode) dest_reg_half = (accumulator += src_reg_0 * src_reg_1) (opt_mode) dest_reg_half = (accumulator –= src_reg_0 * src_reg_1) (opt_mode) dest_reg = (accumulator = src_reg_0 * src_reg_1) (opt_mode) dest_reg = (accumulator += src_reg_0 * src_reg_1) (opt_mode) dest_reg = (accumulator –= src_reg_0 * src_reg_1) (opt_mode) dest_reg = – src_reg; dest_accumulator = – src_accumulator dest_reg = src_reg (RND) (32 bit to 16 bit round and saturate) accumulator = accumulator (S) dest_reg = SIGNBITS sample_register dest_reg = src_reg_1 - src_reg_2; Ireg -= 2 ; Ireg -= 4 ; -- SP point to next used location [ -- SP] = allreg; allreg = [SP ++]; [ -- SP] = ( R7 : Dreglim, P5 : Preglimit) – or Dreg and Preg on their own Dreg = Dreg1 LOGICAL_OP Dreg2; LOGICAL_OP - &, |, ^ Dreg = ~Dreg1; complement Also BXOR and BXORSHIFT -- more later BIT INSTRUCTIONS BitInstruction(Dreg, bit position) where bit_position is 0 to 31 BitInstruction is BITCLR (clear), BITSET (set), BITTGL (toggle), CC =BITTST (Dreg, bit position) Bit test CC = !BITTST (Dreg, bit position) Bit test R0 = R1.B(X); R0 = R1.B(Z); // Extract and sign extend a byte value // CAN”T DO MATH ON A BYTE VALUE DIRECTLY Dreg = DEPOSIT ( backgroundDreg, foregroundDreg ) ; Dreg = DEPOSIT ( Dreg, Dreg ) (X) ; /* sign-extended */ Foreground format – bits 31 to 16, pattern to be moved, bits 15 to 8, position in backgroundDreg where last (right) bit is moved bits 7 to 6, length of bits 31 to 16 to be moved R7 = DEPOSIT(R4, R3); R4=0b1111 1111 1111 1111 1111 1111 1111 1111 R3=0b0000 0000 0000 0000 0000 0111 0000 0011 R7=0b1111 1111 1111 1111 1111 1100 0111 1111 R7 = DEPOSIT(R4, R3) (x) ; /* sign-extended*/ R4=0b1111 1111 1111 1111 1111 1111 1111 1111 R3=0b0101 1010 0101 1010 0000 0111 0000 0011 R7=0b0000 0000 0000 0000 0000 0001 0111 1111 Dreg = EXTRACT ( sceneDreg, patternDreg_lo ) (Z) ; Dreg = EXTRACT (Dreg, Dreg_lo ) (X) ; /* sign-extended (b)*/ PatternDreg format bits 15 to 8, position in screenDreg extracted bits 7 to 6, length to be extracts from sceneDreg R7 = EXTRACT (R4, R3L) (Z) ; /* zero-extended*/ R4=0b1010 0101 1010 0101 1100 0011 1010 1010 R3=0bxxxx xxxx xxxx xxxx 0000 0111 0000 0100 R7=0b0000 0000 0000 0000 0000 0000 0000 0111 R7 = EXTRACT (R4, R3.L) (X) ; /* sign-extended*/ R4=0b1010 0101 1010 0101 1100 0011 1010 1010 R3=0bxxxx xxxx xxxx xxxx 0000 0111 0000 0100 R7=0b0000 0000 0000 0000 0000 0000 0000 0111 BITMUX ( Dreg , Dreg , A0 ) (ASR) ; /* shift right, LSB is shifted out */ BITMUX ( Dreg , Dreg , A0 ) (ASL) ; /* shift left, MSB is shifted out */ In the Shift Right version, the processor performs the following sequence. 1. Right shift Accumulator A0 by one bit. Right shift the LSB of source_1 into the MSB of the Accumulator. 2. Right shift Accumulator A0 by one bit. Right shift the LSB of source_0 into the MSB of the Accumulator. In the Shift Left version, the processor performs the following sequence. 1. Left shift Accumulator A0 by one bit. Left shift the MSB of source_0 into the LSB of the Accumulator. 2. Left shift Accumulator A0 by one bit. Left shift the MSB of source_1 into the LSB of the Accumulator. Dreg.L = ONES Dreg; return the number of bits set in Dreg ASHIFT or >>> dest_reg >>>= shift_magnitude; dest_reg = src_reg >>> shift_magnitude (opt_sat); dest_reg = src_reg << shift_magnitude (S); accumulator = accumulator >>> shift_magnitude; dest_reg = ASHIFT src_reg BY shift_magnitude (opt_sat); accumulator = ASHIFT accumulator BY shift_magnitude; LOGICAL SHIFT LSHIFT or >> dest_pntr = src_pntr >> 1; dest_pntr = src_pntr << 1; dest_pntr = src_pntr >> 2; dest_pntr = src_pntr << 2; dest_reg >>= shift_magnitude; dest_reg <<= shift_magnitude; dest_reg = src_reg >> shift_magnitude; dest_reg = src_reg << shift_magnitude; dest_reg = LSHIFT src_reg BY shift_magnitude; ROTATE dest_reg = ROT src_reg BY rotate_magnitude; accumulator_new = ROT accumulator_old BY rotate_magnitude; PARALLEL OPERATION EXAMPLES 32-bit ALU/MAC instruction || 16-bit instruction || 16-bit instruction ; saa (r1:0, r3:2) || r0=[i0++] || r2=[i1++] ; mnop || r1 = [i0++] || r3 = [i1++] ; r7.h=r7.l=sign(r2.h)*r3.h + sign(r2.l)*r3.l || i0+=m3 || r0=[i0] ; NOTE: If two parallel memory operations, only one can involve a Preg NOTE: If two parallel memory operations, then only one can be a write EXTERNAL EVENT MANAGEMENT NOP 16-bit NOP MNOP 32-bit NOP e.g. MNOP || NOP || NOP ; IDLE; CSYNC; (core sync), SSYNC; (system sync), CLI Dreg (clear interrupts,and save old interrupts to Dreg. STI Dreg (set interrupts from Dreg), RAISE uimm4 (force interrupt – effectively software interrupt of any interrupt) EXCPT uimm4 (force exception – effectively software interrupt of any exception) TESTSET (Preg) The Test and Set Byte (Atomic) instruction loads an indirectly addressed memory byte, tests whether it is zero, then sets the most significant bit of the memory byte without affecting any other bits. If the byte is originally zero, the instruction sets the CC bit. If the byte is originally nonzero the instruction clears the CC bit. The sequence of this memory transaction is atomic – meaning it can’t be blocked by interrupts as would the sequence Read memory into R0, test R0, if CC zero then set R0 = 1, Store R0 back to memory. VIDEO PIXEL INSTRUCTIONS ALIGN8, ALIGN16, ALIGN24, DISALGNEXCPT, BYTEOP3P (Dual 16-Bit Add / Clip), Dual 16-Bit Accumulator Extraction with Addition, BYTEOP16P (Quad 8-Bit Add), BYTEOP1P (Quad 8-Bit Average – Byte), BYTEOP2P (Quad 8-Bit Average – Half-Word), BYTEPACK (Quad 8-Bit Pack), BYTEOP16M (Quad 8-Bit Subtract), SAA (Quad 8-Bit Subtract-Absolute-Accumulate), BYTEUNPACK (Quad 8-Bit Unpack) VECTOR INSTRUCTIONS basically 2 16 bit ops Add on Sign, VIT_MAX (Compare-Select), Vector Arithmetic Shift, Vector Logical Shift, Vector MIN, Vector Multiply, Vector Multiply and Multiply-Accumulate, Vector Negate (Two’s Complement), Vector PACK, Vector SEARCH Example Vector Add / Subtract dest = src_reg_0 +|+ src_reg_1; Example Vector MAX dest_reg = MAX ( src_reg_0, src_reg_1 ) (V) Example Vector ABS dest_reg = ABS source_reg (V) Programmable flags (PF) registers INTERRUPT CONTROL Note that FIO_FLAG_D bits are set during edge-triggered interrupts and must be cleared NOTE: The following have a similar format FIO_MASKA_C (Clear – W1C) FIO_MASKA_T (Toggle – W1T) There are also FIO_MASKB registers with same functionalit WATCH-DOG TIMER IPEN has same format as ILAT but is read only CORE TIMER SPI INTERFACE TIMER0, TIMER1, TIMER2 All three timers have equivalent registers There is also an equivalent Timer disable register (write one to clear) EVENT TABLE SPI transmit and receive registers