VHDL Coding The Golden Rules

VHDL Coding Exercise 4: FIR Filter Where to start? Feedback Algorithm Designspace Exploration Architecture Optimization RTLBlock diagram VHDL-Code Algorithm • High-Level System Diagram Context of the design  Inputs and Outputs  Throughput/rates  Algorithmic requirements • Algorithm Description Mathematical Description Performance Criteria  Accuracy  Optimization constraints Implementation constraints  Area  Speed y k    bi xk  i  N xk  i 0 FIR yk  Architecture (1) • Isomorphic Architecture: Straight forward implementation of the algorithm xk  b0 b1 b2 bN 2 bN 1 bN yk  Architecture (2) • Pipelining/Retiming: Improve timing xk  b0 b1 b2 bN 2 bN 1 bN yk  Insert register(s) at the inputs or outputs  Increases Latency Architecture (2) • Pipelining/Retiming: Improve timing xk  b0 b1 b2 bN 2 bN 1 bN yk  Insert register(s) at the inputs or outputs  Increases Latency Perform Retiming:  Move registers through the logic without changing functionality Backwards: Forward: Architecture (2) • Pipelining/Retiming: Improve timing xk  b0 b1 b2 bN 2 bN 1 bN yk  Insert register(s) at the inputs or outputs  Increases Latency Perform Retiming:  Move registers through the logic without changing functionality Backwards: Forward: Architecture (2) • Pipelining/Retiming: Improve timing xk  b0 b1 b2 bN 2 bN 1 bN yk  Insert register(s) at the inputs or outputs  Increases Latency Perform Retiming:  Move registers through the logic without changing functionality Backwards: Forward: Architecture (3) • Retiming and simple transformation: Optimization xk  b0 b1 b2 yk  Reverse the adder chain bN 2 bN 1 bN Architecture (3) • Retiming and simple transformation: Optimization xk  b0 b1 b2 yk  Reverse the adder chain bN 2 bN 1 bN Architecture (3) • Retiming and simple transformation: Optimization xk  b0 b1 b2 yk  Reverse the adder chain Perform Retiming bN 2 bN 1 bN Architecture (3) • Retiming and simple transformation: Optimization xk  b0 b1 b2 yk  Reverse the adder chain Perform Retiming bN 2 bN 1 bN Architecture (3) • Retiming and simple transformation: Optimization xk  b0 b1 b2 yk  Reverse the adder chain Perform Retiming bN 2 bN 1 bN Architecture (3) • Retiming and simple transformation: Optimization xk  b0 b1 b2 yk  Reverse the adder chain Perform Retiming bN 2 bN 1 bN Architecture (3) • Retiming and simple transformation: Optimization xk  b0 b1 b2 yk  Reverse the adder chain Perform Retiming bN 2 bN 1 bN Architecture (3) • Retiming and simple transformation: Optimization xk  b0 b1 b2 yk  Reverse the adder chain Perform Retiming bN 2 bN 1 bN Architecture (3) • Retiming and simple transformation: Optimization xk  b0 b1 b2 yk  Reverse the adder chain Perform Retiming bN 2 bN 1 bN Architecture (3) • Retiming and simple transformation: Optimization xk  b0 b1 b2 yk  Reverse the adder chain Perform Retiming bN 2 bN 1 bN Architecture (3) • Retiming and simple transformation: Optimization xk  b0 b1 b2 yk  Reverse the adder chain Perform Retiming bN 2 bN 1 bN Architecture (3) • Retiming and simple transformation: Optimization xk  b0 b1 b2 yk  Reverse the adder chain Perform Retiming bN 2 bN 1 bN Architecture (3) • Retiming and simple transformation: Optimization xk  b0 b1 b2 yk  Reverse the adder chain Perform Retiming bN 2 bN 1 bN Architecture (4) • More pipelining: Add one pipelining stage to the retimed circuit xk  b0 b1 b2 bN 2 bN 1 bN yk  The longest path is given by the multiplier  Unbalanced: The delay from input to the first pipeline stage is much longer than the delay from the first to the second stage Architecture (5) • More pipelining: Add one pipelining stage to the retimed circuit xk  b0 b1 b2 bN 2 bN 1 yk  Move the pipeline registers into the multiplier:  Paths between pipeline stages are balanced  Improved timing Tclock = (Tadd + Tmult)/2 + Treg bN Architecture (6) • Iterative Decomposition: Reuse Hardware xk  b0 b1 bN 2 b2 bN 1 bN yk  Identify regularity and reusable hardware components Add control xk   multiplexers  storage elements  Control Increases Cycles/Sample b0 bN 0 yk  RTL-Design • Choose an architecture under the following constraints: It meets ALL timing specifications/constraints:  Throughput  Latency Iterative Decomposition It consumes the smallest possible area It requires the least possible amount of power • Decide which additional functions are needed and how they can be implemented efficiently: Storage of samples x(k) => MEMORY xk  Storage of coefficients bi => LUT Address generators for MEMORY and LUT b => COUNTERS b Control => FSM 0 N 0 yk  RTL-Design • RTL Block-diagram:N  Datapath y k    bi xk  i  i 0 xk  b0 0 bN • FSM:  Interface protocols datapath control: yk  RTL-Design • How it works: y k    bi xk  i  N  IDLE i 0  Wait for new sample RTL-Design • How it works: y k    bi xk  i  N  IDLE i 0  Wait for new sample  Store to input register RTL-Design • How it works: y k    bi xk  i  N  IDLE i 0  Wait for new sample  Store to input register  NEW DATA:  Store new sample to memory RTL-Design • How it works: y k    bi xk  i  N i 0  IDLE  Wait for new sample  Store to input register  NEW DATA:  Store new sample to memory  RUN:  y k    bi xk  i  N i 0 RTL-Design • How it works: y k    bi xk  i  N i 0  IDLE  Wait for new sample  Store to input register  NEW DATA:  Store new sample to memory  RUN:   N    y k   bi x k  i i 0  Store result to output register RTL-Design • How it works: y k    bi xk  i  N i 0  IDLE  Wait for new sample  Store to input register  NEW DATA:  Store new sample to memory  RUN:   N    y k   bi x k  i i 0  Store result to output register  DATA OUT:  Output result RTL-Design • How it works: y k    bi xk  i  N i 0  IDLE  Wait for new sample  Store to input register  NEW DATA:  Store new sample to memory  RUN:   N    y k   bi x k  i i 0  Store result to output register  DATA OUT:  Output result / Wait for ACK RTL-Design • How it works: y k    bi xk  i  N i 0  IDLE  Wait for new sample  Store to input register  NEW DATA:  Store new sample to memory  RUN:   N    y k   bi x k  i i 0  Store result to output register  DATA OUT:  Output result / Wait for ACK  IDLE: … Translation into VHDL • Some basic VHDL building blocks: Signal Assignments:  Outside a process: AxD YxD AxD YxD • This is NOT allowed !!! BxD  Within a process (sequential execution): AxD BxD YxD • Sequential execution • The last assignment is kept when the process terminates Translation into VHDL • Some basic VHDL building blocks: Multiplexer: AxD BxD YxD Default Assignment CxD SELxS Conditional Statements: AxD BxD SelAxS OUTxD CxD DxD SelBxS STATExDP Translation into VHDL • Common mistakes with conditional statements: Example: AxD ?? • NO default assignment SelAxS OUTxD BxD • NO else statement ?? SelBxS STATExDP • ASSIGNING NOTHING TO A SIGNAL IS NOT A WAY TO KEEP ITS VALUE !!!!! => Use FlipFlops !!! Translation into VHDL • Some basic VHDL building blocks: Register: DataREGxDN DataREGxDP Register with ENABLE: DataREGxDN DataREGxDN DataREGxDP DataREGxDP Translation into VHDL • Common mistakes with sequential processes: DataREGxDN DataREGxDP CLKxCI • Can not be translated into hardware and is NOT allowed DataRegENxS DataREGxDN DataREGxDP 0 1 DataREGxDN CLKxCI DataRegENxS • Clocks are NEVER generated within any logic DataREGxDP • Gated clocks are more complicated then this • Avoid them !!! Translation into VHDL • Some basic rules: Sequential processes (FlipFlops)  Only CLOCK and RESET in the sensitivity list  Logic signals are NEVER used as clock signals Combinatorial processes  Multiple assignments to the same signal are ONLY possible within the same process => ONLY the last assignment is valid  Something must be assigned to each signal in any case OR There MUST be an ELSE for every IF statement • More rules that help to avoid problems and surprises: Use separate signals for the PRESENT state and the NEXT state of every FlipFlop in your design. Use variables ONLY to store intermediate results or even avoid them whenever possible in an RTL design. Translation into VHDL • Write the ENTITY definition of your design to specify: Inputs, Outputs and Generics Translation into VHDL • Describe the functional units in your block diagram one after another in the architecture section: Translation into VHDL • Describe the functional units in your block diagram one after another in the architecture section: Translation into VHDL • Describe the functional units in your block diagram one after another in the architecture section: Register with ENABLE Register with ENABLE Translation into VHDL • Describe the functional units in your block diagram one after another in the architecture section: Register with CLEAR Translation into VHDL • Describe the functional units in your block diagram one after another in the architecture section: Counter Counter Translation into VHDL • Describe the functional units in your block diagram one after another in the architecture section: Translation into VHDL • The FSM is described with one sequential process and one combinatorial process Translation into VHDL • The FSM is described with one sequential process and one combinatorial process Translation into VHDL • The FSM is described with one sequential process and one combinatorial process Translation into VHDL • The FSM is described with one sequential process and one combinatorial process MEALY Translation into VHDL • The FSM is described with one sequential process and one combinatorial process Translation into VHDL • The FSM is described with one sequential process and one combinatorial process MEALY Translation into VHDL • The FSM is described with one sequential process and one combinatorial process MEALY Translation into VHDL • Complete and check the code: Declare the signals and components Check and complete the sensitivity lists of ALL combinatorial processes with ALL signals that are:  used as condition in any IF or CASE statement  being assigned to any other signal  used in any operation with any other signal Check the sensitivity lists of ALL sequential processes that they  contain ONLY one global clock and one global async. reset signal  no other signals Other Good Ideas • Keep things simple • Partition the design (Divide et Impera): Example: Start processing the next sample, while the previous result is waiting in the output register:  Just add a FIFO to at the output of you filter • Do NOT try to optimize each Gate or FlipFlop • Do not try to save cycles if not necessary • VHDL code Is usually long and that is good !! Is just a representation of your block diagram Does not mind hierarchy

VHDL Coding The Golden Rules

Related documents

Products

Support

VHDL Coding The Golden Rules

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib