ENCM515 Lecture Notes Details

advertisement
ENCM515 Lecture Notes Details
First Draft to identify problems
1) Build and save as .doc file and then save as .htm file
2) Ignore problem with apparent path to c:\temp
3) Upload to web page – Check Path
4) Modify .htm file with emacs to change path into correct path – changing \ into / or removing path to
file:c:\\\temp etc. depending of how the .htm file was formed.
5) Many problems with need for font changes as cut-and paste between PowerPoint slide and doc file
6) Watch for name changes as you save as .htm file because of the way FrontPage handles temporary files
7) You can’t build the .doc file outside of FrontPage as the links don’t get cut over.
Lecture 1 -- Course outline showing marking scheme and student ethics, outline of laboratoried and
timing of quizzes



ENCM515 Course Outline 2002 (Power Point presentation)
Course Handout -- 2002
Expected Student Ethics
Lecture 2 -- Student Self Evaluation approach (Self study in ENCM515)
ENCM515 -- Earned Mark Analysis 2002 (Power Point presentation) 02earnedmarkanalysis.ppt

Provide both student and instructor with information about expectations and performance of a student
in the class.
 See Student Prediction and Tracking of Learning Progress (ASEE paper) for more explanation

Answer such questions as
 What are the student's expectation during this course -- a good pass or an excellent pass?
 Is the student performing as well as expected during all the course work given course learning
curve?
 Is the student’s performance consistent, or erratic?
 What can be done to bring a mark to a level needed for a particular career path?
 Was the quiz set too hard relative to instructor’s intended level of difficulty?



Process for Earned Marked Analysis (How to complete the EMA spreadsheet)
Planning for initial EMA estimation -- web form submission (Details needed for completing the web
form)
Expected and Tracking Report Web Form
Evaluation Spreadsheet.xls

../oldencm515_02/02planning_tracking/02ema_info/515ema6January2002.xls
LECTURES LEADING TO LAB. 0
Familiarization with the VisualDSP development environment
Lecture 3 -- Basic familiarization with SHARC 2106X architectures -- Programmer's model for
registers, alu operations, memory operations, introduction to SISD, SIMD and MIMD processors.
Overview SHARC processor (Power Point presentation) 02overviewSHARCarchitecture.ppt
 Reference sources
 Register file and operations
 Memory configuration and operations
 Sample instructions
 Program Flow
 Some warnings of expected errors
o Code review and code review standards
 Some recent architectural advances
o Tiger-SHARC and Hammerhead-SHARC


SHARC 2106X Processor -- Quick reference sheet for assembly language programming (Use legal
sized paper)
SHARC Navigator Tutorial (download) -- needs pointing to latest version on Analog Devices Web.
Lecture 4 -- Setting up the VisualDSP 1.0++ development environment for ADSP2106X ICE. Note
that we are using the older VisualDSP 1.0++ environment at the moment as its compiler generates
assembly code that is easier to optimize than does the current newer VisualDSP 2.0++ environment.
We are also using the Summit ICE (ICT210) and Mountain ICE (ENA305) interfaces for the 21061.
Material regarding the use of the serial lines for ADSP21061, 21065, 21161 boards will be added later
once the appropriate libraries have been created.
 Setting up VisualDSP++ 1.0 ICE environment -- Lab. 0 (Power Point presentation) with Summit
ICE capability.
 Some stations in A305 have ICE capability, but it is a different ICE. Stations throughout the
Department support the 20 floating licenses for VisualDSP++1.0 and the ADSP-21061 simulator
 Build a Visual DSP project -- OFF_LINE source -- your own version (in C) of a DSP Temperature
conversion algorithm. This will be used later in the course to demonstrate parallelism issues
 Test out VisualDSP software simulation environment
 Set up and test out VisualDSP hardware environment
 Start/finish work on ‘C’ and assembly code version of FMSTEREO_DEMOD for Post-Lab 0
Quiz.
 Laboratory 0 -- VisualDSP Familiarity (Laboratory 0 Home Page)
 Visual DSP tutorial.zip -- needs pointing to latest version on Analog Devices Web.
 R1 -- Files needed for Lab. 0
Lectures 5 and 6 -- Introduction of taking some "C" code used in an algorithm's design and
converting the code to ADSP21061 assembly code. The main reason for doing this is as a starting
point to "optimize the code speed" or "optimize code size" beyond what the compiler can handle.
The optimization "may be possible" because you know characteristics of the algorithm that the
'general' compiler would not recognize. The code we examine forms a major part of the Post Lab. 0
Quiz (Take Home)
Process for systematic conversion of "C" to assembly conversion on ADSP21061 SHARC (Power Point
presentation) 02C2assembly.ppt

Setting up special processor constants and registers to gain speed during assembly language
constructs

Review of use of index and modify registers

Prologue, Body and Epilogue of “C” program translated to assembly code (NO
DIFFERENCE by hand or by compiler)

Example conversion of “C” program into ADSP21061 using a standard procedure
 Take into account register architecture
 Take into account LOAD/STORE architecture
 Take into account standard assembly code problems
 Handle Program Flow Constructs
 Then do conversion of code on line by line basis
 Learning why to avoid calling “C” from assembly
 Familiarization exercise 1 -- Due Tuesday 22nd. Be prepared to hand-in "electronic form" at short
notice -- .doc file
 WARNING -- SHARC processor Delayed Branch Operations
 SHARC (21k) and 68k Register Comparison
 Comparing basic SHARC (21k) and 68k MOVE instruction comparison
 On-line tutorials from Analog Devices and Universities
 "The SHARC in the C", Circuit Cellar Online Magazine, April 2000.
Tutorial 1 -- "Extra worked example for 'C-design' to SHARC Assembler code'" (Power Point
presentation) 02ExtraC2assemblyExample.ppt
 Need to set up review process to look for, and remove, common errors when writing assembly
code
 Process to translate a “C” program involving arrays into SHARC code
 Comparison of timings for non-optimized code, optimized code, hardware loops, super-scalar
architecture
Lecture 7 -- Post Laboratory 0 Quiz (Take Home). Further work with the VisualDSP 1.0++
Development Environment. We will implement a simple DSP algorithm in "C" and assembly code to
perform FM_STEREO demodulation. This also provides an introduction to the audio channel
modeling laboratory development environment. The only function that needs modifying is
FM_STEREO_DEMODULATION( ) in the file channelmodels_lab0.c. Explanation of the other
portions of the code will be explained later.
Details for Post Lab 0 Quiz (Powerpoint presentation) 02PostLabQuiz0.ppt
 Test out VisualDSP software environment using “OFFLINE_SOURCE_DEMO.exe” -- get from
ENCM515 Web
 Test out VisualDSP hardware environment using “LOCAL_SOURCE_DEMO.exe” -- get from
ENCM515 Web
YOU’LL NEED HEADPHONES
 Start/finish work on ‘C’ and assembly code version of FMSTEREO_DEMOD for Post-Lab 0
Quiz.
o Code examples provided
o Test Off-Line version using VisualDSP1.0++ and VisualDSP2.0++
 Files needed for Post Lab 0. Quiz
 Post Lab. 0 Take Home Quiz
Tutorial 2
 Familiarization tutorial based on Familiarization exercise 1
 Familiarization Exercise 2 -- due Tuesday 29th. Be prepared to hand-in "electronic form" at short
notice -- .doc file
LECTURES LEADING TO LABORATORIES 1 AND 2
Familiarization with 21061 syntax
CISC, RISC and DSP Loops using software control – Laboratory 1
Hardware loops, Hardware circular buffers – Laboratory 2
Lecture 8 -- "Background information on Audio Channel Modeling" 02initialaudiomodelling.ppt
 Audio channel modelling concepts as detailed in Bessinger’s thesis on improved sound stage
 Sound re-positioning through delay lines
 Post Lab. 0 Quiz -- Familiarization with VisualDSP1.0++ Tool set. Implementation of FMSTEREO demodulation (“C” and assembly)
 Lab. 1 -- Implementation in “C” (with and without pointers)
 Lab. 2 -- Implementation in “assembly”, with and without pointers, with specialized SHARC
architecture (hardware circular buffers)
 Sound colouration through FIR filters
 Lab. 3 -- In “C”, assembly, custom assembly (hardware loops) and VERY custom assembly
(highly parallel algorithm)
 Additional Audio Channel Modeling
 Lab. 4 -- Multi-tasking environment -- SHARC RTOS -- Room colouration through IIR filters -Student project?
Lecture 9 -- "Efficient Loop Handling for DSP algorithms on CISC, RISC and DSP processors"
02customloops.ppt
 Performing multiple memory accesses to an array
 Loop overhead can steal many cycles
 Loop overhead -- depends on implementation
o Standard loop with test at the start -- while ( )
o Initial test with additional test at end -- do-while( )
o Down-counting loops
 Special Efficiencies
o CISC -- hardware
o RISC -- intelligent compilers
o DSP -- hardware
 Example loop code for 68k, 29k and 21k processors -- processorexample.doc
 "Code Optimization Techniques -- the case of 'The SHARC versus the Minnow" -- Part 1 -- The
Minnow's Viewpoint", Electronic Design Magazine, September 2000.
Lecture 10 -- "Investigation of code optimizing procedures for DSP algorithms written in 'C/C++' “
"Details of Lab. 1" 02detailsLab1.ppt
 Concept of Lab. 1
 Build variants of algorithms for FIFO (Delay Buffer)
o Mass Memory Move (written in “C” -- provided)
o Mass Memory Move (written in “asm” -- direct translation)
o FIFO using software circular buffer (written in “C” -- provided)
o FIFO using software circular buffer (written in “asm” -- direct translation)
 Test that algorithms work correctly using “OFF-LINE”
(using the board in the lab. and the simulator outside)
 Time the various algorithms -- How good is “optimizing compiler” compared to hand-coding.
 Test the effect in an “audio-sense” using “LOCAL” and CODEC”. Here the effect of “length of
time in ISR” becomes important for sound quality
 Laboratory 2 -- same as Lab. 1 but using custom DSP features of the processor for implementing
“circular buffers”
 Details of “2001main.c”,“channelmodels.c” and audio libraries.
 Compiler and algorithm issues on the DSP performance of various implementations of FIFO buffers
(Delay lines).
 Pre Lab 1 Quiz (either in class or start of lab). Solutions to Prelab 1 Quiz.
Tutorial 3 -- Workshop on "ADSP2106x and audio channel modelling"
SHARC2000Workshop_LabsForADSP21065.ppt
 Gain some experience with the VisualDSP IDE environment and 21065L evaluation board
 Simple examples involving “C”, assembly code and associated linkages
 Explore capabilities present in these Lab. Modules for your own courses
Lecture 11 -- "Learning from 'C' compilers" 02learningfromcompilers.ppt
 The “C” compiler knows how to generate assembler
o What can we learn from the “C” compiler as tutor?
 “C” routines can use many parameters
o How does Wind River DiabData 68K compiler do it?
o How does White Mountain SHARC 21K compiler do it?
 Process to generate assembler from “C” (general)
 VisualDSP requirements
 Using -S compiler option and look at .s file
o Printing (Best directly from Visual DSP NOT Notepad)
o

Reverse engineering the .s file for easier reading by using
reverse_clanguage_register_defines.i file and the assembler preprocessor to produce .is
file.
"Code Optimization Techniques -- the case of 'The SHARC versus the Minnow" -- Part 2 -- The Byte
of the SHARC", Electronic Design Magazine, pp 121 -- 138, October 2000.
Tutorial 4 -- Please bring questions. Issues associated with Post Lab.0 Take Home Quiz
 Quick Quiz associated with PreLab 0
Lectures 12 and 13 -- "SHARC number representations" 02sharcnumbers.ppt
o Number Representations are varied
o Make sure you understand them
o Can solve many coding errors by recognizing improper use of number representations
o SHARC default number representation for integers is not what is expected.
o Understanding Number Representations allows for extra speed in that 1 in 1000 situation
Tutorial 5 -- "Extra worked example for 'C-design' to SHARC Assembler code"
02ExtraC2assemblyExample.ppt
o Need to set up review process to look for, and remove, common errors when writing assembly
code
o Process to translate a “C” program involving arrays into SHARC code
o Comparison of timings for non-optimized code, optimized code, hardware loops, super-scalar
architecture
Lecture 13 and 14 -- "Program Flow control in a pipelined processor environment" 02sequencing.ppt
o Parts of the SHARC program sequencer
o Similarity to “old” micro-sequencers used when design custom byte-slice array
processors back in early 80’s
o Pipelining issues
o Resource conflict between instructions
o Delayed branches -- nops or instructions to find
o Loop, restrictions and “short loops”,
o counter and non-counter based loops
o interrupt concepts -- see later lecture
o Instruction Cache
Lectures 15 and 16 -- "Hardware circular buffer operations to support audio modeling of multiple sound
source positions -- Lab. 2) 02detailsLab2.ppt
o Concept of Lab. 1 -- Software FIFO stack
o Software Circular Buffers -- 2 approaches
o FIFO stacks allowing the modeling of audio channels associated with sound positioning through
delays
o Concept of Lab. 2 -- Hardware FIFO stack
o Same code except for variants of new routine
o Compare software and hardware circular buffers
o Developing new code in Assembly code
o Delay line as FIR, FIR coeffs in dm or pm space
o Hardware circular Buffer Concepts introduced
o Recap of hand-in for Laboratories 1 and 2
LECTURES LEADING TO LABORATORIES 3 and 4
Practical use of Parallel Instructions and other DSP architectural features
Implementation of high speed FIR filters – Lab. 3
Implementation of high speed specialized DSP algorithm (e.g. Burg) – Lab. 4
Lecture 17 -- "Concepts of parallel processing on the SHARC 21061 -- Possibilities and Limitations"
02allowedinstructions.ppt
o Limitations of instruction sets -- Why needed?
o CISC processor example
o Recognizing possible limitations in the instruction set of SHARC processor
o Standard operations
o Memory accesses -- parallel and non-parallel
o Parallel COMPUTE instructions
o Parallel COMPUTE instructions with multiple memory accesses
Lectures 18 and 19 -- "Process for parallel instructions on 21061" 02parallelinstructionsprocess.ppt
o What’s the problem?
o Standard Code Development of “C”-code
o Process for “Code with parallel instruction”
o Rewrite with specialized resources
o Move to “resource chart”
o Unroll the loop
o Adjust code
o Reroll the loop
o Check if worth the effort
 "Code Optimization Techniques -- the case of 'The SHARC versus the Minnow" -- Part 2 -- The
Byte of the SHARC", Electronic Design Magazine, pp 121 -- 138, October 2000.
Tutorial 6 – Additional background for post-lab 1 quiz -- "Compare 68K and 29K instructions"
02compare_68_SHARC.ppt
o When to use assembly code
o Useful sub-set of 68K CISC instructions
o Recap Effective addressing modes
o Load/Store Programming style for 68K
o Load/Store Architecture of 21K by comparison with 68K
Tutorial 7 -- Post Lab. 1 Quiz
Tutorial 8 (Self Study) – Controlling the ADSP8147 Codec -- ADSP1847 CODEC User Manual,
ADSP1847, CODEC Training Manual
Tutorial 9 -- Comparing the architectural characteristics of 68HC11, 2106X, 218X, 2116X, RISC, 680X0
processors -- 02CompareArchitectures.ppt
o Processor Architectures to be covered
o 6809, 68HC11, 68332, 68020, 68040, 5206e
o ADSP218X, ADSP2106X, ADSP2116X
o 29k, PowerPC
o How to program various processors (in the broad sense) when you can program 68332 and 21061
processors.
o Basic Implications of Architectures on program performance
Tutorial 10 -- "Compare 68K and 29K instructions" -- timing calculations and instruction set
02compare_68_SHARC_updated.ppt
o When to use assembly code
o Useful sub-set of 68K CISC instructions
o Recap Effective addressing modes
o Load/Store Programming style for 68K
o Load/Store Architecture of 21K by comparison with 68K
Tutorial 11 – Retake of Post-Lab Quiz 1
Lectures 20/21 "Basic architectural characteristics of DSP processors needed to support highly optimized
DSP algorithms" 02processorrequirements.ppt
o Characteristics of DSP algorithms
o Specialized handling of
o Multiplication
o Division (21K has no division instruction)
o ENCM515 Reference Material
o How RISCy Is DSP, IEEE Micro (Jan-10)
o Simply Signal Processing (Jan-40)
o Fast Scaling, CCI (Apr-10)
o Saturation Arithmetic (Apr-20)
Lecture 22 -- "Highly parallel implementation of FIR filters on ADSP21061" 02highlyparallelfir.ppt
Essentially Lab 3
o Compare performance of
o optimized “C” code -- coded the “best way” (software circular buffer using “if” statements or
using pointer “mask operations” -- your choice”
o hand coded non-parallel code
o hand coded parallel code for FIR filter operations
o Compare your optimized code with what is available in DSP library files in VisualDSP directories
o Need to show filter works -- OFFLINE
o Test audio performance
o Write a suitable report discussing results.
Tutorial 12 – Post-Lab 2. Quiz
Tutorial 13 -- Tutorial Notes for "Process for parallel instructions on 21061"
02tutorialparallelinstructionsprocess.ppt
o Rewrite the “C” code using “LOAD/STORE” techniques
o Accounts for the SHARC super scalar RISC DSP architecture
o Write the assembly code using a hardware loop
o Rewrite the assembly code using instructions that could be used in parallel you could find the
correct optimization approach
o Move algorithm to “Resource Usage Chart”
o Optimize using techniques
o Compare and contrast time -- setup and loop
Tutorial 14 -- Tutorial on "SquishDSP -- a tool for optimization of highly parallel code"
02SHARCEcology201.ppt
o Efficiency of assembly code produced by the optimizing VisualDSP++ compiler depends on
design/form of the “C/C++” algorithm.
o Simple code example and a variety of design formats for speed
o Need to further improve speed of code developed by optimizing compiler or through custom
development processes
o Use of the tool SquishDSP to assist in identifying dependencies in your code and possible find
parallelization of instructions
o Speed improvement is algorithm and design dependent, but we have doubled the speed of code
produced by the VisualDSP++ compiler.
o Further tests are needed to see if the improvements scale for more complex DSP algorithms.
o This tutorial was developed for teaching purposes and some parts “may provide BGOs” for people
familiar with concepts

Paper on Optimization of microprocessor resources using a big-business tool -- SHARC2001
Boston
Lecture 23 -- - Examination of the FFT Algorithm on DSP processors
 Additional Files needed -- dft.txt, fft.txt
Lectures 24/25 -- "Custom not speed -- better often means faster" -- 02customNOTspeed.ppt
o Introduction
o Industrial Example of DFT/FFT
o DFT -- FFT Theory
o Straight application
o Proper application
o “The KNOW-WHEN” application
o
o
o
Future Talks
The implications on DSP processor architecture
How are actual DSP processors optimized for FFT operations?
Lectures 26/ 27 -- Comparison on Integer and Floating point DSP processors -- 02intfloat.ppt
o Fast instruction cycle -- not clock speed
o Fast hardware multiplier
o Floating point for easier design -- avoids scaling and overflow
o High precision
o wide busses for register, memory, processing units
o Fast loop operation
Lecture 28 -- Quantization and Truncation Effect of DSP processor implimentations -- Avoiding
introduction distortions into your results" 02QuantizationSHARC99.ppt
o
o
o
o
o
o
o
o
o
o
Why worry?
Finite Precision Effects
Multiplier Coefficient Quantization
Signal Quantization
Filter Structure Effects
DIGICAP -- Tool details not covered in paper
Filter Response Calculation
Quantization Effect Calculation
Availability
Conclusion
Tutorial 15 – Review for Midterm -- Provide me with questions and answers before the start of classes
Tutorial 16 – Midterm Exam -- arrive by 12:30 if possible to gain extra time for doing the midterm
Lecture 29 – Concept of the Burg Algorithm for analysis of spectral data using minimum data sets – Lead
up to Lab. 4
Lecture 30 -- Programming on VLIW processors.
 Class presentations on DSP processor characteristics (for marks associated with prelab 3 quiz)?
Lecture 31 -- "Review of interrupts -- CISC and DSP approaches" 02interrupts1.ppt
Subroutines and Interrupts
Example “C” code (68K)
subroutine assembly code
interrupt service routine assembly
Example “C” code (21K)
The “C” wrapper
interrupts using IRQ1 button
interrupts using 21K timer

"Putting a SHARC amongst the Sailors", Circuit Cellar Online Magazine, December 1999.
Lecture 32-- "21k interrupts -- the hard way or behind the 'C-wrapper' " 02interrupts2.ppt
o Review Subroutines and Interrupts
o Architectural Issues regarding 21K interrupts
o Programming issues regarding 21K interrupts
Tutorial 17 – Visit to Seaman MRI Centre at Foothills Hospital to see DSP algorithms at work
Tutorial 18 -- issues associated with Lab. 4
 Useful DSP related articles by BDTI
 Class presentations on DSP processor characteristics (for marks associated with prelab 3 quiz)?
Lecture 33 -- Embedded Software Process 02SQEpresentation.ppt
Need for Humphrey’s Personal Software Process (PSP).
Relationship between PSP and CMM.
PROBE method of estimating effort needed to implement a design.
Concept of Abowd’s Embedded Software Process (ESP)
Testing of Abowd’s ESP process
Problems of extending ESP into DSP environment
Lecture 34 -- Cache Thrashing CacheDSPTalk.ppt
Concept behind 2106X instruction cache
Cache operation
Introduction of CACHE THRASHING
Solutions to avoid a Cache Thrash without delaying product release
Basis of Cache-DSP tool
Acknowledgements


PDF presentation on Cache optimization Presented at SHARC2001 Boston, September 2001
Class presentations on DSP processor characteristics (for marks associated with prelab 3 quiz)?
Lecture 35 -- K7 and Pentium chips -- RISC, CISC or DSP processors 01k5discussion.ppt
Want to compare Motorola 68332 CISC Processor (based on 68000 era 1978/81) with a AMD K5
CISC Processor (era 1996 CISC)
Look at common features present between AMD K5 CISC and 21K DSP
Comment on paper “Microprocessors outperform DSP 2:1
Lecture 36 -- Outside Speaker -- Brian Howse -- Overview of latest issues in DSP processors
Tutorial 19 -- final exam format will be and what the exam will cover
 Class presentations on DSP processor characteristics (for marks associated with prelab 3 quiz)?
Tutorial 20 -- Class presentations on DSP processor characteristics (for marks associated with prelab 3
quiz)?
Lecture 37 (Self Study) Pipelining issues with the program sequencer control. Comparison of 21k
sequencer unit with byte slice processor sequencer units of ’80;s. -Microcoded CCU 01microcodedccu.ppt
Look at what a “microcoded” processor means
Difference between microcoding and assembly code
Development of ever increasing complexity in CCU for different control tasks
Advantages of pipelining -- in context of CCU
Comparision of a microcoded CCU and the branch control logic of 21k
Download