Eric Dubois Information Source signal Encoder binary stream Channel Information Receiver signal Decoder binary stream Information Source signal Encoder aka ‘data’ Information Receiver signal binary stream Channel Decoder binary stream Information Source error measure Information Receiver signal Encoder aka ‘data’ signal binary stream Channel Decoder binary stream Speech Image Video Text file Music Radiograph Binary executable computer program Computer graphics primitives Weather radar map Airwaves (EM radiation) Cable Telephone line Hard disk CD, DVD Flash memory device Optical path Internet TV screen and viewer Audio system and listener Computer file Image printer and viewer Compute engine No errors permitted (lossless coding) Numerical measures of error, e.g. meansquared error (MSE), signal-to-noise ratio (SNR) Numerical measures of perceptual difference Mean opinion scores from human users Data rate (bits per second) Transmission time (seconds) File size (bytes) Average number of bits per source symbol There is usually a ‘natural’ representation for the source data at a given level of fidelity and sampling rate. Examples: ◦ 8 bits per character in ASCII data ◦ 24 bits per RGB color pixel ◦ 16 bits audio signal sample This natural representation leads to a certain raw channel rate (which is generally too high). Compression involves reducing the channel rate for a given level of distortion (which may be zero for lossless coding). raw channel rate compressio n ratio compressed channel rate Example: HDTV, 1080I Raw channel rate: 1493 Mbit/s (1920*1080*30*24) Compressed channel rate: ~20 Mbit/s Compression ratio: ~75 Categories of sources ◦ continuous time or domain: x(t), x(h,v) ◦ discrete time or domain: x[n], x[m,n] ◦ continuous amplitude or value: xR ◦ discrete amplitude or value: x A = {a1, a2, … aM} We will only consider discrete domain sources. We assume that continuous domain signals can be sampled with negligible loss. This is not considered in this course. We will mainly concentrate on one-dimensional signals such as text, speech, audio, etc. Extensions to images are covered in ELG5378. A source signal is a sequence of values drawn from a source alphabet A: x[1], x[2], … , x[n] A A source coder transforms a source sequence into a coded sequence whose values are drawn from a code alphabet G : u[1], u[2], …, u[i] G Normally G = {0,1}, and we will limit ourselves to this case. Note that the time indexes for the source sequence x[n] and the coded sequence u[i] do not correspond. The decoder must estimate the source signal on the basis of the received coded sequence û[i]. This may be different from u[i] if there are transmission errors. We will generally assume that there are no transmission errors. Lossless coding: The source sequence has discrete values, and these must be reproduced without error. Examples where this is required is text, data, executables, and some quantized signals such as X-rays. Lossy coding: The source sequence may be either continuous or discrete valued. There exists a distortion criterion. The decoded sequence may be mathematically different from the source sequence, but the distortion should be kept sufficiently small. Examples are speech and images. Often a perceptual distortion criterion is desired. Lossless coding methods are often a component of a lossy coding system. There are two variants of the compression problem 1. For a given source and distortion measure, minimize the channel rate for a given level of distortion D0 (which can be zero). 2. For a given source and distortion measure, minimize the distortion (or maximize the quality) for a given channel rate R0. In a coding system, there is typically a tradeoff between rate and distortion R D In a coding system, there is typically a tradeoff between rate and distortion R D0 D In a coding system, there is typically a tradeoff between rate and distortion R R0 D 1. When there is statistical redundancy. ◦ For example, for a sequence of outcomes of a fair 16-sided die, we need 4 bits to represent each outcome. No compression is possible. ◦ In English text, some letters occur far more often than others. We can assign shorter codes to the common ones and longer codes to the uncommon ones and achieve compression (e.g., Morse code). There are many types of statistical redundancy. For example, in English text, we are pretty sure that the next letter after a Q will be a U, so we can exploit it. The key to successful compression will be to formulate models that capture the statistical redundancy in the source. When there is irrelevancy. 2. ◦ ◦ ◦ In many cases, the data is specified more precisely than it needs to be for the intended purpose. The data may be oversampled, or quantized more finely than it needs to be, either everywhere, or in some parts of the signal. This particularly applies to data meant only for consumption and not further processing. To exploit irrelevancy, we need a good model of the requirements of the receiver, e.g., human vision, hearing, etc. We also need a suitable representation of the data, e.g., transform or wavelet representations. Again, the key to success will be the formulation of appropriate models. Change of representation Quantization (not for lossless coding) Binary code assignment All will depend on good models of the source and the receiver. Eric Dubois CBY A-512 Tel: 562-5800 X 6400 edubois@uottawa.ca www.eecs.uottawa.ca/~edubois/courses/ELG5126 Textbook: K. Sayood, Introduction to Data Compression, third edition, Morgan Kaufmann Publishers, 2006. Basic probability and signal processing as typically obtained in an undergraduate Electrical Engineering program (e.g., at uOttawa, ◦ ◦ ELG3125 Signal and System Analysis, ELG3126 Random Signals and Systems The objective of this course is to present the fundamental principles underlying data and waveform compression. The course begins with the study of lossless compression of discrete sources. These techniques are applicable to compression of text, data, programs and any other type of information where no loss is tolerable. They also form an integral part of schemes for lossy compression of waveforms such as audio and video signals, which is the topic of the second part of the course. The main goal of the course is to provide an understanding of the basic techniques and theories underlying popular compression systems and standards such as ZIP, FAX, MP3, JPEG, MPEG and so on, as well as the principles underlying future systems. Some of the applications will be addressed in student projects. Lossless coding: Discrete sources, binary codes, entropy, Huffman and related codes, Markov models, adaptive coding. Arithmetic coding: Principles, coding and decoding techniques, implementation issues. Dictionary techniques: Principles, static dictionary, adaptive dictionary. Waveform coding: Distortion measures, ratedistortion theory and bounds, models. Quantization: Formulation, performance, uniform and non-uniform quantizers, quantizer optimization, vector quantization. Predictive coding: Prediction theory, differential coding (DPCM), adaptive coding. Transform and subband coding: Change of basis, block transforms and filter banks, bit allocation and quantization. Applications (student projects) 20% Assignments: Several assignments, to be handed in during class on the due-date specified. There will be a 5% penalty for each day late, and no assignment will be accepted after one week. 30% Project: An individual project on an application of data compression involving some experimental work. A project report and presentation at the end of the course will be required. More details will follow early in the course. 20% Midterm exam: Closed-book exam, 80 minutes in length. 30% Final exam: Closed-book exam, 3 hours in length, covering the whole course.