Quantization Noise is 1

advertisement
Waveform SpeechCoding
Algorithms: An Overview
Outline
 Introduction
 Concepts
 Quantization
 PCM
 DPCM
 ADPCM
 Standards & Applications
 G711
 G726
 Performance Comparison & Examples
 Summary & Conclusion
Technical Presentation  Page 2
Introduction
Motivation
What is Speech Coding ?
It is the procedure of representing a digitized speech signal as efficiently as possible,
while maintaining a reasonable level of speech quality.
Why would we want to do that ?
To Answer this, let’s have a look at the Structure of the Coding System
Our Guy
Technical Presentation  Page 3
Introduction
Motivation
Filtering & Sampling (1)
Technical Presentation  Page 4
Introduction
Motivation
Filtering & Sampling (2)
Technical Presentation  Page 5
Introduction
Motivation
Filtering & Sampling (3)
Technical Presentation  Page 6
Introduction
Motivation
Filtering & Sampling (4)
 Most of the speech contents lies in between 300 – 3400 Hz
 According to Nyquist theorem Fs >= 2 fm (to avoid aliasing)
 A value of 8kHz is selected (8 >= 2*3.4).
 For good quality16 bits are used to represent each sample.
 Bit-rate = 8kHz *16 bits = 128 kbps
Input Rate
The Input rate could even be more, for example in Skype: 16 kHz sampling frequency is used
in skype and so resulting to an input rate of 192 kBit/s. But, this is a waste of bandwidth that
could rather be used by other services and applications.
Source Coding (Speech Coding in this Context)
[1]
Technical Presentation  Page 7
Introduction
Motivation
Desirable Properties of a Speech Coder
 Low Bit-Rate: By using a lower bit-rate, a smaller bandwidth for transmission is
needed , leaving room for other services and applications .
 High Speech Quality: Speech quality is the rival of “low bit-rate”. It is important for the
decoded speech quality to be acceptable for the target application.
 Low Coding Delays: The process of speech coding introduce extra delay, this might
affect application that have real time requirements.
[1]
Technical Presentation  Page 8
Introduction
Speech Coding Categories
What are the different Categories of speech coding ?
Speech coding is divided into three different categories:
 Waveform Codecs (PCM, DM, APCM, DPCM, ADPCM)
 Vocoders (LPC, Homo-morphic, …etc )
 Hybrid codecs (CELP, SELP, RELP, APC, SBC, … etc)
[2]
Technical Presentation  Page 9
Concepts
Quantization
What Is Quantization ?
Quantization is the process of transforming the sample amplitude of a message into a
discrete amplitude from a finite set of possible amplitudes.
[3]
Each sampled value is approximated with a quantized pulse, the approximation will result
in an error no larger than q/2 in the positive direction or –q/2 in the negative direction.
Technical Presentation  Page 10
Concepts
Quantization
Understanding Quantization
To understand quantization a bit more let’s have a look at the following Example:
Technical Presentation  Page 11
Concepts
Quantization
Classification Of Quantization Process
The Quantization process is classified as follows:
 Uniform Quantization: The representation levels are equally spaced (Uniformly spaced)
Midtread type
Midrise type
 Non-Uniform Quantization: The representation levels have variable spacing from one
another .
[4]
But why do we need such classification ?!
Technical Presentation  Page 12
Concepts
Quantization
Human Speech – Excursion & Recap (1)
Speech can broken into two different categories:
 Voiced (zzzzz)
 Un-Voiced (sssss)
Naturally occurring speech signals are composed of a combination of the above categories,
take the word “Goat” for example:
[4]
Goat contains two voiced signals followed by a partial closure of the vocal tract and then an
Un-voiced signal. Those occurs at 3400-3900, 3900-5400, and 6300-6900, respectively.
Technical Presentation  Page 13
Concepts
Quantization - why do we need such classification ?! (1)
Human Speech – Excursion & Recap (2)
It should be noted that:
The peak-to-peak amplitude of voiced signals is approximately ten times that of un-voiced
signal.
Un-voiced signals contain more information, and thus higher entropy than voiced signals.
The telephone system must provide higher resolution for lower amplitude signals
Probability of occurrence
Statistics of Speech Signals :
Amplitude of speech signals
Technical Presentation  Page 14
[3]
[6]
Concepts
Quantization - why do we need such classification ?! - (2)
Quantization Noise
 The Quantization process is lossy (errorneous).
 An error defined as the difference between the input signal M and the output signal V. This
error E is called the Quantization Noise.
 Consider the simple example:
 M = (3.117, 4.56, 2.31, 7.82, 1)
 V = (3,3,2,7,2)
 E = M – V = (0.117 ,1.561, 0.31, 0.89, 1)
 How do we calculate the noise power ?
 Consider an input m of continuous amplitude of the range (-M_max, M_max)
 Assume a uniform Quantizer, how do we get the Quantization Noise Power
1
Technical Presentation  Page 15
Concepts
Quantization - why do we need such classification ?! - (3)
Comparison – Uniform Vs. Non-Uniform Usage
 Speech signals doesn’t require high quantization resolution for
high amplitudes (50% Vs. 15%).
 wasteful to use uniform quantizer ?
The goal is decrease the SQNR, more levels for low amplitudes, less levels for high ones.
 Maybe use a Non-uniform quantizer ?
[3]
Technical Presentation  Page 16
Concepts
Quantization
More About Non-Uniform Quantizers (Companding)
 Uniform quantizer = use more levels when you need it.
 The human ear follows a logarithmic process in which high amplitude sound doesn’t
require the same resolution as low amplitude sounds.
 One way to achieve non-uniform quantization is to use what is called as “Companding”
 Companding = “Compression + Expanding”
Compressor
Function
Uniform
Quantization
Expander
Function
(-1)
Technical Presentation  Page 17
Concepts
Quantization
What is the purpose of a Compander ?
 The purpose of a compander is to equalize the histogram of speech signals so that the
reconstruction levels tend to be equally used.
[6]
There are two famous companding techniques that Follow the
Encoding law
A-Law Companding
µ-Law Companding
Technical Presentation  Page 18
[6]
2
Concepts
Quantization
A-Law Encoding
µ-Law Encoding
[3]
Technical Presentation  Page 19
Concepts
Quantization
Companding Approximation
 Logarithmic functions are slow to compute, why not approximate ?
 3 bits, 8 segments ( chords ) to approximate
 P is the sign bit of the output
 S’s are the segment code
 Q’s are the quantization codes
[3]
Technical Presentation  Page 20
Concepts
Quantization
Companding Approximation – Algorithm
Encoding
 Add a bias of 33 to the absolute value of the input sample
 Determine the bit position of the most significant among bits 5 to 12 of the input
 Subtract 5 from that position, and this is the Segment code
 Finally, the 4 bit quantization code is set to 4 bits after the bit position of the most
significant among bits 5 to 12
Decoding
 Multiply the quantization code by 2 and add 33 the bias to the result
 Multiply to the result by 2 raised to the power of the segment code
 Decrement the result by the bias
 Use P – bit to determine the sign of the result
 Example ?!
[3]
Technical Presentation  Page 21
Concepts
Quantization
µ-Law Encoding - Example
 Example Input - 656
P
S2
S1
S3
Q3
Q4
Q5
Q6
1
1
0
0
0
1
0
1
 Sample is negative so bit P becomes 1
 Add 33 to the absolute value to bias high input values (due to wrapping)
 The result after adding is 689 = 0001-0101-10001
 The most-significant 1 bit in position 5 to 12 is at position 9
 Subtracting 5 from the position values yields 4  The segment code
 Finally the 4 bits after the last position are inserted as the quantization code
Technical Presentation  Page 22
Concepts
Quantization
µ-Law Decoding - Example
 Example Input - 656
P
S2
S1
S3
Q3
Q4
Q5
Q6
1
1
0
0
0
1
0
1
 The quantization code is 101 = 5, so 5*2 +33 =43
 The segment code is 100 = 4 , so 43* 2^4 = 688.
 Decrement the Bias 688 -33 =655
 But P is 1 so the final result is -655
 Quantization Noise is 1 (Very small)
Technical Presentation  Page 23
Concepts
Quantization
µ-Law Encoding
 Approximately linear for smaller values & Logarithmic for high input values
The practically used values for µ is 255
Used for speech signals
Used for PCM telephone systems in US, Canada and Japan
A-Law Encoding
 Linear segments for low level inputs & a logarithmic segment for high level inputs
The practically used values for A is 100
 Used for PCM telephone system in Europe
Technical Presentation  Page 24
Concepts
Pulse Code Modulation (PCM)
PCM Description
 Sampling results in PAM
 PCM uniformly quantizes PAM
 The result of PCM are PCM words
 Each PCM word is l= Log2 (L) bits
[3]
Technical Presentation  Page 25
Concepts
Differential Pulse Code Modulation (DPCM)
DPCM Description
 Signals that are sampled at a high rate have high correlation.
 The difference between those samples will not be large
 Instead of quantizing each sample, why not quantize the difference ?
 This will result in a quantizer with much less number of bits
[7]
 This is a simple form where (First Order)
 More than one signal can be used in the prediction (N-Order)
 Problems with this approach ?
Technical Presentation  Page 26
[7]
Concepts
Differential Pulse Code Modulation (DPCM)
DPCM Example
[7]
 It is clear here from the table that the error adds up to produce an output signal which is
completely different from the original one
Technical Presentation  Page 27
Concepts
Differential Pulse Code Modulation (DPCM)
DPCM Prediction
 Previously, input to predictor in the encoder was different than the one in the decoder.
 The difference between the predictor led to reconstruction error e(n) = x[n] – x’[n].
 To solve this problem completely the same predictor that was used in the decoder will also
be used in the decoder
Channel
 Therefore the reconstruction error at the decoder output will be the same as the
quantization error at the encoder.
 There will be no quantization accumulation.
Technical Presentation  Page 28
Concepts
Adaptive Differential Pulse Code Modulation (ADPCM)
ADPCM Description
 As can be inferred from the name, ADPCM combines PCM + DPCM and adds the ADPCM
 The “A” in ADPCM stands for “Adaptive”
 In DPCM, the difference between x[k] and x[k-1] is transmitted instead of x[k]
 To further reduce the number of bits per sample, ADPCM adapts the quantization levels to
the characteristics of the analog signal . Original 32-Kbps ADPCM used 4 bits
[9]
Technical Presentation  Page 29
Standards, Examples & Applications
G711
G711 Description
 A Wave form codec that was Released in 1972
 Formal name is Pulse Code Modulation (PCM) since it uses PCM in it’s encoding
 G711 achieves 64 kbps bit rate (8 kHz sampling frequency x 8 bits per sample)
 G711 defines two main compression algorithms
A-Law (Used in North America & Japan)
 µ-Law (Used in Europe and the rest of the world)
 A and µ laws takes as an input 14-bit and 13-bit signed linear PCM samples and Compress
them to 8-bit samples
Applications
 Public Switching Telephone Network (PSTN)
 WiFi phones VoWLAN
 Wideband IP Telephony
 Audio & Video Conferencing
 H.320 & H.323 specifications
Technical Presentation  Page 30
Standards, Examples & Applications
G726
G726 Description
 G726 makes a conversion of a 64 kbps A-law or µ-law PCM channel to and from a 40, 32, 24
or 16 kbps channel.
 The conversion is applied to raw PCM using the ADPCM Encoding Technique
 Different rates are achieved by adapting the number of quantization levels
4
- levels (2 bits and 16 kbps)
7
- levels (3 bits and 24 kbps)
 15 - levels (4 bits and 32 kbps)
 31 - levels (5 bits and 64 kbps)
 Includes G721 and G723
[12]
Technical Presentation  Page 31
Performance Comparison
[1]
Technical Presentation  Page 32
Summary & Conclusion
Summary
 We talked about quantization concepts in all it’s flavors
 We discussed about the category of waveform coding (PCM,DPCM and ADPCM)
 We presented the ITU Standards (G711 and G726) and mentioned some examples and
applications
 Finally we did a comparison the most prominent speech codec's out there.
Conclusion
 Speech coding Is an important concept that is required to efficiently use the existing
bandwidth
 There exist many important metrics to keep in mind when doing speech coding. It is I
important for a good speech coder to balance those metrics. The Most important ones are
 Data Rate
 Speech Quality
 Delay
 Waveform codec's, achieves the best speech quality as well as low delays.
 Vocoders achieves low data rate but at the cost of delays and speech quality
 Hybrid coders achieves acceptable speech quality and acceptable delay and data rate.
Technical Presentation  Page 33
References
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
Wai C. Chu Speech Coding Algorithms: Foundation & Evolution of Standardized Coders
Speech Coding: http://www-mobile.ecs.soton.ac.uk/speech_codecs/
Sklar: Digital Communication Fundamentals And Applications.
A-Law and mu-Law Companding Implementations Using the TMS320C54x
Michael Langer: Data Compression – Introduction to lossy compression
Signal Quantization and Compression Overview http://www.ee.ucla.edu/~dsplab/sqc/over.html
Wajih Abu-Al-Saud: Ch. VI Sampling & Pulse Code Mod. Lecture 25
Yuli You: Audio Coding: Theory And Applications
Tarmo Anttalainen: Introduction to telecommunication Networks Engineering
Wikipedia G711: http://en.wikipedia.org/wiki/G.711
David Salomon: Data Communication the Complete Reference
ITU CCIT Recommendation G.726 ADPCM
Technical Presentation  Page 34
Questions & Discussion
Thank you!!
Technical Presentation  Page 35
Download