IEEE Paper Template in A4 (V1)

advertisement
JOURNAL OF INFORMATION, KNOWLEDGE AND RESEARCH IN
ELECTRONICS AND COMMUNICATION ENGINEERING
A Comparative Review of Various Types of
Speech Coder for Mobile Communication
1PROF.
1
R. N. RATHOD, 2PROF. M. V. MAKWANA, 3 Mr.V. C .TOGADIYA
Asst.Professor Power Electronics Department, Morbi,Gujarat
And Head, Power Electronics Department, Morbi,Gujarat
3 M.E. [EC] Student, R.K.University,Rajkot,Gujarat
2 Asst.Professor
rathod_45@yahoo.com, shafimakwana@gmail.com, vctogadiya@gmail.com
ABSTRACT: Speech coding is the process of obtaining a compact representation of voice signals for efficient
transmission over band-limited wired and wireless channels and/or storage. It is also classified as the process
of reducing the bit rate of digital Speech representation for transmission or storage, while maintaining a speech
quality that is acceptable for the application. . To achieve high quality speech at a low bit rate, coding
algorithms apply sophisticated methods to reduce the redundancies, that is, to remove the irrelevant information
from the speech signal. Speech coding is an important aspect of modern telecommunications. Lower bit rate
implies that a smaller bandwidth is required for transmission. in wireless and satellite communications
bandwidth is limited. At the same time, multimedia communications and some other speech related applications
need to store the digitized voice. Reducing the bit rate implies that less memory is needed for storage. These two
applications of speech compression make speech coding an attractive field of research
Keywords— Analysis by Synthesis Coding, Excitation Detector, Vocal Cords, Quasi Periodic Sound.
I: INTRODUCTION
Speech Coding is classified as the process of
reducing the bit rate of digital speech representation
for transmission or storage, while maintaining a
speech quality that is acceptable for the application.
Speech coding is an important aspect of modern
telecommunications. Speech coding is the process of
digitally representing a speech signal. The primary
objective of speech coding is to represent the speech
signal with the fewest number of bits, while
maintaining a sufficient level of quality of the
retrieved or synthesized speech with reasonable
computational complexity. To achieve high quality
speech at a low bit rate, coding algorithms apply
sophisticated methods to reduce the redundancies,
that is, to remove the irrelevant information from the
speech signal.
In addition, a lower bit rate implies that a smaller
bandwidth is required for transmission. Although in
wired communications very large bandwidths are
now available as a result of the introduction of optical
fiber, in wireless and satellite communications
bandwidth is limited. At the same time, multimedia
communications and some other speech related
applications need to store the digitized voice.
Reducing the bit rate implies that less memory is
needed for storage. These two applications of speech
compression make speech coding an attractive field
of research. [5] Fig.1 depicts the block diagram of
speech coding method.
Fig.1. Block diagram of speech coding
Fig (2)
ISSN: 0975 – 6779| NOV 11 TO OCT 12 | VOLUME – 02, ISSUE - 01
Page 320
JOURNAL OF INFORMATION, KNOWLEDGE AND RESEARCH IN
ELECTRONICS AND COMMUNICATION ENGINEERING
Voiced signal
Voiced signal shown in fig (2), shows the
relationship between amplitude of speech and
varying frequency, it also shows the relationship
between amplitude of speech and varying time.
Unvoiced signal
Unvoiced signal shown in fig (3), shows
relationship between amplitude of speech
varying frequncy,and same fig(3) shows
relationship between amplitude of speech
varying time.
the
and
the
and
Fig (3)
II: SPEECH CODING METHODS
1. Waveform Coding
2. Source Coding
3. Hybrid Coding
Waveform Coding
Waveform coders attempt to code the exact shape of
the speech signal waveform, without considering in
detail the nature of human speech production and
speech perception. Waveform coders are most useful
in applications that require the successful coding of
both speech and no speech signals. In the public
switched telephone network (PSTN), for example,
successful transmission of modem and fax signaling
tones, and switching signals is nearly as important as
the successful transmission of speech. The most
commonly used waveform coding algorithms are
uniform 16-bit PCM, commanded 8-bit PCM, and
ADPCM.
Source Coding
Vocoders are parametric digitizers that use certain
properties of human Speech production Mechanism.
Human Speech is produced by emitting Sound
Pressure Waves Which is radiated primarily from lips,
although significant energy can also emanate from
nostrils, throat, etc. In human Speech, the air
Compressed by the lungs excites the Vocal in Two
types of modes. [2]
When generating Voice Sounds, the Vocal cords
vibrate and generate quasi-periodic Voice Sound. In
the Case of lower energy unvoiced Sounds, the Vocal
cords do not participate in the Voice Production and
the Source act like a noise generator.
In the Case of Vocoders, instead of producing a close
replica of an input Signal at the output, an
appropriate set of Source Parameters is generated to
characterize the input Signal as close as possible for a
given period of time.[3]
Hybrid Coding
Hybrid Coding is an attractive trade-off between
Waveform Coding and Vocoders, both in terms of
speech quality and transmission bit rate. Although
generally at the price of higher complexity. This
Types of Coder is also known as analysis- bySynthesis (ABS) Codec’s. Ex. Include LPAS, CELP
etc.
In an LPAS Coder, the decoded Speech is produced
by filtering the signal Produced by the excitation
generator through both a long term and Short term
predictor synthesis filter.
In CELP Coder both Encoder and decoder store the
same collection of codes of possible length L in a
Codebook. The excitation for each frame is described
completely by the index to an an appropriate Vector.
This index is found by an Exhaustive Search Over all
Possible Code Vector, using the one that gives the
smallest error between the original and decoded
signals. [6]
The CELP coding operated under 8kbps and it goal is
transmit the minimum amount of speech signal in
codeword with minimum error is produced to
synthesis the speech signal[2].the major application
of compression of speech in mobile communication
at side of encoder to transmitt the speech signal in
low bit rate. It allows longer message into speech
code, and it also allows to user share the same
bandwidth.[3]. Codebook excited linear prediction
(CELP) was introduced by B.S. Anal and M.A.
Schroeder at the 1984[4].To increases the speed of
data transmission and data rates one more coding
technology introduced in wireless communication
which is algebric code excited linear pridiction
techniqes.in ACELP the speech signal transmitt with
minimum 4kbps speed so we have to reduces the bit
rate more & more compare to other coding
technology, the MATLAB tool is used to design
ACELP algorithm, The tool is user-friendly and
graphical user interface (GUI) that allows the student
to study and verify through graphics the various
aspects of the algorithm such as: the LPanalysis, the
open-loop pitch search, the adaptive codebook search
(pitch search), the fixed codebook search, and the bit
allocation patterns. We choose MATLAB as the
implementation platform because it allows the user to
easily understand the complex parts of the algorithm
whose function is not a Major [10].
ISSN: 0975 – 6779| NOV 11 TO OCT 12 | VOLUME – 02, ISSUE - 01
Page 321
JOURNAL OF INFORMATION, KNOWLEDGE AND RESEARCH IN
ELECTRONICS AND COMMUNICATION ENGINEERING
The Adaptive Multi-Rate Wideband (AMR-WB)
speech codec recently selected by the Third
Generation Partnership Project (3GPP) and European
Telecommunication Standards Institute (ETSI) for
GSM and the third generation mobile communication
WCDMA system for providing wideband speech
services.
The
paper
describes
AMR-WB
standardization history, algorithmic description
including novel techniques for efficient ACELP
wideband speech coding, and subjective quality
performance. The AMR-WB speech codec algorithm
was selected in December 2000, and the
corresponding specifications were approved in March
2001
Standardization activity for wideband speech coding
around 16 kbit/s. The adoption of AMR-WB by ITUT is of significant importance since for the first time
that the same codec is adopted for wireless as well as
wire line services. AMR-WB uses an extended audio
bandwidth from 3.4 kHz to 7 kHz and gives superior
speech quality and voice naturalness compared to
existing 2nd and 3rd generation mobile
communication systems. The wideband speech
service provided by the AMR-WB codec will give
mobile communication speech quality that also
substantially exceeds (narrowband) wire line quality.
A moving vehicle encounters resistance from the air.
This drag is made up of pressure drag and skin
friction drag. The oncoming airflow pushes against
the front, creating a high-pressure region, just as it
does on the wheels and the front of the semi-trailer.
The car moving forward in the airflow creates a lowpressure region behind the tractor and the semi-trailer:
these areas ‘suck’ the vehicle backwards, as it were.
Interestingly, the high-pressure region at the front
contributes just as much to drag as the low-pressure
region at the rear: with or without crosswind, each
accounts for about 1/3 of the overall drag. The
remaining 1/3 of the overall drag is created by the
vehicle’s under-body [3].
The relative importance of aerodynamic design to
reducing truck fuel consumption can be seen from an
overview of the various factors involved in fuel
consumption. It is important here to distinguish
between different types of truck. The chart below
shows the components that use energy in an
articulated vehicle expressed as losses. As the chart
shows, 15% of the fuel is used to overcome
mechanical friction in the engine, gearbox and drive
shaft. 45% of the fuel is used to overcome the rolling
resistance. Drag is responsible for 40% of fuel
consumption [4].
III: SPEECH CODEC ATTRIBUTES
Speech coders attempt to minimize the bit rate for
transmission or storage of the signal while
maintaining required levels of speech quality,
communication delay, and complexity of
implementation (power consumption). We will now
provide brief descriptions of the above parameters
of performance, with particular reference to speech.
Speech Quality
Speech quality is usually evaluated on a five-point
scale, known as the mean-opinion score (MOS)
scale, in speech quality testing---an average over a
large number of speech data, speakers, and
listeners. The five points of quality are: bad, poor,
fair, good, and excellent. Quality scores of 3.5 or
higher generally imply high levels of
intelligibility, speaker recognition and naturalness.
Transmission Bit Rate
Since the speech codec shares the communications
channel with other data, the peak bit rate should be as
low as possible so as not to use a disproportion ate
shares of the channel. codec below 64 kbps were
developed to increase the capacity of equipment used
for narrow bandwidth links.
Communication Delay
Speech coders often process speech in blocks and
such processing introduces communication delay.
Depending on the application, the permissible total
delay could be as low as 1 msec, as in network
telephony, or as high as 500 msec, as in video
telephony. Communication delay is irrelevant for
one-way communication, such as in voice mail.
Complexity
The complexity of a coding algorithm is the
processing effort required to implement the
algorithm, and it is typically measured in terms of
arithmetic capability and memory requirement, or
equivalently in terms of cost. A large complexity
can result in high power consumption in the
hardware.
IV. CURRENT CAPABILITIES IN SPEECH CODING
Fig.2. Shows the speech quality that is currently
achievable at various bit rates from 2.4 to 64 kbps
for narrowband telephone (300--3400 Hz) speech.
The intelligibility of coded speech is sufficiently
high at these bit rates and is not an important issue.
The speech quality is expressed on the five-point
MOS scale along the ordinate in Figure. PCM
(pulse-code modulation) is the simplest coding
system, a memory less quantizer, and provides
essentially transparent coding of telephone speech
at 64 kbps. With a simple adaptive predictor,
adaptive differential PCM(ADPCM) provides highquality speech at 32 kbps. The speech quality is
slightly inferior to that of 64 kbps PCM, although
the telephone handset receiver tends to minimize the
difference. ADPCM at 32 kbps is widely used for
expanding the number of speech channels by a
ISSN: 0975 – 6779| NOV 11 TO OCT 12 | VOLUME – 02, ISSUE - 01
Page 322
JOURNAL OF INFORMATION, KNOWLEDGE AND RESEARCH IN
ELECTRONICS AND COMMUNICATION ENGINEERING
factor of two, particularly in private networks and
international circuits. It is also the basis of lowcomplexity speech coding in several proposals for
personal communication networks, including CT2
(Europe), UDPCS (USA) and Personal Handy
phone (Japan).[8]
For rates of 16 kbps and lower, high speech quality
is achieved by using more complex adaptive
prediction, such as linear predictive coding (LPC)
and pitch prediction, and by exploiting auditory
masking and the underlying perceptual limitations
of the ear. Important examples of such coders are
multi-pulse excitation, regular-pulse excitation, and
code-excited linear prediction (CELP) coders.[1]
The CELP algorithm combines the high quality
potential of waveform coding with the compression
efficiency of model-based vocoders. At present, the
CELP technique is the technology of choice for
coding speech at bit rates of 16 kbps and lower. At
16 kbps, a low-delay CELP (LD-CELP) algorithm
provides both high quality, close to PCM, and low
communication delay and has been accepted as an
international standard for transmission of speech
over telephone networks. [1]
At 8 kbps, which is the bit rate chosen for firstgeneration digital cellular telephony in North
America speech quality is good, although
significantly lower than that of the 64 kbps PCM
speech. Both North American and Japanese first
generation digital standards are based on the CELP
technique. The first European digital cellular
standard is based on regular-pulse excitation
algorithm at 13.2 kbps.
Fig.2. The speech quality mean opinion score for
various bit rates.
The present research is focused on meeting the
critical need for high quality speech transmission
over digital cellular channels at 4 and 8 kbps. Low
bit rate speech coders are fairly complex, but the
advances in VLSI and the availability of digital
signal processors have made possible the
implementation of both encoder and decoder on a
single chip[7]
V: EXPERIMENTAL DATA
Speech
file
Algorithm
used
File
size
Results
The wave
file is
5.84 sec
long
The wave
file is
3.34 sec
long
The wave
file is
2.85 sec
long
a_rajni
LPC,RELP
257KB
i_rajni
LPC,RELP
147 KB
s1 of wb
LPC,RELP
256kbps
s2of wb
LPC,RELP
256kbps
t_8k_2s
LPC,RELP
128kbps
a_bina
LPC,RELP
352kbps
ISSN: 0975 – 6779| NOV 11 TO OCT 12 | VOLUME – 02, ISSUE - 01
The wave
file is
2.94 sec
long
The wave
file is
1.95 sec
long
The
wavefile
is
4.75 sec
long
Page 323
JOURNAL OF INFORMATION, KNOWLEDGE AND RESEARCH IN
ELECTRONICS AND COMMUNICATION ENGINEERING
Simulation Results Based on LPC
Simulation Results Based on RELP
Original signal = "s1ofwb"
Original signal = "t 8k2s"
0.4
0.4
0.2
0.2
0
0
-0.2
-0.2
-0.4
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
-0.4
4
x 10
0
2000
4000
synthesized speech of "s1ofwb" using LPC algo
6000
8000
10000 12000 14000
16000
RELP compressed output
0.2
4
0.1
2
0
0
-0.1
-0.2
-2
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
0
2000
4000
6000
8000
10000 12000 14000
16000
4
x 10
Original signal = "s2ofwb"
0.4
Original signal = "dantaleabrr"
0.6
0.2
0.4
0
0.2
-0.2
0
-0.2
-0.4
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
-0.4
4
x 10
0
synthesized speech of "s2ofwb" using LPC algo
1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
RELP compressed output
0.4
5
0.2
0
0
-0.2
-0.4
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
-5
0
1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
4
x 10
ISSN: 0975 – 6779| NOV 11 TO OCT 12 | VOLUME – 02, ISSUE - 01
Page 324
JOURNAL OF INFORMATION, KNOWLEDGE AND RESEARCH IN
ELECTRONICS AND COMMUNICATION ENGINEERING
VI: CONCLUSION
From the above experiment & result analysis we can
say that we can take the benefit of less memory &
data rate reduction if we utilize coding algorithm. We
can also obtain length of wave file by utilizing
various algorithms
In this paper the simulation is performed for various
wave files by applying various coding algorithms on
it and concluded that the synthesized speech of all the
coding algorithms is different, but decoded speech of
all the coding algorithms is same.
Simulation result of SUB BAND coder of
Wave file “dantaleabrr.wav”
REFRENCES
[1] A. M. Kondoz, “Digital Speech-Coding for Low
Bit Rate Communication Systems”, by JohnWiley &
Sons Ltd., 2001.
[2] L. R. Rabiner and R. W. Schafer, Digital
Processing of Speech Signals",
[3] B. S. Atal, M. R. Schroeder, and V. Stover,
"Voice-Excited Predictive Coding Systetm for Low
Bit-Rate Transmission of Speech",
[4] C. J. Weinstein, "A Linear Predictive Vocoder
with Voice Excitation",
[5] M. H. Johnson and A. Alwan, "Speech Coding:
Fundamentals and Applications", to appear as a
chapter in the Encyclopedia of Telecommunications,
Wiley, December 2002.
[6] wireless communication by Theodore
S.
Rappaport
[7] Springer Handbook of Speech Processing
by Jacob Benesty, M. M. Sondhi, Yiteng Huang
[8] V. K. Garg “Principles and
Applications of
GSM ”.
[9] ITU-T Recommendation G.728, Coding of
Speech at
16 kb/s Using Low-Delay CELP, Sep. 1992.
[10].ITU-T G.729/G.729A CS-ACELP 8kbps Speech
Coder by: Lior Shadhan
ISSN: 0975 – 6779| NOV 11 TO OCT 12 | VOLUME – 02, ISSUE - 01
Page 325
Download