Comments on test vectors generation in ITU

advertisement
INTERNATIONAL TELECOMMUNICATION UNION
TELECOMMUNICATION
STANDARDIZATION SECTOR
STUDY PERIOD 2009-2012
COM 12 – C 13 – E
February 2009
English only
Original: English
Question(s):
8, 13, 15/12
STUDY GROUP 12 – CONTRIBUTION 13
Source:
France Telecom
Title:
Comments on test vectors generation in ITU-T Recommendation P.564
Abstract
The need for parametrical models of voice quality evaluation is undeniable. They are commonly
applied in telecommunication systems wherever measurements of quality need to be done on live
communications but lack of resources do not allow the use of signal-based methods. Until this day
there exist few parametrical models of quality evaluation, only two of them have been standardized
by ITU-T (model E [1] and Call Clarity Index [2]). In order to evaluate the performances of
parametrical voice quality evaluation models which were not standardised ITU-T has produced
recommendation P.564. Hardly any of the existing models have been validated; among others the E
model normalised by recommendation ITU-T G.107 had not passed the validation by ITU-T P.564
[3]. We believe that one of the possible causes of this fact, besides the known weaknesses of the Emodel, lies in a partially improper test vector generation, which leads to estimation of MOS score
using inappropriate values of network parameters. This is caused by imprecise generation of packet
loss ratio for short test vectors. As the length of test vectors cannot be increased (it is limited by
PESQ [4] method to 30 s) we postulate the revision of the network impairment profile in ITU-T
P.564 which for the time being is based on network model described in ITU-T G.1050.
Contact:
Anna Czuczman
France Telecom
France
Tel:
+33 296 05 22 90
Fax:
+33 296 05 35 30
Email: anna.czuczman@orange-ftgroup.com
Attention: This is not a publication made available to the public, but an internal ITU-T Document intended only for use by the
Member States of ITU, by ITU-T Sector Members and Associates, and their respective staff and collaborators in their ITU related
work. It shall not be made available to, and used by, any other persons or entities without the prior written consent of ITU-T.
-2COM 12 – C 13 – E
1.
Introduction
The recommendation ITU-T P.564 defines series of tests which allow the verification that candidate
parametrical model estimate well voice quality of VoIP service. The validation of model conformity
is composed of several phases which are described by recommendation:
■ Choice of test vectors – ITU-T P.564 recommends the usage of 4 speech files : frf1, spm1,
smp2, ukf1
■ Generation of degraded test vectors
■ Evaluation of conformity
Test vectors are generated for various network scenarios, while applying several values of network
parameters. Network parameters as well as scenarios are both described in recommendation ITU-T
G.1050. This recommendation also defines the network architecture and the means of simulating
network parameters.
Before conducting the actual test vector generation as indicated by ITU-T P.564, pre-tests should be
performed. The objective of pre-tests is to determine the appropriate values of parameters in order
to obtain MOS-LQ (i.e. we are restricted to a listening-only context) scores which are within each
of 5 quality ranges.
The evaluation of conformity according to ITU-T P.564 is based on comparison of results of voice
quality estimation of candidate parametrical model (MOS-LQE) with reference MOS-LQO score
measured by PESQ. As the number of tests which need to be conducted is quite significant it has
been chosen to use as reference MOS score the results of objective measuring method and not of
subjective tests.
Source
Device A
Local
Access A
LAN A
LAN data rate
LAN occupancy
Local
Access B
Core IP Network
Access data rates
Access occupancy
MTU size
Route flapping
Link failure
One-way delay
Jitter
Packet loss
Reordered packets
LAN B
Access data rates
Access occupancy
MTU size
Destination
Device B
LAN data rate
LAN occupancy
Figure 1: G.1050 - IP Network Impairment model
While conducting validation according to ITU-T P.564 we have observed that MOS scores
calculated by PESQ differed significantly (1-2 point on MOS scale) for the same set of parameters
values (comparison of results of pre-tests and actual tests). After close investigation we believe that
it stems from the fact that for small files the generated packet loss ratio differs significantly form
the demanded packet loss ratio. As a result, even though the demanded model parameters used to
generate test vectors were exactly identical, the resulting packet loss ratios were not the same, and
thus neither were MOS scores for those files.
The second chapter of this contribution explains the theoretical foundations of this issue. The
following chapters describe its implications for recommendations ITU-T G.1050 (chapter 3) and
ITU-T P.564 (chapter 4). The last chapter concludes this document and presents two propositions of
approaching the problem.
ITU-T\COM-T\COM12\C\13E.DOC
-3COM 12 – C 13 – E
2.
Desired packet loss ratio
The simplest algorithm of packet loss generation is based on one parameter: packet loss ratio Ppl.
Ppl describes the percentage of packets lost of the test vector of N packet length. The algorithm is
presented hereafter.
Let N be equal to number of packets
for n=1:N do
generate random number p from 0 to 1
if p < Ppl loss(n) = TRUE
else loss(n) = FALSE
endif
end
The number of packets actually lost tends to Ppl if N is big enough (N > 5000001). However, the
length of test vectors in ITU-T P.564 validation is quite small. In fact it cannot exceed 30 s as PESQ
(remember: this as been chosen in P.564 as reference method) does not allow the usage of longer
files. Therefore maximum number of packets is 1500 for a packet size of 20 ms. This maximum
length of test vectors is much too small and as a result the observed (applied) packet loss ratio
(Ppm) may differ significantly from Ppl. This is proven in the following chapters.
Theoretically the probability of loosing n packets when applying the above algorithm with a defined
Ppl is equal to:
N
N n
Ploose n out of N packets     Ppl n  1  Ppl 
n
Let the measured probability Ppm be defined as the number of lost packets n divided by the total
number of packets in test vector (N). Using the above formula we can obtain the relation between
measured (Ppm) and demanded (Ppl) packet loss ratio. Figure 2 illustrates the probability density
function of measured packet loss ratio (Ppm) knowing that the demanded packet loss ratio was
equal to Ppl.
Figure 2: Probability density function of actually obtained packet loss ratio for a given test vector
1 For Ppl = 2%, probability of 98,76 % that measured packet lost ratio will be equal to 2 % ± 0,05 %
ITU-T\COM-T\COM12\C\13E.DOC
-4COM 12 – C 13 – E
As it can be observed on Figure 2 the standard deviation of probability density function is quite
significant. For example, for Ppl equal to 2 % there is a no zero probability that the observed packet
loss ratio (Ppm) will be twice smaller or 50 % higher!
Practical experiments confirm theoretic assumptions. Packet loss generation algorithm presented
above was employed to generate packet loss vectors. 10000 tests were performed for demanded loss
ratios (Ppl) of 2 %, 5 % and 10 %.
Figure 3: Experimentally observed probability density function of packet loss ratio generation
Therefore, when generating packet loss on files which are too small, one has to be aware that the
obtained packet loss ratio can differ from demanded.
In addition, these effects are visible even for maximum length of test vector allowed by PESQ (30
s).
3.
Implication for G.1050
The model defined by ITU-T G.1050 uses a more complicated model of packet loss (Gilbert-Elliot)
but the algorithm of loss application is basically the same as the one presented above in this
document (loss is applied packet per packet). As the model is using several parameters to generate
loss sequence (loss and transition probabilities), each of these parameters is subject to effects
described in previous chapter.
In order to simulate real network, G.1050 takes into consideration several parts of network (home,
access and core). In each part of the network packet loss and jitter are applied according to
mechanisms proper to that part of network. However, it also means that packet loss is applied
several times for every part of the network and that each time the actual packet ratio applied may
differ from the demanded ratio. Therefore the resulting packet loss ratio at the egress of the model
may differ significantly from the ratio which could have been calculated theoretically, and (more
embarrassing) between two consecutive applications of the same simulation scenario. This fact in
itself is not bothering when using test vectors of significant length (the theoretical and obtained
packet loss ratios are similar); however it has important implications when used for validation with
P.564.
ITU-T\COM-T\COM12\C\13E.DOC
-5COM 12 – C 13 – E
4.
Implications for P.564
Most of the network parameters described by ITU-T G.1050 which are taken into consideration by
ITU-T P.564 have direct influence on packet loss ratio applied on test vectors: LAN or access
occupancy, Out Of Sequence Packets and Core loss percentage. In addition, in several test scenarios
(1A, 3C, 5C, 6B, 7B) these parameters are used at the same time.
In addition, ITU-T P.564 recommends the usage of speech files of 8 seconds. It means that for
typical packet size of 20 ms the test vectors have length of 400 packets. This is yet smaller that the
length for which the analysis described in chapter 2 was done.
As described in previous chapters, when simulating packet loss on small test vectors, the obtained
packet loss ratio differs from the demanded ratio. Yet, if several parameters are used to apply packet
loss (V1 vs. V2 etc. as defined by ITU-T P.564) it is difficult to determine the value of each
parameter which was actually used to apply packet loss. Generally, the demanded values are
considered as those which were actually applied, but as it was described in previous chapter this is
not necessarily true. Therefore the obtained MOS scores may in fact correspond to different
nominal values of parameters (V1, V2, V3, V4 or F1) than those to which they were attributed. As a
result, it may happen that MOS scores measured during different tests for the same condition differ
from each other, even though the measurements were performed on test vectors which were
generated with the same set of network parameters. It is also highly probable that some vectors fall
outside the conditions for which they had been meant. But all in all, we can’t really say that this
gives a global bias to the test plan of P.564.
We believe that this may be one of the causes of the fact that the E model was not validated
according to P.564.
5.
Conclusions
P.564 describes an excellent procedure of validating candidate parametrical models of voice quality
evaluation. Yet, the fact that it uses PESQ method for reference MOS measuring results in the
limitation of test vector length to 30 s which for standard payloads of 20 ms corresponds to 1500
packets. On the other hand, the network model ITU-T G.1050 used to generate test vectors has an
important drawback when small test vectors are used. While it is clear that no other reference
method can replace PESQ for the time being in P.564, it is necessary to discuss the method of
generation of test vectors. Several possibilities may be discussed.
Firstly, if it is decided to continue with usage of ITU-T G.1050 network model, it is necessary to
change slightly the procedure of test vector generation. For example, it is possible to generate
several test vectors for one set of parameters and then calculate the mean of observed packet loss
ratio of these test vectors. MOS score (for both PESQ and the parametric model under validation)
should only be calculated for test vectors for which the observed packet loss ratio is closest to mean
value. The number of test vectors which need to be generated should be defined empirically.
A second option could consist in usage of a different network model (or an enhanced G.1050) for
test vector generation. This model should employ a different packet loss algorithm which would be
less subject to randomness for small test vectors. Nevertheless, the applied model should simulate
real network conditions.
It is necessary that Q 15/12 considered this issue as it may be an important cause of non-conformity
of candidate parametrical models, which otherwise could have been proven valid.
References
ITU-T\COM-T\COM12\C\13E.DOC
-6COM 12 – C 13 – E
[1] ITU-T recommendation G.107 (2008) “The E-model: a computational model for use in
transmission planning”
[2] ITU-T recommendation P.562 (2004) “Analysis and interpretation of INMD voice-service
measurements”
[3] ITU-T contribution COM 12 – C 100 – E (2007) “E-model P.564 Compliance Testing and Emodel evolution proposals”
[4] ITU-T recommendation P.862 (2001) “Perceptual evaluation of speech quality (PESQ): An
objective method for end-to-end speech quality assessment of narrow-band telephone networks and
speech codecs”
_____________
ITU-T\COM-T\COM12\C\13E.DOC
Download