Dallas, TX
March 27-31, 2006
C11-20060424-015
3GPP2/TSG-C1.1
TITLE:
Characterization Final Test Report for EVRC-Release B
SOURCE:
Alan Sharpley
Dynastat
6850 Austin Center Blvd., Suite 150
Austin, TX 78731
Phone: (512) 476-4797
Fax:
(512) 472-2883
Email:
asharpley@dynastat.com
ABSTRACT:
This contribution provides a report of the activities of the Host and Listening Laboratories for the
Characterization Test for the cdma2000® standardized speech codec EVRC, Release-B. The
standardization effort was undertaken by the Voice Services Sub-Working Group, C1.1, of
3GPP2, TSG-C. The contribution presents results of the subjective tests and statistical analyses
of the test data.
RECOMMENDATION:
Review and approve.
Dynastat grants a free, irrevocable license to 3GPP2 and its Organizational Partners to incorporate text or other copyrightable
material contained in the contribution and any modifications thereof in the creation of 3GPP2 publications; to copyright and sell in
Organizational Partner's name any Organizational Partner's standards publication even though it may include all or portions of
this contribution; and at the Organizational Partner's sole discretion to permit others to reproduce in whole or in part such
contribution or the resulting Organizational Partner's standards publication. Dynastat is also willing to grant licenses under such
contributor copyrights to third parties on reasonable, non-discriminatory terms and conditions for purpose of practicing an
Organizational Partner’s standard which incorporates this contribution.
This document has been prepared by Dynastat to assist the development of specifications by 3GPP2. It is proposed to the
Committee as a basis for discussion and is not to be construed as a binding proposal on Dynastat. Dynastat specifically
reserves the right to amend or modify the material contained herein and to any intellectual property of Dynastat other than
provided in the copyright statement above.
Kansas City, MO
April 24-28, 2006
C11-20060424-015
CONTENTS
1.
INTRODUCTION
3
2.
HOST LAB ACTIVITIES
4
2.1.
2.2.
2.3.
2.4.
3.
DELIVERY OF EXECUTABLES
PROCESSING OF THE SPEECH MATERIALS
CROSSCHECKING OF THE PROCESSED SPEECH MATERIALS
CT0 – AVERAGE DATA RATE MEASUREMENT
LISTENING LAB ACTIVITIES FOR THE CHARACTERIZATION TEST
3.1.
3.2.
LISTENING INSTRUMENT
FORMAL LISTENING TESTS
3.2.1.
3.2.2.
3.2.3.
3.3.
Experiment CT1 - Clean Channel Conditions
Experiment CT2 – Impaired Channel Conditions (FER)
Experiment CT3 – Background Noise Conditions
EXPERT LISTENING TESTS
3.3.1.
3.3.2.
Experiment CT4 – Bad Rate Handling
Experiment CT5 - Performance with Music Signals
4
4
4
5
5
5
6
6
7
8
10
10
10
4.
TERMS OF REFERENCE TESTS
11
5.
UNITY GAIN REQUIREMENT
13
6.
ADDITIONAL ANALYSES
13
6.1.
6.2.
6.3.
PERFORMANCE ACROSS BIT-RATES
DUNNETT’S TESTS
GLOBAL ANALYSES
6.3.1.
6.3.2.
7.
ANOVA across Conditions within Experiments
ANOVA Across all Experiments — CT1, CT2, CT3
REFERENCES
13
14
17
17
19
20
APPENDIX – HOST LAB PROCESSING SCRIPTS
CHARACTERIZATION TEST SCRIPTS
Processing Script for Experiment CT1 – Clean Channel conditions
Processing Script for Experiment CT2 – Impaired Channel conditions
Processing Script for Experiment CT3 – Background Noise Conditions
Processing Script for Experiment CT4 – Bad Rate Handling
Processing Script for Experiment CT5 – Music Signals
21
21
21
21
21
21
21
© 2004, Third Generation Partnership Project 2. All Rights Reserved. Permission is granted for copying, reproducing, or
duplicating this document only for the legitimate purposes of 3GPP2. No other copying, reproduction, duplication, or distribution
is permitted.
2
Kansas City, MO
April 24-28, 2006
C11-20060424-015
1. Introduction
The Voice Services Sub-Working Group of 3GPP2, TSG-C developed a Characterization Test Plan
[1] describing a series of experiments designed to characterize the performance of Release B of the
cdma2000 Enhanced Variable-Rate Speech Codec (EVRC). The codec, designated EVRC-B, has
been designed to operate at various operating points where each operating point corresponds to a
different Average Data Rate (ADR). Subjective quality targets were set for each of three operating
points of EVRC-B relative to the quality of standardized speech codecs, EVRC-A [2] and G.723
[3]. The subjective tests were designed to test whether the performance of the various operating
points of EVRC-B met those quality targets. Statistical tests of the success or failure of meeting
these targets are known as Terms of Reference (ToR) tests. The ToR’s for EVRC-B are:

EVRC-B operating at an ADR of 9.3 kbps (the same ADR as EVRC-A) — the quality
target or ToR is to be statistically “better than” (BT) the quality of EVRC-A.

EVRC-B operating at an ADR of 6.6 kbps — the ToR is to be “better than or equal to”
EVRC-A. That ToR is equivalent to a test that EVRC-B at 6.6 kbps is “not worse than”
(NWT) EVRC-A.

EVRC-B operating at an ADR of 5.8 kbps — the ToR is NWT G.723 operating at 6.3 kbps.

The ToR for EVRC-B half-rate-max (HRM) is BT EVRC-A half-rate max.
Table 1 summarizes the ToR’s and the test and reference codecs involved in the Characterization
Test.
Table 1. Summary of Terms of Reference Tests for the EVRC-B Characterization Test
Ref.
Ref.
Ref.
Test 1
Test 2
Test 3
Test 4
Codec
EVRC-A
G-723
EVRC-A, HRM
EVRC-B
EVRC-B
EVRC-B
EVRC-B, HRM
ADR
9.3 kbps
6.3 kbps
4.8 kbps
9.3 kbps
6.6 kbps
5.8 kbps
4.8 kbps
Term of Reference
------BT
EVRC-A
NWT EVRC-A
NWT G.723 at 6.3 kpbs
BT
EVRC-A, HRM
Table 2 shows a summary of the objective and subjective tests involved in the Characterization
Test (CT). Objective test CT0 was designed to verify the ADR of the codec under various test
conditions. Subjective experiments CT1-CT5 were designed to characterize the performance of the
EVRC-B floating-point executable and to test the ToR’s.
Dynastat contracted with Qualcomm to perform the functions of both the Host Laboratory and the
Listening Laboratory for the Characterization Test for EVRC-B. This document reports the results
of those tests.
© 2004, Third Generation Partnership Project 2. All Rights Reserved. Permission is granted for copying, reproducing, or duplicating
this document only for the legitimate purposes of 3GPP2. No other copying, reproduction, duplication, or distribution is permitted.
3
Kansas City, MO
April 24-28, 2006
C11-20060424-015
Table 2. Summary of Subjective Tests involved in the EVRC-B Characterization Test
CT0
CT1
CT2
CT3
CT4
CT5
Average Data Rate (ADR) Measurement
Clean Channel Conditions
Channel Error Conditions
Noise/DTX Operation
Bad Rate Handling
Music Handling
Objective Measure
Naïve Listener MOS (P.800)
Naïve Listener MOS (P.800)
Naïve Listener P-NSA (P.835)
Expert Listening
Expert Listener MOS
2. Host Laboratory Activities
Section 2 of the CT test plan described the functions of the Host Laboratory and the processing of
the speech materials for the subjective experiments. Dynastat complied fully with the test plan
specifications for conducting the Host Laboratory processing.
2.1. Delivery of executables
Qualcomm delivered the required executable files for processing the test conditions. These files
included the floating-point encoder and decoder for both the EVRC-B codec and the latest
published version for the EVRC-A codec. Qualcomm also delivered an updated version of the
software tool (fersig27) required for outputting ADR measures for cdma2000 codecs. All other
software tools involved in the processing were a part of the ITU-T Software Tool Library [4].
2.2. Processing of the speech materials
Section 2 of the CT test plan specified the Host Lab processing of the speech and music materials
for the subjective tests. The source speech materials were specified to be the source speech files
used in the 3GPP2 standardization of the SMV speech codec [5]. The test plan also contained
sample scripts for processing each condition involved in the tests. Dynastat developed Windows
Cygwin scripts for processing the input speech and music signals for the test conditions in
compliance with the sample scripts and specifications listed in the CT test plan. The processing
scripts for the subjective tests are contained in the Appendix.
2.3. Crosschecking of the Processed Speech Materials
Dynastat coordinated with Qualcomm to perform a crosscheck of the processed materials for the
subjective experiments to insure that correct processing was being performed by the Host Lab.
Dynastat provided a short version of the speech database to Qualcomm for crosschecking purposes.
Dynastat processed the short sample through each condition and provided the processed files to
Qualcomm to crosscheck against files generated independently by their own scripts. Bit-exact
comparisons were performed on the two sets of files and in every case discrepancies were resolved
to the satisfaction of both parties.
© 2004, Third Generation Partnership Project 2. All Rights Reserved. Permission is granted for copying, reproducing, or duplicating
this document only for the legitimate purposes of 3GPP2. No other copying, reproduction, duplication, or distribution is permitted.
4
Kansas City, MO
April 24-28, 2006
C11-20060424-015
2.4. CT0 – Average Data Rate Measurement
The test plan specified an Average Data Rate test for the EVRC-B codec. The test involved
processing each of six speech source files through eight operating points and recording the ADR
values output by the fersig27 software tool. Table 3 shows the ADR values for the CT0 test.
Table 3. Exp. CT0 - Average Data Rate Values for EVRC-B
ADR (kbps)
9.3
8.4
7.8
7.4
7.0
6.6
6.2
5.8
Nom
9.3422
8.4688
7.8782
7.2865
6.9622
6.4230
6.1238
5.8240
Low
9.2737
8.4692
7.9303
7.3169
7.0094
6.4402
6.1395
5.8065
High
9.3593
8.4571
7.8790
7.2685
6.9390
6.4222
6.1133
5.8119
Car15
9.3021
8.4686
7.9237
7.2419
6.9172
6.5579
6.2685
6.0765
Street15
9.2114
8.4925
8.0107
7.2761
7.0022
6.6145
6.3090
6.1295
OffBab20
9.1745
8.4545
7.9841
7.2748
6.9826
6.5431
6.2438
6.0212
3. Listening Laboratory Activities for the Characterization Test
The test plan described the functions of the Listening Laboratory and the subjective tests to be
conducted for the CT. Dynastat complied fully with the test plan specifications for conducting the
Listening Lab activities. There were no deviations from the test plan in the conduct of the
subjective tests. Furthermore, all subjective tests were conducted according to guidelines contained
in the appropriate ITU-T Recommendations for conducting subjective tests. The test plan described
five subjective tests. Three of those tests involved verification of requirements using naïve
listeners, i.e., formal listening tests. Two tests involved the use of expert listeners to verify codec
requirements.
3.1. Listening Instrument
This section describes the listening instrument for all experiments conducted in the CT.
The speech files were played through a Townshend DAT-LINK+ and recorded on a Panasonic
SV3800 Digital Audio Tape (DAT) recorder in the appropriate randomized presentation order. The
speech materials were presented to the panels of listeners seated at separate, visually-screened
listening stations contained within a Tracoustics soundproof room with an overall ambient noise
level of less than 29dBA. The speech materials were presented monaurally over Sennheiser HD-25
supra-aural/closed-back headsets. The other ear was uncovered and Hoth noise was presented in the
listening room to provide an ambient noise level of 30dBA. A Panasonic SV3800 DAT player was
used to play the materials. The audio output of the DAT player was channeled to an audio
distribution amplifier set to deliver narrowband speech to the listeners at an active level of 79dB
SPL at the ear. Calibration was accomplished using a B&K 4153 Artificial Ear with circumaural
headphone adaptor, 4134 Microphone element, 2669 Microphone Preamplifier, and 2609
Measurement Amplifier. Each listening station in the sound room was equipped with a personal
computer system for presentation of the appropriate rating scales and collection of the listeners’
ratings.
© 2004, Third Generation Partnership Project 2. All Rights Reserved. Permission is granted for copying, reproducing, or duplicating
this document only for the legitimate purposes of 3GPP2. No other copying, reproduction, duplication, or distribution is permitted.
5
Kansas City, MO
April 24-28, 2006
C11-20060424-015
3.2. Formal Listening Tests
The test plan described three formal listening tests to characterize the performance of EVRC-B.
Each experiment included test conditions designed to evaluate the codec performance requirements
and objectives for EVRC-B and test the appropriate ToR’s. The organization of the formal
subjective experiments conformed to the test plans of previous 3GPP2 codec standardization
exercises, i.e., experiments designed to test the codec in Clean Channel conditions, in Impaired
Channel conditions, and in Background Noise.
3.2.1. Experiment CT1 - Clean Channel Conditions
The test plan specified the design parameters for the subjective experiments. Exp. CT1 was
specified as using the Absolute Category Rating (ACR) methodology described in ITU-T Rec.
P.800 [6]. The ACR yields the Mean Opinion Score (MOS) as an estimate of overall speech
quality. The experimental design involved 32 test conditions, eight talkers (four males, four
females), and eight speech samples (i.e., sentence-pairs) per talker. The processed speech materials
were presented to eight panels of four listeners (32 listeners total) in a partially-balanced,
randomized blocks experimental design. Each listening panel heard a randomized presentation
order of 32 conditions x 8 talkers. Table 4 shows summary results for Exp. CT1. For each test
condition involved in the experiment, the table shows a condition description followed by MOS
means and standard deviations for Male Talkers (n=4), Female Talkers (n=4), and All Talkers
(n=8). Also included in the table is the 95% Confidence Interval for the All Talker average. Each
value in the table is based on the ratings of 32 listeners.
Table 4. Summary Results for MOS Experiment CT1 – Clean Channel Conditions
#
File
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
1NA0
1NG7
1NB0
1NB1
1NB2
1LA0
1LG7
1LB0
1LB1
1LB2
1HA0
1HG7
1HB0
1HB1
1HB2
1TA0
1TG7
1TB0
1TB1
1TB2
1AAH
1ABH
1AB0
1AB1
1AB2
1QSR
1Q03
1Q10
1Q17
1Q24
1Q31
1Q38
Test Condition
Codec
Nominal input level
Nominal input level
Nominal input level
Nominal input level
Nominal input level
Low input level
Low input level
Low input level
Low input level
Low input level
High input level
High input level
High input level
High input level
High input level
Tandem With EVRC-A
Tandem With EVRC-A
Tandem With EVRC-A
Tandem With EVRC-A
Tandem With EVRC-A
Additional Conditions
Additional Conditions
Additional Conditions
Additional Conditions
Additional Conditions
Direct
3 dB MNRU
10 dB MNRU
17 dB MNRU
24 dB MNRU
31 dB MNRU
38 dB MNRU
EVRC-A at 9.3 kbps
G.723 at 6.3 kbps
EVRC-B at 9.3 kbps
EVRC-B at 6.6 kbps
EVRC-B at 5.8 kbps
EVRC-A at 9.3 kbps
G.723 at 6.3 kbps
EVRC-B at 9.3 kbps
EVRC-B at 6.6 kbps
EVRC-B at 5.8 kbps
EVRC-A at 9.3 kbps
G.723 at 6.3 kbps
EVRC-B at 9.3 kbps
EVRC-B at 6.6 kbps
EVRC-B at 5.8 kbps
EVRC-A at 9.3 kbps
G.723 at 6.3 kbps
EVRC-B at 9.3 kbps
EVRC-B at 6.6 kbps
EVRC-B at 5.8 kbps
EVRC-A half-rate max
EVRC-B half-rate max
EVRC-A -> EVRC-B at 9.3 kbps
EVRC-A -> EVRC-B at 6.6 kbps
EVRC-A -> EVRC-B at 5.8 kbps
Male Talkers
Mean
Stdev
3.95
0.84
3.61
0.88
3.99
0.87
3.99
0.80
3.84
0.85
3.98
0.88
3.61
0.83
3.91
0.93
3.92
0.83
3.95
0.90
3.87
0.89
3.72
0.90
3.93
0.78
3.77
0.87
3.69
0.91
3.74
0.94
3.74
0.85
3.85
0.90
3.73
0.88
3.79
0.90
2.81
0.94
3.38
0.89
3.73
0.86
3.77
0.87
3.71
0.91
4.02
0.93
1.34
0.75
1.77
0.82
2.73
0.82
3.55
0.89
3.95
0.84
3.90
0.89
Female Talkers
Mean
Stdev
3.87
0.79
3.54
0.90
3.81
0.89
3.69
0.91
3.75
0.89
3.70
0.93
3.54
0.91
3.84
0.75
3.84
0.86
3.78
0.88
3.80
0.85
3.58
0.87
3.91
0.83
3.80
0.83
3.66
0.88
3.67
0.76
3.52
0.82
3.78
0.89
3.56
0.93
3.61
0.82
3.07
0.90
3.39
0.92
3.70
0.90
3.50
0.96
3.54
0.88
4.07
0.82
1.22
0.74
1.70
0.80
2.56
0.88
3.43
0.92
3.87
0.82
3.92
0.87
Mean
3.91
3.57
3.90
3.84
3.80
3.84
3.57
3.87
3.88
3.87
3.83
3.65
3.92
3.78
3.68
3.71
3.63
3.82
3.64
3.70
2.94
3.39
3.72
3.63
3.63
4.05
1.28
1.73
2.64
3.49
3.91
3.91
All Talkers
Stdev LCI-95% UCI-95%
0.81
3.81
4.01
0.89
3.47
3.68
0.89
3.79
4.01
0.87
3.73
3.95
0.87
3.69
3.90
0.91
3.73
3.95
0.87
3.47
3.68
0.85
3.77
3.97
0.85
3.78
3.99
0.89
3.76
3.98
0.87
3.73
3.94
0.88
3.54
3.76
0.80
3.82
4.02
0.85
3.68
3.89
0.89
3.57
3.79
0.86
3.60
3.81
0.84
3.53
3.74
0.89
3.71
3.93
0.91
3.53
3.76
0.86
3.59
3.80
0.92
2.83
3.05
0.90
3.28
3.50
0.88
3.61
3.83
0.93
3.52
3.75
0.89
3.52
3.73
0.87
3.94
4.15
0.74
1.19
1.37
0.81
1.63
1.83
0.85
2.54
2.75
0.90
3.38
3.60
0.83
3.81
4.01
0.88
3.80
4.02
© 2004, Third Generation Partnership Project 2. All Rights Reserved. Permission is granted for copying, reproducing, or duplicating
this document only for the legitimate purposes of 3GPP2. No other copying, reproduction, duplication, or distribution is permitted.
6
Kansas City, MO
April 24-28, 2006
C11-20060424-015
Figure 1 shows the MOS profile for the unprocessed (Direct) and MNRU reference conditions. The
form of the function shown in the figure is typical of those obtained in ACR experiments.
5
MOS
4
3
2
1
3dB
10dB
17dB
24dB
31dB
38dB
Direct
MNRU
Fig.1. MOS Profile for the MNRU Reference Conditions in Exp. CT1
3.2.2. Experiment CT2 – Impaired Channel Conditions (FER)
Exp. CT2 used the same design parameters as Exp. CT1 -- ACR test, 32 conditions, eight talkers,
32 listeners. Table 5 shows summary results for Exp. CT2 and Fig. 2 shows the MOS profile for the
reference conditions involved in the experiment. Again, the form of the function is typical of those
obtained in ACR experiments.
Table 5. Summary Results for MOS Experiment CT2 – Impaired Channel Conditions
#
File
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
20A0
20B0
20B1
20B2
21A0
21B0
21B1
21B2
22A0
22B0
22B1
22B2
25A0
25B0
25B1
25B2
20AH
20BH
21AH
21BH
21BA
21BB
21BC
21BD
21BE
2QSR
2Q03
2Q10
2Q17
2Q24
2Q31
2Q38
Test Condition
Codec
0% FER
0% FER
0% FER
0% FER
1% FER
1% FER
1% FER
1% FER
2% FER + 1% D&B
2% FER + 1% D&B
2% FER + 1% D&B
2% FER + 1% D&B
5% FER
5% FER
5% FER
5% FER
0% FER
0% FER
1% FER
1% FER
1% FER
1% FER
1% FER
1% FER
1% FER
Direct
3 dB MNRU
10 dB MNRU
17 dB MNRU
24 dB MNRU
31 dB MNRU
38 dB MNRU
EVRC-A at 9.3 kbps
EVRC-B at 9.3 kbps
EVRC-B at 6.6 kbps
EVRC-B at 5.8 kbps
EVRC-A at 9.3 kbps
EVRC-B at 9.3 kbps
EVRC-B at 6.6 kbps
EVRC-B at 5.8 kbps
EVRC-A at 9.3 kbps
EVRC-B at 9.3 kbps
EVRC-B at 6.6 kbps
EVRC-B at 5.8 kbps
EVRC-A at 9.3 kbps
EVRC-B at 9.3 kbps
EVRC-B at 6.6 kbps
EVRC-B at 5.8 kbps
EVRC-A half-rate max
EVRC-B half-rate max
EVRC-A half-rate max
EVRC-B half-rate max
EVRC-B at 6.2 kbps
EVRC-B at 7.0 kbps
EVRC-B at 7.4 kbps
EVRC-B at 7.8 kbps
EVRC-B at 8.4 kbps
Male Talkers
Mean
Stdev
3.87
0.78
4.09
0.82
4.02
0.70
3.78
0.81
3.79
0.88
3.92
0.87
3.90
0.73
3.79
0.84
3.54
0.83
3.34
0.88
3.45
0.72
3.38
0.93
3.51
0.87
3.52
0.85
3.48
0.92
3.36
0.85
2.73
0.77
3.48
0.81
2.59
0.80
3.30
0.85
3.82
0.78
3.81
0.86
3.98
0.79
3.91
0.79
3.92
0.76
4.17
0.88
1.20
0.63
1.76
0.68
2.89
0.92
3.70
0.83
4.05
0.91
4.12
0.81
Female Talkers
Mean
Stdev
3.89
0.83
3.88
0.76
3.75
0.83
3.59
0.86
3.66
0.79
3.84
0.83
3.72
0.82
3.59
0.87
3.48
0.86
3.47
0.80
3.42
0.75
3.48
0.79
3.25
0.82
3.22
0.79
3.16
0.84
3.30
0.85
2.95
0.74
3.32
0.76
2.84
0.73
3.20
0.70
3.60
0.85
3.77
0.79
3.73
0.75
3.85
0.79
3.77
0.78
4.02
0.78
1.13
0.48
1.59
0.67
2.61
0.80
3.48
0.80
3.89
0.74
4.02
0.74
Mean
3.88
3.98
3.89
3.68
3.73
3.88
3.81
3.69
3.51
3.41
3.43
3.43
3.38
3.37
3.32
3.33
2.84
3.40
2.71
3.25
3.71
3.79
3.86
3.88
3.84
4.10
1.16
1.68
2.75
3.59
3.97
4.07
All Talkers
Stdev
LCI-95% UCI-95%
0.80
3.78
3.98
0.80
3.89
4.08
0.78
3.79
3.98
0.84
3.58
3.79
0.83
3.62
3.83
0.85
3.78
3.99
0.78
3.71
3.90
0.86
3.58
3.79
0.84
3.41
3.62
0.84
3.30
3.51
0.73
3.34
3.52
0.86
3.32
3.54
0.85
3.27
3.48
0.83
3.27
3.47
0.89
3.21
3.43
0.85
3.22
3.43
0.76
2.75
2.93
0.79
3.30
3.50
0.77
2.62
2.81
0.78
3.15
3.34
0.82
3.61
3.81
0.82
3.69
3.89
0.78
3.76
3.95
0.79
3.79
3.98
0.77
3.75
3.94
0.83
4.00
4.20
0.56
1.10
1.23
0.68
1.59
1.76
0.87
2.64
2.86
0.82
3.49
3.69
0.83
3.87
4.07
0.77
3.98
4.17
© 2004, Third Generation Partnership Project 2. All Rights Reserved. Permission is granted for copying, reproducing, or duplicating
this document only for the legitimate purposes of 3GPP2. No other copying, reproduction, duplication, or distribution is permitted.
7
Kansas City, MO
April 24-28, 2006
C11-20060424-015
5
MOS
4
3
2
1
3dB
10dB
17dB
24dB
31dB
38dB
Direct
MNRU
Fig.2. MOS Profile for the MNRU Reference Conditions in Exp. CT2
3.2.3. Experiment CT3 – Background Noise Conditions
Exp. CT3 was specified as using the P-NSA test methodology described in ITU-T Rec. P.835 [7].
The P-NSA methodology is specifically designed to evaluate the quality of speech in background
noise. It yields a measure of Signal Quality (SIG), a measure of Background Quality (BAK), and a
measure of Overall Quality (OVRL). In general, OVRL scores are highly correlated with MOS but
the OVRL rating scale provides greater sensitivity and precision in test conditions involving
background noise. While the OVRL score is of most interest here, the SIG and BAK scores also
provide valuable diagnostic information.
The experimental design involved 36 test conditions, six talkers (three males, three females), and
eight speech samples (i.e., sentence-triads) per talker. The processed speech materials were
presented to eight panels of four listeners (32 listeners total) in a partially-balanced, randomized
blocks experimental design. Each listening panel heard a randomized presentation order of 36
conditions x 6 talkers. Table 6 shows summary results for Exp. CT3. For each test condition
involved in the experiment the table shows a condition description followed by means and standard
deviations for the SIG, BAK, and OVRL scores. Also included in the table is the 95% Confidence
Interval for the OVRL score. Each value in the table is based on the ratings of 32 listeners.
© 2004, Third Generation Partnership Project 2. All Rights Reserved. Permission is granted for copying, reproducing, or duplicating
this document only for the legitimate purposes of 3GPP2. No other copying, reproduction, duplication, or distribution is permitted.
8
Kansas City, MO
April 24-28, 2006
C11-20060424-015
Table 6. Summary Results for P-NSA Experiment CT3 – Background Noise Conditions
File
3CA0
3CAH
3CB0
3CB1
3CB2
3CBH
3CX0
3CX2
3SA0
3SAH
3SB0
3SB1
3SB2
3SBH
3SX0
3SX2
3OA0
3OAH
3OB0
3OB1
3OB2
3OBH
3OX0
3OX2
3R40
3R41
3R42
3R43
3R44
3R04
3R14
3R24
3R34
3R11
3R22
3R33
#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
Test Condition
Noise Condition
Car Noise at 15dB
Car Noise at 15dB
Car Noise at 15dB
Car Noise at 15dB
Car Noise at 15dB
Car Noise at 15dB
Car Noise at 15dB
Car Noise at 15dB
Street Noise at 15dB + 1%FER
Street Noise at 15dB + 1%FER
Street Noise at 15dB + 1%FER
Street Noise at 15dB + 1%FER
Street Noise at 15dB + 1%FER
Street Noise at 15dB + 1%FER
Street Noise at 15dB + 1%FER
Street Noise at 15dB + 1%FER
OffBab Noise at 15dB
OffBab Noise at 15dB
OffBab Noise at 15dB
OffBab Noise at 15dB
OffBab Noise at 15dB
OffBab Noise at 15dB
OffBab Noise at 15dB
OffBab Noise at 15dB
Car noise, SNR=40dB
Car noise, SNR=40dB
Car noise, SNR=40dB
Car noise, SNR=40dB
Car noise, SNR=40dB
Car noise, SNR=0dB
Car noise, SNR=10dB
Car noise, SNR=20dB
Car noise, SNR=30dB
Car noise, SNR=10dB
Car noise, SNR=20dB
Car noise, SNR=30dB
Codec
EVRC-A at 9.3 kbps
EVRC-A half-rate max
EVRC-B at 9.3 kbps
EVRC-B at 6.6 kbps
EVRC-B at 5.8 kbps
EVRC-B half-rate max
EVRC-B at 9.3 kbps w/DTX
EVRC-B at 5.8 kbps w/DTX
EVRC-A at 9.3 kbps
EVRC-A half-rate max
EVRC-B at 9.3 kbps
EVRC-B at 6.6 kbps
EVRC-B at 5.8 kbps
EVRC-B half-rate max
EVRC-B at 9.3 kbps w/DTX
EVRC-B at 5.8 kbps w/DTX
EVRC-A at 9.3 kbps
EVRC-A half-rate max
EVRC-B at 9.3 kbps
EVRC-B at 6.6 kbps
EVRC-B at 5.8 kbps
EVRC-B half-rate max
EVRC-B at 9.3 kbps w/DTX
EVRC-B at 5.8 kbps w/DTX
MNRU=0dB
MNRU=10dB
MNRU=20dB
MNRU=30dB
MNRU=40dB
MNRU=40dB
MNRU=40dB
MNRU=40dB
MNRU=40dB
MNRU=10dB
MNRU=20dB
MNRU=30dB
Signal
Stdev
Mean
0.87
3.79
0.94
3.08
0.86
3.95
0.86
3.76
0.85
3.80
0.89
3.44
0.81
4.01
0.89
3.78
0.85
3.69
0.91
2.96
0.84
3.83
0.97
3.65
0.92
3.72
0.91
3.33
0.85
3.73
0.96
3.63
0.86
4.14
0.95
3.26
0.84
4.20
0.84
4.07
0.85
4.01
0.90
3.75
0.86
4.16
0.87
3.95
0.67
1.35
0.92
2.33
0.95
3.68
0.80
4.27
0.75
4.36
1.22
2.19
1.21
3.24
1.05
3.81
0.85
4.10
1.13
2.37
1.02
3.60
0.85
4.10
Background
Stdev
Mean
0.77
3.36
0.87
3.18
0.71
3.51
0.79
3.42
0.74
3.45
0.81
3.33
0.73
3.51
0.76
3.51
0.85
3.34
0.88
3.10
0.80
3.41
0.83
3.33
0.80
3.45
0.84
3.23
0.82
3.30
0.82
3.29
0.89
4.09
1.05
3.73
0.98
3.97
0.95
3.95
0.84
4.11
0.99
3.83
0.96
4.05
0.95
4.02
1.32
1.96
1.19
2.76
0.83
3.68
0.80
4.10
0.78
4.19
0.51
1.17
0.86
1.76
0.81
2.55
0.81
3.30
0.81
1.47
0.77
2.41
0.72
3.30
Mean
3.54
2.92
3.61
3.47
3.56
3.20
3.61
3.51
3.40
2.80
3.49
3.44
3.45
3.15
3.41
3.30
4.01
3.21
3.98
3.96
3.86
3.65
4.06
3.91
1.22
2.26
3.60
4.15
4.27
1.25
2.09
2.96
3.63
1.55
2.79
3.63
Overall
Stdev LCI-95% UCI-95%
3.64
3.43
0.73
3.01
2.82
0.68
3.71
3.51
0.73
3.58
3.36
0.78
3.67
3.46
0.73
3.31
3.09
0.79
3.72
3.50
0.76
3.61
3.40
0.76
3.51
3.28
0.80
2.91
2.69
0.78
3.60
3.38
0.77
3.54
3.34
0.72
3.56
3.34
0.80
3.25
3.04
0.74
3.51
3.31
0.70
3.42
3.19
0.82
4.12
3.90
0.81
3.34
3.09
0.86
4.09
3.88
0.76
4.08
3.85
0.80
3.97
3.75
0.79
3.77
3.53
0.83
4.16
3.96
0.72
4.02
3.80
0.77
1.31
1.14
0.58
2.37
2.15
0.78
3.71
3.49
0.78
4.25
4.05
0.70
4.37
4.16
0.72
1.33
1.17
0.57
2.20
1.98
0.78
3.07
2.85
0.78
3.73
3.52
0.76
1.65
1.45
0.71
2.90
2.68
0.77
3.74
3.51
0.78
Figures 3a, 3b, and 3c show the score profiles for SIG, BAK, and OVRL for the three sets of
reference conditions included in the P-NSA experiment. Figure 3a shows results for conditions
where SNR (car noise) is held constant at 40dB and MNRU is varied from 0dB to 40dB. Figure 3b
shows conditions where MNRU is held constant at 40dB and SNR (car noise) is varied from 0 dB
to 40 dB. Figure 3c shows results for conditions where both SNR (car noise) and MNRU are varied
from 10 dB to 40 dB. These reference conditions provide listeners a frame of reference in two
dimensions – signal quality and background quality. The profiles shown in Figs. 3a, 3b, and 3c are
typical of those obtained in P-NSA experiments.
Fig.3b - MNRU=40dB
Fig. 3a - SNR=40dB Car Noise
5
5
SIG
BAK
3
OVRL
2
P-NSA Scores
P-NSA Scores
4
4
SIG
BAK
3
OVRL
2
1
1
0 dB
10 dB
20 dB
30 dB
0 dB
40 dB
10 dB
20 dB
30 dB
40 dB
SNR Car Noise
MNRU
© 2004, Third Generation Partnership Project 2. All Rights Reserved. Permission is granted for copying, reproducing, or duplicating
this document only for the legitimate purposes of 3GPP2. No other copying, reproduction, duplication, or distribution is permitted.
9
Kansas City, MO
April 24-28, 2006
C11-20060424-015
Fig.3c - MNRU = SNR
P-NSA Scores
5
4
SIG
3
BAK
OVRL
2
1
10/10 dB
20/20 dB
30/30 dB
40/40 dB
MNRU/SNR Car Noise
Fig.3. SIG, BAK, and OVRL Score Profiles for the P-NSA Reference Conditions
3.3. Expert Listening Tests
3.3.1. Experiment CT4 – Bad Rate Handling
The test plan specified an experiment to verify the performance of EVRC-B relative to EVRC-A
under conditions of bad-rate handling. Dynastat processed eight sentence-pairs for one male and
one female talker through EVRC-A and three operating points of EVRC-B for the condition of 1%
bad rates. The processed materials were presented to 10 expert listeners who judged them on three
rating scales for Annoyance, Frequency, and Severity of artifacts. The top portion of Table 7 shows
the three rating scales and the bottom half summarizes the results of the expert listeners’ ratings.
Table 7. Results of Exp. CT4 - Expert Listener Ratings of Annoyance, Frequency, and Severity of Artifacts.
Annoyance Scale
Frequency Scale
Severity Scale
5 Not noticeable
5 Not Noticeable
3 Mild
4 Noticeable but not annoying
4 Infrequent
2 Moderate
3 Somewhat annoying
3 Frequent
1 Severe
2 Annoying
2 Very frequent
1 Very annoying
1 Extremely frequent
Codec
Annoyance Rating Frequency / Severity of Artifacts
EVRC-A
3.70
infrequent / mild
EVRC-B at 9.3 kbps
3.10
frequent / mild to moderate
EVRC-B at 6.6 kbps
3.05
frequent / mild to moderate
EVRC-B at 5.8 kbps
2.55
frequent / moderate
3.3.2. Experiment CT5 - Performance with Music Signals
The test plan specified an MOS experiment using expert listeners to characterize the performance
of EVRC-B at 9.3 kbps with music signals. The music signal database included four meaningful
music samples for each of three genres: Classical, Pop, and Rock. Each music sample was
approximately 15 sec. in duration. The processed music materials were presented to four panels of
four expert listeners in a partially balanced, randomized blocks experimental design. Each listener
rated the two processed conditions, EVRC-A and EVRC-B, and four MNRU reference conditions
for each of the three music genres. Table 8 and Fig. 4 shows summary results for Exp. CT5.
© 2004, Third Generation Partnership Project 2. All Rights Reserved. Permission is granted for copying, reproducing, or duplicating
this document only for the legitimate purposes of 3GPP2. No other copying, reproduction, duplication, or distribution is permitted.
10
Kansas City, MO
April 24-28, 2006
C11-20060424-015
Table 8. Results for Exp.CT5 - MOS for Music Signals
Exp.CT5 - Music signals
Condition
File
5MA0
EVRC-A
5MB0
EVRC-B at 9.3 kbps
5M06
MNRU 6dB
5M15
MNRU 15dB
5M24
MNRU 24dB
5MSR
Source
Scores by Genre
Class.
Pop
Rock
1.88
1.75
2.19
2.00
1.75
1.88
1.63
2.06
2.06
3.63
3.56
3.31
4.06
3.75
3.69
4.13
3.44
3.63
MOS
1.94
1.88
1.92
3.50
3.83
3.73
All Genres
Stdev
CI-L
0.73
1.73
0.79
1.65
0.58
1.75
0.65
3.32
0.69
3.64
0.77
3.51
CI-U
2.14
2.10
2.08
3.69
4.03
3.95
5
MOS
4
3
2
1
6dB
15dB
24dB
Src
EVRC-A EVRC-B
Test Condition
Fig.4. MOS for the Test and Reference Conditions in Exp. CT5 – Music Signals
4. Terms of Reference Tests
Table 9 shows the results of the Terms of Reference tests specified in the test plan. The following
bullet points summarize the ToR tests detailed in Table 9.

EVRC-B at 9.3 kbps better than EVRC-A – passed in 1 out of 12 tests

EVRC-B at 6.6 kbps not worse than EVRC-A – passed in all 11 tests

EVRC-B at 5.8 kbps not worse than G.723 at 6.3 kbps – passed in all 4 tests

EVRC-B, half rate max better than EVRC-A, half rate max – passed in all 6 tests
© 2004, Third Generation Partnership Project 2. All Rights Reserved. Permission is granted for copying, reproducing, or duplicating
this document only for the legitimate purposes of 3GPP2. No other copying, reproduction, duplication, or distribution is permitted.
11
Kansas City, MO
April 24-28, 2006
C11-20060424-015
Table 9. Details of Terms of Reference Tests for the EVRC-B Characterization Test
Terms of Reference -- Is EVRC-B at 9.3 kbps Better Than EVRC-A
Reference Codec
File
Mean
SE
Exp.
Cond.
CT1
CT1
CT1
CT1
Nominal level
Low level
High level
Tandem
CT2
CT2
File
Test Codec
Mean
SE
Diff.
t-test
SEMD
t
Pass
1NA0
1LA0
1HA0
3.91
3.84
3.83
0.81
0.91
0.87
1NB0
1LB0
1HB0
3.90
3.87
3.92
0.89
0.85
0.80
0.00
-0.03
-0.09
0.08
0.08
0.07
0.05
-0.40
-1.17
No
No
No
0%FER
1%FER
1TA0
20A0
21A0
3.71
3.88
3.73
0.86
0.80
0.83
1TB0
20B0
21B0
3.82
3.98
3.88
0.89
0.80
0.85
-0.11
-0.11
-0.16
0.08
0.07
0.07
-1.42
-1.49
-2.10
No
No
Yes
CT2
CT2
CT3
2%FER
5%FER
Car noise
22A0
25A0
3CA0
3.51
3.33
3.54
0.84
0.85
0.73
22B0
25B0
3CB0
3.41
3.37
3.61
0.84
0.83
0.73
0.11
-0.04
-0.07
0.07
0.07
0.07
1.41
-0.58
-0.98
No
No
No
CT3
CT3
CT5
Street noise
Off/Bab noise
Music signals
3SA0
3OA0
5MA0
3.40
4.01
1.94
0.80
0.81
0.73
3SB0
3OB0
5MB0
3.49
3.98
1.88
0.77
0.76
0.79
-0.09
0.03
0.06
0.08
0.08
0.15
-1.17
0.32
0.41
No
No
No
t
Pass
Terms of Reference -- Is EVRC-B at 6.6 kbps Not Worse Than EVRC-A
Reference Codec
Test Codec
t-test
SEMD
File
Mean
SE
File
Mean
SE
Diff.
Exp.
Cond.
CT1
CT1
CT1
CT1
CT2
Nominal level
Low level
High level
Tandem
0%FER
1NA0
3.91
0.81
1NB1
3.84
0.87
0.07
0.07
0.90
Yes
1LA0
1HA0
3.84
3.83
0.91
0.87
1LB1
1HB1
3.88
3.78
0.85
0.85
-0.04
0.05
0.08
0.08
-0.55
0.67
Yes
Yes
1TA0
20A0
3.71
3.88
0.86
0.80
1TB1
20B1
3.64
3.89
0.91
0.78
0.06
-0.01
0.08
0.07
0.80
-0.11
Yes
Yes
CT2
CT2
1%FER
2%FER
21A0
22A0
3.73
3.51
0.83
0.84
21B1
22B1
3.81
3.43
0.78
0.73
-0.08
0.08
0.07
0.07
-1.15
1.12
Yes
Yes
CT2
CT3
CT3
CT3
5%FER
Car noise
Street noise
Off/Bab noise
25A0
3CA0
3SA0
3OA0
3.33
3.54
3.40
4.01
0.85
0.73
0.80
0.81
25B1
3CB1
3SB1
3OB1
3.32
3.47
3.44
3.96
0.89
0.78
0.72
0.80
0.01
0.06
-0.04
0.05
0.08
0.08
0.08
0.08
0.15
0.80
-0.54
0.56
Yes
Yes
Yes
Yes
t
-2.87
-3.75
-0.35
-0.88
Pass
Yes
Yes
Yes
Yes
Terms of Reference -- Is EVRC-B Half Rate Max Better Than EVRC-A Half Rate Max
Reference Codec
Test Codec
t-test
Cond.
SEMD
File
Mean
SE
File
Mean
SE
Diff.
t
Nominal level
1AAH
2.94
0.92
1ABH
3.39
0.92
-0.45
0.08
-5.45
0%FER
20AH
2.84
0.76
20BH
3.40
0.79
-0.56
0.07
-8.14
1%FER
21AH
2.71
0.77
21BH
3.25
0.78
-0.53
0.07
-7.76
Car noise
3CAH
2.92
0.68
3CBH
3.20
0.79
-0.29
0.08
-3.80
Street noise
3SAH
2.80
0.78
3SBH
3.15
0.74
-0.35
0.08
-4.52
Off/Bab noise
3OAH
3.21
0.86
3OBH
3.65
0.83
-0.44
0.09
-5.08
Pass
Yes
Yes
Yes
Yes
Yes
Yes
Exp.
CT1
CT1
CT1
CT1
Exp.
CT1
CT2
CT2
CT3
CT3
CT3
Terms of Reference -- Is EVRC-B at 5.8 kbps Not Worse Than G.723 at 6.3 kbps
Reference Codec
Test Codec
t-test
Cond.
SEMD
File
Mean
SE
File
Mean
SE
Diff.
1NG7
3.57
0.89
1NB2
3.80
0.87
-0.22
0.08
Nominal level
1LG7
3.57
0.87
1LB2
3.87
0.89
-0.29
0.08
Low level
1HG7
3.65
0.88
1HB2
3.68
0.89
-0.03
0.08
High level
1TG7
3.63
0.84
1TB2
3.70
0.86
-0.07
0.08
Tandem
© 2004, Third Generation Partnership Project 2. All Rights Reserved. Permission is granted for copying, reproducing, or duplicating
this document only for the legitimate purposes of 3GPP2. No other copying, reproduction, duplication, or distribution is permitted.
12
Kansas City, MO
April 24-28, 2006
C11-20060424-015
5. Unity Gain Requirement
Section 2.7 of the test plan specified that the EVRC-B codec must not show more than  0.5 dB
deviation between input and output active speech levels as measured by the ITU-T software tool actlev [4].
Dynastat measured the active speech level of the EVRC-A and EVRC-B codecs under clean, Nominal input
level conditions and compared the values to the active speech level of the input source file. Table 10 shows
the results of those comparisons. The deviation in active speech level is within 0.5 dB for EVRC-A and for
EVRC-B at all three bit-rates.
Table 10. Results of Test of Unity Gain Requirement.
Source
EVRC-A
-25.711
deviation:
-26.145
0.434
EVRC-B
6.6 kbps
-26.190
0.479
9.3 kbps
-26.107
0.396
5.8 kbps
-26.197
0.486
6. Additional Analyses
During discussions with Qualcomm, the Listening Laboratory proposed to conduct additional tests
and analyses to further describe the MOS voting data. This section describes those analyses.
6.1. Performance Across Bit-rates
Exp. CT2 included eight test conditions designed to evaluate the performance of EVRC-B across a
wide range of ADR. Figure 5 shows the results (MOS and 95% CI) for EVRC-B over the eight
values of ADR under 1% FER conditions. The figure indicates a slight linear increase in
performance with increase in bit-rate. An Analysis of Variance (ANOVA) for Linear Trend was
conducted on these MOS results. The ANOVA showed that the Linear Trend was not significant (F
= 2.89, df = 1,1785, p = 0.089). In fact, the ANOVA showed that there was no significant
difference in MOS performance across the eight bit-rates (F = 1.94, df = 7,1785, p = 0.060).
4.0
3.9
MOS
3.8
3.7
3.6
3.5
5.80
6.20
6.60
7.00
7.40
7.80
8.40
9.30
ADR (kbps)
Fig.5. MOS for EVRC-B Across ADR (1% FER)
© 2004, Third Generation Partnership Project 2. All Rights Reserved. Permission is granted for copying, reproducing, or duplicating
this document only for the legitimate purposes of 3GPP2. No other copying, reproduction, duplication, or distribution is permitted.
13
Kansas City, MO
April 24-28, 2006
C11-20060424-015
6.2. Dunnett’s Tests
Dynastat proposed to perform a series of tests using a statistical method, Dunnetts Test, that is more
appropriate for the data and the experimental design of the subjective experiments. Dunnetts test is
a special case of the more general Post Hoc Multiple Means Test, where multiple treatment means
are statistically compared to a common control mean. In the case of the MOS experiments, the
treatments are the operating points of EVRC-B and the control is EVRC-A under the same test
condition.
The first stage in a Dunnetts Test is to run an ANOVA for the effects of Codecs x Subjects, where
Codecs (n = 4) includes EVRC-A plus the three operating points of EVRC-B (9.3, 6.6, and 5.8
kbps) and Subjects (n=32) are the votes for each listener averaged over talkers — 8 talkers for
Exps. CT1 and CT2 and 6 talkers for Exp. CT3. If the F-ratio for the Codecs effect is significant
(i.e., p < .05) then there is significant variation among the scores for the four codecs and the test
proceeds to the second stage. If the F-ratio is not significant there is no significant variation among
the codecs (i.e., the scores for all four codecs are statistically equivalent) and the Dunnetts test is
complete. In the second stage of Dunnetts Test the treatment means are compared statistically to the
control mean and the mean differences are evaluated for significance.
Separate Dunnetts Tests were computed for each test condition involved in Experiments CT1, CT2
and CT3. Table 11 shows the results of the Dunnetts Tests for Exp. CT1. The table is separated into
five sections, one for each test condition involved in Exp. CT1. For all of the test conditions except
High Input Level, the F-ratio is not significant and therefore all four codecs have equivalent MOS
values. For the High Input Level condition, however, the F-ratio is significant (F = 6.98, p<.05) and
the subsequent Dunnetts comparisons show that EVRC-B at 5.8 kbps (MOS=3.68) is significantly
worse than EVRC-A (MOS=3.83).
© 2004, Third Generation Partnership Project 2. All Rights Reserved. Permission is granted for copying, reproducing, or duplicating
this document only for the legitimate purposes of 3GPP2. No other copying, reproduction, duplication, or distribution is permitted.
14
Kansas City, MO
April 24-28, 2006
C11-20060424-015
Table 11. Results of Dunnetts Tests of MOS Values for Exp. CT1
Nominal Input Level
Codec
EVRC-A (control)
EVRC-B at 9.3 kbps
EVRC-B at 6.6 kbps
EVRC-B at 5.8 kbps
File
F = 1.90
MOS
not signif.
Diff
Dunnetts
Result
1NA0
1NB0
1NB1
1NB2
3.91
3.90
3.84
3.80
0.00
0.07
0.11
0.07
1.23
2.02
Tst = Ref
Tst = Ref
Tst = Ref
File
F = 0.17
MOS
not signif.
Diff
Dunnetts
Result
1LA0
1LB0
1LB1
1LB2
3.84
3.87
3.88
3.87
-0.03
-0.04
-0.03
-0.50
-0.69
-0.44
Tst = Ref
Tst = Ref
Tst = Ref
File
F = 6.98
MOS
signif.
Diff
p < .05
Dunnetts
Result
1HA0
1HB0
1HB1
1HB2
3.83
3.92
3.78
3.68
-0.09
0.05
0.16
-1.59
0.94
2.88
Tst = Ref
Tst = Ref
Tst < Ref
File
F = 2.66
MOS
not signif.
Diff
Dunnetts
Result
1TA0
1TB0
1TB1
1TB2
3.71
3.82
3.64
3.70
-0.11
0.06
0.01
-1.76
1.00
0.13
Tst = Ref
Tst = Ref
Tst = Ref
File
F = 1.84
MOS
not signif.
Diff
Dunnetts
Result
1TA0
1AB0
1AB1
1AB2
3.71
3.72
3.63
3.63
-0.01
0.07
0.08
-0.23
1.45
1.60
Tst = Ref
Tst = Ref
Tst = Ref
Low Input Level
Codec
EVRC-A (control)
EVRC-B at 9.3 kbps
EVRC-B at 6.6 kbps
EVRC-B at 5.8 kbps
High Input Level
Codec
EVRC-A (control)
EVRC-B at 9.3 kbps
EVRC-B at 6.6 kbps
EVRC-B at 5.8 kbps
Tandem: Codec => EVRC-A
Codec
EVRC-A (control)
EVRC-B at 9.3 kbps
EVRC-B at 6.6 kbps
EVRC-B at 5.8 kbps
Tandem: EVRC-A => Codec
Codec
EVRC-A (control)
EVRC-B at 9.3 kbps
EVRC-B at 6.6 kbps
EVRC-B at 5.8 kbps
Table 12 shows the results for the Dunnetts Tests for the conditions in Exp.CT2. Two of the
conditions in Exp.CT2 show significant F-ratios, 0% FER and 1% FER. For the 0% FER condition,
EVRC-B at 5.8 kbps is significantly worse than EVRC-A. However, this is exactly the same
condition as the Nominal Input Level condition in Exp.CT1 where there was no significant
difference. The results of the two experiments contradict each other and indicate that the difference
is on the edge of being significant. In the test for the 1% FER condition, EVRC-B at 9.3 kbps is
actually shown to be significantly higher than the control, EVRC-A.
© 2004, Third Generation Partnership Project 2. All Rights Reserved. Permission is granted for copying, reproducing, or duplicating
this document only for the legitimate purposes of 3GPP2. No other copying, reproduction, duplication, or distribution is permitted.
15
Kansas City, MO
April 24-28, 2006
C11-20060424-015
Table 12. Results of Dunnetts Tests of MOS Values for Exp. CT2
0% FER
File
F = 9.52
MOS
signif.
Diff
p < .05
Dunnetts
Result
EVRC-A (control)
EVRC-B at 9.3 kbps
EVRC-B at 6.6 kbps
EVRC-B at 5.8 kbps
20A0
20B0
20B1
20B2
3.88
3.98
3.89
3.68
-0.11
-0.01
0.20
-1.83
-0.14
3.39
Tst = Ref
Tst = Ref
Tst < Ref
1% FER
Codec
EVRC-A (control)
EVRC-B at 9.3 kbps
EVRC-B at 6.6 kbps
EVRC-B at 5.8 kbps
File
21A0
21B0
21B1
21B2
F = 4.14
MOS
3.73
3.88
3.81
3.69
signif.
Diff
-0.16
-0.08
0.04
p < .05
Dunnetts
-2.57
-1.35
0.64
Result
Tst > Ref
Tst = Ref
Tst = Ref
2% FER + 2% D & B
Codec
EVRC-A (control)
EVRC-B at 9.3 kbps
EVRC-B at 6.6 kbps
EVRC-B at 5.8 kbps
File
22A0
22B0
22B1
22B2
F = 0.89
MOS
3.51
3.41
3.43
3.43
not signif.
Diff
0.11
0.08
0.08
Dunnetts
1.53
1.13
1.19
Result
Tst = Ref
Tst = Ref
Tst = Ref
5% FER
Codec
EVRC-A (control)
EVRC-B at 9.3 kbps
EVRC-B at 6.6 kbps
EVRC-B at 5.8 kbps
File
25A0
25B0
25B1
25B2
F = 0.44
MOS
3.38
3.37
3.32
3.33
not signif.
Diff
0.01
0.06
0.05
Dunnetts
0.12
0.94
0.77
Result
Tst = Ref
Tst = Ref
Tst = Ref
Codec
Table 13 shows the results for the Dunnetts Tests for the conditions in Exp.CT3. The Dunnetts
Tests for Exp.CT3 include two additional codec-conditions, EVRC-B at 9.3 and 5.8 kbps with
DTX. This results in a Dunnetts Test with five treatment conditions and one control.
Two of the test conditions in Exp.CT3 show significant F-ratios, Street Noise and Off/BabNoise.
However, for the Street Noise condition none of the codecs are significantly different form EVRCA. For the Off/Bab Noise condition EVRC-B at 5.8 kbps is significantly worse than EVRC-A.
Based on the results of the Dunnetts Tests for Experiments CT1, CT2, and CT3, it is concluded that
EVRC-B at 9.3 kbps and 6.6 kbps are equivalent to EVRC-A on all test conditions. In addition,
EVRC-B at 5.8 kbps is equivalent to EVRC-A on most of the conditions involved in the three
experiments (9 out of 12 test conditions).
© 2004, Third Generation Partnership Project 2. All Rights Reserved. Permission is granted for copying, reproducing, or duplicating
this document only for the legitimate purposes of 3GPP2. No other copying, reproduction, duplication, or distribution is permitted.
16
Kansas City, MO
April 24-28, 2006
C11-20060424-015
Table 13. Results of Dunnetts Tests of OVRL Values for Exp. CT3
Car Noise at 15dB SNR
Codec
EVRC-A (control)
EVRC-B at 9.3 kbps
EVRC-B at 6.6 kbps
EVRC-B at 5.8 kbps
EVRC-B at 9.3 kbps w/DTX
EVRC-B at 5.8 kbps w/DTX
File
3CA0
3CB0
3CB1
3CB2
3CX0
3CX2
F = 1.72
MOS
3.54
3.61
3.47
3.56
3.61
3.51
not signif.
Diff
-0.07
0.06
-0.03
-0.07
0.03
Dunnett
-1.22
1.04
-0.45
-1.22
0.52
Result
Tst = Ref
Tst = Ref
Tst = Ref
Tst = Ref
Tst = Ref
Street Noise at 15dB SNR
Codec
EVRC-A (control)
EVRC-B at 9.3 kbps
EVRC-B at 6.6 kbps
EVRC-B at 5.8 kbps
EVRC-B at 9.3 kbps w/DTX
EVRC-B at 5.8 kbps w/DTX
File
3SA0
3SB0
3SB1
3SB2
3SX0
3SX2
F = 2.56
MOS
3.40
3.49
3.44
3.45
3.41
3.30
signif.
Diff
-0.09
-0.04
-0.05
-0.01
0.09
p < .05
Dunnett
-1.66
-0.74
-0.92
-0.18
1.66
Result
Tst = Ref
Tst = Ref
Tst = Ref
Tst = Ref
Tst = Ref
Off/Bab Noise at 20dB SNR
Codec
EVRC-A (control)
EVRC-B at 9.3 kbps
EVRC-B at 6.6 kbps
EVRC-B at 5.8 kbps
EVRC-B at 9.3 kbps w/DTX
EVRC-B at 5.8 kbps w/DTX
File
3OA0
3OB0
3OB1
3OB2
3OX0
3OX2
F = 2.68
MOS
4.01
3.98
3.96
3.86
4.06
3.91
signif.
Diff
0.03
0.05
0.15
-0.05
0.10
p < .05
Dunnett
0.42
0.74
2.45
-0.76
1.68
Result
Tst = Ref
Tst = Ref
Tst < Ref
Tst = Ref
Tst = Ref
6.3. Global Analyses
All of the statistical tests described in the previous sections involved tests of the ToR’s for EVRCB for specific test conditions within experiments – i.e., local analyses. This section describes
statistical analyses and tests comparing the results of EVRC-A and EVRC-B across conditions
within an experiment and across experiments, i.e., global analyses.
6.3.1. ANOVA across Conditions within Experiments
For each of the three formal MOS tests ANOVA’s were conducted comparing the results of EVRCA and EVRC-B across three relevant test conditions involved in the experiment. Separate
ANOVA’s were conducted for each of three ADR’s for EVRC-B — 9.3, 6.6, and 5.8 kbps.
6.3.1.1. ANOVA for Exp. CT1 – Input Level
For Exp. CT1, results were analyzed for the three input level conditions. Each analysis was a threeway ANOVA for factors Codec (EVRC-A vs. EVRC-B), Conditions (Nominal, Low, and High
input level), and Subjects (n=32). The effects of interest here are the F-ratios for the Codecs main
effect and the Codecs x Conditions interaction effect. Table 14 shows the results of the three Global
ANOVA’s for Exp.CT1. Each row in the table shows the results of the ANOVA for a particular
ADR for EVRC-B. The left side of the table shows the obtained F-ratios for the ANOVA. For the
Codecs main effect, an F-ratio greater than 4.16 indicates a significant difference between the MOS
© 2004, Third Generation Partnership Project 2. All Rights Reserved. Permission is granted for copying, reproducing, or duplicating
this document only for the legitimate purposes of 3GPP2. No other copying, reproduction, duplication, or distribution is permitted.
17
Kansas City, MO
April 24-28, 2006
C11-20060424-015
values for the two codecs, EVRC-A and EVRC-B, over the three test conditions. For the Codecs x
Conditions interaction effect, an F-ratio greater than 3.15 indicates a significant difference between
the patterns of MOS scores for the two codecs over the three test conditions.
There was only one significant F-ratio obtained for Exp. CT1 — the Codecs effect for the
comparison EVRC-A vs. EVRC-B at 5.8 kbps. An examination of the mean values (shown on the
right side of the table) indicates that EVRC-A scored significantly higher (MOS = 3.859) than
EVRC-B at 5.8 kbps (MOS = 3.780). For Exp. CT1 there were no significant Codecs x Conditions
interactions.
Table 14. Results of ANOVA for Exp. CT1
Exp. CT1
Comparison
A vs. B at 9.3
A vs. B at 6.6
A vs. B at 5.8
Codecs
2.15
1.12
4.57
F-ratios
Cod. x Cond.
0.79
1.13
2.44
Exp. CT1
Comparison
A vs. B at 9.3
A vs. B at 6.6
A vs. B at 5.8
A
3.859
3.859
3.859
MOS
B
3.897
3.835
3.780
Diff.
-0.038
0.025
0.079
6.3.1.2. ANOVA for Exp. CT2 - FER
For Exp. CT2, results were analyzed for the three FER test conditions. Each analysis was a threeway ANOVA for factors Codecs (EVRC-A vs. EVRC-B), Conditions (1%, 2%, and 5% FER), and
Subjects (n=32). Table 15 shows the results of the three Global ANOVA’s for Exp.CT2. There was
only one significant F-ratio for Exp. CT2 — the Codecs x Conditions interaction effect for the
comparison EVRC-A vs. EVRC-B at 9.3 kbps. The significant interaction indicates a significance
difference in the patterns of MOS scores for the two codecs across FER conditions. Figure 6
illustrates the significant interaction for EVRC-A vs. EVRC-B at 9.3kbps across the FER
conditions.
Table 15. Results of ANOVA for Exp. CT2
Exp. CT2
Comparison
A vs. B at 9.3
A vs. B at 6.6
A vs. B at 5.8
Codecs
0.11
0.23
1.41
F-ratios
Cod. x Cond.
6.12
1.79
0.14
EVRC-A
Exp. CT2
Comparison
A vs. B at 9.3
A vs. B at 6.6
A vs. B at 5.8
A
3.539
3.539
3.539
MOS
B
3.553
3.520
3.482
Diff.
-0.014
0.020
0.057
EVRC-B0
4.0
MOS
3.5
3.0
2.5
1% FER
2% FER
5% FER
Fig. 6 Significant Interaction of Codecs and FER Conditions for Exp. CT2
© 2004, Third Generation Partnership Project 2. All Rights Reserved. Permission is granted for copying, reproducing, or duplicating
this document only for the legitimate purposes of 3GPP2. No other copying, reproduction, duplication, or distribution is permitted.
18
Kansas City, MO
April 24-28, 2006
C11-20060424-015
6.3.1.3. ANOVA for Exp. CT3 – Background Noise
For Exp. CT3, results were analyzed for the three Background Noise conditions. Each analysis was
a three-way ANOVA for factors Codecs (EVRC-A vs. EVRC-B), Conditions (Car, Street, and
Off/Bab noise), and Subjects (n=32). Table 16 shows the results of the three Global ANOVA’s for
Exp.CT3. There was only one significant F-ratio for Exp. CT3 — the Codecs x Conditions
interaction effect for the comparison EVRC-A vs. EVRC-B at 5.8 kbps. The significant interaction
indicates a significance difference in the patterns of MOS scores for the two codecs across
background noise conditions. Figure 7 illustrates the significant interaction for EVRC-A vs. EVRCB at 5.8kbps across the background noise conditions.
Table 16. Results of ANOVA for Exp. CT3
Exp. CT3
Comparison
A vs. B at 9.3
A vs. B at 6.6
A vs. B at 5.8
Codecs
1.3
0.42
0.32
F-ratios
Cod. x Cond.
1.77
0.85
3.89
Exp. CT3
Comparison
A vs. B at 9.3
A vs. B at 6.6
A vs. B at 5.8
A
3.647
3.647
3.647
OVRL
B
3.694
3.625
3.623
Diff.
-0.047
0.022
0.024
Fig. 7 Significant Interaction of Codecs and Background Noise Conditions for Exp. CT3
6.3.2. ANOVA Across all Experiments — CT1, CT2, CT3
The final set of analyses involved comparing EVRC-A and EVRC-B across all three formal
experiments, CT1, CT2, and CT3. Table 17 shows the results of the Global ANOVA across all
three experiments for each of the three ADR’s of EVRC-B.
Table 17. Results of Global ANOVA Over Exps. CT1, CT2, and CT3
CT1, CT2, CT3
Comparison
A vs. B at 9.3
A vs. B at 6.6
A vs. B at 5.8
Codecs
2.31
1.28
4.66
F-ratios
Cod. x Cond.
0.20
0.01
0.42
CT1, CT2, CT3
Comparison
A vs. B at 9.3
A vs. B at 6.6
A vs. B at 5.8
A
3.682
3.682
3.682
MOS/OVRL
B
Diff.
3.715
-0.033
3.660
0.022
3.628
0.054
© 2004, Third Generation Partnership Project 2. All Rights Reserved. Permission is granted for copying, reproducing, or duplicating
this document only for the legitimate purposes of 3GPP2. No other copying, reproduction, duplication, or distribution is permitted.
19
Kansas City, MO
April 24-28, 2006
C11-20060424-015
The F-ratio for the ANOVA comparing EVRC-A vs. EVRC-B at 5.8 kbps is significant and a
comparison of the means shows that EVRC-A is significantly better than EVRC-B at 5.8kpbs
across the three subjective experiments.
Figure 8 shows average scores over all three formal subjective experiments for EVRC-A and the
three ADR’s of EVRC-B. The figure illustrates the relative performance of EVRC-A and various
ADR’s of EVRC-B across a wide range of test conditions. In general the figure shows that:

EVRC-B at 9.3 kbps performs slightly better than EVRC-A

EVRC-B at 6.6 kbps performs slightly worse than EVRC-A

EVRC-B at 5.8 kbps performs significantly worse than EVRC-A
EVRC-A
EVRC-B at 9.3 kbps
EVRC-B at 6.6 kbps
EVRC-B at 5.8 kbps
MOS/OVRL
4.0
3.8
3.6
3.4
CT1
CT2
CT3
Fig. 8 Performance of EVRC-A and EVRC-B Across All Three Experiments.
7. References
[1]
3GPP2-C11-20051205-004b — EVRC-B characterization test plan and schedule; January,
2006.
[2]
EVRC-A — TIA/EIA-IS-127, Enhanced Variable Rate Codec, Speech Service Option 3 for
Wideband Spread Spectrum Digital Systems, September, 1996.
[3]
ITU-T Rec. G.723 — Dual Rate Speech coder for Multi-Media communication
Transmitting at 5.3 and 6.3 kbit/sec (11/1996).
[4]
ITU-T Rec. G.191 — Software Tool Library (12/2000).
© 2004, Third Generation Partnership Project 2. All Rights Reserved. Permission is granted for copying, reproducing, or duplicating
this document only for the legitimate purposes of 3GPP2. No other copying, reproduction, duplication, or distribution is permitted.
20
Kansas City, MO
April 24-28, 2006
C11-20060424-015
[5]
SMV — C-S0034-0 v1.0, Minimum Performance Specification for the Selectable Mode
Vocoder (SMV), Speech Service Option 56 for Wideband Spread Spectrum Digital
Systems; April, 2004.
[6]
ITU-T Rec. P.800 — Methods for subjective determination of transmission quality
(08/1996).
[7]
ITU-T Rec. P.835 — Subjective test methodology for evaluating speech communication
systems that include noise suppression algorithm (11/2003).
Appendix – Host Lab Processing scripts
Characterization Test Scripts
Processing Script for Experiment CT1 – Clean Channel conditions
Processing Script for Experiment CT2 – Impaired Channel conditions
Processing Script for Experiment CT3 – Background Noise Conditions
Processing Script for Experiment CT4 – Bad Rate Handling
Processing Script for Experiment CT5 – Music Signals
© 2004, Third Generation Partnership Project 2. All Rights Reserved. Permission is granted for copying, reproducing, or duplicating
this document only for the legitimate purposes of 3GPP2. No other copying, reproduction, duplication, or distribution is permitted.
21