Dallas, TX March 27-31, 2006 C11-20060424-015 3GPP2/TSG-C1.1 TITLE: Characterization Final Test Report for EVRC-Release B SOURCE: Alan Sharpley Dynastat 6850 Austin Center Blvd., Suite 150 Austin, TX 78731 Phone: (512) 476-4797 Fax: (512) 472-2883 Email: asharpley@dynastat.com ABSTRACT: This contribution provides a report of the activities of the Host and Listening Laboratories for the Characterization Test for the cdma2000® standardized speech codec EVRC, Release-B. The standardization effort was undertaken by the Voice Services Sub-Working Group, C1.1, of 3GPP2, TSG-C. The contribution presents results of the subjective tests and statistical analyses of the test data. RECOMMENDATION: Review and approve. Dynastat grants a free, irrevocable license to 3GPP2 and its Organizational Partners to incorporate text or other copyrightable material contained in the contribution and any modifications thereof in the creation of 3GPP2 publications; to copyright and sell in Organizational Partner's name any Organizational Partner's standards publication even though it may include all or portions of this contribution; and at the Organizational Partner's sole discretion to permit others to reproduce in whole or in part such contribution or the resulting Organizational Partner's standards publication. Dynastat is also willing to grant licenses under such contributor copyrights to third parties on reasonable, non-discriminatory terms and conditions for purpose of practicing an Organizational Partner’s standard which incorporates this contribution. This document has been prepared by Dynastat to assist the development of specifications by 3GPP2. It is proposed to the Committee as a basis for discussion and is not to be construed as a binding proposal on Dynastat. Dynastat specifically reserves the right to amend or modify the material contained herein and to any intellectual property of Dynastat other than provided in the copyright statement above. Kansas City, MO April 24-28, 2006 C11-20060424-015 CONTENTS 1. INTRODUCTION 3 2. HOST LAB ACTIVITIES 4 2.1. 2.2. 2.3. 2.4. 3. DELIVERY OF EXECUTABLES PROCESSING OF THE SPEECH MATERIALS CROSSCHECKING OF THE PROCESSED SPEECH MATERIALS CT0 – AVERAGE DATA RATE MEASUREMENT LISTENING LAB ACTIVITIES FOR THE CHARACTERIZATION TEST 3.1. 3.2. LISTENING INSTRUMENT FORMAL LISTENING TESTS 3.2.1. 3.2.2. 3.2.3. 3.3. Experiment CT1 - Clean Channel Conditions Experiment CT2 – Impaired Channel Conditions (FER) Experiment CT3 – Background Noise Conditions EXPERT LISTENING TESTS 3.3.1. 3.3.2. Experiment CT4 – Bad Rate Handling Experiment CT5 - Performance with Music Signals 4 4 4 5 5 5 6 6 7 8 10 10 10 4. TERMS OF REFERENCE TESTS 11 5. UNITY GAIN REQUIREMENT 13 6. ADDITIONAL ANALYSES 13 6.1. 6.2. 6.3. PERFORMANCE ACROSS BIT-RATES DUNNETT’S TESTS GLOBAL ANALYSES 6.3.1. 6.3.2. 7. ANOVA across Conditions within Experiments ANOVA Across all Experiments — CT1, CT2, CT3 REFERENCES 13 14 17 17 19 20 APPENDIX – HOST LAB PROCESSING SCRIPTS CHARACTERIZATION TEST SCRIPTS Processing Script for Experiment CT1 – Clean Channel conditions Processing Script for Experiment CT2 – Impaired Channel conditions Processing Script for Experiment CT3 – Background Noise Conditions Processing Script for Experiment CT4 – Bad Rate Handling Processing Script for Experiment CT5 – Music Signals 21 21 21 21 21 21 21 © 2004, Third Generation Partnership Project 2. All Rights Reserved. Permission is granted for copying, reproducing, or duplicating this document only for the legitimate purposes of 3GPP2. No other copying, reproduction, duplication, or distribution is permitted. 2 Kansas City, MO April 24-28, 2006 C11-20060424-015 1. Introduction The Voice Services Sub-Working Group of 3GPP2, TSG-C developed a Characterization Test Plan [1] describing a series of experiments designed to characterize the performance of Release B of the cdma2000 Enhanced Variable-Rate Speech Codec (EVRC). The codec, designated EVRC-B, has been designed to operate at various operating points where each operating point corresponds to a different Average Data Rate (ADR). Subjective quality targets were set for each of three operating points of EVRC-B relative to the quality of standardized speech codecs, EVRC-A [2] and G.723 [3]. The subjective tests were designed to test whether the performance of the various operating points of EVRC-B met those quality targets. Statistical tests of the success or failure of meeting these targets are known as Terms of Reference (ToR) tests. The ToR’s for EVRC-B are: EVRC-B operating at an ADR of 9.3 kbps (the same ADR as EVRC-A) — the quality target or ToR is to be statistically “better than” (BT) the quality of EVRC-A. EVRC-B operating at an ADR of 6.6 kbps — the ToR is to be “better than or equal to” EVRC-A. That ToR is equivalent to a test that EVRC-B at 6.6 kbps is “not worse than” (NWT) EVRC-A. EVRC-B operating at an ADR of 5.8 kbps — the ToR is NWT G.723 operating at 6.3 kbps. The ToR for EVRC-B half-rate-max (HRM) is BT EVRC-A half-rate max. Table 1 summarizes the ToR’s and the test and reference codecs involved in the Characterization Test. Table 1. Summary of Terms of Reference Tests for the EVRC-B Characterization Test Ref. Ref. Ref. Test 1 Test 2 Test 3 Test 4 Codec EVRC-A G-723 EVRC-A, HRM EVRC-B EVRC-B EVRC-B EVRC-B, HRM ADR 9.3 kbps 6.3 kbps 4.8 kbps 9.3 kbps 6.6 kbps 5.8 kbps 4.8 kbps Term of Reference ------BT EVRC-A NWT EVRC-A NWT G.723 at 6.3 kpbs BT EVRC-A, HRM Table 2 shows a summary of the objective and subjective tests involved in the Characterization Test (CT). Objective test CT0 was designed to verify the ADR of the codec under various test conditions. Subjective experiments CT1-CT5 were designed to characterize the performance of the EVRC-B floating-point executable and to test the ToR’s. Dynastat contracted with Qualcomm to perform the functions of both the Host Laboratory and the Listening Laboratory for the Characterization Test for EVRC-B. This document reports the results of those tests. © 2004, Third Generation Partnership Project 2. All Rights Reserved. Permission is granted for copying, reproducing, or duplicating this document only for the legitimate purposes of 3GPP2. No other copying, reproduction, duplication, or distribution is permitted. 3 Kansas City, MO April 24-28, 2006 C11-20060424-015 Table 2. Summary of Subjective Tests involved in the EVRC-B Characterization Test CT0 CT1 CT2 CT3 CT4 CT5 Average Data Rate (ADR) Measurement Clean Channel Conditions Channel Error Conditions Noise/DTX Operation Bad Rate Handling Music Handling Objective Measure Naïve Listener MOS (P.800) Naïve Listener MOS (P.800) Naïve Listener P-NSA (P.835) Expert Listening Expert Listener MOS 2. Host Laboratory Activities Section 2 of the CT test plan described the functions of the Host Laboratory and the processing of the speech materials for the subjective experiments. Dynastat complied fully with the test plan specifications for conducting the Host Laboratory processing. 2.1. Delivery of executables Qualcomm delivered the required executable files for processing the test conditions. These files included the floating-point encoder and decoder for both the EVRC-B codec and the latest published version for the EVRC-A codec. Qualcomm also delivered an updated version of the software tool (fersig27) required for outputting ADR measures for cdma2000 codecs. All other software tools involved in the processing were a part of the ITU-T Software Tool Library [4]. 2.2. Processing of the speech materials Section 2 of the CT test plan specified the Host Lab processing of the speech and music materials for the subjective tests. The source speech materials were specified to be the source speech files used in the 3GPP2 standardization of the SMV speech codec [5]. The test plan also contained sample scripts for processing each condition involved in the tests. Dynastat developed Windows Cygwin scripts for processing the input speech and music signals for the test conditions in compliance with the sample scripts and specifications listed in the CT test plan. The processing scripts for the subjective tests are contained in the Appendix. 2.3. Crosschecking of the Processed Speech Materials Dynastat coordinated with Qualcomm to perform a crosscheck of the processed materials for the subjective experiments to insure that correct processing was being performed by the Host Lab. Dynastat provided a short version of the speech database to Qualcomm for crosschecking purposes. Dynastat processed the short sample through each condition and provided the processed files to Qualcomm to crosscheck against files generated independently by their own scripts. Bit-exact comparisons were performed on the two sets of files and in every case discrepancies were resolved to the satisfaction of both parties. © 2004, Third Generation Partnership Project 2. All Rights Reserved. Permission is granted for copying, reproducing, or duplicating this document only for the legitimate purposes of 3GPP2. No other copying, reproduction, duplication, or distribution is permitted. 4 Kansas City, MO April 24-28, 2006 C11-20060424-015 2.4. CT0 – Average Data Rate Measurement The test plan specified an Average Data Rate test for the EVRC-B codec. The test involved processing each of six speech source files through eight operating points and recording the ADR values output by the fersig27 software tool. Table 3 shows the ADR values for the CT0 test. Table 3. Exp. CT0 - Average Data Rate Values for EVRC-B ADR (kbps) 9.3 8.4 7.8 7.4 7.0 6.6 6.2 5.8 Nom 9.3422 8.4688 7.8782 7.2865 6.9622 6.4230 6.1238 5.8240 Low 9.2737 8.4692 7.9303 7.3169 7.0094 6.4402 6.1395 5.8065 High 9.3593 8.4571 7.8790 7.2685 6.9390 6.4222 6.1133 5.8119 Car15 9.3021 8.4686 7.9237 7.2419 6.9172 6.5579 6.2685 6.0765 Street15 9.2114 8.4925 8.0107 7.2761 7.0022 6.6145 6.3090 6.1295 OffBab20 9.1745 8.4545 7.9841 7.2748 6.9826 6.5431 6.2438 6.0212 3. Listening Laboratory Activities for the Characterization Test The test plan described the functions of the Listening Laboratory and the subjective tests to be conducted for the CT. Dynastat complied fully with the test plan specifications for conducting the Listening Lab activities. There were no deviations from the test plan in the conduct of the subjective tests. Furthermore, all subjective tests were conducted according to guidelines contained in the appropriate ITU-T Recommendations for conducting subjective tests. The test plan described five subjective tests. Three of those tests involved verification of requirements using naïve listeners, i.e., formal listening tests. Two tests involved the use of expert listeners to verify codec requirements. 3.1. Listening Instrument This section describes the listening instrument for all experiments conducted in the CT. The speech files were played through a Townshend DAT-LINK+ and recorded on a Panasonic SV3800 Digital Audio Tape (DAT) recorder in the appropriate randomized presentation order. The speech materials were presented to the panels of listeners seated at separate, visually-screened listening stations contained within a Tracoustics soundproof room with an overall ambient noise level of less than 29dBA. The speech materials were presented monaurally over Sennheiser HD-25 supra-aural/closed-back headsets. The other ear was uncovered and Hoth noise was presented in the listening room to provide an ambient noise level of 30dBA. A Panasonic SV3800 DAT player was used to play the materials. The audio output of the DAT player was channeled to an audio distribution amplifier set to deliver narrowband speech to the listeners at an active level of 79dB SPL at the ear. Calibration was accomplished using a B&K 4153 Artificial Ear with circumaural headphone adaptor, 4134 Microphone element, 2669 Microphone Preamplifier, and 2609 Measurement Amplifier. Each listening station in the sound room was equipped with a personal computer system for presentation of the appropriate rating scales and collection of the listeners’ ratings. © 2004, Third Generation Partnership Project 2. All Rights Reserved. Permission is granted for copying, reproducing, or duplicating this document only for the legitimate purposes of 3GPP2. No other copying, reproduction, duplication, or distribution is permitted. 5 Kansas City, MO April 24-28, 2006 C11-20060424-015 3.2. Formal Listening Tests The test plan described three formal listening tests to characterize the performance of EVRC-B. Each experiment included test conditions designed to evaluate the codec performance requirements and objectives for EVRC-B and test the appropriate ToR’s. The organization of the formal subjective experiments conformed to the test plans of previous 3GPP2 codec standardization exercises, i.e., experiments designed to test the codec in Clean Channel conditions, in Impaired Channel conditions, and in Background Noise. 3.2.1. Experiment CT1 - Clean Channel Conditions The test plan specified the design parameters for the subjective experiments. Exp. CT1 was specified as using the Absolute Category Rating (ACR) methodology described in ITU-T Rec. P.800 [6]. The ACR yields the Mean Opinion Score (MOS) as an estimate of overall speech quality. The experimental design involved 32 test conditions, eight talkers (four males, four females), and eight speech samples (i.e., sentence-pairs) per talker. The processed speech materials were presented to eight panels of four listeners (32 listeners total) in a partially-balanced, randomized blocks experimental design. Each listening panel heard a randomized presentation order of 32 conditions x 8 talkers. Table 4 shows summary results for Exp. CT1. For each test condition involved in the experiment, the table shows a condition description followed by MOS means and standard deviations for Male Talkers (n=4), Female Talkers (n=4), and All Talkers (n=8). Also included in the table is the 95% Confidence Interval for the All Talker average. Each value in the table is based on the ratings of 32 listeners. Table 4. Summary Results for MOS Experiment CT1 – Clean Channel Conditions # File 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 1NA0 1NG7 1NB0 1NB1 1NB2 1LA0 1LG7 1LB0 1LB1 1LB2 1HA0 1HG7 1HB0 1HB1 1HB2 1TA0 1TG7 1TB0 1TB1 1TB2 1AAH 1ABH 1AB0 1AB1 1AB2 1QSR 1Q03 1Q10 1Q17 1Q24 1Q31 1Q38 Test Condition Codec Nominal input level Nominal input level Nominal input level Nominal input level Nominal input level Low input level Low input level Low input level Low input level Low input level High input level High input level High input level High input level High input level Tandem With EVRC-A Tandem With EVRC-A Tandem With EVRC-A Tandem With EVRC-A Tandem With EVRC-A Additional Conditions Additional Conditions Additional Conditions Additional Conditions Additional Conditions Direct 3 dB MNRU 10 dB MNRU 17 dB MNRU 24 dB MNRU 31 dB MNRU 38 dB MNRU EVRC-A at 9.3 kbps G.723 at 6.3 kbps EVRC-B at 9.3 kbps EVRC-B at 6.6 kbps EVRC-B at 5.8 kbps EVRC-A at 9.3 kbps G.723 at 6.3 kbps EVRC-B at 9.3 kbps EVRC-B at 6.6 kbps EVRC-B at 5.8 kbps EVRC-A at 9.3 kbps G.723 at 6.3 kbps EVRC-B at 9.3 kbps EVRC-B at 6.6 kbps EVRC-B at 5.8 kbps EVRC-A at 9.3 kbps G.723 at 6.3 kbps EVRC-B at 9.3 kbps EVRC-B at 6.6 kbps EVRC-B at 5.8 kbps EVRC-A half-rate max EVRC-B half-rate max EVRC-A -> EVRC-B at 9.3 kbps EVRC-A -> EVRC-B at 6.6 kbps EVRC-A -> EVRC-B at 5.8 kbps Male Talkers Mean Stdev 3.95 0.84 3.61 0.88 3.99 0.87 3.99 0.80 3.84 0.85 3.98 0.88 3.61 0.83 3.91 0.93 3.92 0.83 3.95 0.90 3.87 0.89 3.72 0.90 3.93 0.78 3.77 0.87 3.69 0.91 3.74 0.94 3.74 0.85 3.85 0.90 3.73 0.88 3.79 0.90 2.81 0.94 3.38 0.89 3.73 0.86 3.77 0.87 3.71 0.91 4.02 0.93 1.34 0.75 1.77 0.82 2.73 0.82 3.55 0.89 3.95 0.84 3.90 0.89 Female Talkers Mean Stdev 3.87 0.79 3.54 0.90 3.81 0.89 3.69 0.91 3.75 0.89 3.70 0.93 3.54 0.91 3.84 0.75 3.84 0.86 3.78 0.88 3.80 0.85 3.58 0.87 3.91 0.83 3.80 0.83 3.66 0.88 3.67 0.76 3.52 0.82 3.78 0.89 3.56 0.93 3.61 0.82 3.07 0.90 3.39 0.92 3.70 0.90 3.50 0.96 3.54 0.88 4.07 0.82 1.22 0.74 1.70 0.80 2.56 0.88 3.43 0.92 3.87 0.82 3.92 0.87 Mean 3.91 3.57 3.90 3.84 3.80 3.84 3.57 3.87 3.88 3.87 3.83 3.65 3.92 3.78 3.68 3.71 3.63 3.82 3.64 3.70 2.94 3.39 3.72 3.63 3.63 4.05 1.28 1.73 2.64 3.49 3.91 3.91 All Talkers Stdev LCI-95% UCI-95% 0.81 3.81 4.01 0.89 3.47 3.68 0.89 3.79 4.01 0.87 3.73 3.95 0.87 3.69 3.90 0.91 3.73 3.95 0.87 3.47 3.68 0.85 3.77 3.97 0.85 3.78 3.99 0.89 3.76 3.98 0.87 3.73 3.94 0.88 3.54 3.76 0.80 3.82 4.02 0.85 3.68 3.89 0.89 3.57 3.79 0.86 3.60 3.81 0.84 3.53 3.74 0.89 3.71 3.93 0.91 3.53 3.76 0.86 3.59 3.80 0.92 2.83 3.05 0.90 3.28 3.50 0.88 3.61 3.83 0.93 3.52 3.75 0.89 3.52 3.73 0.87 3.94 4.15 0.74 1.19 1.37 0.81 1.63 1.83 0.85 2.54 2.75 0.90 3.38 3.60 0.83 3.81 4.01 0.88 3.80 4.02 © 2004, Third Generation Partnership Project 2. All Rights Reserved. Permission is granted for copying, reproducing, or duplicating this document only for the legitimate purposes of 3GPP2. No other copying, reproduction, duplication, or distribution is permitted. 6 Kansas City, MO April 24-28, 2006 C11-20060424-015 Figure 1 shows the MOS profile for the unprocessed (Direct) and MNRU reference conditions. The form of the function shown in the figure is typical of those obtained in ACR experiments. 5 MOS 4 3 2 1 3dB 10dB 17dB 24dB 31dB 38dB Direct MNRU Fig.1. MOS Profile for the MNRU Reference Conditions in Exp. CT1 3.2.2. Experiment CT2 – Impaired Channel Conditions (FER) Exp. CT2 used the same design parameters as Exp. CT1 -- ACR test, 32 conditions, eight talkers, 32 listeners. Table 5 shows summary results for Exp. CT2 and Fig. 2 shows the MOS profile for the reference conditions involved in the experiment. Again, the form of the function is typical of those obtained in ACR experiments. Table 5. Summary Results for MOS Experiment CT2 – Impaired Channel Conditions # File 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 20A0 20B0 20B1 20B2 21A0 21B0 21B1 21B2 22A0 22B0 22B1 22B2 25A0 25B0 25B1 25B2 20AH 20BH 21AH 21BH 21BA 21BB 21BC 21BD 21BE 2QSR 2Q03 2Q10 2Q17 2Q24 2Q31 2Q38 Test Condition Codec 0% FER 0% FER 0% FER 0% FER 1% FER 1% FER 1% FER 1% FER 2% FER + 1% D&B 2% FER + 1% D&B 2% FER + 1% D&B 2% FER + 1% D&B 5% FER 5% FER 5% FER 5% FER 0% FER 0% FER 1% FER 1% FER 1% FER 1% FER 1% FER 1% FER 1% FER Direct 3 dB MNRU 10 dB MNRU 17 dB MNRU 24 dB MNRU 31 dB MNRU 38 dB MNRU EVRC-A at 9.3 kbps EVRC-B at 9.3 kbps EVRC-B at 6.6 kbps EVRC-B at 5.8 kbps EVRC-A at 9.3 kbps EVRC-B at 9.3 kbps EVRC-B at 6.6 kbps EVRC-B at 5.8 kbps EVRC-A at 9.3 kbps EVRC-B at 9.3 kbps EVRC-B at 6.6 kbps EVRC-B at 5.8 kbps EVRC-A at 9.3 kbps EVRC-B at 9.3 kbps EVRC-B at 6.6 kbps EVRC-B at 5.8 kbps EVRC-A half-rate max EVRC-B half-rate max EVRC-A half-rate max EVRC-B half-rate max EVRC-B at 6.2 kbps EVRC-B at 7.0 kbps EVRC-B at 7.4 kbps EVRC-B at 7.8 kbps EVRC-B at 8.4 kbps Male Talkers Mean Stdev 3.87 0.78 4.09 0.82 4.02 0.70 3.78 0.81 3.79 0.88 3.92 0.87 3.90 0.73 3.79 0.84 3.54 0.83 3.34 0.88 3.45 0.72 3.38 0.93 3.51 0.87 3.52 0.85 3.48 0.92 3.36 0.85 2.73 0.77 3.48 0.81 2.59 0.80 3.30 0.85 3.82 0.78 3.81 0.86 3.98 0.79 3.91 0.79 3.92 0.76 4.17 0.88 1.20 0.63 1.76 0.68 2.89 0.92 3.70 0.83 4.05 0.91 4.12 0.81 Female Talkers Mean Stdev 3.89 0.83 3.88 0.76 3.75 0.83 3.59 0.86 3.66 0.79 3.84 0.83 3.72 0.82 3.59 0.87 3.48 0.86 3.47 0.80 3.42 0.75 3.48 0.79 3.25 0.82 3.22 0.79 3.16 0.84 3.30 0.85 2.95 0.74 3.32 0.76 2.84 0.73 3.20 0.70 3.60 0.85 3.77 0.79 3.73 0.75 3.85 0.79 3.77 0.78 4.02 0.78 1.13 0.48 1.59 0.67 2.61 0.80 3.48 0.80 3.89 0.74 4.02 0.74 Mean 3.88 3.98 3.89 3.68 3.73 3.88 3.81 3.69 3.51 3.41 3.43 3.43 3.38 3.37 3.32 3.33 2.84 3.40 2.71 3.25 3.71 3.79 3.86 3.88 3.84 4.10 1.16 1.68 2.75 3.59 3.97 4.07 All Talkers Stdev LCI-95% UCI-95% 0.80 3.78 3.98 0.80 3.89 4.08 0.78 3.79 3.98 0.84 3.58 3.79 0.83 3.62 3.83 0.85 3.78 3.99 0.78 3.71 3.90 0.86 3.58 3.79 0.84 3.41 3.62 0.84 3.30 3.51 0.73 3.34 3.52 0.86 3.32 3.54 0.85 3.27 3.48 0.83 3.27 3.47 0.89 3.21 3.43 0.85 3.22 3.43 0.76 2.75 2.93 0.79 3.30 3.50 0.77 2.62 2.81 0.78 3.15 3.34 0.82 3.61 3.81 0.82 3.69 3.89 0.78 3.76 3.95 0.79 3.79 3.98 0.77 3.75 3.94 0.83 4.00 4.20 0.56 1.10 1.23 0.68 1.59 1.76 0.87 2.64 2.86 0.82 3.49 3.69 0.83 3.87 4.07 0.77 3.98 4.17 © 2004, Third Generation Partnership Project 2. All Rights Reserved. Permission is granted for copying, reproducing, or duplicating this document only for the legitimate purposes of 3GPP2. No other copying, reproduction, duplication, or distribution is permitted. 7 Kansas City, MO April 24-28, 2006 C11-20060424-015 5 MOS 4 3 2 1 3dB 10dB 17dB 24dB 31dB 38dB Direct MNRU Fig.2. MOS Profile for the MNRU Reference Conditions in Exp. CT2 3.2.3. Experiment CT3 – Background Noise Conditions Exp. CT3 was specified as using the P-NSA test methodology described in ITU-T Rec. P.835 [7]. The P-NSA methodology is specifically designed to evaluate the quality of speech in background noise. It yields a measure of Signal Quality (SIG), a measure of Background Quality (BAK), and a measure of Overall Quality (OVRL). In general, OVRL scores are highly correlated with MOS but the OVRL rating scale provides greater sensitivity and precision in test conditions involving background noise. While the OVRL score is of most interest here, the SIG and BAK scores also provide valuable diagnostic information. The experimental design involved 36 test conditions, six talkers (three males, three females), and eight speech samples (i.e., sentence-triads) per talker. The processed speech materials were presented to eight panels of four listeners (32 listeners total) in a partially-balanced, randomized blocks experimental design. Each listening panel heard a randomized presentation order of 36 conditions x 6 talkers. Table 6 shows summary results for Exp. CT3. For each test condition involved in the experiment the table shows a condition description followed by means and standard deviations for the SIG, BAK, and OVRL scores. Also included in the table is the 95% Confidence Interval for the OVRL score. Each value in the table is based on the ratings of 32 listeners. © 2004, Third Generation Partnership Project 2. All Rights Reserved. Permission is granted for copying, reproducing, or duplicating this document only for the legitimate purposes of 3GPP2. No other copying, reproduction, duplication, or distribution is permitted. 8 Kansas City, MO April 24-28, 2006 C11-20060424-015 Table 6. Summary Results for P-NSA Experiment CT3 – Background Noise Conditions File 3CA0 3CAH 3CB0 3CB1 3CB2 3CBH 3CX0 3CX2 3SA0 3SAH 3SB0 3SB1 3SB2 3SBH 3SX0 3SX2 3OA0 3OAH 3OB0 3OB1 3OB2 3OBH 3OX0 3OX2 3R40 3R41 3R42 3R43 3R44 3R04 3R14 3R24 3R34 3R11 3R22 3R33 # 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 Test Condition Noise Condition Car Noise at 15dB Car Noise at 15dB Car Noise at 15dB Car Noise at 15dB Car Noise at 15dB Car Noise at 15dB Car Noise at 15dB Car Noise at 15dB Street Noise at 15dB + 1%FER Street Noise at 15dB + 1%FER Street Noise at 15dB + 1%FER Street Noise at 15dB + 1%FER Street Noise at 15dB + 1%FER Street Noise at 15dB + 1%FER Street Noise at 15dB + 1%FER Street Noise at 15dB + 1%FER OffBab Noise at 15dB OffBab Noise at 15dB OffBab Noise at 15dB OffBab Noise at 15dB OffBab Noise at 15dB OffBab Noise at 15dB OffBab Noise at 15dB OffBab Noise at 15dB Car noise, SNR=40dB Car noise, SNR=40dB Car noise, SNR=40dB Car noise, SNR=40dB Car noise, SNR=40dB Car noise, SNR=0dB Car noise, SNR=10dB Car noise, SNR=20dB Car noise, SNR=30dB Car noise, SNR=10dB Car noise, SNR=20dB Car noise, SNR=30dB Codec EVRC-A at 9.3 kbps EVRC-A half-rate max EVRC-B at 9.3 kbps EVRC-B at 6.6 kbps EVRC-B at 5.8 kbps EVRC-B half-rate max EVRC-B at 9.3 kbps w/DTX EVRC-B at 5.8 kbps w/DTX EVRC-A at 9.3 kbps EVRC-A half-rate max EVRC-B at 9.3 kbps EVRC-B at 6.6 kbps EVRC-B at 5.8 kbps EVRC-B half-rate max EVRC-B at 9.3 kbps w/DTX EVRC-B at 5.8 kbps w/DTX EVRC-A at 9.3 kbps EVRC-A half-rate max EVRC-B at 9.3 kbps EVRC-B at 6.6 kbps EVRC-B at 5.8 kbps EVRC-B half-rate max EVRC-B at 9.3 kbps w/DTX EVRC-B at 5.8 kbps w/DTX MNRU=0dB MNRU=10dB MNRU=20dB MNRU=30dB MNRU=40dB MNRU=40dB MNRU=40dB MNRU=40dB MNRU=40dB MNRU=10dB MNRU=20dB MNRU=30dB Signal Stdev Mean 0.87 3.79 0.94 3.08 0.86 3.95 0.86 3.76 0.85 3.80 0.89 3.44 0.81 4.01 0.89 3.78 0.85 3.69 0.91 2.96 0.84 3.83 0.97 3.65 0.92 3.72 0.91 3.33 0.85 3.73 0.96 3.63 0.86 4.14 0.95 3.26 0.84 4.20 0.84 4.07 0.85 4.01 0.90 3.75 0.86 4.16 0.87 3.95 0.67 1.35 0.92 2.33 0.95 3.68 0.80 4.27 0.75 4.36 1.22 2.19 1.21 3.24 1.05 3.81 0.85 4.10 1.13 2.37 1.02 3.60 0.85 4.10 Background Stdev Mean 0.77 3.36 0.87 3.18 0.71 3.51 0.79 3.42 0.74 3.45 0.81 3.33 0.73 3.51 0.76 3.51 0.85 3.34 0.88 3.10 0.80 3.41 0.83 3.33 0.80 3.45 0.84 3.23 0.82 3.30 0.82 3.29 0.89 4.09 1.05 3.73 0.98 3.97 0.95 3.95 0.84 4.11 0.99 3.83 0.96 4.05 0.95 4.02 1.32 1.96 1.19 2.76 0.83 3.68 0.80 4.10 0.78 4.19 0.51 1.17 0.86 1.76 0.81 2.55 0.81 3.30 0.81 1.47 0.77 2.41 0.72 3.30 Mean 3.54 2.92 3.61 3.47 3.56 3.20 3.61 3.51 3.40 2.80 3.49 3.44 3.45 3.15 3.41 3.30 4.01 3.21 3.98 3.96 3.86 3.65 4.06 3.91 1.22 2.26 3.60 4.15 4.27 1.25 2.09 2.96 3.63 1.55 2.79 3.63 Overall Stdev LCI-95% UCI-95% 3.64 3.43 0.73 3.01 2.82 0.68 3.71 3.51 0.73 3.58 3.36 0.78 3.67 3.46 0.73 3.31 3.09 0.79 3.72 3.50 0.76 3.61 3.40 0.76 3.51 3.28 0.80 2.91 2.69 0.78 3.60 3.38 0.77 3.54 3.34 0.72 3.56 3.34 0.80 3.25 3.04 0.74 3.51 3.31 0.70 3.42 3.19 0.82 4.12 3.90 0.81 3.34 3.09 0.86 4.09 3.88 0.76 4.08 3.85 0.80 3.97 3.75 0.79 3.77 3.53 0.83 4.16 3.96 0.72 4.02 3.80 0.77 1.31 1.14 0.58 2.37 2.15 0.78 3.71 3.49 0.78 4.25 4.05 0.70 4.37 4.16 0.72 1.33 1.17 0.57 2.20 1.98 0.78 3.07 2.85 0.78 3.73 3.52 0.76 1.65 1.45 0.71 2.90 2.68 0.77 3.74 3.51 0.78 Figures 3a, 3b, and 3c show the score profiles for SIG, BAK, and OVRL for the three sets of reference conditions included in the P-NSA experiment. Figure 3a shows results for conditions where SNR (car noise) is held constant at 40dB and MNRU is varied from 0dB to 40dB. Figure 3b shows conditions where MNRU is held constant at 40dB and SNR (car noise) is varied from 0 dB to 40 dB. Figure 3c shows results for conditions where both SNR (car noise) and MNRU are varied from 10 dB to 40 dB. These reference conditions provide listeners a frame of reference in two dimensions – signal quality and background quality. The profiles shown in Figs. 3a, 3b, and 3c are typical of those obtained in P-NSA experiments. Fig.3b - MNRU=40dB Fig. 3a - SNR=40dB Car Noise 5 5 SIG BAK 3 OVRL 2 P-NSA Scores P-NSA Scores 4 4 SIG BAK 3 OVRL 2 1 1 0 dB 10 dB 20 dB 30 dB 0 dB 40 dB 10 dB 20 dB 30 dB 40 dB SNR Car Noise MNRU © 2004, Third Generation Partnership Project 2. All Rights Reserved. Permission is granted for copying, reproducing, or duplicating this document only for the legitimate purposes of 3GPP2. No other copying, reproduction, duplication, or distribution is permitted. 9 Kansas City, MO April 24-28, 2006 C11-20060424-015 Fig.3c - MNRU = SNR P-NSA Scores 5 4 SIG 3 BAK OVRL 2 1 10/10 dB 20/20 dB 30/30 dB 40/40 dB MNRU/SNR Car Noise Fig.3. SIG, BAK, and OVRL Score Profiles for the P-NSA Reference Conditions 3.3. Expert Listening Tests 3.3.1. Experiment CT4 – Bad Rate Handling The test plan specified an experiment to verify the performance of EVRC-B relative to EVRC-A under conditions of bad-rate handling. Dynastat processed eight sentence-pairs for one male and one female talker through EVRC-A and three operating points of EVRC-B for the condition of 1% bad rates. The processed materials were presented to 10 expert listeners who judged them on three rating scales for Annoyance, Frequency, and Severity of artifacts. The top portion of Table 7 shows the three rating scales and the bottom half summarizes the results of the expert listeners’ ratings. Table 7. Results of Exp. CT4 - Expert Listener Ratings of Annoyance, Frequency, and Severity of Artifacts. Annoyance Scale Frequency Scale Severity Scale 5 Not noticeable 5 Not Noticeable 3 Mild 4 Noticeable but not annoying 4 Infrequent 2 Moderate 3 Somewhat annoying 3 Frequent 1 Severe 2 Annoying 2 Very frequent 1 Very annoying 1 Extremely frequent Codec Annoyance Rating Frequency / Severity of Artifacts EVRC-A 3.70 infrequent / mild EVRC-B at 9.3 kbps 3.10 frequent / mild to moderate EVRC-B at 6.6 kbps 3.05 frequent / mild to moderate EVRC-B at 5.8 kbps 2.55 frequent / moderate 3.3.2. Experiment CT5 - Performance with Music Signals The test plan specified an MOS experiment using expert listeners to characterize the performance of EVRC-B at 9.3 kbps with music signals. The music signal database included four meaningful music samples for each of three genres: Classical, Pop, and Rock. Each music sample was approximately 15 sec. in duration. The processed music materials were presented to four panels of four expert listeners in a partially balanced, randomized blocks experimental design. Each listener rated the two processed conditions, EVRC-A and EVRC-B, and four MNRU reference conditions for each of the three music genres. Table 8 and Fig. 4 shows summary results for Exp. CT5. © 2004, Third Generation Partnership Project 2. All Rights Reserved. Permission is granted for copying, reproducing, or duplicating this document only for the legitimate purposes of 3GPP2. No other copying, reproduction, duplication, or distribution is permitted. 10 Kansas City, MO April 24-28, 2006 C11-20060424-015 Table 8. Results for Exp.CT5 - MOS for Music Signals Exp.CT5 - Music signals Condition File 5MA0 EVRC-A 5MB0 EVRC-B at 9.3 kbps 5M06 MNRU 6dB 5M15 MNRU 15dB 5M24 MNRU 24dB 5MSR Source Scores by Genre Class. Pop Rock 1.88 1.75 2.19 2.00 1.75 1.88 1.63 2.06 2.06 3.63 3.56 3.31 4.06 3.75 3.69 4.13 3.44 3.63 MOS 1.94 1.88 1.92 3.50 3.83 3.73 All Genres Stdev CI-L 0.73 1.73 0.79 1.65 0.58 1.75 0.65 3.32 0.69 3.64 0.77 3.51 CI-U 2.14 2.10 2.08 3.69 4.03 3.95 5 MOS 4 3 2 1 6dB 15dB 24dB Src EVRC-A EVRC-B Test Condition Fig.4. MOS for the Test and Reference Conditions in Exp. CT5 – Music Signals 4. Terms of Reference Tests Table 9 shows the results of the Terms of Reference tests specified in the test plan. The following bullet points summarize the ToR tests detailed in Table 9. EVRC-B at 9.3 kbps better than EVRC-A – passed in 1 out of 12 tests EVRC-B at 6.6 kbps not worse than EVRC-A – passed in all 11 tests EVRC-B at 5.8 kbps not worse than G.723 at 6.3 kbps – passed in all 4 tests EVRC-B, half rate max better than EVRC-A, half rate max – passed in all 6 tests © 2004, Third Generation Partnership Project 2. All Rights Reserved. Permission is granted for copying, reproducing, or duplicating this document only for the legitimate purposes of 3GPP2. No other copying, reproduction, duplication, or distribution is permitted. 11 Kansas City, MO April 24-28, 2006 C11-20060424-015 Table 9. Details of Terms of Reference Tests for the EVRC-B Characterization Test Terms of Reference -- Is EVRC-B at 9.3 kbps Better Than EVRC-A Reference Codec File Mean SE Exp. Cond. CT1 CT1 CT1 CT1 Nominal level Low level High level Tandem CT2 CT2 File Test Codec Mean SE Diff. t-test SEMD t Pass 1NA0 1LA0 1HA0 3.91 3.84 3.83 0.81 0.91 0.87 1NB0 1LB0 1HB0 3.90 3.87 3.92 0.89 0.85 0.80 0.00 -0.03 -0.09 0.08 0.08 0.07 0.05 -0.40 -1.17 No No No 0%FER 1%FER 1TA0 20A0 21A0 3.71 3.88 3.73 0.86 0.80 0.83 1TB0 20B0 21B0 3.82 3.98 3.88 0.89 0.80 0.85 -0.11 -0.11 -0.16 0.08 0.07 0.07 -1.42 -1.49 -2.10 No No Yes CT2 CT2 CT3 2%FER 5%FER Car noise 22A0 25A0 3CA0 3.51 3.33 3.54 0.84 0.85 0.73 22B0 25B0 3CB0 3.41 3.37 3.61 0.84 0.83 0.73 0.11 -0.04 -0.07 0.07 0.07 0.07 1.41 -0.58 -0.98 No No No CT3 CT3 CT5 Street noise Off/Bab noise Music signals 3SA0 3OA0 5MA0 3.40 4.01 1.94 0.80 0.81 0.73 3SB0 3OB0 5MB0 3.49 3.98 1.88 0.77 0.76 0.79 -0.09 0.03 0.06 0.08 0.08 0.15 -1.17 0.32 0.41 No No No t Pass Terms of Reference -- Is EVRC-B at 6.6 kbps Not Worse Than EVRC-A Reference Codec Test Codec t-test SEMD File Mean SE File Mean SE Diff. Exp. Cond. CT1 CT1 CT1 CT1 CT2 Nominal level Low level High level Tandem 0%FER 1NA0 3.91 0.81 1NB1 3.84 0.87 0.07 0.07 0.90 Yes 1LA0 1HA0 3.84 3.83 0.91 0.87 1LB1 1HB1 3.88 3.78 0.85 0.85 -0.04 0.05 0.08 0.08 -0.55 0.67 Yes Yes 1TA0 20A0 3.71 3.88 0.86 0.80 1TB1 20B1 3.64 3.89 0.91 0.78 0.06 -0.01 0.08 0.07 0.80 -0.11 Yes Yes CT2 CT2 1%FER 2%FER 21A0 22A0 3.73 3.51 0.83 0.84 21B1 22B1 3.81 3.43 0.78 0.73 -0.08 0.08 0.07 0.07 -1.15 1.12 Yes Yes CT2 CT3 CT3 CT3 5%FER Car noise Street noise Off/Bab noise 25A0 3CA0 3SA0 3OA0 3.33 3.54 3.40 4.01 0.85 0.73 0.80 0.81 25B1 3CB1 3SB1 3OB1 3.32 3.47 3.44 3.96 0.89 0.78 0.72 0.80 0.01 0.06 -0.04 0.05 0.08 0.08 0.08 0.08 0.15 0.80 -0.54 0.56 Yes Yes Yes Yes t -2.87 -3.75 -0.35 -0.88 Pass Yes Yes Yes Yes Terms of Reference -- Is EVRC-B Half Rate Max Better Than EVRC-A Half Rate Max Reference Codec Test Codec t-test Cond. SEMD File Mean SE File Mean SE Diff. t Nominal level 1AAH 2.94 0.92 1ABH 3.39 0.92 -0.45 0.08 -5.45 0%FER 20AH 2.84 0.76 20BH 3.40 0.79 -0.56 0.07 -8.14 1%FER 21AH 2.71 0.77 21BH 3.25 0.78 -0.53 0.07 -7.76 Car noise 3CAH 2.92 0.68 3CBH 3.20 0.79 -0.29 0.08 -3.80 Street noise 3SAH 2.80 0.78 3SBH 3.15 0.74 -0.35 0.08 -4.52 Off/Bab noise 3OAH 3.21 0.86 3OBH 3.65 0.83 -0.44 0.09 -5.08 Pass Yes Yes Yes Yes Yes Yes Exp. CT1 CT1 CT1 CT1 Exp. CT1 CT2 CT2 CT3 CT3 CT3 Terms of Reference -- Is EVRC-B at 5.8 kbps Not Worse Than G.723 at 6.3 kbps Reference Codec Test Codec t-test Cond. SEMD File Mean SE File Mean SE Diff. 1NG7 3.57 0.89 1NB2 3.80 0.87 -0.22 0.08 Nominal level 1LG7 3.57 0.87 1LB2 3.87 0.89 -0.29 0.08 Low level 1HG7 3.65 0.88 1HB2 3.68 0.89 -0.03 0.08 High level 1TG7 3.63 0.84 1TB2 3.70 0.86 -0.07 0.08 Tandem © 2004, Third Generation Partnership Project 2. All Rights Reserved. Permission is granted for copying, reproducing, or duplicating this document only for the legitimate purposes of 3GPP2. No other copying, reproduction, duplication, or distribution is permitted. 12 Kansas City, MO April 24-28, 2006 C11-20060424-015 5. Unity Gain Requirement Section 2.7 of the test plan specified that the EVRC-B codec must not show more than 0.5 dB deviation between input and output active speech levels as measured by the ITU-T software tool actlev [4]. Dynastat measured the active speech level of the EVRC-A and EVRC-B codecs under clean, Nominal input level conditions and compared the values to the active speech level of the input source file. Table 10 shows the results of those comparisons. The deviation in active speech level is within 0.5 dB for EVRC-A and for EVRC-B at all three bit-rates. Table 10. Results of Test of Unity Gain Requirement. Source EVRC-A -25.711 deviation: -26.145 0.434 EVRC-B 6.6 kbps -26.190 0.479 9.3 kbps -26.107 0.396 5.8 kbps -26.197 0.486 6. Additional Analyses During discussions with Qualcomm, the Listening Laboratory proposed to conduct additional tests and analyses to further describe the MOS voting data. This section describes those analyses. 6.1. Performance Across Bit-rates Exp. CT2 included eight test conditions designed to evaluate the performance of EVRC-B across a wide range of ADR. Figure 5 shows the results (MOS and 95% CI) for EVRC-B over the eight values of ADR under 1% FER conditions. The figure indicates a slight linear increase in performance with increase in bit-rate. An Analysis of Variance (ANOVA) for Linear Trend was conducted on these MOS results. The ANOVA showed that the Linear Trend was not significant (F = 2.89, df = 1,1785, p = 0.089). In fact, the ANOVA showed that there was no significant difference in MOS performance across the eight bit-rates (F = 1.94, df = 7,1785, p = 0.060). 4.0 3.9 MOS 3.8 3.7 3.6 3.5 5.80 6.20 6.60 7.00 7.40 7.80 8.40 9.30 ADR (kbps) Fig.5. MOS for EVRC-B Across ADR (1% FER) © 2004, Third Generation Partnership Project 2. All Rights Reserved. Permission is granted for copying, reproducing, or duplicating this document only for the legitimate purposes of 3GPP2. No other copying, reproduction, duplication, or distribution is permitted. 13 Kansas City, MO April 24-28, 2006 C11-20060424-015 6.2. Dunnett’s Tests Dynastat proposed to perform a series of tests using a statistical method, Dunnetts Test, that is more appropriate for the data and the experimental design of the subjective experiments. Dunnetts test is a special case of the more general Post Hoc Multiple Means Test, where multiple treatment means are statistically compared to a common control mean. In the case of the MOS experiments, the treatments are the operating points of EVRC-B and the control is EVRC-A under the same test condition. The first stage in a Dunnetts Test is to run an ANOVA for the effects of Codecs x Subjects, where Codecs (n = 4) includes EVRC-A plus the three operating points of EVRC-B (9.3, 6.6, and 5.8 kbps) and Subjects (n=32) are the votes for each listener averaged over talkers — 8 talkers for Exps. CT1 and CT2 and 6 talkers for Exp. CT3. If the F-ratio for the Codecs effect is significant (i.e., p < .05) then there is significant variation among the scores for the four codecs and the test proceeds to the second stage. If the F-ratio is not significant there is no significant variation among the codecs (i.e., the scores for all four codecs are statistically equivalent) and the Dunnetts test is complete. In the second stage of Dunnetts Test the treatment means are compared statistically to the control mean and the mean differences are evaluated for significance. Separate Dunnetts Tests were computed for each test condition involved in Experiments CT1, CT2 and CT3. Table 11 shows the results of the Dunnetts Tests for Exp. CT1. The table is separated into five sections, one for each test condition involved in Exp. CT1. For all of the test conditions except High Input Level, the F-ratio is not significant and therefore all four codecs have equivalent MOS values. For the High Input Level condition, however, the F-ratio is significant (F = 6.98, p<.05) and the subsequent Dunnetts comparisons show that EVRC-B at 5.8 kbps (MOS=3.68) is significantly worse than EVRC-A (MOS=3.83). © 2004, Third Generation Partnership Project 2. All Rights Reserved. Permission is granted for copying, reproducing, or duplicating this document only for the legitimate purposes of 3GPP2. No other copying, reproduction, duplication, or distribution is permitted. 14 Kansas City, MO April 24-28, 2006 C11-20060424-015 Table 11. Results of Dunnetts Tests of MOS Values for Exp. CT1 Nominal Input Level Codec EVRC-A (control) EVRC-B at 9.3 kbps EVRC-B at 6.6 kbps EVRC-B at 5.8 kbps File F = 1.90 MOS not signif. Diff Dunnetts Result 1NA0 1NB0 1NB1 1NB2 3.91 3.90 3.84 3.80 0.00 0.07 0.11 0.07 1.23 2.02 Tst = Ref Tst = Ref Tst = Ref File F = 0.17 MOS not signif. Diff Dunnetts Result 1LA0 1LB0 1LB1 1LB2 3.84 3.87 3.88 3.87 -0.03 -0.04 -0.03 -0.50 -0.69 -0.44 Tst = Ref Tst = Ref Tst = Ref File F = 6.98 MOS signif. Diff p < .05 Dunnetts Result 1HA0 1HB0 1HB1 1HB2 3.83 3.92 3.78 3.68 -0.09 0.05 0.16 -1.59 0.94 2.88 Tst = Ref Tst = Ref Tst < Ref File F = 2.66 MOS not signif. Diff Dunnetts Result 1TA0 1TB0 1TB1 1TB2 3.71 3.82 3.64 3.70 -0.11 0.06 0.01 -1.76 1.00 0.13 Tst = Ref Tst = Ref Tst = Ref File F = 1.84 MOS not signif. Diff Dunnetts Result 1TA0 1AB0 1AB1 1AB2 3.71 3.72 3.63 3.63 -0.01 0.07 0.08 -0.23 1.45 1.60 Tst = Ref Tst = Ref Tst = Ref Low Input Level Codec EVRC-A (control) EVRC-B at 9.3 kbps EVRC-B at 6.6 kbps EVRC-B at 5.8 kbps High Input Level Codec EVRC-A (control) EVRC-B at 9.3 kbps EVRC-B at 6.6 kbps EVRC-B at 5.8 kbps Tandem: Codec => EVRC-A Codec EVRC-A (control) EVRC-B at 9.3 kbps EVRC-B at 6.6 kbps EVRC-B at 5.8 kbps Tandem: EVRC-A => Codec Codec EVRC-A (control) EVRC-B at 9.3 kbps EVRC-B at 6.6 kbps EVRC-B at 5.8 kbps Table 12 shows the results for the Dunnetts Tests for the conditions in Exp.CT2. Two of the conditions in Exp.CT2 show significant F-ratios, 0% FER and 1% FER. For the 0% FER condition, EVRC-B at 5.8 kbps is significantly worse than EVRC-A. However, this is exactly the same condition as the Nominal Input Level condition in Exp.CT1 where there was no significant difference. The results of the two experiments contradict each other and indicate that the difference is on the edge of being significant. In the test for the 1% FER condition, EVRC-B at 9.3 kbps is actually shown to be significantly higher than the control, EVRC-A. © 2004, Third Generation Partnership Project 2. All Rights Reserved. Permission is granted for copying, reproducing, or duplicating this document only for the legitimate purposes of 3GPP2. No other copying, reproduction, duplication, or distribution is permitted. 15 Kansas City, MO April 24-28, 2006 C11-20060424-015 Table 12. Results of Dunnetts Tests of MOS Values for Exp. CT2 0% FER File F = 9.52 MOS signif. Diff p < .05 Dunnetts Result EVRC-A (control) EVRC-B at 9.3 kbps EVRC-B at 6.6 kbps EVRC-B at 5.8 kbps 20A0 20B0 20B1 20B2 3.88 3.98 3.89 3.68 -0.11 -0.01 0.20 -1.83 -0.14 3.39 Tst = Ref Tst = Ref Tst < Ref 1% FER Codec EVRC-A (control) EVRC-B at 9.3 kbps EVRC-B at 6.6 kbps EVRC-B at 5.8 kbps File 21A0 21B0 21B1 21B2 F = 4.14 MOS 3.73 3.88 3.81 3.69 signif. Diff -0.16 -0.08 0.04 p < .05 Dunnetts -2.57 -1.35 0.64 Result Tst > Ref Tst = Ref Tst = Ref 2% FER + 2% D & B Codec EVRC-A (control) EVRC-B at 9.3 kbps EVRC-B at 6.6 kbps EVRC-B at 5.8 kbps File 22A0 22B0 22B1 22B2 F = 0.89 MOS 3.51 3.41 3.43 3.43 not signif. Diff 0.11 0.08 0.08 Dunnetts 1.53 1.13 1.19 Result Tst = Ref Tst = Ref Tst = Ref 5% FER Codec EVRC-A (control) EVRC-B at 9.3 kbps EVRC-B at 6.6 kbps EVRC-B at 5.8 kbps File 25A0 25B0 25B1 25B2 F = 0.44 MOS 3.38 3.37 3.32 3.33 not signif. Diff 0.01 0.06 0.05 Dunnetts 0.12 0.94 0.77 Result Tst = Ref Tst = Ref Tst = Ref Codec Table 13 shows the results for the Dunnetts Tests for the conditions in Exp.CT3. The Dunnetts Tests for Exp.CT3 include two additional codec-conditions, EVRC-B at 9.3 and 5.8 kbps with DTX. This results in a Dunnetts Test with five treatment conditions and one control. Two of the test conditions in Exp.CT3 show significant F-ratios, Street Noise and Off/BabNoise. However, for the Street Noise condition none of the codecs are significantly different form EVRCA. For the Off/Bab Noise condition EVRC-B at 5.8 kbps is significantly worse than EVRC-A. Based on the results of the Dunnetts Tests for Experiments CT1, CT2, and CT3, it is concluded that EVRC-B at 9.3 kbps and 6.6 kbps are equivalent to EVRC-A on all test conditions. In addition, EVRC-B at 5.8 kbps is equivalent to EVRC-A on most of the conditions involved in the three experiments (9 out of 12 test conditions). © 2004, Third Generation Partnership Project 2. All Rights Reserved. Permission is granted for copying, reproducing, or duplicating this document only for the legitimate purposes of 3GPP2. No other copying, reproduction, duplication, or distribution is permitted. 16 Kansas City, MO April 24-28, 2006 C11-20060424-015 Table 13. Results of Dunnetts Tests of OVRL Values for Exp. CT3 Car Noise at 15dB SNR Codec EVRC-A (control) EVRC-B at 9.3 kbps EVRC-B at 6.6 kbps EVRC-B at 5.8 kbps EVRC-B at 9.3 kbps w/DTX EVRC-B at 5.8 kbps w/DTX File 3CA0 3CB0 3CB1 3CB2 3CX0 3CX2 F = 1.72 MOS 3.54 3.61 3.47 3.56 3.61 3.51 not signif. Diff -0.07 0.06 -0.03 -0.07 0.03 Dunnett -1.22 1.04 -0.45 -1.22 0.52 Result Tst = Ref Tst = Ref Tst = Ref Tst = Ref Tst = Ref Street Noise at 15dB SNR Codec EVRC-A (control) EVRC-B at 9.3 kbps EVRC-B at 6.6 kbps EVRC-B at 5.8 kbps EVRC-B at 9.3 kbps w/DTX EVRC-B at 5.8 kbps w/DTX File 3SA0 3SB0 3SB1 3SB2 3SX0 3SX2 F = 2.56 MOS 3.40 3.49 3.44 3.45 3.41 3.30 signif. Diff -0.09 -0.04 -0.05 -0.01 0.09 p < .05 Dunnett -1.66 -0.74 -0.92 -0.18 1.66 Result Tst = Ref Tst = Ref Tst = Ref Tst = Ref Tst = Ref Off/Bab Noise at 20dB SNR Codec EVRC-A (control) EVRC-B at 9.3 kbps EVRC-B at 6.6 kbps EVRC-B at 5.8 kbps EVRC-B at 9.3 kbps w/DTX EVRC-B at 5.8 kbps w/DTX File 3OA0 3OB0 3OB1 3OB2 3OX0 3OX2 F = 2.68 MOS 4.01 3.98 3.96 3.86 4.06 3.91 signif. Diff 0.03 0.05 0.15 -0.05 0.10 p < .05 Dunnett 0.42 0.74 2.45 -0.76 1.68 Result Tst = Ref Tst = Ref Tst < Ref Tst = Ref Tst = Ref 6.3. Global Analyses All of the statistical tests described in the previous sections involved tests of the ToR’s for EVRCB for specific test conditions within experiments – i.e., local analyses. This section describes statistical analyses and tests comparing the results of EVRC-A and EVRC-B across conditions within an experiment and across experiments, i.e., global analyses. 6.3.1. ANOVA across Conditions within Experiments For each of the three formal MOS tests ANOVA’s were conducted comparing the results of EVRCA and EVRC-B across three relevant test conditions involved in the experiment. Separate ANOVA’s were conducted for each of three ADR’s for EVRC-B — 9.3, 6.6, and 5.8 kbps. 6.3.1.1. ANOVA for Exp. CT1 – Input Level For Exp. CT1, results were analyzed for the three input level conditions. Each analysis was a threeway ANOVA for factors Codec (EVRC-A vs. EVRC-B), Conditions (Nominal, Low, and High input level), and Subjects (n=32). The effects of interest here are the F-ratios for the Codecs main effect and the Codecs x Conditions interaction effect. Table 14 shows the results of the three Global ANOVA’s for Exp.CT1. Each row in the table shows the results of the ANOVA for a particular ADR for EVRC-B. The left side of the table shows the obtained F-ratios for the ANOVA. For the Codecs main effect, an F-ratio greater than 4.16 indicates a significant difference between the MOS © 2004, Third Generation Partnership Project 2. All Rights Reserved. Permission is granted for copying, reproducing, or duplicating this document only for the legitimate purposes of 3GPP2. No other copying, reproduction, duplication, or distribution is permitted. 17 Kansas City, MO April 24-28, 2006 C11-20060424-015 values for the two codecs, EVRC-A and EVRC-B, over the three test conditions. For the Codecs x Conditions interaction effect, an F-ratio greater than 3.15 indicates a significant difference between the patterns of MOS scores for the two codecs over the three test conditions. There was only one significant F-ratio obtained for Exp. CT1 — the Codecs effect for the comparison EVRC-A vs. EVRC-B at 5.8 kbps. An examination of the mean values (shown on the right side of the table) indicates that EVRC-A scored significantly higher (MOS = 3.859) than EVRC-B at 5.8 kbps (MOS = 3.780). For Exp. CT1 there were no significant Codecs x Conditions interactions. Table 14. Results of ANOVA for Exp. CT1 Exp. CT1 Comparison A vs. B at 9.3 A vs. B at 6.6 A vs. B at 5.8 Codecs 2.15 1.12 4.57 F-ratios Cod. x Cond. 0.79 1.13 2.44 Exp. CT1 Comparison A vs. B at 9.3 A vs. B at 6.6 A vs. B at 5.8 A 3.859 3.859 3.859 MOS B 3.897 3.835 3.780 Diff. -0.038 0.025 0.079 6.3.1.2. ANOVA for Exp. CT2 - FER For Exp. CT2, results were analyzed for the three FER test conditions. Each analysis was a threeway ANOVA for factors Codecs (EVRC-A vs. EVRC-B), Conditions (1%, 2%, and 5% FER), and Subjects (n=32). Table 15 shows the results of the three Global ANOVA’s for Exp.CT2. There was only one significant F-ratio for Exp. CT2 — the Codecs x Conditions interaction effect for the comparison EVRC-A vs. EVRC-B at 9.3 kbps. The significant interaction indicates a significance difference in the patterns of MOS scores for the two codecs across FER conditions. Figure 6 illustrates the significant interaction for EVRC-A vs. EVRC-B at 9.3kbps across the FER conditions. Table 15. Results of ANOVA for Exp. CT2 Exp. CT2 Comparison A vs. B at 9.3 A vs. B at 6.6 A vs. B at 5.8 Codecs 0.11 0.23 1.41 F-ratios Cod. x Cond. 6.12 1.79 0.14 EVRC-A Exp. CT2 Comparison A vs. B at 9.3 A vs. B at 6.6 A vs. B at 5.8 A 3.539 3.539 3.539 MOS B 3.553 3.520 3.482 Diff. -0.014 0.020 0.057 EVRC-B0 4.0 MOS 3.5 3.0 2.5 1% FER 2% FER 5% FER Fig. 6 Significant Interaction of Codecs and FER Conditions for Exp. CT2 © 2004, Third Generation Partnership Project 2. All Rights Reserved. Permission is granted for copying, reproducing, or duplicating this document only for the legitimate purposes of 3GPP2. No other copying, reproduction, duplication, or distribution is permitted. 18 Kansas City, MO April 24-28, 2006 C11-20060424-015 6.3.1.3. ANOVA for Exp. CT3 – Background Noise For Exp. CT3, results were analyzed for the three Background Noise conditions. Each analysis was a three-way ANOVA for factors Codecs (EVRC-A vs. EVRC-B), Conditions (Car, Street, and Off/Bab noise), and Subjects (n=32). Table 16 shows the results of the three Global ANOVA’s for Exp.CT3. There was only one significant F-ratio for Exp. CT3 — the Codecs x Conditions interaction effect for the comparison EVRC-A vs. EVRC-B at 5.8 kbps. The significant interaction indicates a significance difference in the patterns of MOS scores for the two codecs across background noise conditions. Figure 7 illustrates the significant interaction for EVRC-A vs. EVRCB at 5.8kbps across the background noise conditions. Table 16. Results of ANOVA for Exp. CT3 Exp. CT3 Comparison A vs. B at 9.3 A vs. B at 6.6 A vs. B at 5.8 Codecs 1.3 0.42 0.32 F-ratios Cod. x Cond. 1.77 0.85 3.89 Exp. CT3 Comparison A vs. B at 9.3 A vs. B at 6.6 A vs. B at 5.8 A 3.647 3.647 3.647 OVRL B 3.694 3.625 3.623 Diff. -0.047 0.022 0.024 Fig. 7 Significant Interaction of Codecs and Background Noise Conditions for Exp. CT3 6.3.2. ANOVA Across all Experiments — CT1, CT2, CT3 The final set of analyses involved comparing EVRC-A and EVRC-B across all three formal experiments, CT1, CT2, and CT3. Table 17 shows the results of the Global ANOVA across all three experiments for each of the three ADR’s of EVRC-B. Table 17. Results of Global ANOVA Over Exps. CT1, CT2, and CT3 CT1, CT2, CT3 Comparison A vs. B at 9.3 A vs. B at 6.6 A vs. B at 5.8 Codecs 2.31 1.28 4.66 F-ratios Cod. x Cond. 0.20 0.01 0.42 CT1, CT2, CT3 Comparison A vs. B at 9.3 A vs. B at 6.6 A vs. B at 5.8 A 3.682 3.682 3.682 MOS/OVRL B Diff. 3.715 -0.033 3.660 0.022 3.628 0.054 © 2004, Third Generation Partnership Project 2. All Rights Reserved. Permission is granted for copying, reproducing, or duplicating this document only for the legitimate purposes of 3GPP2. No other copying, reproduction, duplication, or distribution is permitted. 19 Kansas City, MO April 24-28, 2006 C11-20060424-015 The F-ratio for the ANOVA comparing EVRC-A vs. EVRC-B at 5.8 kbps is significant and a comparison of the means shows that EVRC-A is significantly better than EVRC-B at 5.8kpbs across the three subjective experiments. Figure 8 shows average scores over all three formal subjective experiments for EVRC-A and the three ADR’s of EVRC-B. The figure illustrates the relative performance of EVRC-A and various ADR’s of EVRC-B across a wide range of test conditions. In general the figure shows that: EVRC-B at 9.3 kbps performs slightly better than EVRC-A EVRC-B at 6.6 kbps performs slightly worse than EVRC-A EVRC-B at 5.8 kbps performs significantly worse than EVRC-A EVRC-A EVRC-B at 9.3 kbps EVRC-B at 6.6 kbps EVRC-B at 5.8 kbps MOS/OVRL 4.0 3.8 3.6 3.4 CT1 CT2 CT3 Fig. 8 Performance of EVRC-A and EVRC-B Across All Three Experiments. 7. References [1] 3GPP2-C11-20051205-004b — EVRC-B characterization test plan and schedule; January, 2006. [2] EVRC-A — TIA/EIA-IS-127, Enhanced Variable Rate Codec, Speech Service Option 3 for Wideband Spread Spectrum Digital Systems, September, 1996. [3] ITU-T Rec. G.723 — Dual Rate Speech coder for Multi-Media communication Transmitting at 5.3 and 6.3 kbit/sec (11/1996). [4] ITU-T Rec. G.191 — Software Tool Library (12/2000). © 2004, Third Generation Partnership Project 2. All Rights Reserved. Permission is granted for copying, reproducing, or duplicating this document only for the legitimate purposes of 3GPP2. No other copying, reproduction, duplication, or distribution is permitted. 20 Kansas City, MO April 24-28, 2006 C11-20060424-015 [5] SMV — C-S0034-0 v1.0, Minimum Performance Specification for the Selectable Mode Vocoder (SMV), Speech Service Option 56 for Wideband Spread Spectrum Digital Systems; April, 2004. [6] ITU-T Rec. P.800 — Methods for subjective determination of transmission quality (08/1996). [7] ITU-T Rec. P.835 — Subjective test methodology for evaluating speech communication systems that include noise suppression algorithm (11/2003). Appendix – Host Lab Processing scripts Characterization Test Scripts Processing Script for Experiment CT1 – Clean Channel conditions Processing Script for Experiment CT2 – Impaired Channel conditions Processing Script for Experiment CT3 – Background Noise Conditions Processing Script for Experiment CT4 – Bad Rate Handling Processing Script for Experiment CT5 – Music Signals © 2004, Third Generation Partnership Project 2. All Rights Reserved. Permission is granted for copying, reproducing, or duplicating this document only for the legitimate purposes of 3GPP2. No other copying, reproduction, duplication, or distribution is permitted. 21