Liaison Statement To: ITU SG 16 Q8, 9, 10/16 For Information Source: IETF CODEC Working Group, Real-Time Applications and Infrastructure Area (RAI) Date: 9 November 2011 Speech and Audio Coding Standardization The IETF codec working group would like to thank the ITU for their liaison statement on 25 August 2011, providing comments on the Opus codec. We have taken the comments provided in that document and addressed many of them in the most recent revision of the Opus specification, which you can find at (http://datatracker.ietf.org/doc/draft-ietf-codec-opus/) . A second working group last call was issued on October 31st, completing on 19 November, coincident with the conclusion of the Taipei IETF meeting. Going through each of the comments in the liaison, here is how they have been addressed: SG16 experts are unsure about the maturity level of the specification and wonder when a final standard will be delivered. It is stated that "the design team believed the codec was complete by June 2011, consequently, the codec group issued a WGLC for the codec on July 8, 2011". However, since that date, several patches and bug fixes have been sent on the IETF reflector, which suggests that the lastcall agreement on the codec was based upon an unstable version. In the IETF process, the purpose of the working group last call is to solicit additional comments, which often result in further modifications of the specification. The last call issued when the chairs believe that no substantive issues remain and the last call is to test that assumption. As such, it is quite appropriate and reasonable for minor changes to be made by document authors, as well as for more substantive ones based on comments. Misalignments between the specification text and the C-code implementation have also been noted: some algorithmic features performed in the C-code implementation, e.g., warped LPC, are not described in the text, whilst some algorithmic features described in the specification text are not implemented in the source code, e.g., the switching between SILK and CELT at speech/music and music/speech frame transitions. Thank you for pointing out these errors. Warped LPC is mentioned at a high level on page 135 of the specification. The details are specified in the code, which is the normative specification of the algorithm. As for switching between SILK and CELT; the encoder will switch automatically but not based on a music/speech detection. Such a thing could be implemented, but since it is encoder-side and not normative, it does not need to be part of the reference implementation. The group discussed that and agreed that it does not make sense to include within the specification. The "readme" file is not in agreement with the "help" output of the executable command line (probably the readme has been written for an older software version?); This has been fixed. The C-code still contains some "TODO" comments; All the todos that needed to be done have been done; the others have been changed to NOTE comments or removed. Parts of the C-code seems to be either unreachable or remain unoptimized: We believe that a significant amount of work still needs to be done to derive an efficient implementation without useless additional complexity; Actually the code has been run and tested across many different platforms, and the latest version includes test vectors included (by reference) which the group believes test all aspects of the decoder. This is also a reference implementation and does not, by design, include platform-specific assembly or other optimizations, which are out of scope for this work. The portability of the current version is rather limited. Speech and audio coding standards are expected to have a wide portability so that they can be used in a wide range of environments. The OPUS codec software seems to have been natively developed for Linux (or Cygwin) and does not seem to be easily portable to other platforms. For instance, it cannot be compiled directly on another platform with a different compiler such as DOS/Microsoft Visual Studio and building a Microsoft Visual Studio project will require various modifications to the C-code; The autoconf and MSVC projects do make things easier on a broader set of systems, but they're about half the size of the whole codec so not really suitable for inclusion in the draft. But they're in the SCM linked from http://opus-codec.org/. The web interface there will build tarfile snapshots from the repository. Testing has actually been done on: Linux (x86, x86_64, IA64, PPC, Armv7) (GCC 4.7, 4.5, 3.x depending on the platform) Linux with LLVM compiler (x86, x86_64) NetBSD (x86) FreeBSD (x86) Solaris 10 (Ultrasparc) Win32 via Mingw Win32 via LCC-Win32 Win32 via OpenWatcom Dos32 via OpenWatcom IBM S/390 VAX (MicroVAX 3900, via SIMH, really It's quite slow) Test vectors to check the compliance with the OPUS standard are missing: Speech & audio coding standards should have a minimum set of Test Vectors to check whether the generated executable works properly and any implementation complies with the expected standardized format; Agreed. The latest version includes test vectors. The auxiliary functionalities required for VoIP, e.g. time shortening/stretching, are not provided together with the codec. An important justification for the formalization of theIETF Codec WG was that these functionalities were stated to be very crucial for VoIP quality and are not provided in the codecs from other SDOs. This was discussed as part of the working group last call. Consensus from the group is that these kinds of algorithms are non-normative and do not need to be included as part of the specification itself. Rather, the decoder includes control parameters which allow a jitter buffer implementation to do this. A pointer to a jitter buffer implementation which does such warping (the Google webRTC code) was included as an informative reference. The understanding of SG16 experts was that the primary objective of the IETF Codec WG was to develop a codec which is royalty-free and easily distributable, as given in guidelines (http://datatracker.ietf.org/doc/draft-ietf-codecguidelines/), and this was the main motivation behind using royalty-free codecs to define the quality requirement references, as given in Section 5.1 of codec requirements (http://datatracker.ietf.org/doc/draft-ietf-codec-requirements/). It is unfortunate that this objective seems to not have been achieved. We believe that the choice of the codecs for quality requirement references were not appropriate and have subsequently been shown to be somewhat misguiding. These requirements should have been set with regard to standardized codecs based on their technical merits rather than their royalty status. The IETF itself cannot make decisions on the validity of patent claims. This is for the judgment of the members of the working group to make on their own. The group will decide, as part of its final working group last call, whether participants believe the document to be ready for publication based on our goals for the working group. Our goals and objectives for the work remain the same. According to test results provided in another IETF deliverable referred in your LS (http://datatracker.ietf.org/doc/draft-valin-codec-results/), OPUS appears to have some promisingquality. Yet, this deliverable does not include any formal test results based on a test plan designed with appropriate standardized testing methodologies. Moreover, it is a compilation of various tests conducted for different purposes using older versions of the codec. Therefore, it is difficult to assess the quality of the final version of OPUS codec which enters WGLC. The testing document has now been accepted as a working group item (http://datatracker.ietf.org/doc/draft-ietf-codec-results/). The group has discussed the dispensation of the older test results, and has agreed to move them to an appendix. The document now includes some testing done since WGLC, and will grow to include additional testing that gets performed after issuance of the final codec specification.