Telecommunications Industry Association (TIA) TR-30.1/02-08-110 Waltham, MA, August 6-9, 2002 COMMITTEE CONTRIBUTION Technical Committee TR-30 Meetings SOURCE: Cisco Systems, Inc. TITLE: Proposed data type to represent compressed raw data using Run Length Encoding (RLE) DISTRIBUTION: Members of TR-30.1 CONTACT: Mehryar Garakani Office: Email: +1.805.961.3640 mgarakan@cisco.com ____________________ ABSTRACT This document proposes a compressed format for encoding raw data for MoIP. The company represented by this individual may have patents or published pending patent applications, the use of which may be essential to the practice of all or part of this contribution incorporated in a TIA Publication and the company represented by this individual is willing to grant a license to applicants for such intellectual property contained in this contribution in a manner consistent with 2a) or 2b) of Annex H of the TIA Engineering Manual. COPYRIGHT STATEMENT: The contributor grants a free, irrevocable license to the Telecommunications Industry Association (TIA) to incorporate text or other copyrightable material contained in this contribution and any modifications thereof in the creation of a TIA Publication; to copyright and sell in TIA's name any TIA Publication even though it may include all or portions of this contribution; and at TIA's sole discretion to permit others to reproduce in whole or in part such contributions or the resulting TIA Publication. This contributor will also be willing to grant licenses under such copyrights to third parties on reasonable, non-discriminatory terms and conditions for purpose of practicing a TIA Publication incorporates this contribution. This document has been prepared by the Source Company(s) to assist the TIA Engineering Committee. It is proposed to the Committee as a basis for discussion and is not to be construed as a binding proposal on the Source Company(s). The Source Company(s) specifically reserves the right to amend or modify the material contained herein and nothing herein shall be construed as conferring or offering licenses or rights with respect to any intellectual property of the Source Company(s) other than provided in the copyright statement above. Telecommunications Industry Association Technical Committee TR-30.1 Meeting Waltham, MA, August 6-9, 2002 1 Introduction A compressed format is proposed that would allow a simple yet efficient loss-less representation of raw data for modem over IP. The loss-less representation is based on a Run Length Encoding scheme that is defined in Section 2. The algorithm can result in substantial reduction in amount of raw data required to be transported across IP network. The proposed RLE format can easily be extended to include repeats of any arbitrary bit fields. This is described in Section 3. Definitions: The loss-less algorithm operates on an string of octets, referred to as “primary octet string”. The output of the loss-less algorithm is referred to as “derived octet string”. Escape sequence is defined to be octet 0xFF (8 mark bits). 2 Octet Compressed RLE Encoder for Raw Data Type The Octet compression algorithm works on octets contained on a single MoIP (e.g., SPRT) packet. Each packet would be self contained, i.e., can be decoded independently from any other MoIP packet. The “derived octet string” is generated from the “primary octet string” as follows: 1) A single occurrence of the escape sequence in the “primary octet string” encodes to two occurrence of the escape sequence in the “derived octet string”. 2) Two or more (up to 126) occurrences of the escape sequence in the “primary octet string” encodes to three octets as follows: <esc> n <esc> where <esc> is 0xFF and ‘n’ is an unsigned octet specifying the number of occurrences of <esc> octet. 3) A single or double occurrence of any octet that is not the escape sequence in the “primary octet string” encodes to the same number of occurrences of this octet in the “derived octet string” (i.e., no change in data). 4) Three or more (up to 126) occurrences of any octet that is not the escape sequence in the “primary octet string” encodes to three octets as follows: <esc> n <non-esc octet>. 5) The resulting encoded sub-string from steps 1, 2, and 4 may not cross packet boundaries and should be placed as an atomic unit within a single MoIP packet. This is necessary to ensure that each MoIP packet is self contained and can be independently decoded. Proposed data type to represent compressed raw data using Run Length Encoding (RLE) Page 2 TR30.1/02-08-110 Telecommunications Industry Association Technical Committee TR-30.1 Meeting Waltham, MA, August 6-9, 2002 3 Bit Compressed RLE Encoder for Raw Data Type The above scheme can be extended to cover any repeat of data bit fields. This can be achieved by allowing the following additional “extended sub-string format”, which may be optionally used by the transmitter: <esc> n m o1 o2 o3 o4 … Where <esc> represents the escape sequence 0xFF, N is the number of repeats (up to 126) with most significant bit indicating extended substring format, M is the number of bits for a repeat (up to 255), o1 o2 o3 o4 … represent a sequence of octets containing the associated bits for the repeat pattern padded to the next octet boundary with zero pad. Proposed data type to represent compressed raw data using Run Length Encoding (RLE) Page 3 TR30.1/02-08-110