Robust Header Compression for Lossy Links

advertisement
Rohc draft contribution
Zhigang Liu, Nokia
September 12, 2000
This document contains a new version of texts on encoding. It is
based on the section 4.5 and section 5.8 as in <draft-ietf-rohc-rtp01.txt>. Section 5.8 is now merged with section 4.5. Following are the
major changes that have been done:




Change the name of "VLE" and "OVLE" to window-based LSB
encoding, and rewrite the two sections related to LSB encoding.
Removed section 4.5.2 and 4.5.3
Added a section on scaled RTP TS encoding
Added a section on offset IP-ID encoding
Bormann (ed.)
[Page 1]
INTERNET-DRAFT
4.5.
Robust Header Compression
July 14, 2000
Encoding methods
This chapter describes in general various encoding methods that can
be used for different header fields. How the methods are applied to
each field (e.g. values of associated parameters) will be specified
in the packet format chapter.
4.5.1. Least Significant Bits (LSB) encoding
Least Significant Bits (LSB) encoding can be used for a header
field whose values are usually subject to small changes. In simplest
terms, k least significant bits of the field value are sent by the
compressor instead of the original field value, where k is a positive
integer. After receiving the k LSBs, the decompressor derives the
original value using a previously received value as reference
(v_ref).
The scheme is guaranteed to be correct if the compressor and the
decompressor agree on an interpretation interval in which 1) the
original value resides and 2) is the only value that has the exact
same k LSBs as transmitted in the compressed packet. (It will be
shown later that this statement can actually be relaxed) The interval
can be described as a function f(v_ref, k). Without loosing
generality, we choose the function as
f(v_ref, k) = [v_ref – p, v_ref + (2^k – 1) - p], where p is an
integer
<-------------- interpretation interval -------------->
|-------------+---------------------------------------|
v_ref – p
v_ref
v_ref + (2^k-1) - p
Note that function f has a good property: for any value k, k LSBs
will uniquely identify a value in the interval. This is referred to
as the uniqueness property of function f, which satisfies the
condition 2) above.
The parameter p is introduced so that the interpretation interval
can be shifted with respect to v_ref. A good value of p can yield
more efficient encoding for fields with certain characteristics. For
example, if a field value observed by the compressor is always
increasing, p should be set to 0, and thus gives an interval [v_ref,
v_ref + 2^k – 1]. Because LSB encoding will be applied in rohc to
header fields whose values normally increases except some uncommon
situations (e.g. packet misordering, RTP TS for video, etc), only two
values will be assigned to p: p1 = 0 or p2 = 2^(k-2) – 1. The latter
gives an interpretation interval [v_ref – (2^(k-2) – 1), v_ref +
3*2^(k-2)] which can handle negative changes but still gives more
Bormann (ed.)
[Page 2]
INTERNET-DRAFT
Robust Header Compression
July 14, 2000
space for the positive changes. See packet format section for more
details.
Following is the procedure of compression and decompression:
1) The compressor (decompressor) always uses as v_ref_c (v_ref_d)
the last value that has been compressed (decompressed);
2) When compressing a value v, the compressor simply calculates the
minimal (for efficiency) but sufficient (for correctness) value
of k such that v falls into the interval (interval_c) as defined
by function f with v_ref = v_ref_c. Let's represent this
procedure as a function k = g(v, v_ref). Note that more than k
LSBs may be sent to fit the packet format (see packet format
section);
3) When receiving m LSBs, the decompressor will derive the
interpretation interval (interval_d) as defined by f with k = m
and v_ref = v_ref_d. Then it picks as the decompressed value the
one in interval_d whose LSBs match the received ones.
The scheme is complicated by two factors: packet loss between the
compressor and decompressor, and transmission errors undetected by
the lower layer. In the former case, the compressor and decompressor
will lose the synchronization of v_ref, and thus the one of the
interpretation interval. If v is still covered by the
intersection(interval_c, interval_d), the decompression will be
correct. Otherwise, incorrect decompression will happen. The next
section will address this issue.
As to the latter case, the corrupted LSB will give incorrectly
decompressed value that will be used later as v_ref, which in turn
will probably lead to damage propagation. This can be solved by the
idea of secure reference, i.e. a reference value whose correctness
can be verified by a CRC calculated over it. Consequently, the
procedure 1) above is modified as following: the compressor always
uses as v_ref_c the last value that has been compressed and sent with
a CRC. The decompressor always uses as v_ref_d the last correctly (as
proven by CRC) decompressed value.
4.5.2
Window-based LSB encoding
The idea is very simple. Although the compressor cannot determine
the exact value of v_ref_d that will be used by decompressor for a
particular value v, it knows the 'candidate' set of values from which
the decompressor may choose the v_ref_d. Obviously, such candidate
set is a subset of all the values that have been sent by the
compressor before v. The compressor then calculates the value of k
such that no matter which v_ref_d the decompressor uses, the resulted
interval_d covers v.
Bormann (ed.)
[Page 3]
INTERNET-DRAFT
Robust Header Compression
July 14, 2000
Because of the order introduced by the fact that the compressor
(decompressor) always uses the last sent (received) value with a CRC
as the reference, the compressor actually maintains a sliding window
(VSW) rather than a set. VSW is initially empty. Below are the
operations performed on VSW by the compressor:
1) After sending a value v (either compressed or uncompressed) with
a CRC, the compressor adds v to the head of the VSW;
2) For each value v being compressed, the compressor chooses k =
max(g(v, v_min), g(v, v_max)), where v_main and v_max are the
minimal and maximal value in VSW, and g is the function defined
in previous section;
3) The sliding window is advanced when the compressor has
sufficient confidence that values that are older (at the tail of
VSW) than a certain value will be no longer used by the
decompressor as v_ref_d. The confidence may be obtained by
various means, e.g., an ACK from the decompressor if operating
in R mode, in which case values older than the ACKed one can be
removed from VSW. In the case of U/O mode, since there is a CRC
to verify correct decompression, a VSW with certain depth will
be used. The window depth is an optimization parameter
determined during implementation. A special case is set it to 1.
Note that the decompressor follows the same procedure as described
in previous section, except that it MUST ACK each value received with
a CRC (refer to compression/decompression logic section for details).
4.5.3 Scaled RTP Timestamp encoding
The RTP Timestamp (RTP TS) will not increase by any arbitrary
number from packet to packet. Instead, the increase is normally an
integral multiple of some unit (TS_STRIDE). For example, in the
case of audio, the sample rate is normally 8Khz and one voice frame
covers 20 ms. Furthermore, each voice frame is usually carried in
one RTP packet. Therefore, the RTP increment is always n * 160 (=
8000 * 0.02), where n is an integer. Note that silence periods has
no impact on this as the sample clock at the source normally keeps
running without changing either frame rate or boundary.
For the case of video, there is a TS_STRIDE as well if we look at
the frame level. The sample rate for most video codec is 90Khz. If
the frame rate is fixed, say, to 30 frames/second, the RTP TS
increment will always be n * 3000 (= 90000 / 30) between frames.
Note that one video frame is normally divided into several RTP
packets to achieve robustness against packet loss. Consequently,
several RTP packets can carry the same RTP TS. However, this does
not affect the usage of TS_STRIDE for RTP TS in packets belonging
to different video frames.
Bormann (ed.)
[Page 4]
INTERNET-DRAFT
Robust Header Compression
July 14, 2000
Therefore, the RTP TS can be downscaled by a factor of TS_STRIDE
before compression. This will save s bits for each compressed RTP
TS, where s = floor(log2(TS_STRIDE)). Following is the detailed
algorithm:
1) Initialization. The compressor sends to the decompressor the
value of TS_STRIDE (e.g. via in-band signalling, see packet
format section) and the absolute value of one RTP TS (TS_0). The
latter will be used by the decompressor to initialize TS_OFFSET
= TS_0 mod TS_STRIDE.
2) After initialization, the compressor no longer compresses the
original RTP TS values. Instead, it compresses the down-scaled
values: TS_scaled = RTP TS / TS_STRIDE. The compression method
could be either window based LSB encoding or the timer-based
encoding described in the next section, or anything else.
3) When receiving the compressed information of TS_scaled, the
decompressor first derives the value of the original TS_scaled.
Then the original RTP TS can be calculated as RTP TS = TS_scaled
* TS_STRIDE + TS_OFFSET.
4) Note that the wrap around of the 32-bit RTP TS will invalidate
the current value of TS_OFFSET used in the equation above. For
example, let's assume TS_STRIDE = 160 = 0xA0 and the current RTP
TS = 0xFFFFFFF0, which means TS_OFFSET = 0x50 = 80. Then if the
next RTP TS = 0x00000130 (i.e. the increment = 160 * 2 = 320),
the new TS_OFFSET should be 0x00000130 mod 0xA0 = 0x90 = 144,
instead of 80 (or 0xA0). Therefore, the decompressor must detect
the wrap round of RTP TS (which is trivial) and updates the
TS_OFFSET.
In general, this method can be applied to many frame-based codecs.
However, the value of TS_STRIDE might (though unlikely) change
during a session. If that happens, the RTP TS has to be compressed
unscaled until the re-initialization of the new TS_STRIDE and
TS_OFFSET is done.
4.5.4 Timer-Based Compression of RTP Timestamp
By the definition of RTP TS [RFC 1889], it (when generated at
source) should closely follow a linear pattern as a function of the
time of day clock, particularly for the case when speech is being
carried in the RTP payload. The linear ratio is basically determined
by the source sample rate (assuming fixed), but may be complicated by
packetization (e.g. in the case of video) or some frame rearrangement (e.g. B-frame in some video codec).
As the example shown in the previous section, with a fixed sample
rate of 8KHz, 20 ms in time domain is equivalent to an increment of 1
in the scaled RTP TS domain, or 160 in the unscaled RTP TS domain.
Bormann (ed.)
[Page 5]
INTERNET-DRAFT
Robust Header Compression
July 14, 2000
Consequently, the (scaled) RTP TS in headers coming into the
decompressor also follow a linear pattern as a function of time, but
less closely, due to the delay jitter between the source and the
decompressor. In normal operation (no crashes or failures), the delay
jitter is bounded, to meet the requirements of conversational realtime traffic. Hence, by using a local clock to measure packet
arrival time, the decompressor can obtain an approximation of the
(scaled) RTP TS in the header to be decompressed. The approximation
is then refined with the k LSBs of the (scaled) RTP TS carried in the
header. The required value of k to ensure correct decompression is a
function of the jitter between the source and decompressor. The
compressor can estimate the jitter (using a local clock) and
determine k, or alternatively it can have a fixed k, and filter out
the packets with excessive jitter.
The advantages to this scheme:


The size of the compressed RTP TS is constant and small. In
particular, it does NOT depend on the length of the silence
interval. This is in contrast to other RTP TS compression
techniques, which require a variable number of bits dependent
on the duration of the preceding silence interval.
No synchronization is required between the two clocks: one
local to the compressor and the other to the decompressor.
Note that although this scheme can work in theory with both scaled
and unscaled RTP TS, it is preferable in practice to be combined with
scaled RTP TS encoding because of the less demanding requirement on
the clock resolution (e.g. 20 ms instead of 1/8 ms). Therefore, the
algorithm described below assumes that the scheme works on top of
scaled RTP TS. (The case of unscaled RTP TS will be very similar,
with only slightly changes on scale factors)
Compressor: major task is to determine the value of k.
1) The compressor maintains a sliding window TSW = {(T_j, t_j) |
for each header j sent previously with a CRC}, where T_j = the
scaled RTP TS for header j, t_j = the arrival time of header j.
The TSW has the same purpose as VSW (section 4.5.2).
2) When a new header n arrives with T_n as the scaled RTP TS, the
compressor measures the arrival time t_n. Then it calculates
Max_Jitter_BC = max {|(T_n – T_j) – (t_n – t_j) / TIME_STRIDE|},
where TIME_STRIDE is a period of real time (e.g. 20 ms) that is
equivalent to one TS_STRIDE. Max_Jitter_BC is the maximum jitter
before the compressor (in units of TS_STRIDE), using all the
headers in the sliding window as reference.
3) K is then calculated as: k = ceiling(log2(2 * J + 1), where J =
Max_Jitter_BC + Max_Jitter_CD + 2.
Max_Jitter_CD is the upper bound of jitter expected on the
communication channel between compressor and decompressor (CD-
Bormann (ed.)
[Page 6]
INTERNET-DRAFT
Robust Header Compression
July 14, 2000
CC). It depends only on the characteristics of the CD-CC and is
expected to be reasonably small in order to have good quality
for real-time services.
The factor of 2 is to account for the quantization error caused
by the clock at the compressor and decompressor, which can be
+/- 1.
Note that the calculation of k follows the same compression
algorithm as described in section 4.5.1, with p = 2^(k-1) – 1.
4) The pair (T_n, t_n) will be added to TSW if header n is sent
with a CRC. The advance of TSW is same as described in section
4.5.2.
Decompressor:
1) It always uses as reference header the last correctly (as
verified by CRC) decompressed header. Maintain the pair (T_ref,
t_ref), where T_ref = the scaled RTP TS in the reference header,
t_ref = the arrival time of the reference header.
2) When receiving a compressed header n at time t_n, it calculates
the approximation of the original scaled TS as: T_approx = T_ref
+ (t_n – t_ref) / TIME_STRIDE.
3) The approximation is then refined by the k LSBs carried in
header n, following the same decompression algorithm as in
section 4.5.1, with p = 2^(k-1) – 1.
Note that the algorithm does not assume any special behavior of the
input to the compressor (i.e. it tolerates reordering, or more
generally, non-increasing RTP timestamp behavior observed prior to
the compressor). Besides, the clock resolution is allowed to be worse
than TIME_STRIDE, in which case the difference (i.e. actual
resolution – TIME_STRIDE) can be considered as an additional jitter
in the calculation of k.
4.5.5
Offset IP-ID encoding
This section assumes that the Ipv4 stack at the source host assigns
IP-ID to the value of a 2-byte counter which is always increased by
one after each assignment. Therefore, the IP-ID field of a particular
Ipv4 packet flow will be sequential (with increment of 1 from packet
to packet) except when the sequence is disrupted by other flows.
Consequently, the observation is that RTP SN increases by 1 for each
packet and the IP-ID increases by at least the same amount. Thus, it
is more efficient to compress the offset, i.e. (IP-ID – RTP SN), in
stead of IP-ID itself.
Bormann (ed.)
[Page 7]
INTERNET-DRAFT
Robust Header Compression
July 14, 2000
The following text describes how to compress/decompress the sequence
of offsets using window-based LSB encoding/decoding, with p = p2 (see
section 4.5.1).
Compressor:
Maintaining a sliding window of IP-ID-offsets, W = {offset_i = ID_i
– SN_i | for each header i sent out with CRC}, ID_i and SN_i are
the values of IP-ID and RTP SN in header i, respectively.
When compressing packet n during FO state, calculates the offset_n
= ID_n – SN_n, then compresses offset_n using window-based LSB
encoding, with W as the context.
Decompressor:
Reference header = the last correctly (as verified by CRC)
decompressed header.
When receiving a compressed packet m, it calculates the offset_ref
= ID_ref – SN_ref, where ID_ref and SN_ref are the values of IP-ID
and RTP SN in the reference header, respectively. Then, it
decompresses the IP-ID offset_m following the window-based LSB
decoding, using the LSBs in the packet m and offset_ref as the
decompression context.
Finally, it regenerates IP-ID for packet m = RTP SN in packet m +
decompressed offset_m.
Note that some Ipv4 stacks use little endian for the IP-ID field,
instead of big endian (the network byte order). In that case, the
compressor can compress IP-ID field after swapping the bytes.
Consequently, the decompressor will also swap the bytes after
decompression to regenerate the original IP-ID. This trick works only
if the compressor and the decompressor synchronize on the byte order
of IP-ID field (see header format section).
Bormann (ed.)
[Page 8]
Download