RTP payload format for SVC

advertisement
D1.2: Understanding the RTP
packetization (encapsulation) of SVC
V 1.2
Contents
1.
Real-Time Protocol (RTP) ...................................................................................... 2
RTP runs on top of UDP ............................................................................................ 2
RTP Example........................................................................................................... 2
RTP and QoS........................................................................................................... 3
RTP Streams ........................................................................................................... 3
RTP Header ............................................................................................................ 3
Real-Time Control Protocol (RTCP) ............................................................................. 3
RTCP Packets .......................................................................................................... 4
Synchronization of Streams ...................................................................................... 5
RTCP Bandwidth Scaling ........................................................................................... 5
2.
RTP Payload Format for H.264 Video ...................................................................... 6
NAL unit type .......................................................................................................... 6
3.
RTP payload format for SVC .................................................................................. 8
NAL unit header: 3 octets extended ........................................................................... 8
References .............................................................................................................. 10
1
1. Real-Time Protocol (RTP)
RTP specifies a packet structure for packets carrying audio and video data: RFC 3550
(replaces RFC 1889).
RTP packet provides
o
o
o
payload type identification
packet sequence numbering
timestamping
RTP runs in the end systems.
RTP packets are encapsulated in UDP segments
Interoperability: If two Internet phone applications run RTP, then they may be able to
work together
RTP runs on top of UDP
RTP libraries provide a transport-layer interface
that extend UDP:
•
port numbers, IP addresses
•
error checking across segment
•
payload type identification
•
packet sequence numbering
•
time-stamping
RTP Example
This is an example from [2].
Consider sending 64 kbps PCM-encoded voice over RTP.
Application collects the encoded data in chunks, e.g., every 20 msec = 160 bytes in a
chunk.
The audio chunk along with the RTP header form the RTP packet, which is encapsulated
into a UDP segment.
RTP header indicates type of audio encoding in each packet; senders can change encoding
during a conference. RTP header also contains sequence numbers and timestamps.
2
RTP and QoS
RTP does not provide any mechanism to ensure timely delivery of data or provide other
quality of service guarantees.
RTP encapsulation is only seen at the end systems -- it is not seen by intermediate
routers.
Routers providing the Internet's traditional best-effort service do not make any special
effort to ensure that RTP packets arrive at the destination in a timely matter.
In order to provide QoS to an application, the Internet most provide a mechanism, such
as RSVP, for the application to reserve network resources.
RTP Streams
RTP allows each source (for example, a camera or a microphone) to be assigned its own
independent RTP stream of packets.
For example, for a videoconference between two participants, four RTP streams
could be opened: two streams for transmitting the audio (one in each direction)
and two streams for the video (again, one in each direction).
However, some popular encoding techniques -- including MPEG1 and MPEG2 -- bundle
the audio and video into a single stream during the encoding process. When the audio
and video are bundled by the encoder, then only one RTP stream is generated in each
direction.
For a many-to-many multicast session, all of the senders and sources typically send
their RTP streams into the same multicast tree with the same multicast address.
RTP Header
Payload Type (7 bits): Used to indicate the type of encoding that is currently being used.
If a sender changes the encoding in the middle of a conference, the sender informs the
receiver through this payload type field.
•
Payload type 31. H.261
•
Payload type 33, MPEG2 video
Sequence Number (16 bits): The sequence number increments by one for each RTP
packet sent; may be used to detect packet loss and to restore packet sequence.
Real-Time Control Protocol (RTCP)
Works in conjunction with RTP.
3
Each participant in an RTP session periodically transmits RTCP control packets to all
other participants. Each RTCP packet contains sender and/or receiver reports that report
statistics useful to the application.
Statistics include number of packets sent, number of packets lost, interarrival jitter, etc.
This feedback of information to the application can be used to control performance and
for diagnostic purposes.
The sender may modify its transmissions based on the feedback.
- For an RTP session there is typically a single multicast address; all RTP and RTCP packets
belonging to the session use the multicast address.
- RTP and RTCP packets are distinguished from each other through the use of distinct port
numbers.
- To limit traffic, each participant reduces his RTCP traffic as the number of conference
participants increases.
RTCP Packets
Receiver report packets:
•
fraction of packets lost, last sequence number, average interarrival jitter.
Sender report packets:
•
SSRC of the RTP stream, the current time, the number of packets sent, and the
number of bytes sent.
Source description packets:
4
•
e-mail address of the sender, the sender's name, the SSRC of the associated RTP
stream. Packets provide a mapping between the SSRC and the user/host name.
Synchronization of Streams
•
RTCP can be used to synchronize different media streams within a RTP session.
•
Consider a videoconferencing application for which each sender generates one RTP
stream for video and one for audio.
•
The timestamps in these RTP packets are tied to the video and audio sampling
clocks, and are not tied to the wall-clock time (i.e., to real time).
•
Each RTCP sender-report packet contains, for the most recently generated packet in
the associated RTP stream, the timestamp of the RTP packet and the wall-clock time
for when the packet was created. Thus the RTCP sender-report packets associate the
sampling clock to the real-time clock.
•
Receivers can use this association to synchronize the playout of audio and video.
RTCP Bandwidth Scaling
RTCP attempts to limit its traffic to 5% of the session bandwidth.
For example, suppose there is one sender, sending video at a rate of 2 Mbps. Then RTCP
attempts to limit its traffic to 100 Kbps.
The protocol gives 75% of this rate, or 75 kbps, to the receivers; it gives the remaining
25% of the rate, or 25 kbps, to the sender.
The 75 kbps devoted to the receivers is equally shared among the receivers. Thus, if
there are R receivers, then each receiver gets to send RTCP traffic at a rate of 75/R kbps
and the sender gets to send RTCP traffic at a rate of 25 kbps.
A participant (a sender or receiver) determines the RTCP packet transmission period by
dynamically calculating the the average RTCP packet size (across the entire session) and
dividing the average RTCP packet size by its allocated rate.
5
2. RTP Payload Format for H.264 Video
All contents of this section is refered from [3]
Internally, the NAL uses NAL units. A NAL unit consists of a one- byte header and the
payload byte string. The header indicates the type of the NAL unit, the (potential) presence
of bit errors or syntax violations in the NAL unit payload, and information regarding the
relative importance of the NAL unit for the decoding process. This RTP payload specifcation
is designed to be unaware of the bit string in the NAL unit payload.
Some concepts should be noted:
Access unit: includes Primary Coded Picture
Time: SEI + RTP Time Sequence
SEI messages
The picture timing SEI message enables carriage of multiple timestamps for the same
coded picture, and therefore the 3:2 pulldown process is perfectly controlled. The
picture
timing SEI message mechanism is necessary because only one timestamp per coded
frame can be
conveyed in the RTP timestamp.
The most impotant concept in this memo is NAL unit types
NAL unit type: 5 last bits in NAL header
6
Table 1. Summary of NAL unit types and their payload structures
Type Packet
Type name
Section
-----------------------------------------------------------------------------------------0
undefned
1-23 NAL unit Single NAL unit packet per H.264
5.6
24
STAP-A
Single-time aggregation packet
5.7.
25
STAP-B
Single-time aggregation packet
5.7.
26
MTAP6
Multi-time aggregation packet
5.7.2
27
MTAP24
Multi-time aggregation packet
5.7.2
28
FU-A
Fragmentation unit
29
FU-B
Fragmentation unit
30-31 undefined



5.8
5.8
-
Packetization Modes
This memo specifes three cases of packetization modes:
o Single NAL unit mode
o Non-interleaved mode
o Interleaved mode
The single NAL unit mode is targeted for conversational systems that comply with
ITU-T Recommendation H.241.
The non-interleaved mode is targeted for conversational systems that may not
comply with ITU-T Recommendation H.241.
In the non-interleaved mode, NAL units are transmitted in NAL unit decoding order.
The interleaved mode is targeted for systems that do not require very low end-to-end
latency. The interleaved mode allows transmission of NAL units out of NAL unit
decoding order.
Fragmentation Units (FUs) -> special, particular
This payload type allows fragmenting a NAL unit into several RTP packets. Doing
so on the application layer instead of relying on lower layer fragmentation (e.g., by
IP) has the following advantages:
o The payload format is capable of transporting NAL units bigger than 64 kbytes
over an IPv4 network that may be present in pre- recorded video, particularly in
High Defnition formats (there is a limit of the number of slices per picture, which
results in a limit of NAL units per picture, which may result in big NAL units).
o
The fragmentation mechanism allows fragmenting a single picture and applying
generic forward error correction as described in section 2.5.
a NAL unit MUST be reassembled in RTP sequence number order.
7
3. RTP payload format for SVC
All contents of this section is refered from [4].
In the SVC case, the base layer is anticipated to conform to a non-scalable profile of
H.264/AVC. The enhancement layers conform to the SVC specification.
NAL unit header: 3 octets extended
like AVC NAL unit header
In H.264/AVC, the NAL unit types 20
and 21 (among others) were reserved
for future extensions. SVC uses these
two NAL unit types and indicates the
presence of this 2nd extend octet
P: priority
D: discardable
E: sign for the 3rd octet
temporal_level (TL) is used to indicate
temporal scalability layer. A layer
consisting of NAL units that carry
pictures with a smaller TL value has a
lower frame rate.
This octet adds more dependency information.
dependency_id (DID) field can be used
to indicate the inter-layer coding
dependency hierarchy. At any temporal
location, a picture of a lower DID value
may be used for inter-layer prediction
for coding of a picture with a higher DID
8
value.
quality_level (QL) indicates the FGS
layer hierarchy. At any temporal
location
and
with
identical
dependency_id value, an FGS picture
with quality_level value equal to QL
uses the FGS picture or base quality
picture (the non-FGS picture when QL-1
=0) with quality_level value equal to
QL-1 for inter-layer prediction. When QL
is larger than 0, the NAL unit contains
an FGS slice or its part.
(Referred from the section 2 (Network abstraction layer) of [4].)
The other sections in this paper mention about some problems in transmitting like: 3 use
cases with problems with firewall pinholes
Payload specific signaling
In its current form, the SVC RTP payload draft (Wenger and Wang, 2005) follows the other
possible avenue, which is a payload specific solution. The draft suggests that each layer (or
group of layers, the draft is not specific in this regard) be transported in its own RTP
session, which is announced/negotiated as an independent SDP media description. From an
SDP and higher protocol viewpoint, all these descriptions appear to be independent media
streams. Their relationship is described in a single SDP attribute, which carries a binary
description of the layering structure in BASE64 format. The content of this attribute is not
accessible to non-SVC-aware mechanisms.
NAL UNIT AGGREGATION AND FRAGMENTATION
Once more, we believe that the difficulty of defining more packet types can most easily be
overcome by disallowing more than one layer in one RTP session. Alternatively, it could be
possible to require that aggregation only be performed with NAL units belonging to a single
layer. Finally, it could be argued that any devices that wish to re-arrange RTP packets must
necessarily be media aware and therefore can be required to look into the media data
themselves.
Hence, no support from a payload viewpoint is required. This is the reason why (Wenger
and Wang, 2005) does not include new aggregation and fragmentation packets.
9
References
[1] RFC 3550 : RTP
[2]
http://connekgroup.net/documents/ebook/computer_science/networking/top_down/PRESEN
TATIONS/CHAPTER6A.PPT
[3] RFC 3984: RTP Payload Format for H.264 Video
[4] RTP payload format for H.264/SVC scalable video coding [draft]
WENGER Stephan, WANG Ye-kui, HANNUKSELA Miska M.
10
Download