An Alternative Complexity Model for the MPEG

advertisement
An Alternative Complexity Model for the MPEG-4 Video Verifier Mechanism
João Valentim, Paulo Nunes, Fernando Pereira
Instituto Superior Técnico (IST) – Instituto de Telecomunicações
Av. Rovisco Pais, 1049001 Lisboa, Portugal
Phone: +351 21 841 84 60; Fax: +351 21 841 84 72
e-mail: {joao.valentim, paulo.nunes, fernando.pereira}@lx.it
ABSTRACT
MPEG-4 is the first object-based audiovisual coding standard.
To control the minimum decoding complexity resources required
at the decoder, the MPEG-4 Visual standard defines the socalled Video Complexity Verifier (VCV). This paper proposes
an alternative VCV model, based on a set of relative macroblock
(MB) complexity weights assigned to the various MB coding
types used in MPEG-4 video coding. The new VCV model
allows a more efficient use of the available decoding resources
by preventing the over-evaluation of the decoding complexity of
certain MB types and thus making possible to encode scenes (for
the same profile@level decoding resources) which otherwise
would be considered too requiring.
1.
INTRODUCTION
In MPEG-4, the first object-based audiovisual coding standard,
the various video objects composing a scene may vary in size
along time and may be encoded at different temporal rates using
different MB coding types. To limit the decoding complexity of
the corresponding bitstreams, it is then necessary to put some
limits on the variability of the number and type of MB/s as well
as on the bitrate and on the picture memory required to store the
decoded data. These limits are specified in the MPEG-4 Visual
standard [1], through the Video Buffering Verifier (VBV)
mechanism; within this mechanism, the Video Complexity
Verifier (VCV) model deals with the limits related to the
decoding complexity. Although the decoding complexity
associated with the various MB coding tools varies quite a lot,
the MPEG-4 VCV model only distinguishes the boundary MBs
(B-VCV buffer) from all other types of MBs (VCV buffer
including all MBs). This way, different MB coding types with
very different levels of complexity are treated exactly in the same
way by the VCV model, e.g. transparent and opaque
macroblocks. This represents an over-dimensioning of the
decoding complexity for some MBs and thus a very inefficient
use of the available decoding resources.
This paper proposes an alternative model to the current MPEG-4
VCV model [1] exploring the relative decoding complexity of
the various MB coding types used in MPEG-4 video coding.
This model is based on a closer estimate of the actual decoding
complexity of the various video objects composing a scene and
thus allows a much better use of the decoding resources which
may be a critical factor in applications environments where
resources are scarce and expensive, such as mobile and smart
card applications.
2.
RELATIVE MACROBLOCK
COMPLEXITY WEIGHTS
In order to define a more efficient VCV model, it is necessary to
have a credible measure of the relative decoding complexity for
the various MB coding types. In [2], the decoding time of each
MB type obtained with an optimized version of the MoMuSys
decoder [3] for several representative MPEG-4 test sequences
and different profile@level combinations has been used as the
decoding complexity measure. The complexity results confirm
that the decoding complexity of the various types of nonboundary MBs is not the same, as assumed by the current
MPEG-4 VCV model. The fact that the MPEG-4 VCV model
does not distinguish the various types of non-boundary MBs in
terms of decoding complexity, implicitly requires that the
MPEG-4 decoders are designed to deal with the most critical
case, this means always with the most complex MB coding type
for each profile in question.
Taking this fact into account, the complexity weights must be
defined relatively to the most complex MB type in the context of
each profile, i.e., the maximum complexity weight is set to 1 for
this MB type and all the other weights are relative to this one
and thus less than 1. This solution allows the implementation of
a “trading system”, where it is possible, for example, to trade one
of the most complex MBs by two MBs with half the relative
complexity, while still maintaining the bitstream decodable by a
compliant decoder, this means without having to require higher
decoding resources.
The relative complexity weight for each MB complexity type is
obtained as the ratio between the maximum decoding time for
the considered type (this is a conservative solution since most of
the times the MBs for that type will be less complex) and the
higher maximum decoding time from all the MB types relevant
for the profile in question: the Inter4V+InterCAE type for the
Core profile and the Inter4V type for the Simple Profile [2].
Table 1 shows the relative decoding complexity weights
assigned to the various MB coding classes as defined in
[2].
Table 1 – Relative MB decoding complexity weights.
MB
complexity
class (Cj)
C1
C2
C3
C4
MB coding types in each
complexity class
Inter4V+InterCAE
Inter+InterCAE
Inter4V+IntraCAE
Inter+IntraCAE
Intra+IntraCAE
Inter4V+NoUpdate
Inter+NoUpdate
Intra+NoUpdate
Inter4V+Opaque
Inter+Opaque
Intra+Opaque
C5
Skipped+InterCAE
C6
Skipped+IntraCAE
C7
Skipped+NoUpdate
C8
Skipped+Opaque
C9
Transparent
C10
Inter4V (only rect. VO)
Inter (only rect. VO)
Intra (only rect. VO)
Skipped (only rect. VO)
C11
C12
Relative
complexity weight
(kj)
Simple
Profile
Core
Profile
–
1.00
–
0.88
–
0.77
12
Mi 
–
0.70
–
–
–
–
–
0.40
1.00
0.66
0.89
0.59
0.13
0.09
0.12
IST VCV MODEL: A MORE EFFICIENT
SOLUTION
Using the relative MB decoding complexity weights presented in
[2], this paper proposes an alternative, more efficient, VCV
model: the IST VCV model. This model is based on a single
buffer with a single decoder rate using different MB complexity
weights for the various MB complexity classes [4]:

Complexity model based on the MB coding tools – The
distinction in terms of decoding complexity between the
various MBs is associated to the different MB texture and
shape coding tools used, i.e., the MB complexity classes are
related to a texture-shape tool combination for which a
relative complexity weight is measured [2].

Single buffer with relative MB complexity weights – In
the proposed model, a single buffer stores all the coded
MBs, but each MB is weighted according to its complexity
class. Thus, the IST VCV buffer occupancy corresponds to
a weighted sum of the number of coded MBs.

Single decoding rate – The use of a single buffer with MB
complexity weights implies a single decoding rate. The IST
j
 Mc j
(1)
where kj is the complexity weight associated to the MB class j
and Mcj is the number of MBs in VOP i belonging to class j. The
time that takes to decode VOP i is then given by equation (2)
tdi 
0.21
0.12
k
j 1
0.32
The weights presented above have been defined in a rather
conservative way, by using the most complex coding type within
each MB complexity class, in order to stay within the limits even
if there is some variation due to different implementation
platforms and optimizations.
3.
and MPEG-4 VCV decoding rates are the same making
possible to compare the two models in a simple way, since
the decoding computational resources remain the same.
The main advantage of the IST VCV solution, relatively to the
MPEG-4 VCV solution, is to model more closely the real
decoding complexity of a given set of bitstreams building a
visual scene, since the different types of MBs are distinguished
in terms of decoding complexity and thus decoding resources are
not wasted due to the “killing” assumption that all MBs beside
boundary MB are equally and maximally difficult (and there are
big variations as shown in Table 1.
For the IST VCV model proposed in this paper, the number of
equivalent MBs for a given Video Object Plane (VOP) i, Mi, that
is added to the VCV buffer at each decoding time instant, ti, is
given by the expression (1)
Mi
H
(2)
where Mi is the equivalent number of VOP i MBs, given by (1),
and H is the VCV decoding rate for the profile@level in
question. The interval of time where VOP i is being decoded
extends from time instant si to time instant ei, which are defined
by the expressions
VCV (ti )
VCV (ti )  M i
, ei  ti 
(3)
H
H
where ti is the VOP i decoding time, VCV(ti) is the VCV
occupancy when the VOP i MBs, Mi, are added to the VCV and
H is the VCV decoding rate for the profile@level in question.
Since the IST VCV decoding rate and buffer size are unchanged
relatively to the MPEG-4 VCV model for each profile@level, a
direct comparison between the two models can easily be done,
because the decoder computational resources are maintained.
Beside the direct comparison of the VCV models, it is necessary
to assure that the limits on the B-VCV model accumulating only
the boundary MBs are not exceeded, if precisely the same
resources decoding are to be used (this has been checked but for
simplicity the charts will not be included in the figures of next
section). A comparison between the two VCV models will be
presented in the next section.
si  ti 
4.
COMPARISON BETWEEN THE IST AND
THE MPEG-4 VCV MODELS
The ideal way to compare and validate the IST VCV model
relatively to the MPEG-4 VCV model would be by decoding
bitstreams which, for a given profile@level, would violate the
MPEG-4 VCV model but not the IST VCV model, and showing
that these scenes could be decoded in due time by a compliant
MPEG-4 decoder. This would show that the existing
profile@level decoding resources are enough and thus that the
MPEG-4 VCV model wastes these resources (due to complexity
over-dimensioning) when prevents those bitstream from being
classified as compliant to the profile@level in question.
However, this comparison and validation can only be done with
a real-time decoder which it is not available [3]. Thus, the
comparison between the two VCV models was done by
comparing the occupancy of the two VCV buffers and thus the
effects of the proposed approach under the assumption that the
measured complexity weights are acceptable (the authors believe
that the weights are conservative meaning that the real
complexity is even lower that the IST VCV complexity). To
perform this comparison an encoder with only the VBV (bitrate)
rate control in action was used. The feedback mechanism that
prevents the violation of the VCV and VMV (memory) models
has been disabled in order to allow the visualization of the
corresponding buffer occupancy evolution even if it is above
100% occupancy. In this case, the coded bitstreams are the same
for both models, but the VCV MB decoding complexity
evaluation is done differently, depending on the considered
model. This comparison methodology allows to verify that, in
many situations, the MPEG-4 VCV model exceeds the 100%
buffer occupancy while the IST VCV does not exceed that limit.
This means that the use of IST VCV model would allow, for a
given profile@level, to encode in a “compliant way” (meaning
using the same decoding resources) video scenes that the
MPEG-4 VCV model would not allow due to its clear overdimensioning of MB decoding complexity for certain MB
coding types. This is true when similar spatial resolutions and
temporal VOP rates are used, e.g., no VOP skipping.
4.1
4.2
Scenes with several arbitrarily shaped objects
To make a rigorous comparison between the MPEG-4 VCV and
the IST VCV in scenes with arbitrarily shaped objects, e.g., for
the Core profile, the restriction imposed by the MPEG-4 B-VCV
requiring that the number of boundary MBs for each decoding
time is not greater than half the B-VCV capacity must be
considered. This means that, from a complexity point of view,
the MPEG-4 VCV worst-case scenario corresponds to the case
where the MBs are 50% of the most complex non-boundary MB
type (Inter4V+Opaque) and 50% of the most complex boundary
MB type (Inter4V + InterCAE). To accommodate this case, and
only for the purpose of the comparison between the two VCV
models, the relative complexity weights have to be changed
using as reference the average time between the maximum
decoding time of the Inter4V+InterCAE and Inter4V+Opaque
types, and not the Inter4V+InterCAE type proposed in the IST
VCV model which should be used if the MPEG-4 B-VCV did
not exist.
Figure 2 shows the MPEG-4 and IST VCV occupancies for the
MPEG-4 test sequence Coastguard, with 4 video objects (VOs),
in QCIF format, encoded at 30 fps with Core Profile@Level 1
(CP@L1) at 384 kbit/s.
Scenes with one rectangular object
Figure 1 shows the MPEG-4 and IST VCV occupancies for the
MPEG-4 test sequence Akiyo, rectangular, in QCIF format,
encoded at 15 fps with Simple Profile@Level 1 (SP@L1) at 64
kbit/s.
Figure 2 – MPEG-4 and IST VCV occupancy: Coastguard,
QCIF, 30 fps, CP@L1, 384 kbit/s.
Figure 1 – MPEG-4 and IST VCV occupancy: Akiyo, QCIF, 15
fps, SP@L1, 64 kbit/s.
As can be seen on the chart, the MPEG4 VCV occupancy is
always 100%, which means that the coded bitstream is MPEG-4
compliant for the given profile@level (according to the MPEG-4
VCV model). If the IST VCV model is used instead, the same
sequence can also be “compliantly” encoded with SP@L1
profile, because the VCV buffer occupancy stays between 25%
and 50% during the encoding process. But because of the low
IST VCV occupancy, this sequence could be encoded at a higher
framerate, e.g., 25 frames/s, due to the low complexity associated
with some MB types. The high MPEG-4 VCV occupation is due
to the complexity over-estimation of some MB types, mainly the
Skipped MBs for rectangular objects in this case.
The figure shows that the MPEG-4 VCV buffer overflows, and
thus the bitstream is not compliant. The IST VCV model shows
that the scene is not too complex to be coded at the considered
profile@level, since the VCV occupancy is around 60% during
the encoding process. The MPEG-4 VCV occupancy peak that
can be seen in Figure 2 is caused by a great number of
transparent MBs that appear in those VOPs. Since in the IST
VCV model, transparent MBs have a low relative computational
weight (0.12), the peak is attenuated and the IST VCV buffer is
not exceeded.
It is important to notice that the number of transparent MBs in a
scene strongly influences the MPEG-4 VCV performance.
Figure 3 shows one image of the Children_and_Coastguard test
sequence, 3 VOs, in QCIF format, encoded at 30 fps with
CP@L1 at 384 kbit/s. The “Children” and “Flag” object
bounding boxes have a high number of transparent MBs and
overlap in the scene, which contributes to the high MPEG-4
VCV occupancy as can be seen in Figure 4.
Figure 3 – Sample of the Children_and_Flag sequence.
Figure 5 shows the MPEG-4 and IST VCV occupancies for the
MPEG-4 test sequence News, 4 VOs, CIF format, encoded at 30
fps, CP@L2 at 2000 kbit/s. The MPEG-4 VCV buffer capacity
is largely exceeded during the coding process. On the other
hand, the IST VCV occupancy is always around 40%, which
shows that this scene can be encoded in CP@L2. The influence
of transparent MBs in the MPEG-4 VCV is easily verified in this
example. The number of boundary and opaque MBs stays
approximately constant along the scene, while the number of
transparent MBs oscillates between two (rather high and similar)
values. When the number of transparent MBs increases, there is
a corresponding increase in the MPEG-4 VCV occupancy, and
when the number of transparent MBs decreases, the MPEG-4
VCV occupancy decreases. On the other hand, the IST VCV
buffer occupancy stays approximately constant, because of the
low complexity weight that transparent MBs have on this model.
Of course, the differences are not only related to the transparent
MBs but this is the most critical case as may be seen from the
weights in Table 1.
5.
Figure 4 – MPEG-4 and IST VCV occupancy:
Children_and_Flag, QCIF, 30 fps, CP@L1, 384 kbit/s.
Notice that although this scene only has 3 video objects and the
majority of the MBs are transparent (58% transparent, 41%
boundary and 1% opaque), it cannot be coded in CP@L1 using
the MPEG-4 VCV model. The MPEG-4 VCV buffer occupancy
is always very high, due to the high number of transparent MBs
in the “Children” and “Flag” bounding boxes. The occupancy
increases when the “MPEG-4 Logo” object appears, leading to a
non-compliant set of bitstreams because the VCV occupancy
exceeds 100%. On the same Figure, it can be seen that the IST
VCV buffer occupancy is always under 50%, because the
transparent MBs complexity weight is rather low allowing the
scene to be “compliantly” encoded with CP@L1. The term
“compliant” means here that maintaining the standardized
decoding resources, the bitstream should be able to be decoded
within the necessary timing limits since it is not really more
complex than other MPEG-4 “officially” compliant bitstreams.
Another example that shows the weakness of the MPEG-4 VCV
model in the presence of transparent MBs and the effectiveness
of IST VCV model is shown in Figure 5.
This paper proposes an alternative Video Complexity Verifier
model, based on a set of relative MB complexity weights
assigned to the various MB coding types used in MPEG-4 video
coding. These weights allow to more reliably measure the real
decoding complexity of a given MPEG-4 encoded scene.
Complexity measurements show that the MPEG-4 VCV model
over-estimates the decoding complexity of some scenes notably
because some MB types, such as the transparent and skipped
MBs, are over-evaluated (not distinguished from the real
complex ones) in terms of decoding complexity. On the other
hand, the IST VCV model allows the encoding of many of the
scenes considered too complex by the MPEG-4 VCV model, for
a given profile@level. These scenes can be decoded by a
compliant decoder without changing the decoding resources, and
thus making a better use of these resources. The efficient use of
decoding resources is very important, mainly in applications
where they are scarce and expensive, e.g., mobile terminals.
Mobile applications should be among the first where MPEG-4
will “explode” , as demonstrated by the recent adoption by 3GPP
(3rd Generation Partnership Project), responsible for the UMTS
specification, of the MPEG-4 Visual standard for video coding.
6.
[1]
[2]
[3]
[4]
Figure 5 – MPEG-4 and IST VCV occupancy: News, CIF, 30
fps, CP@L2, 2000 kbit/s.
FINAL REMARKS
REFERENCES
ISO/IEC 14496-2: 1999, “Information Technology –
Coding of Audio-visual Objects – Part 2: Visual”.
J. Valentim, P. Nunes, F. Pereira, “Evaluating MPEG-4
Video Decoding Complexity”, 2nd Workshop and
Exhibition on MPEG-4, San Jose – USA, June 2001.
J. Valentim, P. Nunes, L. Ducla-Soares, F. Pereira, “IST
MPEG-4 Video Compliant Framework”. Doc. ISO/IEC
JTC1/SC29/WG11 MPEG2000/M5844, Noordwijkerhout
MPEG Meeting, March 2000.
P. Nunes, F. Pereira, “MPEG-4 Compliant Video
Encoding: Analysis and Rate Control Strategies”,
Proceedings of the ASILOMAR 2000 Conference, Pacific
Grove – CA, USA, October 2000.
Download