An Alternative Complexity Model for the MPEG-4 Video Verifier Mechanism João Valentim, Paulo Nunes, Fernando Pereira Instituto Superior Técnico (IST) – Instituto de Telecomunicações Av. Rovisco Pais, 1049001 Lisboa, Portugal Phone: +351 21 841 84 60; Fax: +351 21 841 84 72 e-mail: {joao.valentim, paulo.nunes, fernando.pereira}@lx.it ABSTRACT MPEG-4 is the first object-based audiovisual coding standard. To control the minimum decoding complexity resources required at the decoder, the MPEG-4 Visual standard defines the socalled Video Complexity Verifier (VCV). This paper proposes an alternative VCV model, based on a set of relative macroblock (MB) complexity weights assigned to the various MB coding types used in MPEG-4 video coding. The new VCV model allows a more efficient use of the available decoding resources by preventing the over-evaluation of the decoding complexity of certain MB types and thus making possible to encode scenes (for the same profile@level decoding resources) which otherwise would be considered too requiring. 1. INTRODUCTION In MPEG-4, the first object-based audiovisual coding standard, the various video objects composing a scene may vary in size along time and may be encoded at different temporal rates using different MB coding types. To limit the decoding complexity of the corresponding bitstreams, it is then necessary to put some limits on the variability of the number and type of MB/s as well as on the bitrate and on the picture memory required to store the decoded data. These limits are specified in the MPEG-4 Visual standard [1], through the Video Buffering Verifier (VBV) mechanism; within this mechanism, the Video Complexity Verifier (VCV) model deals with the limits related to the decoding complexity. Although the decoding complexity associated with the various MB coding tools varies quite a lot, the MPEG-4 VCV model only distinguishes the boundary MBs (B-VCV buffer) from all other types of MBs (VCV buffer including all MBs). This way, different MB coding types with very different levels of complexity are treated exactly in the same way by the VCV model, e.g. transparent and opaque macroblocks. This represents an over-dimensioning of the decoding complexity for some MBs and thus a very inefficient use of the available decoding resources. This paper proposes an alternative model to the current MPEG-4 VCV model [1] exploring the relative decoding complexity of the various MB coding types used in MPEG-4 video coding. This model is based on a closer estimate of the actual decoding complexity of the various video objects composing a scene and thus allows a much better use of the decoding resources which may be a critical factor in applications environments where resources are scarce and expensive, such as mobile and smart card applications. 2. RELATIVE MACROBLOCK COMPLEXITY WEIGHTS In order to define a more efficient VCV model, it is necessary to have a credible measure of the relative decoding complexity for the various MB coding types. In [2], the decoding time of each MB type obtained with an optimized version of the MoMuSys decoder [3] for several representative MPEG-4 test sequences and different profile@level combinations has been used as the decoding complexity measure. The complexity results confirm that the decoding complexity of the various types of nonboundary MBs is not the same, as assumed by the current MPEG-4 VCV model. The fact that the MPEG-4 VCV model does not distinguish the various types of non-boundary MBs in terms of decoding complexity, implicitly requires that the MPEG-4 decoders are designed to deal with the most critical case, this means always with the most complex MB coding type for each profile in question. Taking this fact into account, the complexity weights must be defined relatively to the most complex MB type in the context of each profile, i.e., the maximum complexity weight is set to 1 for this MB type and all the other weights are relative to this one and thus less than 1. This solution allows the implementation of a “trading system”, where it is possible, for example, to trade one of the most complex MBs by two MBs with half the relative complexity, while still maintaining the bitstream decodable by a compliant decoder, this means without having to require higher decoding resources. The relative complexity weight for each MB complexity type is obtained as the ratio between the maximum decoding time for the considered type (this is a conservative solution since most of the times the MBs for that type will be less complex) and the higher maximum decoding time from all the MB types relevant for the profile in question: the Inter4V+InterCAE type for the Core profile and the Inter4V type for the Simple Profile [2]. Table 1 shows the relative decoding complexity weights assigned to the various MB coding classes as defined in [2]. Table 1 – Relative MB decoding complexity weights. MB complexity class (Cj) C1 C2 C3 C4 MB coding types in each complexity class Inter4V+InterCAE Inter+InterCAE Inter4V+IntraCAE Inter+IntraCAE Intra+IntraCAE Inter4V+NoUpdate Inter+NoUpdate Intra+NoUpdate Inter4V+Opaque Inter+Opaque Intra+Opaque C5 Skipped+InterCAE C6 Skipped+IntraCAE C7 Skipped+NoUpdate C8 Skipped+Opaque C9 Transparent C10 Inter4V (only rect. VO) Inter (only rect. VO) Intra (only rect. VO) Skipped (only rect. VO) C11 C12 Relative complexity weight (kj) Simple Profile Core Profile – 1.00 – 0.88 – 0.77 12 Mi – 0.70 – – – – – 0.40 1.00 0.66 0.89 0.59 0.13 0.09 0.12 IST VCV MODEL: A MORE EFFICIENT SOLUTION Using the relative MB decoding complexity weights presented in [2], this paper proposes an alternative, more efficient, VCV model: the IST VCV model. This model is based on a single buffer with a single decoder rate using different MB complexity weights for the various MB complexity classes [4]: Complexity model based on the MB coding tools – The distinction in terms of decoding complexity between the various MBs is associated to the different MB texture and shape coding tools used, i.e., the MB complexity classes are related to a texture-shape tool combination for which a relative complexity weight is measured [2]. Single buffer with relative MB complexity weights – In the proposed model, a single buffer stores all the coded MBs, but each MB is weighted according to its complexity class. Thus, the IST VCV buffer occupancy corresponds to a weighted sum of the number of coded MBs. Single decoding rate – The use of a single buffer with MB complexity weights implies a single decoding rate. The IST j Mc j (1) where kj is the complexity weight associated to the MB class j and Mcj is the number of MBs in VOP i belonging to class j. The time that takes to decode VOP i is then given by equation (2) tdi 0.21 0.12 k j 1 0.32 The weights presented above have been defined in a rather conservative way, by using the most complex coding type within each MB complexity class, in order to stay within the limits even if there is some variation due to different implementation platforms and optimizations. 3. and MPEG-4 VCV decoding rates are the same making possible to compare the two models in a simple way, since the decoding computational resources remain the same. The main advantage of the IST VCV solution, relatively to the MPEG-4 VCV solution, is to model more closely the real decoding complexity of a given set of bitstreams building a visual scene, since the different types of MBs are distinguished in terms of decoding complexity and thus decoding resources are not wasted due to the “killing” assumption that all MBs beside boundary MB are equally and maximally difficult (and there are big variations as shown in Table 1. For the IST VCV model proposed in this paper, the number of equivalent MBs for a given Video Object Plane (VOP) i, Mi, that is added to the VCV buffer at each decoding time instant, ti, is given by the expression (1) Mi H (2) where Mi is the equivalent number of VOP i MBs, given by (1), and H is the VCV decoding rate for the profile@level in question. The interval of time where VOP i is being decoded extends from time instant si to time instant ei, which are defined by the expressions VCV (ti ) VCV (ti ) M i , ei ti (3) H H where ti is the VOP i decoding time, VCV(ti) is the VCV occupancy when the VOP i MBs, Mi, are added to the VCV and H is the VCV decoding rate for the profile@level in question. Since the IST VCV decoding rate and buffer size are unchanged relatively to the MPEG-4 VCV model for each profile@level, a direct comparison between the two models can easily be done, because the decoder computational resources are maintained. Beside the direct comparison of the VCV models, it is necessary to assure that the limits on the B-VCV model accumulating only the boundary MBs are not exceeded, if precisely the same resources decoding are to be used (this has been checked but for simplicity the charts will not be included in the figures of next section). A comparison between the two VCV models will be presented in the next section. si ti 4. COMPARISON BETWEEN THE IST AND THE MPEG-4 VCV MODELS The ideal way to compare and validate the IST VCV model relatively to the MPEG-4 VCV model would be by decoding bitstreams which, for a given profile@level, would violate the MPEG-4 VCV model but not the IST VCV model, and showing that these scenes could be decoded in due time by a compliant MPEG-4 decoder. This would show that the existing profile@level decoding resources are enough and thus that the MPEG-4 VCV model wastes these resources (due to complexity over-dimensioning) when prevents those bitstream from being classified as compliant to the profile@level in question. However, this comparison and validation can only be done with a real-time decoder which it is not available [3]. Thus, the comparison between the two VCV models was done by comparing the occupancy of the two VCV buffers and thus the effects of the proposed approach under the assumption that the measured complexity weights are acceptable (the authors believe that the weights are conservative meaning that the real complexity is even lower that the IST VCV complexity). To perform this comparison an encoder with only the VBV (bitrate) rate control in action was used. The feedback mechanism that prevents the violation of the VCV and VMV (memory) models has been disabled in order to allow the visualization of the corresponding buffer occupancy evolution even if it is above 100% occupancy. In this case, the coded bitstreams are the same for both models, but the VCV MB decoding complexity evaluation is done differently, depending on the considered model. This comparison methodology allows to verify that, in many situations, the MPEG-4 VCV model exceeds the 100% buffer occupancy while the IST VCV does not exceed that limit. This means that the use of IST VCV model would allow, for a given profile@level, to encode in a “compliant way” (meaning using the same decoding resources) video scenes that the MPEG-4 VCV model would not allow due to its clear overdimensioning of MB decoding complexity for certain MB coding types. This is true when similar spatial resolutions and temporal VOP rates are used, e.g., no VOP skipping. 4.1 4.2 Scenes with several arbitrarily shaped objects To make a rigorous comparison between the MPEG-4 VCV and the IST VCV in scenes with arbitrarily shaped objects, e.g., for the Core profile, the restriction imposed by the MPEG-4 B-VCV requiring that the number of boundary MBs for each decoding time is not greater than half the B-VCV capacity must be considered. This means that, from a complexity point of view, the MPEG-4 VCV worst-case scenario corresponds to the case where the MBs are 50% of the most complex non-boundary MB type (Inter4V+Opaque) and 50% of the most complex boundary MB type (Inter4V + InterCAE). To accommodate this case, and only for the purpose of the comparison between the two VCV models, the relative complexity weights have to be changed using as reference the average time between the maximum decoding time of the Inter4V+InterCAE and Inter4V+Opaque types, and not the Inter4V+InterCAE type proposed in the IST VCV model which should be used if the MPEG-4 B-VCV did not exist. Figure 2 shows the MPEG-4 and IST VCV occupancies for the MPEG-4 test sequence Coastguard, with 4 video objects (VOs), in QCIF format, encoded at 30 fps with Core Profile@Level 1 (CP@L1) at 384 kbit/s. Scenes with one rectangular object Figure 1 shows the MPEG-4 and IST VCV occupancies for the MPEG-4 test sequence Akiyo, rectangular, in QCIF format, encoded at 15 fps with Simple Profile@Level 1 (SP@L1) at 64 kbit/s. Figure 2 – MPEG-4 and IST VCV occupancy: Coastguard, QCIF, 30 fps, CP@L1, 384 kbit/s. Figure 1 – MPEG-4 and IST VCV occupancy: Akiyo, QCIF, 15 fps, SP@L1, 64 kbit/s. As can be seen on the chart, the MPEG4 VCV occupancy is always 100%, which means that the coded bitstream is MPEG-4 compliant for the given profile@level (according to the MPEG-4 VCV model). If the IST VCV model is used instead, the same sequence can also be “compliantly” encoded with SP@L1 profile, because the VCV buffer occupancy stays between 25% and 50% during the encoding process. But because of the low IST VCV occupancy, this sequence could be encoded at a higher framerate, e.g., 25 frames/s, due to the low complexity associated with some MB types. The high MPEG-4 VCV occupation is due to the complexity over-estimation of some MB types, mainly the Skipped MBs for rectangular objects in this case. The figure shows that the MPEG-4 VCV buffer overflows, and thus the bitstream is not compliant. The IST VCV model shows that the scene is not too complex to be coded at the considered profile@level, since the VCV occupancy is around 60% during the encoding process. The MPEG-4 VCV occupancy peak that can be seen in Figure 2 is caused by a great number of transparent MBs that appear in those VOPs. Since in the IST VCV model, transparent MBs have a low relative computational weight (0.12), the peak is attenuated and the IST VCV buffer is not exceeded. It is important to notice that the number of transparent MBs in a scene strongly influences the MPEG-4 VCV performance. Figure 3 shows one image of the Children_and_Coastguard test sequence, 3 VOs, in QCIF format, encoded at 30 fps with CP@L1 at 384 kbit/s. The “Children” and “Flag” object bounding boxes have a high number of transparent MBs and overlap in the scene, which contributes to the high MPEG-4 VCV occupancy as can be seen in Figure 4. Figure 3 – Sample of the Children_and_Flag sequence. Figure 5 shows the MPEG-4 and IST VCV occupancies for the MPEG-4 test sequence News, 4 VOs, CIF format, encoded at 30 fps, CP@L2 at 2000 kbit/s. The MPEG-4 VCV buffer capacity is largely exceeded during the coding process. On the other hand, the IST VCV occupancy is always around 40%, which shows that this scene can be encoded in CP@L2. The influence of transparent MBs in the MPEG-4 VCV is easily verified in this example. The number of boundary and opaque MBs stays approximately constant along the scene, while the number of transparent MBs oscillates between two (rather high and similar) values. When the number of transparent MBs increases, there is a corresponding increase in the MPEG-4 VCV occupancy, and when the number of transparent MBs decreases, the MPEG-4 VCV occupancy decreases. On the other hand, the IST VCV buffer occupancy stays approximately constant, because of the low complexity weight that transparent MBs have on this model. Of course, the differences are not only related to the transparent MBs but this is the most critical case as may be seen from the weights in Table 1. 5. Figure 4 – MPEG-4 and IST VCV occupancy: Children_and_Flag, QCIF, 30 fps, CP@L1, 384 kbit/s. Notice that although this scene only has 3 video objects and the majority of the MBs are transparent (58% transparent, 41% boundary and 1% opaque), it cannot be coded in CP@L1 using the MPEG-4 VCV model. The MPEG-4 VCV buffer occupancy is always very high, due to the high number of transparent MBs in the “Children” and “Flag” bounding boxes. The occupancy increases when the “MPEG-4 Logo” object appears, leading to a non-compliant set of bitstreams because the VCV occupancy exceeds 100%. On the same Figure, it can be seen that the IST VCV buffer occupancy is always under 50%, because the transparent MBs complexity weight is rather low allowing the scene to be “compliantly” encoded with CP@L1. The term “compliant” means here that maintaining the standardized decoding resources, the bitstream should be able to be decoded within the necessary timing limits since it is not really more complex than other MPEG-4 “officially” compliant bitstreams. Another example that shows the weakness of the MPEG-4 VCV model in the presence of transparent MBs and the effectiveness of IST VCV model is shown in Figure 5. This paper proposes an alternative Video Complexity Verifier model, based on a set of relative MB complexity weights assigned to the various MB coding types used in MPEG-4 video coding. These weights allow to more reliably measure the real decoding complexity of a given MPEG-4 encoded scene. Complexity measurements show that the MPEG-4 VCV model over-estimates the decoding complexity of some scenes notably because some MB types, such as the transparent and skipped MBs, are over-evaluated (not distinguished from the real complex ones) in terms of decoding complexity. On the other hand, the IST VCV model allows the encoding of many of the scenes considered too complex by the MPEG-4 VCV model, for a given profile@level. These scenes can be decoded by a compliant decoder without changing the decoding resources, and thus making a better use of these resources. The efficient use of decoding resources is very important, mainly in applications where they are scarce and expensive, e.g., mobile terminals. Mobile applications should be among the first where MPEG-4 will “explode” , as demonstrated by the recent adoption by 3GPP (3rd Generation Partnership Project), responsible for the UMTS specification, of the MPEG-4 Visual standard for video coding. 6. [1] [2] [3] [4] Figure 5 – MPEG-4 and IST VCV occupancy: News, CIF, 30 fps, CP@L2, 2000 kbit/s. FINAL REMARKS REFERENCES ISO/IEC 14496-2: 1999, “Information Technology – Coding of Audio-visual Objects – Part 2: Visual”. J. Valentim, P. Nunes, F. Pereira, “Evaluating MPEG-4 Video Decoding Complexity”, 2nd Workshop and Exhibition on MPEG-4, San Jose – USA, June 2001. J. Valentim, P. Nunes, L. Ducla-Soares, F. Pereira, “IST MPEG-4 Video Compliant Framework”. Doc. ISO/IEC JTC1/SC29/WG11 MPEG2000/M5844, Noordwijkerhout MPEG Meeting, March 2000. P. Nunes, F. Pereira, “MPEG-4 Compliant Video Encoding: Analysis and Rate Control Strategies”, Proceedings of the ASILOMAR 2000 Conference, Pacific Grove – CA, USA, October 2000.