474 IEEE SYSTEMS JOURNAL VOL. 5, NO. 4, DECEMBER 2011 A Cross-Layer Approach for Prioritized Frame Transmissions of MPEG-4 Over the IEEE 802.11 and IEEE 802.11e Wireless Local Area Networks Yang Xiao, Senior Member, IEEE, Yan Zhang, Senior Member, IEEE, Mike Nolen, Julia Hongmei Deng, and Jingyuan Zhang Abstract—In this paper, we study MPEG-4 transmissions over the IEEE 802.11 wireless local area networks (WLANs). First, we provide a simulation of MPEG-4 using OPNET over the WLANs in terms of throughput, impacts of multiple MPEG-4 streams, and compression rate on throughput. Our simulation results show that a higher throughput does not always yield a better quality MPEG-4 video. We further observe that if an I frame of MPEG4 video is lost, the next N−1 frames (all P and B frames) are useless [where N is the total number of frames contained within one group of picture (GoP)]. Furthermore, we observe that the I, P, and B frames are in decreasing order of importance. Therefore, we propose a cross-layer approach to improve MPEG-4 transmissions over WLANs. In the proposed approach: 1) P and B frames will be discarded by the MPEG-4 decoder at the receiver’s medium access control (MAC) layer if the corresponding I fame is lost; 2) the individual MPEG-4 frames are prioritized at the MAC layer so that I frames have a higher priority than P frames, which have a higher priority than the B frames; 3) each frame (I, B, P) has a time deadline field so that if the deadline cannot be reached, the frame and other related P and B frames in the same GoP are deleted without further transmissions/re-transmissions; and 4) if the delay between an I frame and the next P frame is too long, then it may be better to drop the least important B frames in an attempt to allow the video to catch up. Finally, we study MPEG-4 transmissions over the IEEE 802.11e WLANs, and we adopt a measurement admission control scheme for IEEE 802.11e. Our results show the advantages of the proposed scheme. Index Terms—Cross-layer design, IEEE 802.11, IEEE 802.11e, MPEG-4, WiFi, wireless local area networks. I. Introduction C ROSS-LAYER design is a common solution used to mitigate the side effect of information hiding between Manuscript received October 26, 2010; revised February 14, 2011; accepted May 12, 2011. Date of publication September 12, 2011; date of current version November 23, 2011. This work was supported in part by the U.S. National Science Foundation, under Grants CCF-0829827, CNS-0716211, CNS0737325, and CNS-1059265. Y. Xiao and J. Zhang are with the Department of Computer Science, University of Alabama, Tuscaloosa, AL 35487 USA (e-mail: yangxiao@ieee.org; zhang@cs.ua.edu). Y. Zhang is with Simula Research Laboratory, Lysaker 1325, Norway (e-mail: yanzhang@simula.no). M. Nolen is with the Department of Computer Science, University of Memphis, Memphis, TN 38152 USA (e-mail: mnolen@earthlink.net). H. Deng is with Intelligent Automation, Inc., Rockville, MD 20855 USA (e-mail: hdeng@i-a-i.com). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/JSYST.2011.2165596 the abstract layers [9], [10]. The current architecture of crosslayer designs can be summarized as a “coordination model” that briefly describes the functionality that current cross-layer designs may support [11], [12]. The four existing crosslayer design methods are the intralayer, interlayer, centralized, and distributed method [11]. The intralayer method allows direct communication between any pair of layers in the open systems interconnection (OSI) protocol stack [11], [13], [14]. The intralayer method introduces one or more coordination planes that share data with some (or all) of the layers in the OSI protocol stack [11], [15], [16]. The centralized method introduces a centralized node (sometimes using base station) or tiers which are organized in a hierarchical manner [11], [17], [18]. The distributed method does not include any centralized node, but rather it executes crosslayer information sharing among all the nodes in a network [11], [19]. The goals of this paper are to: 1) model MPEG-4 traffic over the IEEE 802.11 wireless local area networks (WLANs), which will be used to study throughput and the impacts of multiple MPEG-4 streams on throughput, and 2) design a cross-layer approach between the medium access control (MAC) layer and the application layer (MPEG-4) to improve MPEG-4 transmissions over WLANs. Some of the questions that might be answered are the following. 1) What is the impact on throughput when the resolution of an MPEG-4 stream is increased if there are multiple sources? 2) Given the limited bandwidth, how many wireless stations can send MPEG-4 streams at different resolutions before the quality of MPEG-4 degrades to a point where it is un-viewable? 3) What happens if we vary the compression rate? 4) How do we design a better MAC layer for MPEG-4 transmissions? We study MPEG-4 transmissions over the IEEE 802.11e WLANs; we provide a simulation of MPEG-4 using OPNET over the WLANs in terms of throughput, impacts of multiple MPEG-4 streams, and compression rate on throughput. Our simulation results show that a higher throughput does not always yield a better quality MPEG-4 video. We further c 2011 IEEE 1932-8184/$26.00 XIAO et al.: CROSS-LAYER APPROACH FOR PRIORITIZED FRAME TRANSMISSIONS OF MPEG-4 observe that if an I frame of an MPEG-4 video is lost, the next N − 1 frames (all P and B frames) are useless [where N is the total number of frames contained within one group of picture (GoP)]. Furthermore, we observe that I, P, and B frames are in decreasing order of importance. We further propose a scheme to enhance MPEG-4 transmissions over WLAN by adopting a measurement admission control (AC) scheme for IEEE 802.11e. In the proposed prioritized frame cross-layer transmission approach: 1) P and B frames will be discarded by the MPEG-4 decoder at the receiver’s MAC layer if the corresponding I frame is lost; 2) the individual MPEG-4 frames are prioritized at the MAC layer so that I frames have a higher priority than P frames, which have a higher priority than B frames; 3) each frame (I, B, P) has a time deadline field for this frame so that if the deadline cannot be reached, the frame and other related P and B frames in the same GoP should be deleted without further transmissions/re-transmissions; and 4) if the delay between an I frame and the next P frame is too long, then it may be better to drop the least important B frames in an attempt to allow the video to catch up. In [5] and [6], we propose a two-level protection and guarantee mechanism for voice and video traffic in the enhanced distributed channel access (EDCA)-based distributed WLANs. In the first-level protection, the existing voice and video flows are protected from the new or already existing voice and video flows via a distributed AC with tried-andknown and early-protection enhancements. In the second-level protection, the voice and video flows are protected from the best-effort data traffic by adopting frame-based and limit-based data control mechanisms. In this paper, we apply the AC and data control schemes in [5] and [6] coupled with the proposed prioritized frame cross-layer transmission approach to MPEG-4 transmissions. Joint rate control in video source coding and channel transmission parameters is an important topic [7], [8], especially when compressed video bit streams have different characteristics other than generic data; such differences may be the importance of the I/P/B frame, the dependency of decoding from inter/intra coding, and the delay constraint for real-time transmission. The contribution of this paper may be found when the application and simulations of the proposed prioritized frame cross-layer transmission approach with the IEEE 802.11e WLANs are coupled with the AC and data control schemes. Note that the results of this paper are from the viewpoint of the MAC layer of single-hop WLAN rather than from end-to-end video applications. The cross-layer design is between the application layer and the IEEE 802.11e MAC layer. The remainder of this paper is organized as follows. We first briefly introduce MPEG-4 in Section I, and we then provide OPNET simulations of MPEG-4 traffic over WLANs in Section II. In Section III, we propose a prioritized MPEG-4 frame transmission scheme over WLANs as well as some selected simulation results. We present the AC and data control schemes in Section IV. Simulation results are presented in Section V, and we conclude our paper in Section IV. 475 II. MPEG-4 One of the most significant improvements in video compression has been the MPEG suit of standards [1], of which MPEG-4 is the newest. These standards take raw video at any frame rate and size and employ a number of techniques to compress a video at rates as high as 60:1 without much loss in quality. The main methods of compression used by MPEG-4 are to remove redundant information found in the individual frames that comprise the video and to predict changes from one frame to the next. These methods, as well as some less significant compression techniques, are what allow MPEG-4 to achieve its high rate of compression. MPEG-4 encodes an input stream into a sequence of frames known as GoPs. It is this encoding that allows for higher compression rates. The higher the compression rate, the more information that is lost during encoding. The main concept here is that, from one frame or picture to the next, very few changes may occur in the short time between frames. Due to the minor differences between frames, it is possible to predict the changes from one scene to the next. Based on this, MPEG-4 (and MPEG-1 and MPEG-2) takes a sequence of raw frames at any rate (with rate being frame per second, usually around 25–30 for high-quality video) and splits the frames into a sequence of pictures called a GoP. One second of video may be decomposed into several GoPs depending on the complexity of the overall scene and the compression rate. The more motion there may be in the scene, the higher the complexity and the lower the compression it may have. Fig. 1 illustrates this aspect. A sequence of scenes from the raw video is decomposed into several GoPs. Fig. 1 illustrates how a GoP is further decomposed into anywhere from 9 to 15 frames, labeled as I, B, and P. These individual MPEG-4 frames are what comprise the original video after it has been compressed. A brief discussion of the individual frames is given below. 1) I (intra coded picture): This is a single still compressed image that is used as a starting point for the next sequence of frames (B and P types). This single image is also used to resynchronize the entire scene. In the event that a GoP is lost or corrupted, the next GoP has a fresh image from which it may start. There are no references to other pictures in I frames. 2) P (predicted pictures): These are pictures that are compressed and used as a reference point for B frames. The compression used here involves motion-compensated prediction. The picture is broken down into small blocks, usually 16 × 16, and the changes from the I frame to the next P frame and then to the next frame are used to compress the overall GoP. 3) B (bi-predictive pictures): These provide the highest level of compression. Using the previous I or P frame and the next P frame, the B frames may be predicted with a weighted average. The B frame may be predicted by using both the forward and backward directional changes in motion. In summary, the I frame is a reference point used to start the next GoP and to resynchronize the video during errors in 476 Fig. 1. Scene is broken down into a GoP, and further decomposes the GoP into its individual components, the I, P, and B frames. transmission. The P frames are compressed versions of the I frame, and they contain some predicted information. The B frames are compressed even further and are comprised mostly of predicted information from neighboring I and P frames. All of these frames are related, and each I frame has several P and B frames that are derived from it. Therefore, if an I frame is lost, all P and B frames up to the next I frame are of no use. There are numerous parameters associated with MPEG-4. Some major parameters are frame rate per second, frame size, and n (n can be thought of as the number of I f/s, as it is used to control the number of frames in a GoP), and these affect the overall compression of the video. The lower the value of n, the lower the compression may be. III. OPNET Simulations of MPEG-4 Over WLANs The goal here is to model MPEG-4 traffic over 802.11. To do this, we need a better understanding of the MPEG-4 traffic generated by a video source. It has been determined that the individual frames produced by MPEG-4 can vary in size. When raw video is encoded into MPEG-4, three types of frames are produced, as discussed previously. The first type of frame is the I frame, which is the largest and most important frame, and the second is the P frame, which, though it is much smaller than an I frame, is still important. Finally, the B frame, which is even smaller then the P frame, is the least important of the three frame types. When a sequence of raw input frames from a video source is encoded into a GoP, each GoP has one I frame and a set of alternating B and P frames. In most implementations, there are twice as many B frames as P frames. This framing sequence leads to a variable bit rate traffic pattern, due to the I frame being very large and the P and B frames containing only the changes from one I frame to the next. The actual sizes of the B and P frames are dependent on the complexity of the video, which is determined by how much the background changes and how much motion is contained in the video. Based on this understanding of MPEG-4 traffic, the IEEE SYSTEMS JOURNAL VOL. 5, NO. 4, DECEMBER 2011 frame size, frame rate, distribution of I, P, and B frames, and approximate sizes of I, P, and B frames, it should be possible to generate MPEG-4 traffic using OPNET. The following measures are used to study MPEG-4 over WLAN: the number of stations, resolution, compression rate, total throughput, delay incurred, average interarrival times between frames, and frame loss rate (because of the relationship between the different frames losses of one type of frame affect the decoding of the MPEG-4 stream at the receiver). Each sending station generates MPGE-4 traffic at a rate of 30 f/s, and we view the frames per second received at the receiver as the number of sending stations increases. To model MPEG-4 traffic, a process model needs to be built to generate MPEG-4 traffic. We adopt an enhanced version of an OPNET process model developed at Sharp Labs [2]. In the simulation, no access point is used, and a single MPEG-4 stream can be defined as each sender transmitting to exactly one receiving station that does not transmit any traffic. To study the throughput, delays, and other aspects of MPEG-4 traffic in a WLAN, six groups of scenarios have been created; each consists of a single pair in order to establish a base line, and then the number of stations is increased. For each group, the resolution and compression rate have been varied. Different scenarios were created in OPNET for this work, as well as the number of stations (1, 5, 10, 15, 20), mean I frame size (45 K, 120 K, 260 K), and N (9 or 15). The traffic generated by MPEG-4 is dependent on the parameters discussed below. The process model provided has the ability to vary these parameters. 1) Frame rate: Set to 30 f/s; this parameter is the number of frames produced by the process model per second. This includes the three MPEG-4 frame types, labeled I, P, and B frames; the distribution of these frames is controlled by parameters M and N. 2) M: Set to 3; this parameter defines the positions of the I and P frames contained within a GoP. The setting of 3 means that every third frame will be either an I or P frame, with the zero frame always being an I frame and every third frame being a P frame. The zero frame is always the first frame within a GoP. Finally, the total number of frames contained within a GoP is defined by N. 3) N: This parameter defines the total number of frames contained within one GoP. The above three parameters will generate the following framing sequence for 1 s of MPEG-4 traffic (N = 15) with two GoPs per second: I B B P B B P B B P B B P B B I B B P B B P B B P B B P B B. The MPEG-2 process model has the ability to control the size of the three frames listed above. This is handled though a probability distribution function in terms of the mean frame size in bits and variances in bits specified for each type of MPEG-4 frame. The exact values of these are governed by input frame size; therefore, the values chosen will be specific to a particular display resolution. For this work, three estimated resolutions were chosen, and they are listed in Table I. The resolution is given in pixels, assuming true color and extra data for audio. Since MPEG-4 uses JPEG compression for the I frame, the mean compression rate of XIAO et al.: CROSS-LAYER APPROACH FOR PRIORITIZED FRAME TRANSMISSIONS OF MPEG-4 477 TABLE I MPEG-4 Frame Sizes Estimates Scenario Name I Frame (Mean, Var) (Bits) P Frame (Mean, Var) (Bits) B Frame (Mean, Var) (Bits) Estimated Resolution 45 K (45 000, 1000) (4100, 1000) (2000, 800) 128 × 120 pixels (11:1 I frame compression rate) 120 K (120 000, 4000) (12 000, 2000) (4000, 2000) 210 × 190 (11:1 I frame compression rate) 260 K (260 000, 15 000) (60 000, 5000) (25 000, 2000) 300 × 290 (11:1 I frame compression rate) TABLE II Listing All Scenarios, the Number of MPEG-4 Streams, Mean I Frame Size, and the Size of N Fig. 2. Illustrating the flow of traffic from one source to one sink. the I frame is estimated to be 11:1. The mean of the P frame is usually around one-tenth of that for the I frame unless the I frame is very large, as in the last entry. It has been found that the mean of a B frame size is usually around one-half to one-third of that for P frame, again depending on the size of the I frame. The size of the GoP is 9. The data rate is 11 Mb/s of IEEE 802.11b. To model MPEG-4 traffic, a process model needs to be built in order to generate MPEG-4 traffic; in fact, a process model already exists to serve this purpose. We obtained an OPNET process model developed at Sharp Labs [2] that can be used to generate MPEG-2 traffic. Although this model is for MPEG-2, it should still be applicable to MPEG-4, since the only difference between the two standards is compression techniques and some enhancements which are not relevant for this project. This process model will be used to replace the standard source process model that is included in the standard wireless workstation model. To study the traffic requirements of MPEG-4, several different scenarios have been created in OPNET. Each has the following items in common. There are two wireless workstations, with one defined as the sender and the other as the receiver (see Fig. 2). The sender uses the provided process model to generate MPEG-4 traffic according to the specified parameters, and the sender has a designated receiver. For this work, no access point is used. Therefore, each sender transmits to exactly one receiving station, and the receiver does not transmit any traffic (hence traffic is considered as one way). This is known as a single MPEG-4 stream. At this time, the pairs of wireless stations are located within range of each other. To study the throughput, delays, and other aspects of MPEG-4 traffic in a WLAN, six groups of scenarios have been created; each of these consists of a single pair of workstations to establish a base line, and then the number of stations is increased. For each group, the resolution and compression rate are varied. Table II lists the different scenarios created in OPNET for this work, as well as the number of stations, mean I frame size, and N. OPNET Scenario Name 1− stat− 45 K− n9 5− stat− 45 K− n9 10− stat− 45 K− n9 15− stat− 45 K− n9 20− stat− 45 K− n9 1− stat− 45 K− n15 5− stat− 45 K− n15 10− stat− 45 K− n15 15− stat− 45 K− n15 20− stat− 45 K− n15 1− stat− 120 K− n9 5− stat− 120 K− n9 10− stat− 120 K− n9 15− stat− 120 K− n9 20− stat− 120 K− n9 1− stat− 120 K− n15 5− stat− 120 K− n15 10− stat− 120 K− n15 15− stat− 120 K− n15 20− stat− 120 K− n15 1− stat− 260 K− n9 5− stat− 260 K− n9 10− stat− 260 K− n9 15− stat− 260 K− n9 20− stat− 260 K− n9 1− stat− 260 K− n15 5− stat− 260 K− n15 10− stat− 260 K− n15 15− stat− 260 K− n15 20− stat− 260 K− n15 Stations Count 1 5 10 15 20 1 5 10 15 20 1 5 10 15 20 1 5 10 15 20 1 5 10 15 20 1 5 10 15 20 Frame Size 45 K 45 K 45 K 45 K 45 K 45 K 45 K 45 K 45 K 45 K 120 K 120 K 120 K 120 K 120 K 120 K 120 K 120 K 120 K 120 K 260 K 260 K 260 K 260 K 260 K 260 K 260 K 260 K 260 K 260 K N 9 9 9 9 9 15 15 15 15 15 9 9 9 9 9 15 15 15 15 15 9 9 9 9 9 15 15 15 15 15 The following assumptions listed in Table III are mainly concerned with the WLAN environment and the individual stations. Some of these assumptions are equated with parameters in the OPNET WLAN MAC protocol. The following assumptions, which are listed in Table III, are mainly concerned with MPEG-4 traffic. One item that should be noted at this time is the naming convention used for the scenarios. Because the I frame is the most important and largest frame in an MPEG-4 GoP, it is used to label the different scenarios; thus, a scenario labeled 45 K actually refers to the mean size of the I frame. This reference is used throughout this paper. Within OPNET, the standard object labeled “WLAN− Stat− adv” is used as a model for both the sender and receiver. On the sender side, the normal process model used to generate traffic has been replaced with the provided MPEG-4 process model, and, on the receiver side, a new sink process model 478 IEEE SYSTEMS JOURNAL VOL. 5, NO. 4, DECEMBER 2011 Fig. 3. OPNET simulations of MPEG-4 over WLAN with different mean I frame sizes and N. (a) MPEG-4 streams (N = 9). (b) MPEG-4 streams (N = 15). (c) MPEG-4 streams (N = 9). (d) MPEG-4 streams (N = 15). (e) MPEG-4 streams. (f) MPEG-4 streams (N = 9). (g) MPEG-4 streams (N = 15). (h) MPEG-4 streams (N = 9). (i) MPEG-4 streams (N = 15). TABLE III TABLE IV General Assumptions Made About WLAN and MAC Relevant WLAN Workstation Model Parameters Queue size Noise Fragmentation Large packet processing Background traffic Traffic generation Sink ON/OFF Traffic direction Infinite (no packet loss do to limited queue size) Noise-free environment (no errors during transmission) Yes (needed because of the large packets) Handled by fragmentation No competing background traffic The source generates packets (MPEG-4) continuously over the course of a simulation The sink drops the packets once received No On/Off traffic One way from sender to on receiver known as “sink− mod− c” has been created to collect statistics on interarrival times and on the number of I, P, and B frames. The other parameters related to this model are listed in Table IV. Fig. 3(a) shows that, as the number of stations increases, the experienced throughput decreases (as expected). From this, we can see, with the lower compression rate (N = 9), that the required throughput is higher and that it degrades very rapidly in comparison with the required throughput for N = 15 in Data rate Queue size Large packet processing Node ID MAC address Sender destination address Receiver traffic generation Receiver destination address Access point 11 Mb/s Infinite (1 000 000 000) Fragment Assigned numerically Same as node ID Sender has a designated receiver assigned None Random (default setting) None Fig. 3(b). The graph shows that at the highest frame size of 260 K, the throughput is well below 50% at ten stations. In Fig. 3(b), where the size of a GoP is now 15 (i.e., N = 15), we see similar results to that of Fig. 3(a), except that the required throughput is lower due to the higher compression rate. Yet, even at the higher compression rate, the throughput for a single station receiving MPEG-4 traffic with frame sizes of 260 K and 120 K degrades rapidly after the total number of sending stations climbs to ten or more. Fig. 3(c) shows that, with the lower frame size (45 K), the total throughput continues to increase as the number of sending stations increases, where N = 9. This result is expected and is in contrast to the larger frame sizes, where 260 K and 120 K both experienced a decrease in total throughput as the number XIAO et al.: CROSS-LAYER APPROACH FOR PRIORITIZED FRAME TRANSMISSIONS OF MPEG-4 479 Fig. 4. Frame loss with different mean I frame sizes and N. (a) N = 15, mean I frame size = 45 K. (b) N = 15, mean I frame size = 120 K. (c) N = 15, mean I frame size = 260 K. (d) N = 9, mean I frame size = 45 K. (e) N = 9, mean I frame size = 120 K. (f) N = 9, mean I frame size = 260 K. of sending stations increased. This is most likely due to the increasing number of collisions. Fig. 3(d) shows that, with the lower frame size (45 K), the total throughput continues to increase as the number of sending stations increases (as expected), and for both 260 K and 120 K it decreases in total throughput as the number of sending stations increases, where N = 9. In Fig. 3(e), all six curves shown in Fig. 3(c) and (d) are plotted together to show what happens when N is changed. One interesting observation in Fig. 3(e) is that, for the plots for 260 K, the 260 K frame size actually gets better total throughput in contrast to the other curves when N is set to 15 and the number of stations is 15 or above. Fig. 3(f), where N = 9, shows that, for the lower resolution 45 K, the interarrival times between frames remains relatively constant at the expected rate of 1/30 s (0.033 s) until the number of stations climbs above 15. For the higher resolutions, the arrival times increase as the number of stations increases. The only exception to this is for 15 and 20 stations at the highest resolution, where we also observe a small dip in the interarrival time. After viewing the raw data, it has been determined that the mean arrival times for 15 may have been skewed; at this point, there was quite a large variance in the times. Fig. 3(g), where N = 15, shows similar changes in arrival times at different resolutions, as in Fig. 3(f). At 260 K with 20 stations, the arrival time actually drops from around 0.20 to just over 0.15. At first, this looks like an improvement, but after a closer look at the raw data, it was determined that the mean interarrival time for 15 stations may have been skewed by a large variance in the times. Fig. 3(h) and (i) shows the number of experienced frames received versus the number of MPEG-4 streams. As the number of MPEG-4 streams increases, the number of received frames decreases from the default (30 f/s) to especially low, especially for larger frame sizes. Next, we look at the loss of the individual MPEG-4 frame types for the different frame sizes as well as the total frame loss per second. Fig. 4(a) and (d) shows the number of I, P, and B f/s for increasing numbers of MPEG-4 streams. The mean I frame size is 45 K. We observe that at 15 stations, we still average 30 f/s with almost no loss of I and P frames. In Fig. 4(d), where N = 9, the frames per second, when 20 stations are sending, drop to a little above 25 on average, versus 29 f/s when N = 15. Also in Fig. 4(d), there is a drop, according to the raw data, in the number of I frames from the expected 3.33 I f/s to 2.77, and this drop affects the quality of the video. Fig. 4(b) and (e) shows the number of I, P, and B f/s for increasing numbers of MPEG-4 streams, where the mean I frame size is 120 K. We observe that even at ten stations we are still averaging 30 f/s with almost no loss of I and P frames; yet, as the number of stations increases to 15 and above, the loss of B and P frames increases. In both Fig. 4(b) and (e), the loss rate of I f/s at 20 stations is 50% according to the data, and this means that all P and B frames up to the next I frame are wasted frames because they rely on the previous I frame. Fig. 4(c) and (f) shows the number of I, P, and B f/s for increasing numbers of MPEG-4 streams. The mean I frame size is 260 kb. We observe that even at ten stations the average frames per second is less than 50% of the expected 30 f/s, with only 50% of the I and P frames arriving per second on average. Here, the quality of the video would most likely degrade very rapidly. The results from the simulations show that at 11 Mb/s there is an expected limit on the number of wireless stations that can send MPEG-4 traffic. Decreasing the MPEG-4 video resolution also allows us to increase the number of stations sending MPEG-4 traffic, but not by much at the higher resolutions. The results of the simulation show us that at 45 K with the different compression rates (N = 9 and N = 15), 15 more stations can send MPEG-4 traffic but the limit may be around 20. At the higher resolution, only about five stations can send MPEG-4 traffic before the quality of the video is affected due to a drop in throughput. 480 IEEE SYSTEMS JOURNAL VOL. 5, NO. 4, DECEMBER 2011 With respect to total throughput, we observe from Fig. 3(c)–(e) that the higher resolution MPEG-4 streams are unable to achieve the required bandwidth at above five stations. There is even a drop in the total throughput for the higher resolution streams as the number of stations increases from 10 to 15 to 20; this is likely due to increased collisions caused by so many stations contending for the medium. When looking at the total throughput for the higher resolutions, it would seem that, because a high throughput is present, the video would still be good despite the fact that it is below the requirement for a particular resolution. Yet if we view the frames received per second and the number of MPEG-4 frames received, we realize that, based on the relationship between the different MPEG-4 frames, the quality of the video most likely will degrade to the point where it is un-viewable. Again, this is on account of the relationship between the MPEG-4 frames: a lost I frame means the B and P frames up to the next I frame are useless. Thus, even though a station may receive a large number of B and P frames and high throughput, in reality the MPEG-4 decoder will most likely discard the majority of the B and P frames with the result that the throughput may be high but the quality of the video may be low. Varying the compression rate by changing N from 9 to 15 does affect the throughput and perhaps the number of stations that can send MPEG-4 traffic. It should be noted here that changing N from 15 to 9 decreases the compression rate which, as a result, increases the required bandwidth. Some may argue that scarifying bandwidth for the quality of a video is desirable; this is due to MPEG-4s ability to resynchronize the video on the next I frame. If the number of I f/s is two (N = 15) and we lose an I frame, then we lose 50% of the video; but if N = 9, where there are three I f/s, the loss will be only 1/3. The simulations seem to contradict this since the higher compression rate (N = 15) always performs better with respect to frames received per second. Finally, we consider the interarrival time for the different resolutions. As expected, the delay between frames increases as the number of sending stations and the frames received per second decreases. This again will affect the video, since the decoder not only has to wait longer on frames (inter arrival) but also may have to delay decoding while waiting on a P frame that may never arrive due to it being dropped after too many collisions. Also, when looking at the graphs concerning the interarrival times and the number of frames per second for the higher resolution, it appears that these numbers do not agree. The numbers presented are averages from the 2 min of simulation time. After looking at the raw data concerning interarrival times and the frames per second, it was found that there was a large variance in the data. This could be caused by the large numbers of stations sending great amounts of traffic that all compete for the medium. IV. Prioritized MPEG-4 Frame Transmission In the above section, we study MPEG-4 transmissions over WLANs. Our simulation results show that a higher throughput does not always yield a better quality of MPEG-4 video. It is thought that the loss of different MPEG-4 frames has a dramatic affect on the video. We observe that I, P, and B frames are in decreasing order of importance. If an I frame of MPEG-4 video is lost, the next N − 1 frames (all P and B frames) are useless, where N is the total number of frames contained within one GoP. First, the P and B frames are a waste of bandwidth, as these frames must be discarded by the MPEG-4 decoder at the receiver on account of the lost I frame. These useless P and B frames cause collisions and large delays incurred by too many stations sending MPEG-4 traffic. Furthermore, if frames are delayed too long, it may be better to simply drop them in order to relieve the traffic than to allow the MPEG-4 decoder to handle the resynchronization on it own. Therefore, we propose a cross-layer approach to improve MPEG-4 transmissions over WLANs. In the proposed approach, there are the following protocols. 1) P and B frames will be discarded by the MPEG-4 decoder at the receiver’s MAC layer if the corresponding I frame is lost. 2) The individual MPEG-4 frames are prioritized at the MAC layer so that I frames have a higher priority than P frames which have a higher priority than B frames; we will explain several ways to prioritize. 3) Each frame (I, B, P) has a time deadline field with the result that, if the deadline cannot be reached, the frame and other related P and B frames in the same GoP are deleted without further transmissions/re-transmissions. 4) If the delay between an I frame and the next P frame is too long, then it may be better to drop the least important B frames in an attempt to allow the video to catch up. 5) In the transmission queue of the MAC layer, the frames are reorganized according to dependence and to priorities: I frames, related P frames, and related B frames. The individual MPEG-4 frames could be prioritized, meaning that the more important I frames would be given higher priority than the less important B frames. There are several ways to enhance the priority of I frames: 1) similar to IEEE 802.11e [3], [4], the minimum window size (CWmin ) of I frames < CWmin of P frames ≤ CWmin of B frames; 2) similar to IEEE 802.11e [3], [4], the arbitration interframe space (AIFS) of I frames < AIFS of P frames ≤ AIFS of B frames; 3) retry-limit of I frames > retry-limit of P frames ≥ retrylimit of B frames; e.g., the retry-limit of I frames can be the same as the retransmission limit in the normal 802.11 standard frame (i.e., 7 for short frames, the retry-limit of P frames can be 3 or 4, and the retry-limit of B frames can be 1 or 2). First, I frames must have the highest priority and thus the most retransmission retries because a lost I frame affects so many later P and B frames. Second are the P frames; if one of these is lost, only four surrounding B frames are affected. As a result, only about 5/30 or 1/6 of the video is affected versus 1/2 for a lost I frame (assuming a GoP is 15 frames). Finally, we consider the B frames, which have the lowest priority and the smallest effect on the video. If a B frame XIAO et al.: CROSS-LAYER APPROACH FOR PRIORITIZED FRAME TRANSMISSIONS OF MPEG-4 is lost, no other frames are affected; however, it should be noted that the decoder may incur a delay as it waits for a B frame that will never arrive. Finally, it would be least effective to drop B frames by only attempting one retransmission due to a collision. MPEG-4 traffic is possibly generated in the order seen in Fig. 1, meaning that the first I frame is followed by B and P frames which are related to it. If the I frame is dropped due to too many retransmission failures, then there is no reason to attempt to transmit the following B and P frames, since these are useless to the decoder with out the lost I frame. Thus, a dependence could be defined as when an I frame is dropped, the next N − 1 frames (P and B frames) or all frames up to the next I frame will also be dropped, assuming they arrive in order. If the frames arrive in order, then we only need to drop all B and P frames queued, up to the next I frame. If this is not the case, then a dependence between frames could be defined in such a way that if the parent frame (an I frame) is lost, then all of its children P and B frames will be dropped. A similar dependence could be defined for P and B frames, but this should have less of an effect since only four B frames are affected by the loss of a single P frame. Let us consider the delay between types of frames. If the delay between an I frame and the next P frame is too long, then it may be better to drop the least important B frames in an attempt to allow the video to catch up. This suggests that we should assign a maximum delay to the different frames before the frame is dropped in an attempt to reduce the effects on the video if the delay between frames should be too long. Furthermore, each frame (I, B, P) has a time deadline field for this frame so that if the deadline cannot be reached, the frame and other related P and B frames in the same GoP may be deleted without further transmissions/retransmissions. In an attempt to improve the overall quality, these possible suggestions may be combined together to prioritize, define the dependence, and define the maximum tolerable delay in an attempt to improve the quality of MPEG-4 traffic. For these suggestions, a cross-layer approach is proposed to implement them at both the MAC and application layers (MPEG-4). Within the application layer, the frame’s priority, dependence, and maximum delay must be assigned. This would then require that the MAC layer make the necessary adjustment to retransmission retries or other priorities based on the frame priority. Also, if a frame were lost, then the MAC layer would need to adjust its queue by dropping all dependent frames. It would be necessary for he MAC layer to monitor the maximum delay; this could become complex if it were not implemented correctly, as the delays between frames would possibly trigger the dropping of frames and, in turn, cause the dependence function to drop more frames. Finally, the MAC layer would tell the application layer to perform corresponding actions, (e.g., when the MAC layer drops some I and B frames, it will inform the application layer to realize the MAC layer’s decisions). We simulate some of the suggestions and provide a few of their results. In the simulations, we implement retry-limit priority; the retry-limits of I, P, and B frames are 7, 3, and 1, respectively. P and B frames will be discarded by the 481 Fig. 5. Throughput per station (260 K frame size). (a) MPEG-4 streams (N = 9). (b) MPEG-4 streams (N = 15). MPEG-4 decoder at the receiver’s MAC layer if the corresponding I fame is lost. Fig. 5 compares the throughputs per station of the original scheme (802.11) and the proposed scheme. As illustrated in Fig. 5, the throughput of the proposed scheme is significantly improved. Fig. 6 shows the frame loss for I, B, and P frames; as illustrated in Fig. 6, the I and P frames are not lost. Comparing Fig. 6 to Fig. 5(c) and (f), we observe that the proposed scheme significantly improves the transmissions of the I and P frames. The quality of video can be significantly improved due to the successful transmissions of I and P frames. V. AC and Data Control In our proposed schemes in [5] and [6], the existing voice and video flows are protected from new and already existing voice and video flows. In the distributed AC for differentiation services of the IEEE 802.11e EDCA, channel utilization measurements are conducted during each beacon interval (or several beacon intervals), and available/residual budgets are calculated. When the budget of one class becomes small, new traffic streams belonging to this class can no longer gain transmission time, and existing nodes will not be allowed to increase the transmission time that they are already using. Therefore, the existing traffic streams are protected, and the channel capacity is fully utilized. The distributed AC is developed to protect active quality of service (QoS) flows, such as voice and video. The access point announces the transmission budget via beacons for each AC (except AC 0). The budget indicates the allowable transmission time per AC in addition to what is being utilized. QoS stations (QSTAs) determine an internal transmission limit per AC for each beacon interval, based on the transmission count during the previous beacon period and the transmission budget announced from the quality of access point (QAP). The local voice/video transmission time per beacon interval should not exceed the internal transmission limit per AC. When the transmission budget for an AC is depleted, new flows are not able to gain transmission time, while existing flows cannot increase the transmission time per beacon interval that they are already using. This mechanism protects existing flows. 482 Fig. 6. IEEE SYSTEMS JOURNAL VOL. 5, NO. 4, DECEMBER 2011 Frame loss. (a) N = 15, mean I frame size = 260 K. (b) N = 9, mean I frame size = 260 K. Fig. 7. Frame loss with different mean I frame sizes and N (original MAC). (a) N = 15, mean I frame size = 260 K. (b) N = 9, mean I frame size = 260 K. However, although much of the channel capacity can be utilized by existing voice and video, too many unsuccessful best-effort data transmissions can degrade the existing voice and video flows; this may happen since numerous data transmissions may cause many collisions. The existing voice and video flows become vulnerable to data traffic. This occurs because priority supports are not strict but rather are relative and are conducted stochastically. Accordingly, we propose data control mechanisms, in which the existing voice and video flows are protected from the best-effort data traffic. A detailed algorithm is skipped. The most effective way to control data transmissions is to reduce the number of collisions, or collision probability, caused by the data transmissions. Controlling the number of stations accessing the wireless medium for data transmissions will cause unfairness among stations. Moreover, it is quite difficult and unrealistic to obtain the accurate number of active stations for data transmissions as well as the associated data transmission rate. Here we adopted frame-based approaches. In [6], we propose frame-based approaches. In the framebased approach, we attempt to dynamically control the EDCA channel access parameters (i.e., AIFS[0], CWmin [0], and CWmax [0]) based on the observations of frame transmission behaviors so that, when the number of active data stations is large, throughputs for voice and video flows may be protected; this protection may occur by increasing the initial contention window size and interframe space for the best-effort data traffic. Our goal is to control the number of collisions, or collision probability, independent of the number of active stations for data transmissions. In the frame-based approach, stations dynamically adjust the EDCA data parameters based on the behaviors of one or multiple frames. Observations of frame behaviors can tell us several traffic conditions. For example, consecutive successful transmissions signify that the traffic load condition is reasonably good, whereas consecutive dropped frames signify that either the traffic condition or the channel condition is bad. In our approach, the announced EDCA parameters for data (AC 0) from QAP via beacon frames are default values for data traffic. Thus, data EDCA parameters may be adjusted by QSTAs locally. We suggest that QAP does not update data EDCA parameters except for periodic maintenance (shutting down), perhaps one night every two weeks (depending on applications). When the QAP is on again, all new parameters can be reset as needed. In the frame-based approach, we can further classify schemes into intra-frame-based schemes, inter-frame-based schemes, and combined intra-inter-frame-based schemes. In the intra-frame-based scheme, we propose a fast-backoff scheme. In the inter-frame-based scheme, we propose an adaptation approach based on consecutive dropped transmissions, consecutive successful transmissions, consecutive transmissions with a small number of retries, and consecutive transmissions with a larger number of retries. In the combined intra-inter-frame-based scheme, the above two approaches are combined. Details of these schemes are omitted due to limited space. XIAO et al.: CROSS-LAYER APPROACH FOR PRIORITIZED FRAME TRANSMISSIONS OF MPEG-4 Fig. 8. Frame loss with different mean I frame sizes and N (PMFT). (a) N = 15, mean I frame size = 260 K. (b) N = 9, mean I frame size = 260 K. 483 Fig. 9. Frame loss with different mean I frame sizes and N (PMFT + AC): only for accepted flows. (a) N = 15, mean I frame size = 260 K. (b) N = 9, mean I frame size = 260 K. VI. Simulation Results In this section, we present the simulation results via OPNET. Each sending station generates MPGE-4 traffic at a rate of 30 f/s, and we view the frames per second received at the receiver as the number of sending stations increases. To model MPEG-4 traffic, a process model needs to be built in order to generate MPEG-4 traffic. We have adopted an enhanced version of an OPNET process model developed at Sharp Labs [2]. For this work, different scenarios were created in OPNET, as well as the number of stations (1, 5, 10, 15, 20), mean I frame size (260 K), and N (9 or 15). The traffic generated by MPEG-4 is dependent on several parameters which are discussed below. The process model provided has the ability to vary these parameters. M is set to 3; this parameter defines the positions of the I and P frames contained within a GoP. The setting of 3 means that every third frame will be either an I or P frame, with the zero frame always being an I frame and every third frame being a P frame; the zero frame is always the first frame within a GoP. Finally, the total number of frames contained within a GoP is defined by N. This parameter defines the total number of frames contained within one GoP. We simulate some of the suggestions and provide various simulation results. In the simulations, we implement retry-limit priority; the retry-limits of I, P, and B frames are 7, 3, and 1, respectively. P and B frames will be discarded by the MPEG-4 decoder at the receiver’s MAC layer if the corresponding I frame is lost. We compare the original scheme (802.11), the proposed prioritized MPEG-4 frame transmission (PMFT), and Fig. 10. Throughput per station (260 K frame size). (a) MPEG-4 streams (N = 9). (b) MPEG-4 streams (N = 15). PMFT + AC. The simulation setup is built upon the setup described in Section III. Fig. 7(a) and (b) shows the number of I, P, and B f/s for an increasing number of MPEG-4 streams in the original MAC. The mean I frame size is 260 kb. We observe that even at ten stations the average frames per second is less than 50% of the expected 30 f/s, where there are only 50% of the I and P frames arriving per second on average. Here, the quality of the video will most likely degrade very rapidly. Fig. 8 shows the frame loss for the I, B, and P frames of PMFT. As illustrated in Fig. 8, for PMFT, the I and P frames 484 IEEE SYSTEMS JOURNAL VOL. 5, NO. 4, DECEMBER 2011 are not lost. When comparing Fig. 8 to Fig. 7(a) and (b), we observe that the proposed scheme significantly improves the transmissions of the I and P frames. The quality of video can be significantly improved due to the successful transmissions of I and P frames. Fig. 9 shows the frame loss for the I, B, and P frames of PMFT + AC, with only accepted flows shown. We observe that there is no loss for the I, B, and P frames. Fig. 10 compares the throughputs per station of the original scheme (802.11), PMFT, and PMFT + AC. As illustrated in Fig. 7, the throughput of the proposed PMFT scheme outperforms the original scheme, and the proposed PMFT + AC is the best (there is no loss). VII. Conclusion This paper has first shown how much MPEG-4 traffic may be handled in a WLAN using 802.11 at 11 Mb/s. We have observed the different compression rates, resolutions, and the numbers of stations sending MPEG-4 traffic. We have also looked briefly at the effects of increasing numbers of stations on the interarrival times and, more importantly, on the types of MPEG-4 frames received. If one considers the relationship among MPEG-4 frames and then reviews the frames received per second for the highest resolution simulated, the video received seems un-acceptable; this is because the MPEG-4 decoder would most likely discard the majority of the frames. Based on this, the quality of the video is dependent on not only the throughput but also on the arrival of the different frames. Our simulation results showed that a higher throughput does not always yield a better quality of MPEG-4 video. We further observed that if an I frame of MPEG-4 video was lost, the next N − 1 frames (all P and B frames) were useless, where N is the total number of frames contained within one GoP. Furthermore, we observed that I, P, and B frames were in decreasing order of importance. Therefore, we proposed a cross-layer approach between the MAC layer and the application layer to improve MPEG-4 transmissions over WLANs via priority by discarding P and B frames if the depending I frame was lost, deadlines for each frame, reordering frames in the transmission queues, etc. Our results showed the advantage of the proposed scheme. Finally, we adopted a measurement admission control scheme for IEEE 802.11e. Simulation results showed the advantage of the proposed schemes. References [1] Jeju, MPEG-4 Overview, document JTC1/SC29 N4668, ISO/IEC, Mar. 2002. [2] S. Kandala and S. Deshpande, MPEG-2 Model, Sharp Labs, Camas, WA, 2004. [3] Y. Xiao, “IEEE 802.11E: QoS provisioning at the MAC layer,” IEEE Wireless Commun., vol. 11, no. 3, pp. 72–79, Jun. 2004. [4] IEEE 802.11e-2005, Supplement to Part 11: Wireless Medium Access Control (MAC) and Physical Layer (PHY) Specifications: Medium Access Control (MAC) Enhancements for Quality of Service (QoS), Nov. 2005. [5] Y. Xiao, H. Li, and S. Choi, “Protection and guarantee for voice and video traffic in IEEE 802.11e wireless LANs,” in Proc. IEEE INFOCOM, Mar. 2004, pp. 2153–2163. [6] Y. Xiao, H. Li, and S. Choi, “Two-level protection and guarantee for multimedia traffic in IEEE 802.11e distributed WLANs,” Wireless Netw., vol. 15, no. 2, pp. 141–161, Feb. 2009. [7] M.-T. Sun and A. R. Reibman, Compressed Video Over Networks. New York: Marcel Dekker, 2001. [8] Y. Wang, J. Ostermann, and Y. Zhang, Video Processing and Communications. Englewood Cliffs, NJ: Prentice-Hall, 2001. [9] A. Goldsmith and B. Girod. (2004). Cross-Layer Design of Ad-Hoc Wireless Networks for Real-Time Media [Online]. Available: http://www. stanford.edu/∼zhuxq/adhoc− project [10] T. Saadawi, “Optimizing airborne networking performance with crosslayer design approach,” Res. Foundation, City Univ. New York, New York, Tech. Rep. AFRL-RI-RS-TR-2009-165, 2008. [11] F. Foukalas, V. Gazis, and N. Alonistioti, “Cross-layer design proposals for wireless mobile networks: A survey and taxonomy,” IEEE Commun. Surveys Tutorials, vol. 10, no. 1, pp. 70–85, Jan.–Mar. 2008. [12] T. Goff, J. Moronski, D. S. Phatak, and V. Gupta, “Freeze-TCP: A true end-to-end TCP enhancement mechanism for mobile environments,” in Proc. 19th INFOCOM, Mar. 2000, pp. 1537–1545. [13] V. T. Raisinghani and S. Iyer, “Cross-layer feedback architecture for mobile device protocol stacks,” IEEE Commun. Mag., vol. 44, no. 1, pp. 85–92, Jan. 2006. [14] T. Kwon, H. Lee, S. Choi, J. Kim, D. Cho, S. Cho, S. Yun, W. Park, and K. Kim, “Design and implementation of a simulator based on a crosslayer protocol between MAC and PHY layers in a wibro compatible IEEE 802.16e OFDMA system,” IEEE Commun. Mag., vol. 43, no. 12, pp. 136–146, Dec. 2005. [15] G. Carneiro, J. Ruela, and M. Ricardo, “Cross-layer design in 4G wireless terminals,” IEEE Wireless Commun., vol. 11, no. 2, pp. 7–13, Apr. 2004. [16] S. Khan, Y. Peng, E. Steinbach, M. Sgroi, and W. Kellerer, “Applicationdriven cross-layer optimization for video streaming over wireless networks,” IEEE Commun. Mag., vol. 44, no. 1, pp. 122–130, Jan. 2006. [17] R. Ferrús, L. Alonso, A. Umbert, X. Reves, J. Perez-Romero, and F. Casadevall, “Cross-layer scheduling strategy for UMTS downlink enhancement,” IEEE Commun. Mag., vol. 43, no. 6, pp. 24–28, Jun. 2005. [18] H. Jiang, W. Zhuang, and X. Shen, “Cross-layer design for resource allocation in 3G wireless networks and beyond,” IEEE Commun. Mag., vol. 43, no. 12, pp. 120–126, Dec. 2005. [19] X. Lin, N. B. Shroff, and R. Srikant, “A tutorial on cross-layer optimization in wireless networks,” IEEE J. Sel. Areas Commun., vol. 24, no. 8, pp. 1452–1463, Aug. 2006. Yang Xiao (S’98–M’01–SM’04) is currently with the Department of Computer Science, University of Alabama, Tuscaloosa. His current research interests include security, telemedicine, robots, sensor networks, and wireless networks. He has published more than 300 papers in major journals, refereed conference proceedings, and book chapters related to these research areas. He is currently the Editor-in-Chief of the International Journal of Security and Networks and the International Journal of Sensor Networks. He was the Founder and Editor-in-Chief of the International Journal of Telemedicine and Applications from 2007 to 2009. Yan Zhang (M’05–SM’10) received the Ph.D. degree from Nanyang Technological University, Singapore. Since August 2006, he has been with the Simula Research Laboratory, Lysaker, Norway. He is currently a Senior Research Scientist with the Simula Research Laboratory. He is an Associate Professor (part-time) with the University of Oslo, Oslo, Norway. His current research interests include resources, mobility, spectrums, energy, and data management in wireless communications and networking. Dr. Zhang is a Regional Editor, an Associate Editor, on the Editorial Board, or a Guest Editor of a number of international journals. He is currently serving as a Book Series Editor for the book series on Wireless Networks and Mobile Communications (Auerbach Publications, CRC Press, Taylor and Francis Group). He serves as an Organizing Committee Chair for many international conferences, including AINA 2011, WICON 2010, IWCMC 2010/2009, BODYNETS 2010, BROADNETS 2009, ACM MobiHoc 2008, IEEE ISM 2007, and CHINACOM 2009/2008. XIAO et al.: CROSS-LAYER APPROACH FOR PRIORITIZED FRAME TRANSMISSIONS OF MPEG-4 Mike Nolen received the B.S. degree in computer science in 1989 from Lambuth College, Jackson, TN. He was a Doctoral Student with the University of Memphis, Memphis. He has over seven years of work experience with several high-tech telecommunications companies. His current research interests include database systems, networking, operating systems, and software engineering. Julia Hongmei Deng received the Ph.D. degree in communications and computer networks from the Department of Electrical Engineering, University of Cincinnati, Cincinnati, OH, in 2004. She is currently a Principal Scientist with Intelligent Automation, Inc. (IAI), Rockville, MD. Her current research interests include protocol design, analysis and implementation in wireless ad hoc/sensor networks, and network security and management. At IAI, she is currently leading many network and security-related projects, such as secure routing for airborne networks, network service for airborne networks, denialof-service mitigation for tactical networks, trust-aware querying for sensor networks, and agent-based intrusion detection system, just to name a few. 485 Jingyuan Zhang received the Ph.D. degree in computer science from Old Dominion University, Norfolk, VA, in 1992. He is currently an Associate Professor with the Department of Computer Science, University of Alabama, Tuscaloosa. Prior to joining the University of Alabama, he had been a Principal Computer Scientist with ECI Systems and Engineering, Virginia Beach, VA, an Assistant Professor with Elizabeth City State University, Elizabeth City, NC, and an Instructor with Ningbo University, Ningbo, China. His current research interests include wireless networks, mobile computing, and collaborative software.