D1.3: Understanding Evalvid Contents Introduction to Evalvid ................................................................................................ 2 1. Framework and Design ...................................................................................... 3 2. Functionalities .................................................................................................. 4 2.1. Determination of Packet and Frame Loss........................................................... 4 Packet loss .......................................................................................................... 4 Frame loss ........................................................................................................... 4 2.2. Determination of Delay and Jitter ..................................................................... 4 2.3. Video Quality Evaluation ................................................................................. 5 3. Evalvid’s Components ........................................................................................ 6 Source ................................................................................................................... 6 Video Encoder and Video Decoder ............................................................................. 6 VS (Video Sender) ................................................................................................... 6 ET (Evaluate Trace) ................................................................................................. 6 FV (Fix Video) ......................................................................................................... 7 3rd party Tools......................................................................................................... 7 References ................................................................................................................ 8 1 Introduction to Evalvid EvalVid is a complete framework and tool-set for evaluation of the quality of video transmitted over a real or simulated communication network, provides packet/frame loss rate, packet/frame jitter, PSNR, and MOS metrics for video quality assessment purposes. [2] – p426 EvalVid is built to evaluate transmission-distorted videos. It has a modular structure, making it possible to exchange at users discretion both the underlying transmission system as well as the codecs, so it is applicable to any kind of coding scheme, and might be used both in real experimental set-ups and simulation experiments. The tools are implemented in pure ISO-C for maximum portability. All interactions with the network are done via two trace files. So it is very easy to integrate EvalVid in any environments. [1] - Section 1 2 1. Framework and Design In Figure 1, a complete transmission of a digital video is symbolized from the recording at the source over the encoding, packetization, transmission over the network, jitter reduction by the play-out buffer, decoding and display for the user. The evaluation of these data is done on the sender side, so the informations from the receiver have to be transported back to the sender. On the other hand it is possible to reconstruct the video to be displayed from the information available at the sender side. The only additional information required from the receiver side is the file containing the time stamps of every received packet. This is much more convenient than the transmission of the complete (errorneous and decoded) video files from the receiver side. The boxes in Figure 1 named VS, ET, FV, PSNR and MOS are the programs of which the framework actually consists. Interactions between the tools and the network (which is considered a black box) are based on trace files. These files contain all necessary data. The only file that must be provided from the user of EvalVid is the “receiver trace file”. If the network is a real link, this is achieved with the help of TCP-dump (for details see Section 4, too). If the network is simulated, then this file must be produced by the receiver entity of the simulation. – Summary from Section 2 of [1] 3 2. Functionalities 2.1. Determination of Packet and Frame Loss Packet loss Packet losses are usually calculated on the basis of packet IDs. In measurements, packet IDs are often taken from IP, which provides a unique packet id. The unique packet id is also used to cancel the effect of reordering. In the context of video transmission it is not only interesting how much packets got lost, but also which kind of data is in the packets. We have the formula to calculate the Packet Loss in MPEG video transmission: [1]-3.1 Frame loss One frame could be packetized in many packets, it’s the reason which cause frame losses. 2.2. Determination of Delay and Jitter In video transmission systems not only the actual loss is important for the perceived video quality, but also the delay of frames and the variation of the delay, usually referred to as frame jitter. Play-out buffer: This would eliminate any possible jitter at the cost of a additional delay of the entire transmission time. The other extreme would be a buffer capable of holding exactly one frame. In this case no jitter at all can be eliminated but no additional delay is introduced The formal definition of jitter as used in the paper [1] is given by Equation 3, 4 and 5. It is the variance of the inter-packet or inter-frame time. The “frame time” is determined by the time at which the last segment of a segmented frame is received. 4 2.3. Video Quality Evaluation There are basically two approaches to measure digital video quality, namely subjective quality measures and objective quality measures. Subjective quality metrics always grasp the crucial factor, the impression of the user watching the video while they are extremely costly: highly time consuming, high manpower requirements and special equipment needed. The human quality impression usually is given on a scale from 5 (best) to 1 (worst) as in Table 1. This scale is called Mean Opinion Score (MOS). Objective methods: evaluate by formulas. Objective metrics have been developed to emulate the quality impression of the human visual system (HVS), with many metric types discussed. [1]-p.7 The most widespread method is the calculation of peak signal to noise ratio (PSNR) image by image. The PSNR compares the maximum possible signal energy to the noise energy ([1]-section 3.2). PSNR measures the error between a reconstructed image and the original one ([2]-section 2). 5 3. Evalvid’s Components This section is summarized and pruned from Section 2 of [2]. Source The video source can be either in the YUV QCIF (176 × 144) or in the YUV CIF (352 × 288) formats. coding. Video Encoder and Video Decoder Currently, EvalVid only supports single layer video, included H.264/AVC VS (Video Sender) reads the compressed video file from the output of the video encoder, fragments each large video frame into smaller segments, and then transmits these segments via UDP packets over a real or simulated network. The VS component also generates a video trace file that contains information about every frame in the real video file. The video trace file and the sender trace file are later used for subsequent video quality evaluation. For examples of trace files, see [2]-p.428. ET (Evaluate Trace) The evaluation takes place at the sender side. Therefore, the information about the timestamp, the packet ID, and the packet payload size available at the receiver has to be transported back to the sender. Based on the original encoded video file, the video trace file, the sender trace file, and the receiver trace file, the ET component creates a frame/ packet loss and frame/packet jitter report and generates a reconstructed video file, which corresponds to the possibly corrupted video found at the receiver side as it would be reproduced to an end user. In principle, the generation of the potentially corrupted video can be regarded as a process of copying the original video trace file frame by frame, omitting frames indicated as lost or corrupted at the receiver side. Furthermore, the current version of the ET component implements the cumulative inter-frame jitter algorithm for play-out buffer. More details of ET include formulas, experiment results can be seen in [1]-Section 4.3. 6 FV (Fix Video) If the codec cannot handle missing frames, the FV component is used to tackle this problem by inserting the last successfully decoded frame in the place of each lost frame as an error concealment technique. 3rd party Tools TCPdump, WinDump: tools for analyzing the flow state of the transmission. 7 References [1] EvalVid - A Framework for Video Transmission and Quality Evaluation - Jirka Klaue, Berthold Rathke, and Adam Wolisz – September 2003 [2] An Evaluation Framework for More Realistic Simulations of MPEG Video Transmission CHIH-HENG KE, CE-KUEN SHIEH, WEN-SHYANG HWANG, ARTUR ZIVIANI - JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 24, 425-440 (2008) 8