ECE 408 Final Project Fall 2013 Parameters • • • • Groups of 3 preferred Groups of 1-2 possible w/ prior approval Look for a group on Piazza 2 Project options – HEVC Intraframe Prediction Competition – A topic from your own research Competition • Intraframe prediction for HEVC video encoder • http://x265.org/ • Fixed task, groups compete to see who can build the fastest implementation • Evaluation metric will be a weighted mixture of PCIe I/O time and total time • Winning team gets iPads – Sponsored by MultiCoreWare Intra(frame) Prediction • Part of H.265 (aka HEVC) video format – Successor to H.264, most popular current format – Achieves higher PSNR with lower bitrate by using more computationally expensive methods • Idea: Real video frames exhibit structure – A pixel’s color can be predicted from the color of its neighbors within the same frame (intraframe) or from recent frames (interframe) – Encode a block of pixels as a prediction mode + a residual or delta from that prediction – Should be smaller than coding pixel values directly (compression) HEVC Intra Prediction Modes • Frames are processed in 4x4 – 64x64 blocks of pixels in (mostly) topleft to bottom-right order – We can use the (previously processed) upper and left neighboring pixels to estimate (predict) the current block of pixels • Video consists of 1 luma and 2 chroma channels (YCC colorspace) – 4:2:0 subsampling means luma is at 2x the x and y resolution – Prediction is done separately for all 3 channels • Three patterns that are seen a lot in video are flat regions, smooth gradients, and straight edges • We can predict a block of pixels as: – The average of its neighbors (DC) – A smooth gradient based on its neighbors (Planar) – A linear extension of its neighbors in one of 33 directions (Angular) • 35 total modes (up from 8 in H.264, DC + 8 Angular) DC Mode Don’t Care Left Neighbor Don’t Care Top Neighbor Current Block • Predict that all pixels in the block are the average of the edge pixels of top and left neighbor blocks • Good at compressing flat regions (one color) Planar Mode Don’t Care Left Neighbor • • • Don’t Care Top Neighbor Current Block Predict that the block forms a smooth gradient defined by its top and left neighbors Computed by average of two linear interpolation (less expensive than bilinear) Good at compressing smoothly varying regions Angular Modes • 33 directions • More coverage close to horizontal and vertical • Those directions are more common in real video Angular Modes • Extend neighbor pixels into current block at specific angle – Good at compressing areas with straight edges • Often need to linearly interpolate between 2 neighbor pixels • Formulated such that it can be done in integer arithmetic Angular Modes H.264 HEVC 11% Lower Bitrate SATD • Sum of Absolute Differences (SAD) is a simple way of measuring the disparity between two blocks of pixels • Sum of Absolute Transformed Distances (SATD) does a Hadamard transform on the differences before summing – More computationally complex – Correlates better with subjective and objective (PSNR) metrics • SATD on an 8x8 block is commonly called SA8D Your Task • For 4x4, 8x8, 16x16, 32x32, 64x64 pred. blocks: – Assume the entire frame is a regular grid – For each luma and chroma block: • For each of the 35 prediction modes: – Use reference pixels directly for neighbors (no reconstruction) – Compute predicted pixel values – Compute SATD between prediction and reference pixels • Return list of <mode, SATD> tuples sorted by SATD (best to worst prediction) • Your kernel may operate on one or multiple frames Infrastructure • We will provide a code skeleton and test harness, as with the labs • We will link to resources with high-level and low-level explanations of intra prediction • The existing serial and vectorized x265 code is also a good reference • Your code should compile cleanly and run on the GEM cluster’s C2050s – We may get a newer (Kepler) evaluation machine Evaluation • We will measure total prediction time and time for memcpy()s to and from the GPU • Final metric will be a weighted average of total time and I/O time (exact weights TBA) • Each member of the winning team by this metric will receive an iPad Additional Challenge • Two related challenges not counted towards the competition and course grade are also available: – DCT Primitives – Loop Filters • Teams can win iPads if for one of these two challenges if they: – – – – Meet performance standards (TBA) Perform better than any other team Meet code quality standards Contribute code to open source repository DCT Primitives • List of Primitives: – Discrete Cosine Transform – Quantization – Dequantization – Inverse Discrete Cosine Transform Loop Filters • Deblocking Filter: – Block coding results in sharp edges in image Courtesy of wikipedia.org Loop Filters • Deblocking Filter: – Block coding results in sharp edges in image – DBF removes edges between blocks Courtesy of wikipedia.org Loop Filters • Deblocking Filter: – Block coding results in sharp edges in image – DBF removes edges between blocks • Sample Adaptive Offset (SAO) Filter: – Reconstruct original amplitudes using offsets – Band filter: categorize samples into 32 bands – Edge filter: add offsets depending on neighbors Infrastructure • Infrastructure similar to competition will be provided • Less support than competition Dates • November 31: Project Proposals due – Only for students not doing the competition – Oral in class (5 slides / 10 min) • Week of November 18: Progress Reports – Appointment with course staff (15 min) • December 16: Final Project Presentations • December 18: Final Project Report due