DAC50, Designer Track, 156-VB543 Parallel Design Methodology for Video Codec LSI with High-level Synthesis and FPGA-based Platform Kazuya YOKOHARI, Koyo NITTA, Mitsuo IKEDA, and Atsushi SHIMIZU NTT Media Intelligence Laboratories 6/5/2013 Copyright(c) 2013 Nippon Telegraph and Telephone Corporation 1 Outline • • • • • Introduction Proposed Design Methodology Case Study: 4K HEVC Intra Codec Evaluation Conclusion 6/5/2013 Copyright(c) 2013 Nippon Telegraph and Telephone Corporation 2 Video Codec LSI • MPEG-2 and H.264/AVC are major standards of video coding. • We have developed MPEG-2 video codec LSI (VASA) and H.264/AVC codec LSI (SARA). • The development of video codec LSI needs many simulations. Bit Stream Codec LSI Test data VASA SARA (MPEG-2) (H.264/AVC) (Coded Image) Objective evaluation examples: BD-Bitrate, SSIM, PSNR • Coded image should be evaluated by subjective and objective evaluation. • Degradations of some coded images are not detected by objective evaluation. • Subjective evaluation in real-time is important to find these degradations. 6/5/2013 Copyright(c) 2013 Nippon Telegraph and Telephone Corporation 3 Existing LSI Design Flow • Even behavioral design which is fastest simulation environment needs 100 times simulation time, at the existing design flow. • Fast simulation environment is important, since many simulations are needed at the video codec LSI design. SystemC source codes Behavioral design Fail Verification Existing architecture exploration loop X100 (on CPU) Pass Stimulus Behavioral Synthesis RTL design Verilog-RTL codes Verification Fail Pass Verilog-RTL codes (already verified) ASIC 6/5/2013 Simulation Speed Logic Synthesis Gate-level design X10,000 (on CPU) X1,000 (on emulator) P&R FPGA Technology Library X1,000 (on CPU) X100 (on emulator) IP core Copyright(c) 2013 Nippon Telegraph and Telephone Corporation 4 The Problems of The Video Codec LSI Development • Many simulations are needed at the development of the video codec LSI. • The simulation needs 100 times simulation time at the existing LSI design. • To resolve above problems, simulation and circuit design environments are important to check and improve codec LSI performance smoothly. • Simulation environment: FPGA-based platform. Real-time simulation becomes possible using FPGA. • Circuit design environment: High-level synthesis. Rapid prototyping becomes possible using high-level synthesis. 6/5/2013 Copyright(c) 2013 Nippon Telegraph and Telephone Corporation 5 Video Codec Design Platform • The video codec design platform is able to run large scale circuit simulation in real-time using many FPGAs. • The proposed platform enables input and output image data in real-time using some SDI interfaces. FPGA1 FPGA2 FPGA (Center) FPGA3 FPGA4 SDI interface • The proposed platform has many FPGAs, since the scale of a product level video codec LSI is very large. • This platform enables simulations of a product level circuit using many FPGAs. 6/5/2013 Copyright(c) 2013 Nippon Telegraph and Telephone Corporation 6 Proposed Video Codec Design Flow (1/2) • Proposed design flow enables rapid prototyping using high-level synthesis. • Proposed design flow enables real-time simulation using the proposed platform. • Feedback time is needed by repetition of each design steps when single architecture exploration loop is used. SystemC source codes Behavioral design Fail Verification Existing architecture exploration loop Proposed architecture exploration loop X100 (on CPU) Pass Stimulus Behavioral Synthesis RTL design Verilog-RTL codes Verification Fail Pass Verilog-RTL codes (already verified) ASIC 6/5/2013 Simulation Speed Logic Synthesis Gate-level design X10,000 (on CPU) X1,000 (on emulator) P&R FPGA Technology Library X1,000 (on CPU) X100 (on emulator) IP core X1 (on video codec design platform) Copyright(c) 2013 Nippon Telegraph and Telephone Corporation 7 Proposed Video Codec Design Flow (2/2) • Circuits design is subdivided and parallel design is performed, in order to reduce feedback time by repetition of each design steps. • Using parallel design, architecture exploration is realized at high speed. SystemC source codes Behavioral design Fail Verification Existing architecture exploration loop Proposed architecture exploration loop X100 (on CPU) Pass Stimulus Behavioral Synthesis RTL design Verilog-RTL codes Verification Fail Pass Verilog-RTL codes (already verified) ASIC 6/5/2013 Simulation Speed Logic Synthesis Gate-level design X10,000 (on CPU) X1,000 (on emulator) P&R FPGA Technology Library X1,000 (on CPU) X100 (on emulator) IP core X1 (on video codec design platform) Copyright(c) 2013 Nippon Telegraph and Telephone Corporation 8 Summary of The Proposed Design Methodology The proposed parallel design methodology has three features. 1. High-level synthesis. – Using high-level synthesis, a target circuit architecture can be easily changed and tuned compared with a RTL design methodology. 2. Video codec design platform. – Using video codec design platform, a subjective image evaluation can be performed, since the proposed platform can perform simulation in real-time. 3. Parallel design. – Using parallel design and high-level synthesis, the function addition in smaller unit becomes possible that leads to the reduction of a feedback time. Combining these three features, an effect of subjective image quality for each function can be evaluated and used for architecture exploration. 6/5/2013 Copyright(c) 2013 Nippon Telegraph and Telephone Corporation 9 Case Study: 4K HEVC Intra Codec • HEVC (High Efficiency Video Coding) is a next generation video coding standard. • HEVC intra codec consists of three blocks, intra prediction, transform and quantization, and entropy coding block. Input Data Intra Prediction Transform and Quantization Entropy Coding Output Stream Video Coding Intra Prediction generates prediction difference image from input data and predicted image data. 6/5/2013 Transform and Quantization generates quantized values from transformed difference image and reconstruction image from quantized values. Entropy Coding generates bit stream from quantized values. Copyright(c) 2013 Nippon Telegraph and Telephone Corporation 10 The Specifications of the HEVC Intra Codec STEP1 STEP2 (LOOP#1) STEP2 (LOOP#2) STEP2 (LOOP#3) Intra Prediction •PU: 32x32 •Prediction Mode: 4 •PU: 64x64, 16x16 Transform and Quantization •TU: 32x32 •TU: 16x16 Entropy Coding •CU: 32x32 •CU: 64x64 Base Algorithm •HM3.0 •Prediction Mode: 7 •HM7.0 This slide’s scope. *CU stands for Coding Unit. *PU stands for Prediction Unit. *TU stands for Transform Unit. *HM is a reference software of HEVC • Prediction Mode 18 26 34 10 0: Planar 1: DC 2 6/5/2013 Copyright(c) 2013 Nippon Telegraph and Telephone Corporation 11 Evaluation (1/2) Area Circuits Performances and Design Period 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 STEP1 1 2 3 STEP2 LOOP#1 Subjective Evaluation Period 4 5 6 7 8 9 10 11 Area 1.2 1 The main changed points of each block. 0.8 • LOOP#1: Version up base algorithm of each block 0.6 • LOOP#2: Functional expansion of IPD 0.4 • LOOP#3: Functional expansion of each block 0.2 Cycle 500000 IPD 450000 TQ 400000 EC 350000 Cycle 300000 250000 200000 150000 100000 50000 0 17 18 Cycle 3500 Subjective Evaluation Period 12 13 14 15 16 Feedback data is available 3000 2500 2000 1500 1000 500 0 STEP2 LOOP#2 0 1 Area 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Cycle 3500 1.2 •1 The circuit performances of each expanded 0.8 function are evaluated at STEP2. 0.6 • The feedback data is available from other design 0.4 loops at STEP2. 0.2 3000 2500 STEP2 LOOP#3 2000 1500 1000 500 0 0 1 6/5/2013 2 3 4 5 6 7 8 9 10 11 Design Period (Month) 12 13 14 15 16 17 18 Copyright(c) 2013 Nippon Telegraph and Telephone Corporation 12 Evaluation (2/2) • Using the proposed parallel design methodology, three design loops were able to be tried in only seven months. • Using the proposed parallel design methodology, the number of cycle*area was reduced to 1/5 in four months after preliminary design of the LOOP#1 and 1/4 in three months after preliminary design of the LOOP#2. Cycle*Area 1.2 STEP1 STEP2 STEP1, STEP2(LOOP#1) STEP2(LOOP#2) STEP2(LOOP#3) 1 0.8 0.6 LOOP#1 80% down (four months) 90% down 0.4 LOOP#2 75% down (three months) 0.2 0 1 6/5/2013 2 3 4 5 6 7 8 9 10 11 Design Period (Month) 12 13 14 15 16 17 18 Copyright(c) 2013 Nippon Telegraph and Telephone Corporation 13 Conclusion • We proposed that the new design methodology for video codec LSI. Using the proposed design methodology, we are able to reduce feedback time and run simulation and evaluate coded image in real-time. • Using the proposed design methodology, three design loops were able to be tried in only seven months. • Using the proposed design methodology, the number of cycle * area was reduced to 1/5 in four months after preliminary design of the LOOP#1 and 1/4 in three months after preliminary design of the LOOP#2. • In order to realize a HEVC codec, we need to add or expand some functional tools, checking subjective evaluation of these tools. 6/5/2013 Copyright(c) 2013 Nippon Telegraph and Telephone Corporation 14