Reproduction of 22.2 multichannel audio with virtual rendering ITU-R Workshop “Topics on the Future of Audio in Broadcasting” Wed 15th July 2015 Popov Room 16:30 - 20:00 Satoshi OODE Science and Technology Research Laboratories Japan Broadcasting Corporation (NHK) Overview 8K SHV broadcasting service ◦ is planned to begin in 2016. ◦ is composed of stereo, 5.1ch and 22.2ch with metadata related to dialogue level control. Requirement of 22.2 multichannel audio ◦ Three dimensional spatial impression is achieved by loudspeakers placed in 30-45 degree intervals. Reproductions of 22.2 multichannel audio for home use ◦ Theatrical environment using 24 loudspeakers or more. ◦ Rendering to other channel configurations such as 9.1ch. ◦ Loudspeakers integrated with the display using virtual rendering (Binaural reproduction over loudspeakers). ◦ Headphone using virtual rendering (Binaural reproduction). 22.2 multichannel audio specified in Rec. BS.2051 The 22.2 multichannel sound system ◦ is specified as system H in Rec. BS.2051. ◦ consists of three layers. Top layer: 9 channels including overhead loudspeaker. Middle layer: 10 channels. Bottom layer: 3 channels including 2 LFE channels Top layer 9 channels Middle layer 10 channels TpFC TpFL FLc TpFR TpC FC FRc FL FR SiL SiR TpBR BL TpBC LFE1 TpSiR TpSiL TpBL Bottom layer 3 channels + 2 LFE BR BC BtFL BtFC LFE2 BtFR Roadmap of 8K SHV broadcasting service 8K SHV pilot broadcasting service is planned to begin in 2016. Recommendations BT.2020 and BS.2051 were developed and some related recommendations were revised. ARIB Standard B32 which specifies audio coding was also updated in Japan. London Olympics Rio Olympics Tokyo Olympics (2000) 2004 2008 2012 2014 2016 2018 2020 R&D Start Pilot broadcasting Rec. BT.2020 Rec. BS.2051 Full service Rec. BT.2077 (Rec. BS.1770) Rec. BS.1116, BS.1196 ARIB STD-B32 Audio service in 8K SHV broadcasting The revised ARIB Standard STD-B32 provides specifications of audio coding for the advanced satellite broadcasting system. New features are as follows. ◦ Transmission of the down-mixing coefficients for each programme. ◦ Dialogue level control and dialogue replacement. ◦ Audio coding including lossless transmission. NHK plans to broadcast 8K SHV… ◦ ◦ ◦ ◦ using MPEG-4 AAC in satellite broadcasting. using stereo, 5.1ch and 22.2ch audio formats. with metadata related to down-mixing for each programme. with metadata related to dialogue level control function. The information is reported in Report ITU-R BS.2159-7. Dialogue level control function Many complaints with regard to the intelligibility of dialogue although “Dialogue” is the most important contents. The listeners can separately control the level of dialogue and that of total level. Dialogue Dialogue BGM+SE BGM+SE Dialogue BGM+SE Dialogue level is increased. Dialogue BGM+SE Dialogue level is decreased. In traditional channel-based, total level can be changed. In 8K SHV broadcasting, relative dialogue level will be able to be changed. Dialogue level control in 22.2 multichannel audio Some channels are used only for dialogue depending on individual programmes. Dialogue channels have a flag of “dialogue” TpFC TpFL FLc TpFR TpC FC FRc LFE1 FL FR SiL SiR BtFL BtFC LFE2 BtFR TpSiR TpSiL TpBR BL TpBL TpBC Top layer 9 channels BR BC Middle layer 10 channels Bottom layer 3 channels + 2 LFE Metadata for dialogue level control specified in ARIB STD-B32 Descriptor ext_dialogue_status num_dialogue_chans num_additional_lang_chans dialogue_src_index[i] dialogue_main_lang_comment_bytes dialogue_main_lang_comment_data dialogue_main_lang_code dialogue_additional_lang_code[i] dialogue_additional_lang_comment_ bytes[i] dialogue_additional_lang_comment_ data[i] dialogue_gain_index[i] sn_dialogue_plus_index sn_dialogue_minus_index additional_dialogue_data_sync additional_dialogue_index Explanation Existence of dialogue channels. Number of main dialogue channels. Number of alternative dialogue channels. Index of dialogue channels. Byte count of characters indicating content of main dialogue. Byte data of characters indicating content of main dialogue. Language code of main dialogue. Language code of ith alternative dialogue. Byte count of characters indicating content of ith alternative dialogue. Byte data of characters indicating content of ith alternative dialogue. Gain of alternative dialogue channels. (0000: 0 dB, 0001: –1 dB, 0010: –2 dB, ..., 1110: –14 dB, 1111: –∞ dB) Maximum gain of dialogue channels in receiver. (000: 0 dB, 001: +3 dB, 010: +6 dB, ..., 110: +18 dB, 111: +∞ dB) Minimum gain of dialogue channels in receiver. (000: 0 dB, 001: – 3 dB, 010: –6 dB, ..., 110: –18 dB, 111: –∞ dB) Data stream element in which alternative dialogue data is packed. Index of alternative dialogue channels corresponding to the “i” of dialogue_additional_lang_code[i]. Limitation of dialogue level control is important for broadcaster because dialogue source is not always clean. Viewing angle of 8K Super Hi-Vision System parameters are specified in Rec. ITU-R BT.2020 8K SHV has viewing angle of 100° in azimuth and 60° in elevation when the listener is positioned at 0.75 Height of the display. 8K SHV requires wider and higher sound fields than 5.1ch or stereo to match the visual image with the sound image . Left Right Sound Stage 0.75H 22.2ch Viewing distance 1.5H Stereo 5.1ch Characteristics of 22.2 multichannel audio 5.1 22.2 Stable localization of frontal sound over the entire area of the largescreen image Sound image reproduced in all directions around the listener, including elevation 3D spatial impression augmenting the listener’s sense of reality Wide listening area with excellent sound quality Compatible with existing multichannel sound systems Suitable for live recording, mixing, and transmission Loudspeaker Intervals -Requirements of 22.2 multichannel audioReference Top layer (45°) (40 subjects) Test item Middle layer (0°) (20 subjects) 24 loudspeakers 15 degree intervals 8 loudspeakers 45 degree intervals Spatial impression by 8 loudspeakers is almost the same as that by 24 loudspeakers. Mean difference grade score 0.50 0.00 -0.50 -1.00 -1.50 -2.00 -2.50 -3.00 -3.50 12 8 6b 6a 4b 4a 3b 3a (30゚)(45゚) ( 60゚ ) ( 90゚ )( 120゚ Loudspeaker arrangement (Loudspeaker intervals) ) Reproductions of 22.2 multichannel audio for home use Theatrical environment using 24 loudspeakers or more. Rendering to other channel configurations such as 9.1ch. Down-mixing to stereo or 5.1ch Loudspeakers integrated with display using virtual rendering. Headphone with virtual rendering (Binaural). Theatrical environment using 24 loudspeakers or more 22.2 multichannel audio is usually reproduced by 24 loudspeakers Additional loudspeakers are added depending on listening environment. Subwoofers of base management for full range channels (room size). Full range loudspeakers on the side to keep uniformity (room shape). Not absolute positions of channels but relative positions are important. Ideal environment FL FL FL Rendering to other channel configurations such as 9.1ch L Rendering based on channel position When the number of loudspeaker is large, rendering is used. R stereo Down-mixing When 7.1ch, 5.1ch or stereo is used, down-mixing coefficients are used to prevent making dialogue unclear due to spatial masking. L 5.1ch The Ls spatial impression is deteriorated with decreasing number of loudspeakers. L R Ls Rendering L Rs FC FL SiL FR TC BL SiR BR BC C R Ls Rs Down-mix C R 9.1ch Rs FLc FC FRc FL FR SiL 22.2ch SiR BL BR BC FC FL FR Loudspeakers integrated with display using virtual rendering – How to introduce 8K SHV Audio into the home environment High-quality sound requires 24 discrete loudspeakers. 12 loudspeakers integrated with 8K Installing 24 loudspeakers is SHV Flat Panel Display over equipped for home environment. – Compact and convenient system Loudspeaker system should be integrated with SHV display. – 12 loudspeakers system integrated with 85 inches SHV FPD was developed. loudspeaker Loudspeakers integrated with display using virtual rendering For front 11 channels 8 channels around the display are directly reproduced. (marked as red circles) 12 loudspeakers integrated with 8K SHV Flat Panel Display 3 front channels on the display are reproduced using amplitude panning of vertical pair-wise. Vertical pair-wise panning method Horizontal pair-wise panning method loudspeaker Loudspeakers integrated with display using virtual rendering 11 side, back and overhead channels around the listener are reproduced by binaural reproduction over 11 front loudspeakers. 12 loudspeakers integrated with 8K SHV Flat Panel Display (* to simulate acoustical propagation characteristics from the loudspeaker to each ear.) Binaural reproduction over loudspeakers has been studied since the 1960s. Some studies suggested horizontally arrayed loudspeakers can operate binaural reproduction very well. The system realizes immersive audio with only front loudspeakers. loudspeaker Loudspeakers integrated with display using virtual rendering Binaural Reproduction over loudspeakers TpSiL TpC z1 SiL z2 SiR TpBR 22.2 multichannel sound field* is simulated using HRTFs corresponding to each loudspeaker BR *11 side, back and overhead channels FTpBL1 TpBL BL TpSiR FTpBL 2 TpBC BC x1 x2 Loudspeakers integrated with display using virtual rendering Binaural Reproduction over loudspeakers x1 x2 Original sound field H G12 G11 y1 Acoustic crosstalk G21 y1 G11 G12 x1 y = G G H x 2 21 22 2 Crosstalk cancelation is achieved by unit matrix G22 y2 Reproduced sound field Condition number is one of the factors for system stability Loudspeakers integrated with display using virtual rendering – Condition number is one of the factors which indicates the stability of the binaural reproduction. – Increase of the number of loudspeaker realizes more stable binaural reproduction regardless of loudspeaker configuration Two loudspeakers Four loudspeakers Six loudspeakers Twelve loudspeakers 8 7 7 7 7 6 6 6 6 Condition Cond(G) numbers 5 5 3 1.5 1 Frequency [Hz] 2 x 10 4 1 4 3 2 0.5 5 4 3 2 1 5 4 4 Condition Cond(G) numbers 8 Condition Cond(G) numbers 8 Condition Cond(G) numbers 8 3 2 0.5 1.5 1 Frequency [Hz] 2 loudspeaker control 2 x 10 1 4 ≦ 2 0.5 1.5 1 Frequency [Hz] 2 x 10 1 4 12 loudspeaker control 0.5 1.5 1 Frequency [Hz] 2 x 10 4 Headphone with virtual rendering (Binaural reproduction) For mobile or personal use, 22.2 multichannel headphone processor was developed. 22.2 multichannel sound field is simulated using HRTFs corresponding to each loudspeaker. The system in which audio engineer’s HRTFs are installed has already used in 22.2 multichannel sound production. BtFC FC TpFC FLc FRc TpFL BtFR BtFL TpFR FL FR Tp TpSiL TpSiR C z1 z2 SiL FTpBL1 TpBL F TpBL 2 BL 22.2 multichannel headphone processor B C x1 SiR TpBR TpBC BR x2 Conclusion 8K SHV broadcasting service ◦ is composed of stereo, 5.1ch and 22.2ch, ◦ with metadata related to dialogue level control. toward personalization, especially for the older person. toward adaptation of the listening environment. Reproductions of 22.2 multichannel audio for home use ◦ Theatrical environment using 24 loudspeakers or more. ◦ Rendering to other channel configurations such as 9.1ch. ◦ Loudspeakers integrated with the display using virtual rendering (Binaural reproduction over loudspeakers). ◦ Headphone using virtual rendering (Binaural reproduction). Thank you for your attention