slides

advertisement
Audio Definition Model
for Flexible File Formats
Dave Marston
BBC R&D
Involvement
●
EBU Groups:
●
FAR-BWF (BWF file, audio expertise)
●
MIM-MM (EBU Core, metadata expertise)
What is the Audio Definition Model?
●
●
Formalised way of describing audio for file
formats.
Initial file format will be Broadcast WAV
(BWAV).
●
Specified by EBUCore XML schema.
●
Model can be used more generally.
●
Aim to make it the primary description model
for as many formats as possible.
Future Multichannel Audio
●
Channel based
●
●
Scene based
●
●
e.g. Ambisonics
Object based
●
●
e.g. stereo, 5.1, 22.2
Audio objects with stationary or moving spatial
properties.
Combinations of all three
Cooking with Audio!
●
●
●
●
●
Audio Definition Model is like a shopping list of
ingredients.
Each ingredient has a formal description.
BWAV file is like a shopping bag containing the
actual ingredients.
BWAV 'chna' chunk is like the bar-codes on
each item.
The ADM is NOT the recipe though!
Terminology
Track
Stream
Channel
Block
Pack
Object
Type
Content
Programme
A single set of samples or data in the storage
medium.
A combination of tracks (or one track) required to
represent a channel, an object, or a group.
A single sequence of audio samples.
A division of a channel in time.
A set of audio channels that belong together.
A pack with time limited properties.
The type of audio channel, whether direct
speakers, Ambisonic component, audio object,
etc.
Objects with the actual audio.
A set of content that derived from the same
material.
Audio Definition Model Diagram
audioProgramme
audioContent
audioObject
'chna' chunk
Content
Track No
Format
audioPackFormat
audioTrackUID
audioTrackFormatIDRef
audioPackFormatIDRef
audioStreamFormat
audioChannelFormat
audioBlockFormat
audioTrackFormat
Simple Channel Based Example
PCM_FrontLeft
Channel
FrontLeft
Block
start N/A
00010001
00010001
00000001
Track
Stream
PCM_FrontRight
PCM_FrontRight
Channel
FrontRight
Block
start N/A
Pack
3.0
00010002_01
00010002
00010002
00000001
00010005
Track
PCM_Centre
Stream
PCM_Centre
Channel
Centre
Block
start N/A
00010003_01
00010003
00010003
00000001
Track
Stream
PCM_FrontLeft
00010001_01
Object
3.0
Track No
UID
TrackID
PackID
1
00000001
00010001_01
00010005
2
00000002
00010002_01
00010005
3
00000003
00010003_01
00010005
00011005
00000001
00000002
00000003
Coded Audio Example
Track
data1
Stream
DolbyE_3.0
Channel
FrontLeft
Block
start N/A
00040001_01
00040001
00010001
00000001
Track
data2
Channel
FrontRight
Block
start N/A
Pack
3.0
00040001_02
00010002
00000001
00010005
Channel
Centre
Block
start N/A
00010003
00000001
Object
3.0
Track No
UID
TrackID
PackID
00011006
1
00000001
00040001_01
00010005
00000001
00000002
2
00000002
00040001_02
00010005
Object Based Example
Track
Stream
PCM_Object1
00031001_01
PCM_Object1
Channel
Object1
00031001
00031001
Block
start 00:00
dur: 00:05
00000001
Block
start 00:05
dur: 00:08
00000002
Pack
Objects
00031001
Block
start 00:13
dur: 00:07
00000003
Track No
UID
TrackID
PackID
1
00000001
00031001_01
00031001
Object
Objects
start 00:30
dur: 00:20
00031001
00000001
XML Representation
Use new version of the EBUCore schema
<audioChannelFormat audioChannelFormatID="AC_00031001" audioChannelFormatName="Object1"
typeDefinition=”Objects”>
<audioBlockFormat audioBlockFormatID=”AB_00031001_00000001” rtime=”00:00” duration=”00:05”>
<position type=”azimuth”>-20.0</position>
<position type=”elevation”>5.0</position>
<position type=”distance”>1.0</position>
</audioBlockFormat>
<audioBlockFormat audioBlockFormatID=”AB_00031001_00000002” rtime=”00:05” duration=”00:08”>
…
</audioBlockFormat>
<audioBlockFormat audioBlockFormatID=”AB_00031001_00000003” rtime=”00:13” duration=”00:07”>
…
</audioBlockFormat>
</audioChannelFormat>
<audioStreamFormat audioStreamFormatID="AS_00031001" audioStreamFormatName="Object1"
typeDefinition=”PCM”>
<audioChannelFormatIDRef>AC_00031001</audioChannelFormatIDRef>
<audioTrackIDFormatRef>AT_00031001_01</audioTrackFormatIDRef>
</audioStreamFormat>
<audioTrackFormat audioTrackFormatID=”AT_00031001_01" audioTrackFormatName="Object1"
typeDefinition=”PCM”/>
Standard Configuration File
●
Many configurations will use common channel
types (e.g. stereo, 5.1, 22.2, Ambisonics).
Therefore use an external standard reference
XML file.
<audioChannelFormat audioChannelFormatID="AC_00010001" audioChannelFormatName="FrontLeft"
typeDefinition=”DirectSpeakers”>
<audioBlockFormat audioBlockFormatID=”AB_00010001_00000001”>
<speakerLabel>M-30</speakerLabel>
<position type=”azimuth”>-25.0</position>
<position type=”elevation”>5.0</position>
<position type=”distance”>1.0</position>
</audioBlockFormat>
</audioChannelFormat>
Custom Configuration
●
●
For non-standard channel definitions,
particularly audio objects, a custom
configuration file must file generated.
This is what is carried in the 'axml' chunk.
<audioChannelFormat audioChannelFormatID="AC_00031001“ audioChannelFormatName="Object1" typeDefinition=”Objects”>
<audioBlockFormat audioBlockFormatID=”AB_00031001_00000001” rtime=”00:00” duration=”00:05”>
<position type=”azimuth”>-20.0</position>
<position type=”elevation”>5.0</position>
<position type=”distance”>1.0</position>
</audioBlockFormat>
<audioBlockFormat audioBlockFormatID=”AB_00031001_00000002” rtime=”00:05” duration=”00:08”>
<position type=”azimuth”>-22.0</position>
<position type=”elevation”>6.0</position>
<position type=”distance”>1.1</position>
</audioBlockFormat>
<audioBlockFormat audioBlockFormatID=”AB_00031001_00000003” rtime=”00:13” duration=”00:07”>
<position type=”azimuth”>-24.0</position>
<position type=”elevation”>7.0</position>
<position type=”distance”>1.2</position>
</audioBlockFormat>
</audioChannelFormat>
What are BWAV and RF64 Files?
●
WAV is a RIFF file for audio
●
BWAV = Broadcast WAV
●
BWF = Broadcast WAV File
●
RF64 = WAV file for >4GB size files
●
BWAV have a 'bext' chunk
●
MBWF is a RF64 file with a 'bext' chunk
Chunks
●
Resource Interchange File Format (RIFF)
●
Data stored in chunks – header, length & data.
●
WAV chunks:
●
●
'RIFF' : tells you its a WAVE file
●
'fmt ' : contains sample-rate, number of channels, etc.
●
'data' : contains audio samples.
BWAV chunks:
●
'bext', 'axml', 'link', 'levl', 'mext', 'qlty', 'dbmd'
Where does the XML go?
fmt chunk
bext chunk
Refers to
chna chunk
Standard XML
Definitions
Refers to
data chunk
Custom XML
Definitions
axml chunk
is stored in
If no custom XML
definitions are used,
then no axml chunk
is required.
Standard XML
definitions do not
need to be included
in the file.
'chna' chunk
Simple 3.0 Channel Example
Track 1
Track 2
Track 3
TrackNo
audioTrackUID
1
2
3
00000001
00000002
00000003
audioTrackFormatID
00010001_01
00010002_01
00010003_01
First 4 digits specify type of stream.
0001 = PCM
audioPackFormatID
00010005
00010005
00010005
Current Status
●
●
●
EBU Tech 3364 “Audio Definition Model” now
published.
EBU Core v1.5 (EBU Tech 3293) schema
containing ADM soon to be released.
ITU Contributions being made.
Future Work
●
●
●
●
A list of standard configurations will be drawn
together.
●
Database
●
Reference XML file
Audio Object parameters need continual
refinement.
Libraries/APIs for parsing and generating ADM
metadata to be developed.
Look at streaming methods.
Download