32 Bit Floating Point Audio Processing – The Sky is the Limit By

advertisement
32 Bit Floating Point Audio Processing – The Sky is the Limit
By Aaron Mick, (mick@usc.edu, aaronmick@verizon.net), 410.330.7305.
Aaron Mick is a student of the School of Architecture at the University of Southern California. His
hobbies include songwriting, music production, and sound engineering.
Abstract:
Digital audio technology has a history of making exponential leaps – literally! Audio file formats
have increased in potential depth/quality from 8 bits, to 12, to 16, to 24, and now to a
whopping 32 bits. Over time, this steady increase in the “resolution” of the audio waveform has
allowed music artists and audio engineers to create tracks that possess increasing richness and
sound quality. However, recent advancements past 24 bits into 32 bit “float” have opened
brand-new avenues for audio engineering by enabling the restoration of distorted audio
waveforms after the fact, and enables recording and processing of super-high-volume sounds
with the most detail and life-like quality yet.
Key Terms:
-
Bit
-
Sound quality
-
Waveform
-
Dynamic Range
-
Decibel (dB)
-
Audio engineering
32 Bit Floating Point Audio Processing – The Sky is the Limit
By Aaron Mick
Introduction:
When studio recordists use microphones to record analog sounds produced by physical
instruments or voices, the sound wave and all its harmonic characteristics are captured and
converted via electrical signal into a series of bits – the basic unit of information used by
computers. Another increasingly popular method used by artists is a digital audio workstation,
which can produce a wide variety of sounds and help arrange them into a song, already
digitized. In either case, the sound wave is recorded on a disk digitally as a string of bits varying
in intonation. As this string increases in length, so does the length of the audio file. A higher “bit
rate” means that more information about the waveform (the image of the sound itself) can be
recorded in the same length of string, so to speak. “Bit depth” describes the maximum range of
differing volumes that can be recorded in a particular track at any single moment (Matsushita
790).
For example, a sound recorded at 8 bits can be reproduced using only 8 individual “slices” of
information at any single moment in the track, which translates to 256 possible levels of literal
audio energy that can be produced at any given time, resulting in a dynamic range of 48
decibels to the human ear. With each increasing “bit,” the possible levels of audio energy
double (Matsushita 787). The same sound recorded at 24 bits will be able to produce
16,777,216 levels of sound at any one moment in the track (a dynamic range of 144 decibels).
Imagine the bit depth as a standard of potential amplitudes, i.e. the range of loudness of a
sound that can be stored in the file at any point in time.
(Image source: http://tweakheadz.com/16-bit-vs-24-bit-audio)
What’s the Difference?
Higher bit depth improves audio quality by allowing a higher overall dynamic range. A recording
at 16 bits (CD quality) or above has the potential to capture both very soft and very loud sounds
in the same waveform. However, recording the same sound at a depth of 8 bits (at the same
sample rate) will truncate the waveform somewhat; high-volume sounds will hit the “ceiling”
(maximum loudness of the file) too soon, resulting in distortion and harshness when played
back, and some very low-volume sounds are likely to be underrepresented or not recorded in
the file whatsoever because their volume is below the minimum depth of the waveform. When
viewing the waveform graphically using a digital audio workstation, one will notice the pattern
of the sound wave literally moving off the “edges” of the displayable area on the screen (much
like when a line graph goes “off the charts.”)
“Sample rate” is the number of times the audio file is read per second, analogous to the
“refresh rate” on a television or computer monitor. While sample rate is another variable that
can be altered in audio files, it is a separate discussion entirely, and must be treated as a
constant control variable when experimenting with bit depth.
Why 32 bit float?
32 bit float is unique to simply “the next step up” in number of bits, due to the way the bits are
counted. “Floating point” in computing is defined as a scalable decimal point in a calculation
(Zorpette 362). This means that the maximum potential sound level of the file is not necessarily
a discrete or fixed point as in standard 16 or even 24 bit file processing; it can be “stretched”
somewhat by adding in extra matrices of bits that can adapt to the volume levels of the sound
being recorded (Aggarwal 1325). This allows audio processing hardware and software to
perform more accurate calculations that would be mathematically impossible with only 24 bits
or less, and to record audio information that would normally be lost or distorted at a lower bit
depth.
During post-production (work done after the artist has finished recording the actual sounds),
this extra “headroom” in 32 bit float allows for more flexibility in how sound samples are
handled (Kunqi 68). The most common example is the ability to remove unwanted harshness or
loudness from sounds caused by “clipping”: The occurrence of the sound exceeding the
maximum recordable loudness of the file. When this happens, the waveform will form a plateau
at the very maximum dB value along the region of the file where clipping took place. This
causes distortion, loss of information, and harsh noise when played back through speakers.
Clipping is most often caused by improper recording technique or microphone setup, but may
be unavoidable in cases where a wide dynamic range must be captured on the same track (such
as movie sets).
If the file was recorded and rendered at 24 bits or below, there is no way to restore this
recording error after the fact (Chang 9). Lowering the file’s overall volume or running it through
additional processing has little chance of producing better results. Some harshness will be
lowered in the clipped region, but the waveform itself will remain the same shape (complete
with loss of information), and no additional sound quality will be restored. However, if the file
was recorded and continually processed in 32 bit float, the waveform can be instantly restored
to its original real-life quality by simply lowering the volume on the clip in the affected region.
This is because the louder information that was “clipped” and never recorded at 24 bits actually
is not “clipped” in 32 bit float – the extra bits allow the information to still be recorded into the
32 bit file, even if the volume of the sound exceeds acceptable dynamic ranges of most
recording equipment and software (Chang 10). Although it may not be seen on-screen or heard
during recording, the louder information is still present in the 32 bit file, and can be accessed by
performing some minor adjustments in the final mix.
Clipping occurring along the selected region of the waveform of the original recording. The bits
reach their maximum loudness and flatten out along the “ceiling” of the file, creating a plateau
in the waveform where sound information/quality is essentially lost.
16 bit rendering of the same sample, with the volume adjusted. Note the loudness of the file has
been lowered so that it does not reach the maximum “ceiling” anymore, but the plateau still
exists in the clipped region of the waveform, and no information is restored.
32 bit rendering of the sample with volume adjustment. Note the waveform has been fully
restored so that it no longer exceeds the maximum loudness, and the waveform itself possesses
all the detailed peaks and valleys of the original sound (no more plateau).
(Image source: Aaron Mick)
In the end, 32 bit float allows for a maximum dynamic range of up to 1680 decibels! Extremely
loud sounds and deep changes in volume such as sound effects in movies, or a singer with an
extreme vocal capacity, can be recorded in detail without risk of losing information due to
exceeding or dropping out of the waveform’s recordable range. Processing sound samples in 32
bit float also reduces loss of quality over multiple processes (Kunqi 74). Usually a file will gain a
small amount of “noise” (a very minute reduction in sound quality) each time it is run through
an effect rack or some sort of process during post-production. Storing and processing files at 32
bit float is the most effective way to reduce noise and loss of quality during production, and
files can be processed more times without losing clarity.
Drawbacks:
Advantages during production aside, songs recorded at 32 bits will not necessarily present a
noticeable difference to the human ear in general when played back. The advantage of using 32
bit processing techniques is mainly the extra flexibility and maintenance of sound quality during
sound production, and the ability to fix errors in recording. In regards to the final mix that is
released for consumption by the audience, differences perceived in sound level and quality by
the human ear becomes almost nonexistent in formats past 24 bits (Matsushita 788). Also,
there are currently no digital-to-analog converters (the device in your .MP3 player or stereo
that converts the digital file back into electrical impulses to the speakers) capable of converting
32-bit files at their full quality – so even if the final cut of the audio track you are listening to is
recorded in full 32 bit float, you would never be able to tell by listening to its analog rendering
through speakers.
As with any technology, higher-quality mastering requires compromise in the form of a
tradeoff. Increased sound quality comes at the price of increased file size. On average, 32 bit
files take up twice as much disk space as standard 16 bit CD-quality files (Chang 10). When
multiplied over a number tracks at varying lengths, working at 32 bits can consume massive
amounts of disk space, and require more computing resources when recording, rendering, and
editing files.
Conclusion:
For now, 32 bit float is the be-all-end all of digital sound quality, and probably will be for quite
some time, due to the fact that we still cannot demonstrate its full effect without first creating
more powerful analogue devices (converters, amplifiers, speakers) that can properly represent
this level of quality during playback, and the simple idea that it would be beyond human
perception anyways. Likewise, the full potential of working in 32 bit formats digitally has not
been fully explored by audio engineers, technicians, or even artists (Kunqi 66). Currently, it is a
practical method/tool in advanced recording and processing sound during production – this
new format enables greater flexibility in recording, and allows a much higher level of detail to
be captured. 32 bits also is unique in its ability to correct mistakes that have been made during
recording, without having to re-record the sound itself. In the future, increasing bit depth may
lead to new forms of stereo systems, recording techniques, and perhaps even new discoveries
in how music and sound is made and perceived by humans.
Works Cited:
Aggarwal, Ashish. Efficient bit-rate scalability for weighted squared error optimization in audio
coding. IEEE Transactions on Audio, Speech and Language Processing, v 14, n 4, p 13131327.
Chang, Won Ryu. Design of 24 bit DSP for audio algorithms. 2008 International SoC Design
Conference, ISOCC 2008, v 3, p 9-10, 2008, 2008 International SoC Design
Conference.
Kunqi, Gao. Key Issues in Digital Audio Processing. Audio Engineering, v 35, n 7, p 65-70, 2011.
Matsushita, Y; Jibiki, T; Takahashi, H; Takamizawa, T. A 32/24 bit digital audio signal processor.
IEEE Transactions on Consumer Electronics, ISSN 0098-3063, 11/1989, Volume 35, Issue
4, pp. 785 - 792
Zorpette, G. 32 bit processing. Elettrotecnica, v 74, n 4, p 361-5, April 1987.
Download