32 Bit Floating Point Audio Processing – The Sky is the Limit By Aaron Mick, (mick@usc.edu, aaronmick@verizon.net), 410.330.7305. Aaron Mick is a student of the School of Architecture at the University of Southern California. His hobbies include songwriting, music production, and sound engineering. Abstract: Digital audio technology has a history of making exponential leaps – literally! Audio file formats have increased in potential depth/quality from 8 bits, to 12, to 16, to 24, and now to a whopping 32 bits. Over time, this steady increase in the “resolution” of the audio waveform has allowed music artists and audio engineers to create tracks that possess increasing richness and sound quality. However, recent advancements past 24 bits into 32 bit “float” have opened brand-new avenues for audio engineering by enabling the restoration of distorted audio waveforms after the fact, and enables recording and processing of super-high-volume sounds with the most detail and life-like quality yet. Key Terms: - Bit - Sound quality - Waveform - Dynamic Range - Decibel (dB) - Audio engineering 32 Bit Floating Point Audio Processing – The Sky is the Limit By Aaron Mick Introduction: When studio recordists use microphones to record analog sounds produced by physical instruments or voices, the sound wave and all its harmonic characteristics are captured and converted via electrical signal into a series of bits – the basic unit of information used by computers. Another increasingly popular method used by artists is a digital audio workstation, which can produce a wide variety of sounds and help arrange them into a song, already digitized. In either case, the sound wave is recorded on a disk digitally as a string of bits varying in intonation. As this string increases in length, so does the length of the audio file. A higher “bit rate” means that more information about the waveform (the image of the sound itself) can be recorded in the same length of string, so to speak. “Bit depth” describes the maximum range of differing volumes that can be recorded in a particular track at any single moment (Matsushita 790). For example, a sound recorded at 8 bits can be reproduced using only 8 individual “slices” of information at any single moment in the track, which translates to 256 possible levels of literal audio energy that can be produced at any given time, resulting in a dynamic range of 48 decibels to the human ear. With each increasing “bit,” the possible levels of audio energy double (Matsushita 787). The same sound recorded at 24 bits will be able to produce 16,777,216 levels of sound at any one moment in the track (a dynamic range of 144 decibels). Imagine the bit depth as a standard of potential amplitudes, i.e. the range of loudness of a sound that can be stored in the file at any point in time. (Image source: http://tweakheadz.com/16-bit-vs-24-bit-audio) What’s the Difference? Higher bit depth improves audio quality by allowing a higher overall dynamic range. A recording at 16 bits (CD quality) or above has the potential to capture both very soft and very loud sounds in the same waveform. However, recording the same sound at a depth of 8 bits (at the same sample rate) will truncate the waveform somewhat; high-volume sounds will hit the “ceiling” (maximum loudness of the file) too soon, resulting in distortion and harshness when played back, and some very low-volume sounds are likely to be underrepresented or not recorded in the file whatsoever because their volume is below the minimum depth of the waveform. When viewing the waveform graphically using a digital audio workstation, one will notice the pattern of the sound wave literally moving off the “edges” of the displayable area on the screen (much like when a line graph goes “off the charts.”) “Sample rate” is the number of times the audio file is read per second, analogous to the “refresh rate” on a television or computer monitor. While sample rate is another variable that can be altered in audio files, it is a separate discussion entirely, and must be treated as a constant control variable when experimenting with bit depth. Why 32 bit float? 32 bit float is unique to simply “the next step up” in number of bits, due to the way the bits are counted. “Floating point” in computing is defined as a scalable decimal point in a calculation (Zorpette 362). This means that the maximum potential sound level of the file is not necessarily a discrete or fixed point as in standard 16 or even 24 bit file processing; it can be “stretched” somewhat by adding in extra matrices of bits that can adapt to the volume levels of the sound being recorded (Aggarwal 1325). This allows audio processing hardware and software to perform more accurate calculations that would be mathematically impossible with only 24 bits or less, and to record audio information that would normally be lost or distorted at a lower bit depth. During post-production (work done after the artist has finished recording the actual sounds), this extra “headroom” in 32 bit float allows for more flexibility in how sound samples are handled (Kunqi 68). The most common example is the ability to remove unwanted harshness or loudness from sounds caused by “clipping”: The occurrence of the sound exceeding the maximum recordable loudness of the file. When this happens, the waveform will form a plateau at the very maximum dB value along the region of the file where clipping took place. This causes distortion, loss of information, and harsh noise when played back through speakers. Clipping is most often caused by improper recording technique or microphone setup, but may be unavoidable in cases where a wide dynamic range must be captured on the same track (such as movie sets). If the file was recorded and rendered at 24 bits or below, there is no way to restore this recording error after the fact (Chang 9). Lowering the file’s overall volume or running it through additional processing has little chance of producing better results. Some harshness will be lowered in the clipped region, but the waveform itself will remain the same shape (complete with loss of information), and no additional sound quality will be restored. However, if the file was recorded and continually processed in 32 bit float, the waveform can be instantly restored to its original real-life quality by simply lowering the volume on the clip in the affected region. This is because the louder information that was “clipped” and never recorded at 24 bits actually is not “clipped” in 32 bit float – the extra bits allow the information to still be recorded into the 32 bit file, even if the volume of the sound exceeds acceptable dynamic ranges of most recording equipment and software (Chang 10). Although it may not be seen on-screen or heard during recording, the louder information is still present in the 32 bit file, and can be accessed by performing some minor adjustments in the final mix. Clipping occurring along the selected region of the waveform of the original recording. The bits reach their maximum loudness and flatten out along the “ceiling” of the file, creating a plateau in the waveform where sound information/quality is essentially lost. 16 bit rendering of the same sample, with the volume adjusted. Note the loudness of the file has been lowered so that it does not reach the maximum “ceiling” anymore, but the plateau still exists in the clipped region of the waveform, and no information is restored. 32 bit rendering of the sample with volume adjustment. Note the waveform has been fully restored so that it no longer exceeds the maximum loudness, and the waveform itself possesses all the detailed peaks and valleys of the original sound (no more plateau). (Image source: Aaron Mick) In the end, 32 bit float allows for a maximum dynamic range of up to 1680 decibels! Extremely loud sounds and deep changes in volume such as sound effects in movies, or a singer with an extreme vocal capacity, can be recorded in detail without risk of losing information due to exceeding or dropping out of the waveform’s recordable range. Processing sound samples in 32 bit float also reduces loss of quality over multiple processes (Kunqi 74). Usually a file will gain a small amount of “noise” (a very minute reduction in sound quality) each time it is run through an effect rack or some sort of process during post-production. Storing and processing files at 32 bit float is the most effective way to reduce noise and loss of quality during production, and files can be processed more times without losing clarity. Drawbacks: Advantages during production aside, songs recorded at 32 bits will not necessarily present a noticeable difference to the human ear in general when played back. The advantage of using 32 bit processing techniques is mainly the extra flexibility and maintenance of sound quality during sound production, and the ability to fix errors in recording. In regards to the final mix that is released for consumption by the audience, differences perceived in sound level and quality by the human ear becomes almost nonexistent in formats past 24 bits (Matsushita 788). Also, there are currently no digital-to-analog converters (the device in your .MP3 player or stereo that converts the digital file back into electrical impulses to the speakers) capable of converting 32-bit files at their full quality – so even if the final cut of the audio track you are listening to is recorded in full 32 bit float, you would never be able to tell by listening to its analog rendering through speakers. As with any technology, higher-quality mastering requires compromise in the form of a tradeoff. Increased sound quality comes at the price of increased file size. On average, 32 bit files take up twice as much disk space as standard 16 bit CD-quality files (Chang 10). When multiplied over a number tracks at varying lengths, working at 32 bits can consume massive amounts of disk space, and require more computing resources when recording, rendering, and editing files. Conclusion: For now, 32 bit float is the be-all-end all of digital sound quality, and probably will be for quite some time, due to the fact that we still cannot demonstrate its full effect without first creating more powerful analogue devices (converters, amplifiers, speakers) that can properly represent this level of quality during playback, and the simple idea that it would be beyond human perception anyways. Likewise, the full potential of working in 32 bit formats digitally has not been fully explored by audio engineers, technicians, or even artists (Kunqi 66). Currently, it is a practical method/tool in advanced recording and processing sound during production – this new format enables greater flexibility in recording, and allows a much higher level of detail to be captured. 32 bits also is unique in its ability to correct mistakes that have been made during recording, without having to re-record the sound itself. In the future, increasing bit depth may lead to new forms of stereo systems, recording techniques, and perhaps even new discoveries in how music and sound is made and perceived by humans. Works Cited: Aggarwal, Ashish. Efficient bit-rate scalability for weighted squared error optimization in audio coding. IEEE Transactions on Audio, Speech and Language Processing, v 14, n 4, p 13131327. Chang, Won Ryu. Design of 24 bit DSP for audio algorithms. 2008 International SoC Design Conference, ISOCC 2008, v 3, p 9-10, 2008, 2008 International SoC Design Conference. Kunqi, Gao. Key Issues in Digital Audio Processing. Audio Engineering, v 35, n 7, p 65-70, 2011. Matsushita, Y; Jibiki, T; Takahashi, H; Takamizawa, T. A 32/24 bit digital audio signal processor. IEEE Transactions on Consumer Electronics, ISSN 0098-3063, 11/1989, Volume 35, Issue 4, pp. 785 - 792 Zorpette, G. 32 bit processing. Elettrotecnica, v 74, n 4, p 361-5, April 1987.