CSC 391/691 Digital Audio and Video Spring 2009 Burg Digital Audio and MIDI Compared There are two basic ways in which sound is stored in a computer: as digital audio and as MIDI. Digital Audio Sound is produced by vibrations that cause air pressure to change over time. Mathematically, it can be represented as a function over time and graphed as a waveform. The word “boo” recorded in Audacity The amplitude of the wave corresponds to the loudness of the sound. It is customarily measured in decibels. The frequency of the wave corresponds to the pitch of the sound. Frequency is measured in Hertz. One Hertz is one cycle/second. One kilohertz, abbreviated kHz, is 1,000 cycles per second. One megahertz, abbreviated MHz, is 1,000,000 cycles/second. 1.5 1 0.5 0 -0.5 -1 -1.5 0 2 4 6 8 10 12 Sound wave in blue, two cycles shown, first cycle ending at the red vertical line Sound in digital audio format is stored as a sequence of numbers representing the amplitude of air pressure as it varies over time. Sound is converted to digital audio format by sampling and quantization. When you attach a microphone to your sound card and create a digital recording using a program like Audacity or Music Creator as the interface, the analog-to-digital converter (ADC) in your sound card is doing this process of sampling and quantization. The mic detects the changing air pressure amplitude, communicating this information to ADC at evenly spaced points in time. This is the sampling process. The sound card quantizes these values and sends them to the computer to be stored. When the sound file is played, the reverse process happens. The sound card has a digital-to-analog converter (DAC) that converts the digital samples 1 back to continuously varying air pressure, a form that can be converted to vibrations in the air – sound. When you make a digital recording, the sampling rate must be at least twice the frequency of the highest-frequency component in the sound. Otherwise, the recording you make won’t have the true frequencies in it, so it won’t sound exactly like what you were trying to record. This is called aliasing. Since the highest frequency that humans can hear is about 20,000 Hz, CD quality sampling rate is set at 44,100 samples/seconds. “Samples/second” is also abbreviated as Hertz, so CD-quality digital audio is sampled at 44.1 kHz. When samples of the air pressure amplitude of a sound are taken, they are stored in a computer as a binary number. Binary numbers – base 2, that is – consist of bits, each of which can have a value of 0 or 1. Each number in a computer is contained in a certain number of bits. This is called the bit depth. The bit depth per audio sample puts a limit on the number of values that can be represented. If two bits are used, then four values can be represented. Base 2 00 01 10 11 Base 10 0 1 2 3 If three bits are used, eight values can be represented. Base 2 000 001 010 011 100 101 110 111 Base 10 0 1 2 3 4 5 6 7 In general, if b bits are used, 2b values can be represented. Each value is used for to represent one air pressure amplitude level between a minimum and maximum. Thus, the larger the bit depth, the more precisely you can represent air pressure amplitude. If you have more bits, you don’t have to round values up or down so much to the nearest allowable value. The bit depth for CD-quality digital audio is 16 bits per channel with two stereo channels for each sample. A byte is equal to eight bits, so that’s four bytes per sample for a stereo recording and two bytes per sample for a mono recording. 2 When you record digital audio, you’ll need to choose the sampling rate and bit depth. CD-quality is probably fine for your purposes, so a sampling rate of 44.1 kHz and bit depth of 16 bits per sample is good. You don’t need to record in stereo. You can create stereo channels later in the editing if you want to. Recording in mono is fine. A digital audio recording captures exactly the sound that is being transmitted at the moment the sound is made. With a high enough sampling rate and bit depth, the resulting recording can have great fidelity to the original. For example, when a singer is being recorded, all the nuances of the performance are captured – the breathing, characteristic resonance of the voice, stylistic performance of the song, subtle shifts in timing, and so forth. This is one of the advantages of digital audio over MIDI. A disadvantage of digital audio is that it results in a large file. You can easily do the arithmetic. If you have 44,100 samples/second, four bytes per sample, a one minute recording, and 60 seconds in each minute, how many bytes of data do you get for a stereo digital recording? 44,100 samples/second * 4 bytes/sample * 60 seconds/min * 1 min = 10,584,000 bytes/min That’s about 10 megabytes per minute. Because digital audio results in such big files, it is usually compressed before distribution. Often, you keep it in uncompressed form while you are working on it, and then compress it at the end to a format like .mp3. If you want to import an uncompressed audio file into your project, you can use the .wav format. In summary, digital audio is stored as a sequence of numbers, each representing the air pressure amplitude of a sound wave at a certain moment in time. In CD-quality audio, each number is stored in two bytes (with two parallel sequences of numbers, one for each of the two stereo channels). There are 44,100 of these numbers per second for each of the two stereo channels. This creates a big file, which is why digital audio is compressed for distribution. MIDI MIDI stands for Musical Instrument Digital Interface. It is another way in which sound can be stored and communicated in a computer. The recording, encoding, and playing of MIDI is done by the interaction of three basic components: a MIDI input device – often an electronic keyboard or other MIDI-enabled instrument – which you attach to a computer; a MIDI sequencer – often a piece of software like Cakewalk Music Creator, Logic, or Pro Tools – that receives and records the message sent by the MIDI input device; and a MIDI synthesizer or sampler – e.g., the sound card of your computer or a software synthesizer bundled with a MIDI sequencer – that knows how to 3 interpret the MIDI messages and convert them into sound waves that can be played. A MIDI file does not consist of audio samples. MIDI uses a different method of encoding sound and music. When you play a note – say, middle C – on a musical keyboard attached to a computer that is running a MIDI sequencer, the keyboard sends a message that says, essentially Note On, C, velocity v When you lift your finger, a Note off message is then sent. Your MIDI keyboard doesn’t even have to make a sound. It’s just engineered to send a message to the receiving device, which in this case is your computer and the sequencer on it (e.g., Music Creator). This is different from hooking a microphone up to the computer, holding the mic close to a music keyboard, and recording yourself playing the note C as digital audio. In this case, your keyboard must make a sound in order for anything to be recorded. When you record digital audio, the microphone and ADC in your sound card are sampling and quantizing the changing air pressure amplitude caused by your striking the note C. A sequence of samples is stored, that sequence lasting however long you make the recording. If it’s one second, it will be 44,100 samples, stored as 176,400 bytes. In comparison, the MIDI message for playing the note C and then releasing it requires only four bytes. There are also messages that say what type of instrument you want to hear when the file is played back. When the MIDI is played back, it doesn’t have to sound like a piano or an electronic keyboard. You can say, in the file, that you want to hear a clarinet, flute, or any other of the 128 standard MIDI instruments. (There can be even more, but 128 instruments are always available in standard MIDI.) Each instrument is called a patch. Indicating in a MIDI file that you want a different instrument at some point is called a patch change. A whole set of 128 instruments is called a bank. When digital audio is played, the changing air pressure amplitudes that were recorded are reproduced by the playing device so that we hear the sound. MIDI is different, because no air pressure amplitudes are recorded. In the case of MIDI, a synthesizer must know how to create a note C that sounds like the instrument you specified in the file. The synthesizer could be a hardware or software device. There are a number of places where MIDI sounds can be synthesized: the sound card of your computer a software synthesizer provided by the operating system of your computer, like Microsoft GS Wavetable SW Synth 4 a software synthesizer, such as the Cakewalk TTS-1 provided by Music Creator or the many synthesizers provided by Reason The synthesizer converts the MIDI messages into a waveform representation of sound and sends it to the sound card. When the music is played, the DAC in the sound card converts the digitally-encoded waveforms into continuously varying air pressure amplitudes – vibrations – that cause the sound to be played. A MIDI synthesizer can convert MIDI messages into encoded sound waves in one of two basic ways. The sound waves can be created by mathematical synthesis, using operations that combine sine waves in ways that have been determined to create the desired pitches and timbres of instruments. Alternatively, rather than creating the waves by mathematics, a synthesizer can simply look up sound clips that have been stored in its memory bank. Some people make a distinction between these two methods. They call something a synthesizer if it uses mathematical synthesis and a sampler if it reads from a memory bank of sound samples. Some people use the term synthesizer in either case. In this course, we’ll use the terms interchangeably. For a more detailed discussion of the different types of MIDI synthesis, see http://en.wikipedia.org/wiki/Synthesizer.) Because MIDI sounds are synthesized or read from stored samples, the quality of the sound that you end up with is dependent on the quality of your synthesizer. The sound card that comes with your computer may or may not give you good MIDI sound. Different software synthesizers offer a different number of sounds with different qualities. This is why it’s good to experiment with different synthesizers if you have them available, like the ones in Music Creator and Reason. Another disadvantage of MIDI-created music is that it lacks the individuality of digitally recorded individual performances. However, you can compensate for this lack of individuality somewhat by altering the MIDI sounds with filters, pitch bends, and so forth. An advantage of MIDI is the ease with which you can make changes and corrections. One quick patch change – a click of the mouse – can change your sound from a piano to a violin. The key or tempo in which a piece is played can be changed just as easily. These changes are non-destructive in that you can set things back to their original state any time you want to. If you play music on a keyboard and record it as MIDI and you make a few mistakes, you can go in and change those notes individually, which is impossible in digital audio because the notes aren’t stored as separate entities. Mixing Digital Audio and MIDI You can combine digital audio and MIDI in a music production if you have a multi-track editor that handles both – e.g., Music Creator, Logic, or Pro Tools. Each track is designated as either a MIDI track or an audio track. You can designate different inputs and outputs to each track. 5 You can combine audio and MIDI tracks in a wide variety of ways. For example, you can record drums on one track, a violin on another, and piano on another, all in MIDI. Then you might want to record one voice on one audio track and another voice on a second audio track. All these recordings can be done at the same time or at different times (depending on the power of your computer processor to handle simultaneous multitrack recording). You can record one voice on one track and then play that track while you record a second voice in harmony. The advantage of having different instruments and voices on different tracks is that you can then edit them separately without one interfering with another. The adjustments that you want to make to one voice or instrument may not be the same as the adjustments you want to make for another. You can also separate tracks into different output channels, a way to create stereo separation. When you’re done with all your editing, you mix everything down to digital audio and compress it for distribution. However, you should keep a copy of the multi-track file in the file format of your sequencer (e.g., .cwp for Music Creator). That way you can edit it more later if you want to. 6