MUSIC 318 MINI-COURSE ON SPEECH AND SINGING 4. RHYTHM, PROSODY, TONE, LANGUAGE Science of Sound, Chapter 16 Springer Handbook of Acoustics, Chapter 16 RHYTHM A STRIKING CHARACTERISTIC OF A FOREIGN LANGUAGE IS ITS RHYTHM. ENGLISH, RUSSIAN, ARABIC AND THAI ARE STRESS-TIMED LANGUAGES. STRESSED SYLLABLES RECUR AT APPROXIMATELY EQUAL INTERVALS. SYLLABLES MOST OFTEN END WITH A CONSONANT. FRENCH, SPANISH, GREEK, ITALIAN, YORUBA AND TELEGU ARE SYLLABLE TIME LANGUAGES. SYLLABLES RECUR AT APPROXIMATELY EQUAL INTERVALS. SYLLABLES OFTEN END WITH A VOWEL. RHYTHMIC PATTERNS CAN BE USED TO SIGNAL DIFFERENCES IN SYNTACTIC STRUCTURE. COMPARE: 1. The 2000-year-old skeletons 2. The two 1000-year-old skeletons PROSODY IN LINGUISTICS, PROSODY IS THE RHYTHM, STRESS, AND INTONATION OF SPEECH. PROSODY MAY REFLECT VARIOUS FEATURES OF THE SPEAKER OR THE UTTERANCE, THE EMOTIONAL STATE OF A SPEAKER, WHETHER THE UTTERANCE IS A STEMENT, A QUESTION, OR A COMMAND; WHETHER THE SPEAKER IS BEING IRONIC OR SARCASTIC; EMPHASIS, CONTRAST AND FOCUS. IN TERMS OF ACOUSTICS, THE PROSODICS OF ORAL LANGUAGES INVOLVE VARIATION IN SYLLABLE LENGTH, LOUDNESS, PITCH, AND THE FORMANT FREQUENCIES OF SPEECH SOUNDS. PROSODY IS OF GREAT INTEREST IN AUTOMATIC SPEECH RECOGNITION DECLARATIVE, INTEROGATIVE, IMPERATIVE DECALARATIVE: “You are going home” INTEROGATIVE: “You are going home?” (voice is raised at end of sentence) IMPERATIVE: “You ARE going home!” (are is emphasized) EMOTIONAL STATE OF THE SPEAKER PROSODIC FEATURES TEND TO INDICATE THE EMOTIONAL STATE OF THE SPEAKER. “RAISING ONE’S VOICE “ IN ANGER, FOR EXAMPLE, INCREASES BOTH LOUDNESS AND PITCH. A STATE OF EXCITEMENT FREQUENCY CAUSES AN INCREASE IN THE RATE OF SPEAKING. ATTEMPTS HAVE BEEN MADE TO ACCOMPLISH ACOUSTIC “LIE DETECTION” BY ANALYZING THE PROSODIC FEATURES OF RECORDED SPEECH FOR EVIDENCE OF STRESS EFFECT OF EMOTION ON PHONATION FREQUENCY PHONATION FREQUENCY vs TIME FOR THREE ACTORS SPEAKING THE SAME SENTENCE (“For God’s sake!”) IN FOUR DIFFERENT MODES (Williams and Stevens 1972) EFFECT OF EMOTION ON PHONATION FREQUENCY MEDIAN AND RANGE OF THE PHONATION FREQUENCY FOR THREE ACTORS SPEAKING THE SAME SENTENCE: S=SORROW; N=NEUTRAL; F=FEAR; A=ANGER RADIO ANNOUNCER SPEAKING BEFORE (top) AND AFTER (bottom) THE CRASH OF THE HINDENBURG DIRIGIBLE (1937) STRESS SPECTOGRAMS OF THE WORD “SQUEAL” SPOKEN WITH FOUR DEGREES OF STRESS IN RESPONSE TO A LIST OF QUESTIONS (Brownlee 1996) TONE IN SOME LANGUAGES, SUCH AS CHINESE, A PHONEME CAN TAKE ON DIFFERENT MEANINGS DEPENDING ON ITS TONE. THE FOUR TONES IN MANDARIN CHINESE ARE SHOWN VOICE QUALITY VOICE QUALITY IS A BROAD TERM THAT REFERS TO THE EXTRALINGUISTIC ASPECTS OF A SPEAKER’S VOICE WITH REGARD TO IDENTITY, PERSONALITY, HEALTH, AND EMOTIONAL STATE. VOCAL FOLD MASS, VOCAL TRACT LENGTH, TRACHEAL LENGTH, JAW AND TONGUE SIZE, AND NASAL CAVITY VOLUME MAY INDICATE INFORMATION ABOUT AGE, SEX, PHYSIQUE, AND HEALTH. “High fidelity on the line: please say ‘ahh’” THIS IS THE TITLE OF AN INTERESTING ARTICLE BY STEN TERNSTRÖM IN THE FALL 2008 ISSUE OF ECHOES. SPECTRA OF SPEECH SOUNDS ARE ESPECIALLY RICH UP TO 4000 Hz, AND FALL OFF RAPIDLY ABOVE 5000 Hz. BUT HIGH HARMONICS CAN BE MEASURED UP TO 20 kHz. EARLY TELEPHONES TRANSMITTED ONLY 300-3500 Hz WITH LITTLE LOSS IN INTELLIGIBILITY (SEE FILTERED SPEECH IN LESSON 3). IN 2000, A WIDE-BAND STANDARD FOR TELEPHONY WAS DEFINED UP TO 7 000 Hz, A BIG IMPROVEMENT OVER THE OLD “TELEPHONE SOUND.” HOPEFULLY CELL-PHONE SOUND WILL SOON SOUND MUCH BETTER. VOICES HEARD IN LIVE PERFORMANCE MAY SOUND A LITTLE “DULL” OF “FADED” BEYOND THE 15TH ROW, BECAUSE HIGH FREQUENCIES ARE SLIGHTLY DIMINIISHED. NORMAL, “YAWNY”, AND “TWANGY” VOICE Story, Titze, and Hoffman (2001) did a 3-dimensional study of the vocal tract using MRI to determine the shape when vowels /i/, /ae/, /α/, and /u/ were spoken with NORMAL, “YAWNY”, and “TWANGY” voice. Relative to NORMAL speech, the ORAL CAVITY is widened and the TRACT is lengthened for YAWNY vowels. F1 and F2 moved closer together. TWANGY vowels were characterized by shortened TRACT length, widened LIP OPENING, and a slightly constricted ORAL CAVITY. F1 and F2 moved farther apart. Story, Titze and Hoffman, 2001) Story, Titze Hoffman, 2001) ACCENTS “TWO COUNTRIES SEPARATED BY A COMMON LANGUAGE” Have you ever misunderstood someone or been misunderstood by someone who speaks with a different accent? The sounds that an American hears as 'Bob the clerk' may be heard by an Australian as 'barb the clock'. The two most important parameters in determining different vowel sounds are the first two formants, which are frequency bands with increased power. These are the two axes on the graph. The axes are traditionally plotted backwards, as here, so that they approximately correspond to the axes long used by phoneticians and linguists: F1 (vertical) approximately corresponds to the jaw height (which correlates negatively with the extent of the mouth opening). F2 (horizontal) approximately corresponds to the position (forward or back) of the constriction of the vocal tract where the tongue is close to the roof of the mouth. Other important parameters are the length of the vowel and other formants F1 AND F2 FOR ENGLISH VOWEL SOUNDS SPOKEN BY AUSTRALIAN SPEAKERS F1 CORRELATES WITH MOUTH OPENING; F2 CORRELATES WITH TONGUE PLACEMENT AUSTRALIAN SPEAKER For the Australians in this sample, the words "hud" and "hard" have a similar sound, the main difference is the length. For this sample of Americans, it is "hud" and "heard" that are distinguished by length. For an Australian, a long bud is a bard, for an American, it's a bird. AMERICAN SPEAKER TO PARTICIPATE IN THIS SURVEY BY WOLFE, SMITH AND COLLEAGUES, CLICK ON http://project.phys.unsw.edu.au/swe/survey/form.php