Pure Tone Frequency Discrimination: An Examination of Experimental Methodology Robert Mannell Macquarie University, 2007 Abstract This series of experiments examines the concept of "just noticeable differences" (jnd's) in frequency (also known as frequency difference limens or DLF). This is the difference in frequency (Δf) between two tones presented in series that can just be detected. Frequency jnd's vary for different frequencies. The higher the frequency of a tone, the greater the change of frequency must be for the second tone to sound different in pitch. This paper examines a number of experimental methodologies for determining frequency jnd's. The experiments progress form very simple and rather naive experimental designs to more complex and carefully controlled designs. Issues examined include token timing, subject training and response tasks. This paper clearly illustrates the sensitivity of experimental results in psychoacoustics to issues of experimental design and subject training. Readings 1. Moore, B.C.J., 2003, The Psychology of Hearing, Academic Press, 5th edition. o Frequency Discrimination: Chapter 6, esp. pp 197-204 2. Mannell, R.H., 1994, The perceptual and auditory implications of parametric scaling in synthetic speech, Unpublished Ph.D. dissertation, Macquarie University o Frequency Discrimination: Section 2.2.1 "Frequency" 3. Mannell, R.H., 2007, SPH307 Psychoacoustics lecture notes (only available to SPH307 students via WebCT on the MyMQ Portal). Experiment 1 In this experiment we attempted to determine the frequency jnd at 1000 Hz using a very simple experimental procedure. To do this we present a series of 10 conditions. In each condition there are four pairs of tones. Each condition has a fixed frequency difference (Δf) between the two members of each pair. For two of the pairs, the first of the two is set at 1000 Hz and the second is Δf higher. For the other two pairs the second is set at 1000 Hz and the first is Δf lower. Δf ranged from 1 to 10 ms. For each group of four pairs the presentation order is randomised. In each of the four tone pairs, each tone was 500 ms in length and the two tones were separated by 250 ms. Each of the tone pairs was separated by about 1 second. (These tokens are from an audio CD published by the Institute for Perception Research Eindhoven, The Netherlands and the Acoustical Society of America. Houtsma, A.J.M., Rossing, T.D, and Wagenaars, W.M., (1987) Auditory Demonstrations) Each condition had a smaller df value than the preceding condition. We attempted to determine which condition was the first condition for which less than three of the pairs were discriminated by each subject. The jnd was then assumed to be the df value for the proceeding condition. Results There were no valid results for this experiment. All 18 subjects failed to achieve a stable perception of frequency j.n.d. at 1000 Hz. Design Issues Possibly the factor most affecting the negative results for this experiment was the timing of the tone pairs. Firstly, the two member tones of each pair were too far apart in time. Previous work on the determination of frequency j.n.d.'s suggests that tones are discriminated best when they are very close together. The closer together they are, the less the influence of short-term memory on the performance of the subjects. The other timing factor that may have impacted on the results was that the different tone pairs may have been too close together. What is needed is for intra-pair tone separation to be smaller and inter-pair tone separation to be greater. Also, a few more extreme tone differences (say of 20 Hz) at the beginning of each condition may better prepare the subjects for this discrimination task. Perhaps most importantly, subjects received no training before attempting this task. Experiment 2 This experiment was essentially identical to the experiment carried out in experiment 1, except that:• • the two tones in a tone pair were abutted together (rather than having a 250 ms gap between them) subjects were presented with a questionnaire that asked about their hearing and musical experience and training Results 18 subjects participated in the experiment. The results for one subject was excluded because of significant hearing loss. Of the remaining 17 subjects, 3 subjects failed to discriminate any tokens and 1 subject made only one discrimination. These subjects were excluded from the following analysis. An additional subject reported discrimination of all tokens. This result might be due to this subject attending to some feature other than the frequency difference (such as temporal discontinuities [ie. clicks] or phase shifts) or it might be that this subject has above average frequency discrimination abilities. This subject was also excluded. Another 3 subjects provided anomalous results, such as discriminating the least discriminable pairs better that the most discriminable pairs or having a significant drop or increase in the middle conditions relative to the most and least discriminable conditions. All subjects who were included reported discrimination for at least 12 tokens and reported no discrimination for at least 4 tokens. Only 9 subjects were finally included in the analysis. Of these nine, eight reported musical experience. That is, of the nine subjects who reported musical experience, only one was excluded from the analysis and this was because of significant hearing loss. Only one of the subjects who reported no musical experience was included. In the following figure the results for all 9 included subjects are shown against the ten frequency differences (1 to 10 Hz). Figure 1: This figure displays the mean number of tokens perceived as having a change in frequency for each of the frequency differences. The error bar represents one standard deviation. It should be noted that these results, whilst demonstrating some stability in subject responses, do not provide us with a reliable measure of frequency discrimination at 1000 Hz. From previous research, we would expect a flat maximal response (ie. a response of 4) for the larger frequency differences (say above 4 Hz), but this was not achieved. Design Issues In experiment 1 the tone pairs were separated by a silent gap. In experiment 2 there was no separation between the two tones in each pair. They were spliced together with no gap between them. In experiment 1 no subjects were able to perceive changes in frequency whilst in experiment 2 all musically-trained subjects were able to detect frequency changes. Previous work on the determination of frequency j.n.d.'s suggests that tones are discriminated best when they are very close together. The closer together they are, the less the influence of memory on the performance of the subjects. Another problem with this experiment is the presence of clicks in some tone pairs. The abutting of two tones together may result in clicks and possibly also in audible phase changes. In future, tones should more gradually change from one frequency to another over several cycles to avoid clicks and to minimise phase artifacts. Subjects with no musical training or experience typically responded poorly to this experiment with patterns of no discrimination or inconsistent discrimination. A few more extreme tone differences (say of 20 Hz) at the beginning of each condition may better prepare the subjects for this discrimination task. It was determined that subject training should be considered in much greater detail before trying this experiment again. Experiment 3 In this experiment subjects were provided with extensive pre-test training (made possible by the purchase of an integrated audiovisual token presentation and respone system). Responses were simple Yes/No responses to questions of the type "Did this tone change in frequency?". Procedure A: Tone Step In this experiment we utilised a number of 2 second tones of a frequency about 1000 Hz which either increase or decrease in frequency near the tone's temporal mid point. The change in frequency occurs smoothly between the two frequencies over a period of 0.2 seconds. For each frequency change two tokens were created, one in which the frequency increased by Δf Hz and the second in which the frequency decreased by Δf Hz. For example, for a frequency change of 10 Hz, one tone changed from 995 Hz to 1005 Hz whilst the other tone changed from 1005 Hz to 995 Hz. Thirty three (33) tones were created. These included one tone in which the frequency did not change, plus two tones (rising and falling) for each of 16 frequency changes (50, 45, 40, 35, 30, 25, 20, 18, 16, 14, 12, 10, 8, 6, 4, 2 Hz). All tokens were presented to experimental participants using an "audience voting system" created by IML (Innovative Group Response). Each token was played to participants via high quality headphones. A computer screen prompted responses from participants and provided feedback when required. Using a hand held device, participants responded to each tone by pressing "1" if a change was heard and "2" if no change in frequency could be heard. Training occurred before the test session. Training was divided into three sequences:1. The first sequence used tones with very clear frequency changes (50, 45, 40, 35, 30, 25 Hz) presented in order from the largest change (50 Hz) to the smallest change (25 Hz) with rising and falling tones paired and presented in random order. Tones with no change were randomly presented within this sequence. The purpose of this sequence was to familiarise subjects with the sound of rising and falling tones. 2. The second sequence consisted of the 20, 18, 16, 14 and 12 Hz changes presented randomly and interspersed randomly with non-changing tones. These tones were regarded to be reasonably discriminable tones and that would be repeated in the test sequence. 3. The third and final sequence consisted of the least discriminable tone changes (10, 8, 6, 4, and 2 Hz) presented in order of decreasing frequency change interspersed with nonchanging tones. This order was selected to gradually familiarise subjects to increasingly hard to discriminate frequency changes. These tones were also repeated in the test sequence. During training, feedback was provided in the form of bar charts showing the total yes and no responses of the current group of experimental participants and information on whether or not each tone actually did have a change in frequency. The test session immediately followed the training sessions and comprised of the 20, 18, 16, 14, 12, 10, 8, 6, 4, and 2 Hz changes (both directions) presented randomly and interspersed with nonchanging tokens. No feedback was provided during the test session. Procedure B: Frequency Modulation In this experiment we utilised 3 second tones that varied sinusoidally in frequency around the centre frequency (1000 Hz). The modulation rate was 4 Hz. That is, each tone varied smoothly between its upper and lower frequency limits 4 times per second. For example, a tone with a frequency change of 50 Hz varied sinusoidally 4 times per second between 975 Hz and 1025 Hz. Seventeen (17) tones were created. These included one tone in which the frequency did not change, plus one tone for each of 16 frequency changes (50, 45, 40, 35, 30, 25, 20, 18, 16, 14, 12, 10, 8, 6, 4, 2 Hz). As with the Tone Step condition, this experiment commenced with three training sequences similar to the training sequences used for the tone step experiment. The test session immediately followed the training sessions and comprised of the 20, 18, 16, 14, 12, 10, 8, 6, 4, and 2 Hz changes presented randomly with interspersed with non-changing tokens. No feedback was provided during the test session. Subjects 56 subjects participated in this experiment. The majority (about 90%) of subjects were female and their mean age was about 20. Subjects provided information about their hearing and musical background. A small number of additional subjects participated in this experiment but were excluded because of false positive and/or false negative trends during training. A false positive trend occurs when a subject consistently reports a change in frequency for tokens that don't change in frequency. A false negative trend is a trend that exhibits a much greater than average history of negative responses during training to tokens that are reliably identified as having a clear change during training. A false positive trend might indicate that a subject is responding to some aspect of the signal other than the change that is being trained. Nearly always (in the present experiments), however, it is accompanied by false negative trends and is a possible indication of hearing loss. In all cases, subjects excluded on these grounds also self-reported hearing problems. Results Results for the frequency discrimination experiment are summarised in figure 2. Figure 2: This figure compares the frequency step and frequency modulation results for the experiment 3 discrimination tasks. Chi-square statistics were applied to the raw data. The Frequency Step and Frequency Modulation results for a frequency change of 4 Hz were significantly different (p=0.001). No other frequency change showed a significant difference between the two conditions. For the frequency step condition all frequency changes ≥4 Hz were perceived correctly, significantly above chance (ie. above 50%, p<0.01). For the frequency modulation condition all frequency changes ≥6 Hz were perceived correctly, significantly above chance (p<0.01). Musical training and musical experience did not significantly affect the outcome of this experiment. Note however, that there were significant differences in the time it took different subjects to obtain a stable pattern of results. Some subjects obtained a stable pattern of responses immediately whilst others took nearly the entire (rather long) training session to obtain a stable pattern of responses. It may be that musically trained subjects had a tendency to obtain a stable pattern of responses more quickly than other subjects, but this was not tested (this would be a good hypothesis for a future experiment). Experiment 4 In this experiment an important modification to the experiment 3 experimental method was tried. The experiment 3 yes/no training procedure was repeated, but then it was followed by additional training in an AB decision task. The AB decision task presents two tokens, one of which contains the targeted change whilst the other does not change. In the simple AB decision task the subject must answer "A" if token A was perceived to change or "B" if token B was perceived to change. A response was forced, so the subject had to make a decision, even if it was only a guess. Figure 3 summarises the results for these experiments. 26 subjects participated in this experiment and all of them are included in these results as none exhibited false negative or false positive trends during training. Figure 3: This figure compares the frequency step and frequency modulation results for the 2005 frequency discrimination experiments. In these experiments a forced AB choice was made. In the above diagram the horizontal line labeled"A" indicates a perfect chance response (ie. 50% A and 50% B responses). Responses equal to or greater than the line labeled"B" are significantly above chance for p=0.05. Responses equal to or greater than the line labeled"C" are significantly above chance for p=0.01. It should be noted that for p=0.01 the results for this experiment are similar to those in the preceding experiment in that for the frequency step condition the 4 Hz token was significantly perceived above chance whilst for the frequency modulation condition the 6 Hz token was the smallest change that was significantly perceived above chance. Note, however, that in the previous experiment the 4 Hz FM token was not significantly above chance for p=0.05 whereas it was in this experiment. General Conclusions The general conclusions and observations for this series of experiments are:1. Similar results are obtained for frequency step and frequency modulation designs.The frequency step condition provides a slightly better discrimination for a yes/no decision task, but there is no significant difference for an AB choice task. 2. The AB choice task resulted in more consistent results for the frequency step and frequency modulation designs than did the yes/no decision task. 3. Pre-test training is essential. Prior musical training only partially compensates for absence of pre-test training. The inclusion of larger differences in training may increase the efficacy of training by better familiarising subjects with the kind of changes being tested, but this needs to be tested by carrying out training using only the Δf values used in the test stage of the experiment. 4. Token pair timing appears to be critical. Token pairs need to be abutted in order for clear discrimination patterns to emerge. 5. Abutting two tones together without a period of transition results in audible clicks which could affect experimental results. The effect of these clicks was not investigated in this series of experiments but the perceptual effect of such clicks is well attested in psychoacoustic literature. Masking the clicks by adding to the tones a small amount of background white noise may remove any such effect, but this would be likely to occur at the cost of increased uncertaintly in the decisions as is seen in the gap detection experiments reported upon in the Gap Detection paper.