Han et al.: Automatic Volume Control System for Compensation of Volume Difference Between TV Channels 1197 AUTOMATIC VOLUME CONTROL SYSTEM FOR COMPENSATION OF VOLUME DIFFERENCE BETWEEN TV CHANNELS Kyu-Phil Han,’ Kun-Woen Song,’ Zoong-Hee Kim,‘ Gwang-Choon Lee,* and Yeong-Ho Ha’ ’School of Electronic & Electrical Eng., Kyungpook Nat’l Univ. Taegu 702-701, Korea ‘Display Product Research Lab. of LG Electronics Inc. Kumi 730-360, Korea Abstract - The sound levels of TV channels vary greatly according to modulation and demodulation rates. When the television channel or input mode is changed, users have to adjust the volume in order to obtain a proper level of sound. In this paper, an automatic volume control system is proposed to compensate for the volume difference which occurs with channel changing. A simple power estimation based on the symmetry property of sound and a selective volume control algorithm using a 3-step compensation are presented. The proposed system is designed using consumer ICs for easy implementation to current audio media. In addition, a filter is used to revise the system function in order to model the frequency response of the designed system upon that of human at a normal hearing level. As the experimental results in TV set, it is shown that the volume fluctuation is considerably reduced. I . Introduction “loudness curve” is taken into consideration in the system design. II. Specific characteristics of sound The human ear is responsive to frequencies from about 20 to 20,000Hz covering a range of 10 octaves. In the voice signal, the major part of the energy is distributed around 1,00OHz, because the first formant of the: vocal tract is at this frequency[4,5]. In general, if lower frequencies of speech are removed, the articulation index does not change markedly until frequencies above S00Hz are removed. If low frequencies between 500 and 2,500Hz are removed, the articulation index drops sharply. On the other hand, if high frequencies beyond 2,500Hz are removed, 80% articulation remains, lbut the removal of frequencies above 1,000Hz leads to impractical communication systems since only 40% o f the words spoken are correctly identified[2]. Roughly speaking, 90% of the total power lies between S O 0 3,SOOHz. However, the human ear is most sensitive to frequencies from about 2 to SkHz and least sensitive to sounds at the extreme frequencies of the audible range[3]. The most sensitive band of sound is slightly diffxent from the energy concentrated band. Therefore, in order to compensate the volume according to the sensitivity of humans, the power of sound which is mixed by voice, music, and so on has to be calculated in accordance with the frequency response of humans. As shown in Fig. 1, the contour line shows an equivalent loudness which is judged by the listener to be of the same power level, so called “loudness curve”, and it is a real - The reproduced sound levels in TVs are different between channels due to different modulation factors. A constant sound level without abrupt changes due to channel switches would be more convenient for the TV viewer. In order to compensate for the volume difference, without requiring user, the power of the input signal has to be automatically calculated and adjusted. In this paper, a system with power estimation, selective volume compensation of 3-step7 and a simple hardware structure is presented. Since human response is different according to frequency [1,2], the hearing sensitivity[3] which is called the Contributed Paper Manuscript received August 14, 1997 0098 3063/97 $10.00 1997 IEEE Authorized licensed use limited to: Minnesota State University-Mankato. Downloaded on January 29,2021 at 02:53:16 UTC from IEEE Xplore. Restrictions apply. IEEE Transactions on Consumer Electronics, Vol. 43, No. 4, NOVEMBER 1997 1198 response curve established through experimentation. The phon is a unit of the level of loudness. The level of a sound, in phons, is numerically equal to the intensity level(in decibels) of a pure lkHz tone. Another feature of sound is unbiased property that the signal is symmetric to zero level[4]. This is a popular assumption for speech processing [6,7]. It means that the moving average during several ten milliseconds is always zero. Also, it shows that the power can be estimated with the half of the signal. Examples of symmetry are shown in Fig. 2. 130 120 110 100 90 g 80 -E 70 2 60 E 40 50 while processing other jobs, the power estimation is a trivial task. However, it is impossiblc to calculate the power with the current capability of processors. In order to implement the volume system in TVs, a simple algorithm for the estimation is needed. It is assumed that the total power of sound may be calculated by the upper or lower side of the signal because of the symmetric property as mentioned above. A power estimation method using the width of the signal as being greater than a threshold is proposed The threshold value, classifying a frame into a sound or a silence interval, is set by several experiments. From the experiments, 10% of the maximum value is suitable as the threshold. An example of this division is shown in Fig. 3. It shows only a threshold selection. An example of thresholding is represented in Fig. 4 and the power is estimated using only the upper half of the signal as shown in Fig. 4(b). 30 20 10 0 -10 20 50 100 200 500 1,000 2000 5.000 10,00020,0000 Frequency in Hz Fig. 1. The loudness curve. Sample no Sample na (a) (b) Fig. 2. Examples of symmetry. These signals were digitized by 20kHz rate and 16-bit quantization. In the case of the articulation (a)“one” and @) “two” in Korean. (b) Fig. 3. An example of division. The threshold is 10% of the maximum value. (a) Speech signal with 25,000 samples, 22.05kHz sampling rate, and 16-bit quantization. (b) Sound and silence intervals which have the value “1” and “0”, respectively. III. The proposed power estimation algorithm 3.1 Principle of the proposed algorithm If the microprocessor of current audio media in TVs can sample the signal with a 40kHz rate It is known that the power of sound is proportional to the root-mean-square value, but this is not true for humans because of frequency response. Therefore, the transfer function of the designed system should needs to be revised to Authorized licensed use limited to: Minnesota State University-Mankato. Downloaded on January 29,2021 at 02:53:16 UTC from IEEE Xplore. Restrictions apply. Han et al.: Automatic Volume Control System for Compensation of Volume Difference Between TV Channels take account of human response. The detailed system function will be mentioned in section 4.3 after the system design. 1199 directly proportional to the accumulation time which is the sum of the widths in several frames. Thus, the estimation value, V m , can be defined by V,, ‘+““TTTT i Sample no ‘tax- = mt, + c ( 31 where t, , m , and c are the mean accumulation time within several frames, the slope, and the bias value of the estimation function, respectively. w 68 1 I B 2 8 , Sampleno (a) (b) Fig. 4. An example of thresholding. (a) The input signal, (b) the clipped signal. framee(50ms) I To treat the frequency response for human, the perceived power, P,, , is defined as follows. thresholding ti where a ( n ) , v ( n ) , and N, are weighting functions depending on a perception ratio, a sampled signal at the nth point, and the number of the samples, respectively. We can say that Eq. (1) consists of a constant a ( n ) which is related to frequency and the rms value. Let s ( n ) be an actual transfer function of the designed system and f ( n ) be a filter which makes a ( n ) the inverse of loudness. The a ( n ) can be written as > fsnd * frame count= I O ? calculate fa ~k----j--z estima e rms and its Fourier transform where A ( f ) , S ( f ) , and F ( f ) denote the Fourier transformed functions. Because desired A ( f ) and designed S ( f ) are known, the F ( f ) can be simply determined. From now an, we refer to the A (f) as the total transfer function of the system. To complete the power estimation procedure, the VrmSvalue of the second term in Eq. 11) has to be calculated. The width of the signal shown in Fig. 4(b) is used for the estimation. Then, it is shown from Fig. 6 that the rms value of sound is L Fig. 5 . Flow chart of the simulation. 3.2 V,, estimation The simulation procedure is shown in Fig. 5. After the sampled signal is classified into a sound or a silence frame by a threshold value, the maintenance time over the threshold is accumulated in the sound frame. A frame set consists of ten frames and t, is the mean accumulation time of the frame set. tl and tsnd denote the accumulation time of one frame and the minimum time of a sound frame, respectively. Finally, we check that the m s value can be Authorized licensed use limited to: Minnesota State University-Mankato. Downloaded on January 29,2021 at 02:53:16 UTC from IEEE Xplore. Restrictions apply. IEEE Transactions on Consumer Electronics,Vol. 43, No. 4, NOVEMBER 1997 1200 approximated by Eq. (3). Table I shows the case of a voice signal and Table 11 shows that of music. From this data, m and c were calculated to 1.4 and 5, respectively. Since the dimensions of t, (millisecond) and rms value (volts) are different, the question arises how to connect the differences of the units. However, it is more important that tu is proportional to the rms value, so that the difference in unit is beyond the concern. The result of 7,200 frame sets is shown in Fig. 6. From these results, we can conclude that V,, is proportional to and estimated by t,. P 38 003 ic 00 i j Frame length 50ms Data classic jazz pop and speech Root mean square value / la = -1 02 > 1 2I5M - ~~~~~ Table I . The comparison of V,, and tu for a 8-bit quantized voice signal(Max. value: 100, threshold:10. 1 frame length:50ms, and tsnd:O.lms). __ - Framc 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 tl - 00 4.8 18.0 13.4 18.7 18.8 18.4 10.2 3.4 8.1 __ 1.7 13.0 0.6 0.0 00 4.9 9.4 8.9 0.0 0.0 0.0 0.0 4.8 9.6 19 3 18.2 21.3 19.4 9.3 7.5 2.1 __ ~ mcan of V,,, Vrms 2.37 11.43 29.16 20.25 23.11 23.34 24.78 13.60 7.26 15.14 5.06 18.38 3.56 2.26 2.62 10.29 16.17 14.46 2.37 2 44 2.35 2.51 9.69 14.83 25.96 22.81 26.58 25.37 14.96 13.27 5.12 17.04 3.85 7.76 10.14 14.88 5"k_.. aoo 002 Accumulation time (ms) Fig. 6. The relation between mean of V,, t, (Total 7,200 frames are used). IV. System and Implementation The proposed algorithm current TV sets as follows. is implemented in 4.1 The audio system in a TV A signal received from an antenna, which has passed through tuning and a demodulation block, enters the AlV(audio and video) switch as shown in Fig. 7 . Air or line input and video or audio signals are selected by the switch. In an audio control unit, the volume, bass, tone, surround, and balance, etc. are adjusted and the selected and adjusted signal is then transferred to the speaker. Table 11. The comparison of VmT and tu for a 8-bit quantized music signal(Max. value:100, threshold:lO, 1 frame length:50ms, and tsnd:O. lms). ~ Frame 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 tl __ 14.8 14.6 12.9 13.8 14.1 13.7 13.8 15.0 13.5 15.3 ___ 13.7 11.7 13.6 14.1 12.4 11.8 14.0 14.1 13.8 13.2 12.9 __ 15.0 12.5 12.9 24 25 26 27 28 29 30 12.3 12.0 15.8 11.2 ~ 11.5 119 10.3 V" 18.24 22.59 16.91 17.90 20.32 18.14 19.43 19.87 18.86 21.33 18.82 14.15 16.87 17.83 15.15 16.38 17.66 18.15 17.90 18.79 15.28 19.32 16.48 mean of VI,, 19.36 13.21 16.99 12.54 17.02 18.35 17.93 16.99 19.89 15.72 16.40 16.17 12.98 Fig. 7. Block diagram of the audio system in a TV. Authorized licensed use limited to: Minnesota State University-Mankato. Downloaded on January 29,2021 at 02:53:16 UTC from IEEE Xplore. Restrictions apply. Han et al.: Automatic Volume Control System for Compensation of Volume Difference Between TV Channels 4.2 The proposed automatic volume control unit Since only one volume control unit is used in a TV, as shown in Fig. 7, the level which is set by the user can be changed at every alteration. Therefore, another volume control chip has to be inserted in front of the existing audio control unit to preserve the absolute level set by user. The total and the detailed block diagram of the proposed system are presented in Figs. 8 and 9, respectively. The proposed automatic volume control system consists of three main stages which are the low pass filter(LPF) and amplifier, the trigger and counter, and the signal analysis and control parts. Y 1201 In order to operate the trigger circuit, the voltage gain of the amplifier should be 20 times, or 26dB. The reason why thle gain must be 20 is because the trigger circuit was designed with a BJT(bipo1ar junction transistor) and the turn-on voItage(VBE) of the trarisistor must be over 0.7volts. When signals which are higher than the noise level are inputted, the trigger circuit is activated. The filter is classified into the second-order multiple feedback type, because R3 and CZ are connected between the input and output nodes. The advantage of the filter is that it is more stable than the Butterworth or Chebyshev group because of multiple feedback. Another merit is that the gain is easily controlled by the ratio of RI and R3. The cut-off frequency of the filter is calculated by r+ s'gna' Fig. 8. Block diagram of the audio system with an automatic volume control unit. LPF and + amplifier - Trigger circuit - ~ous;;;;;AS~; + DIA convelter Fig. 10. Multiple feedback LPF and amplifier. I Microprocessor (signal analysis and control) Signal input Volume control chip 4.2.2 The second stage Signal output b Fig. 9. Block diagram of the automatic volume control unit. 4.2.1 The first stage At first, the sound is amplified by a LPF and an amplifier circuit as depicted in Fig. 10. Generally, in a TV signal, the magnitude of sound is 0.8Vp-p and the noise level is about 0.035volts. The amplified signal is entered into the trigger circuit. The input of the AND gate is the transistor output and the clock pulse. And the output of the AND gate is inserted into the next stage counter as shown in Fig. 11. Then, the rms value of the sound is converted into binary code. A timer and an AND gate are used for the clock generator and multiplier, respectively. Because of t- 1 gating error in the multiplication, a high frequency clock of above 20kHz has to be used to reduce the error. However, the higher clock is used, the more bits of the counter are needed. According to some experiments, a 1OOkHz clock is sufficient. For example, if a lkHz signal Authorized licensed use limited to: Minnesota State University-Mankato. Downloaded on January 29,2021 at 02:53:16 UTC from IEEE Xplore. Restrictions apply. IEEE Transactions on Consumer Electronics, Vol. 43, No. 4, NOVEMBER 1997 1202 is inputted to the second stage, the output is equal to those of Fig. 12. The black sections of Fig. 12(c) represent a 1OOkHz clock. +r C3202 5 1K To sum up the total flow of the designed system, the input signal is selectively amplified by the LPF which then activates the trigger circuit. The triggered pulse intermits the clock. Then the output clock which has been gated by the input signal is inserted into the counter. Finally, the signal analysis part reads the count periodically and estimates the rms value of the signal. The volume is finally adjusted by the value. +5v +5v -5v b Ro 15P 15K 8 13 4 14 15 3 16 0 2 3 Fig. 11. Trigger, clock generator, and multiplier circuit LSB La_ PR PR (c) Fig. 12. An example of a 1kHz signal in the second stage. (a), (b), and (c) represent the input signal, the output of the trigger circuit, and the multiplier output, respectively. Q-* J counter Input In order to reduce the port waste, a DAC (digital to analog converter) is used. If a parallel port is available, the operational amplifier and DAC are omitted from the circuit. As shown in Fig. 13, the output is inserted into the signal analysis and control unit in Fig. 9. DAC VOUt c>CK K CLR J Q-B - <)CK 0 K CLR MSB n PR PR -8 . - .. 0 0-8 J J 0 2 ,CK .-C,CK K Q CLR K Q CLR Fig. 13. Asynchronous counter and DAC. 4.2.3. The third stage The signal analysisicontrol unit calculates the current power, monitors the channel or mode changes, and adjusts the volume. In order to reduce the awkwardness for humans and decrease the compensation error, the maximum level for a volume change is limited and the three step adjustment is used in this compensation. The control flow is shown in Fig. 14. First, the constants are initialized and the compensation routine will start if a mode or channel change occurs. The frame count, the silence count, and the threshold are set at 30, 50, and 64 in decimal, respectively. 15ms is set as the frame length. And if 50 consecutive frames are all silence frame, the volume is set by the default value. The reason that it can not wait beyond this time is because of awkwardness. Thus, the minimum time for one compensation is 450ms(30frames X 15ms). Normally, since silence and sound frames are usually mixed, the average time for compensation is about 600ms. In current television sets, a mute time of about 300ms is inserted after every channel change. If the mute is Authorized licensed use limited to: Minnesota State University-Mankato. Downloaded on January 29,2021 at 02:53:16 UTC from IEEE Xplore. Restrictions apply. Han et al.: Automatic Volume Control System for Compensation of Volume Difference Between TV Charnels continued until the end of the three step compensation, i.e. for about 1.5 - 1.8s, users feel uneasy. Then the mute is cleared after the first adjustment. Since the fluctuation of sound is severe, the three step compensation and the limitation of the maximum volume change are more effective. Through the experiments, the volume step according to a power difference is determined and the mean(defau1t) level of the volume is set to the center of the most linear scale on the volume curve. When the total volume has 128 scales, it is appropriate that the default level is set to a 64th scale and the maximum change is 6 steps. - -ss initialize constants read the count no sound loop: yes silence loop: 1203 4.3 Analysis and revision of the proposed system As explained in seclion 2, the frequency response of the LPF is an important factor to consider human cognition. Unlike humans, the response of the system has to be constant for a frequency. Actually, the response of the system without the filter(LPF) is not constant. The trigger and the multiplier are more sensitive to high frequencies. It means that the estimation power is higher than the actual power in the frequencies. It is expected that the accumulation of the +- 1 gating error acts as a dominant term in the multiplier. The error can be reduced by using a high frequency clock. However, this is not suitable for implementation in current audio systems. Even if the response of the system is either constant or not, a filter must be used to correct the system function. For example, when a correction filter is not used and a high frequency is inputted, humans feel a low power but the system is actually not. The system was tested without a filter. It was shown that the sensitivity of a 20kHz sine wave is 1.7 times higher than that of a 20Hz. That is to say, the elapsed time of 20Hz from 00 to FF in hexadecimal is about 25ms, and the time of 2OkHz is 15ms. Also the slope of the response is like an exponential curve. Therefore, the response function, F,,, (f), of the second stage can be approximated as e, set default volume level 1 compare the previous level 1 7 I adjust volume & step three? Fig. 14. Flow chart of the volume control routine. where a is a constant. Substituting f = 20k and Fra0(f) = 1.7 into Eq. (5), a is calculated by a = ~ lnlS7 20000 = 0.000027 The response function, t'xp(O.O00027J, is shown in Fig. 15. In order to consider the inverse of loudness, the cut-off frequency of the filter has to be about lOkHz, then the elements of Eq. (4) are set by 1 f" = -300~ = 10.402kHz * 300~ (7) Authorized licensed use limited to: Minnesota State University-Mankato. Downloaded on January 29,2021 at 02:53:16 UTC from IEEE Xplore. Restrictions apply. E E E Transactions on Consumer Electronics, Vol. 43, No. 4, NOVEMBER 1997 1204 According to Eq. (2-2), the system function, S ( f ) , is equal to F,,, (f).Therefore, the total transfer function can be obtained by Eq. (2-2). Fig. 17 shows the A (f). Generally, normal hearing level corresponds to the fourth or fifth curve from the bottom in Fig. 1[3]. When compared with the curves, the transfer function of the total system is an inverse form of the “loudness curve” and more sensitive from 2k to 5kHz. Therefore, we can say that the system takes the human response into consideration. o+ .......r...............,........, IOOHz 1OHz exp(0.000027* 1 .OKWZ IOKHz IOOKHz Frequency) Frequency Fig. 15. The response curve of the second stage which contains the counter and the multiplier. The response of the filter shown in Fig. 10 is depicted in Fig. 16. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I V . Experimental results For the evaluation of the proposed system, the root-mean-square values of the input and the compensated signal are numerically compared as shown in Table III. Subjective hearing tests were undertaken by twenty people. In the subjective tests, it is shown that the proposed system acts properly. Table D. The comparison of the two rms values [unit:volt]. air @.a”+ ..............l. 1OHz 0 . r , . IOOHZ 1. UKHz IOKHz IOOKHr lCHl I CH2 I CH31 TA1 1 0.301 0.333 0.380 0.368 0.452 0.423 0.437 0.438 0.292 0.617 0.791 0.524 0.025 0.408 0.483 0.5 14 0.446 0.001 u(u2:+) Frequency Fig. 16. The transfer function of the filter shown in Fig. 10. The curve is plotted by PSPICE. ......................................................... I (tape) . TA3 midin CD1 CD2 (‘Dl ICD3 variance I--.- VI. Conclusion D.7“+ ............................................................. 1OH2 IOOHz 1. OKHz IOKHz IOOKHZ o U(U2:+)*exp(Ci.O00027* Frequency) Frequency Fig. 17. The total response curve of the designed system. The volume fluctuations in TVs caused by the modulation and demodulation ratio of the air signal, the playing and recording power of a stored signal, etc. are considerably reduced by the proposed system. Since the system is designed using consumer ICs such as an operational amplifier, a timer, a JK FIF, it can be easily applied to current products with a low cost. In addition, to model the frequency response of the designed system on that of a human at a normal Authorized licensed use limited to: Minnesota State University-Mankato. Downloaded on January 29,2021 at 02:53:16 UTC from IEEE Xplore. Restrictions apply. Han et al.: Automatic Volume Control System for Compensation of Volume Difference Between TV Channels hearing level, a 2nd-order multiple feedback LPF is used. Thus, the proposed system can hold a constant volume at normal hearing level. Although the human response to frequencies is considered, the linear compensation was accomplished at the most linear scale of the volume. Therefore, a logarithmic volume scale can be used for the compensation. References 1205 Kun-Woen Song received the B. S. and M. S. in Electronic from Kyungpook Engineering National University, Taegu, Korea, in 1993 and 1995, respectively, and is currently a Ph. D. student in the Department of Electronic Engineering, Kyungpook National University. His main interests are in digital signal processing, non-linear image processing, image coding, and computer vision. [I] Corsi, J. F., Experimental Psychology of Sensory Behaviour, Holt, New York, 1967. [2] Breger, L., Clinical Cognitive Psychology, Prentice-Hall, Englewood Cliffs, N.J., 1969, [3] Donald G. Fink and Donald Christiansen, Electronics Engineers’ Handbook, 3rd ed., McGraw-Hill, New York, 1989. 141 L. Rabiner and B. H. Juang, Fundamentals of Speech Recognition, Prentice-Hall Englewood Cliffs, N.J., 1993 [5] Christopher Schmandt, Voice Communicaion with Computers, MIT lab. 1993. [6] S. B. Davis and P. Mermelstein, ‘Comparison of parmetric representations of monosyllabic word recognition in continuously spoken sentences,” IEEE Trans. Acoust., Speeh, and Signal Processing, vol. ASSP-28, no. 4, pp. 357-366, Aug. 1980. [7] V. Zue, J. Glass, M. Phillips, and S. Seneff, “Acoustic segmentation and phonetic classification in the SUMMIT system,” in Proc. IEEE Int. Con$ on Acoust., Speech, and Signal processing, pp. 389-392, May 1989. Biography Kyu-Phil Han received the B. S. and M. S. in Electronic Engineering from Kyungpook National University, Taegu, Korea, in 1993 and 1995, respectively, and is currently a Ph. D. student in the Department of Electronic Engineering, Kyungpook National University. His main interests are in digital signal processing, image processing, and computer vision. Zoong-Hee Kim received the B. S. and M. S. in Electronic Engineering from Kyungpook National University, Taegu, Korea, in 1970 and 1997, respectively, and is working as a manager in Display Product Research Lab. of LG Electronics Inc. Korea. His main research interests are in digital signal processing, circuit design, and TV signal processing. Gwang-Choon Lee received the B. S. and M. S. in Electronic Engineering from Irha University, Inchon, and Kyungpook National University, Taegu, Korea, in 1970 and 1995, respectively, and is working as a director in Display Product Research Lab. of LG Electronics Inc. Korea. His main research intixests are in digital signal processing and TV signal proce:ssing. Yeong-Ho Ha received the B. S. and hI. S. degrees in Electronic Engineering from Kyungpook National University, Taegu, Korea, in 1976 and 1978, respectively, and P’h. D. degree in Electrical and Computer Engineering from the University of Texas at Austin, Texas, 1985. In March 1986, he joined the Department of Electronic Engineering of Kyungpook National University, as an Assistant Professor, and is currently a Professor. His main research interests are in image processing, computer vision, TV signal processing, and digital signal processing. He served as TPC co-chair of 1994 IEEE Intemational Conference on Intelligent Signal Processing and Communication Systems. He is, a member of IEEE, Pattem Recognition Society, IS&T, K.orea Institute of Telematics and Electronics, and Korean Institute of Communication Sciences. Authorized licensed use limited to: Minnesota State University-Mankato. Downloaded on January 29,2021 at 02:53:16 UTC from IEEE Xplore. Restrictions apply.