Uploaded by xiayu chen

icsv28 full paper A preliminary study on the subjective evaluation of the recording quality of motion cameras

advertisement
A preliminary study on the subjective evaluation of the
recording quality of motion cameras
Chen Xiayu
Address: School of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai, China, 200240
email: c2g1816525966@sjtu.edu.cn
Jin Yumeng* (Co first author,contribute equally)
Address: School of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai, China, 200240
e-mail: 20030708@sjtu.edu.cn
Huang Yu* (Corresponding author)
Address: Institute of Vibration, Shock and Noise, School of Mechanical Engineering, Shanghai Jiao
Tong University, Shanghai, China, 200240
e-mail: yu_huang@sjtu.edu.cn
The demand for the high sound quality of the recordings of motion cameras is increasing with the
rapid development of we media. Sound recorded by two devices with similar electroacoustic
indicators (e.g. distortion, dynamic range, frequency response, etc.) may have significantly different
subjective feelings of listeners. Therefore, this study investigated the influence of various
psychological factors on the subjective preference of audio recordings acquired by two sports
cameras. Fifty subjects compared 14 pairs of audio records (i.e., six pairs of instrumental music,
five vocal and three environmental sounds) by paired comparison method and evaluated the multidimensions including timbre (dark/light, warm/cold), spatial impression, stereo impression
(location accuracy, stability), sound balance, high transparency, medium transparency and low
transparency on corresponding category rating scales for all 28 pieces of records in balanced random
orders. There were significant differences in the subjective preference of the two devices. The
dimensions in timbre, location accuracy, sound balance, and medium transparency significantly
impacted the subjective preference. The nonlinear curve fitting models accounting for relation
between preference and multi-dimension scores were formulated for three groups of stimuli
according to their types (i.e. instrumental, vocal and environmental sounds).
Keywords: recording quality, audio quality subjective evaluation, preference, multi-dimension
1.
Introduction
The high penetration rate of the Internet has led to the vigorous development of self-media (we media).[1]
The 28th International Congress on Sound and Vibration (ICSV28), 24-28 July 2022
1
The demand for the high sound quality of the recordings of motion camera that is an important equipment
of we media is increasing.[2] The sound quality of record and replay is determined by a series of
electromechanical parameters, containing harmonic distortion, frequency response curve, transient
distortion, phase distortion, group delay, etc.[3] Besides, various descriptors have been proposed and
investigated to establish and evaluate sound’s subjective feelings. [4] Studies have found relationships
between objective parameters and subjective indices, e.g. the clarity and frequency characteristics of the
speaker, the sense of space and transient characteristics, and the softness and transient characteristics and
frequency characteristics.[5] Although the technologies of objective measurement and subjective evaluations
are relatively mature now, most of them are for the loudspeakers[6]. For the motion camera, empirically, the
subjective feelings of two the sound quality of different devices might differ significantly even if their
primary objective parameters are similar. Here we conduct an experiment on the sound quality of several
types of sounds recorded by two motion cameras having similar objective parameters. The subjective ratings
on multi-dimensions and subjective preference were investigated to find out thefactors that have a
significant impact on subjective preference. Tentitave nonlinear models estimate the preference of sound
quality of devices according to the numerical values of various dimensions.
2.
Methods
2.1 Apparatus
Experiment was performed in a semi-anechoic chamber. The apparatus used for the listening experiment
consisted of a desktop computer (Thinkpad T450s), a digital analog converter (RME ADI-2 DAC FS) and
a pair of headphones (Sennheiser HD600). We used foobar2000 software to control the playback duration,
playback interval, and playback order of stimuli. The computer volume was 100 and the gain of DAC was
set to –20 dB.
2.2 Stimuli
We used fourteen audio records as test stimuli: six instrumental music, five vocal and three environmental
records. The environmental sounds were field recording with two sports cameras (i.e. devices A and B).
The other sounds were bought on a high quality music website (Qobuz, www.qobuz.com) and recorded by
A and B through loudspeaker playing back in the anechoic chamber. All test stimuli were acquired at 24 bit
and 96 kHz sample rates. The average sound pressure level of each stimulus was set to 60 dBA by adjusting
the amplitude of records and calibrating via the headphones on a dummy head (Head Acoustics HMS IV).
Figure 1 shows the calibration setup.
Figure 1: The experimental setup and the dummy for calibration.
The 28th International Congress on Sound and Vibration (ICSV28), 24-28 July 2022
2
2.3 Participants
Fifty healthy participants, including 33 males and 17 females aged 18 to 28 years, attended the experiment.
They were all students who had no symptoms of ear disease, no blockage in the ear canal, no history of
excessive noise exposure, no ototoxic drugs or family hearing diseases. All participants signed an informed
consent form before the test.
2.4 Protocol
The subjective experiment contains four parts, i.e. Part Ⅰ–Ⅳ, as shown in Table 1. In Part Ⅰ–Ⅲ,
participants evaluated each dimension on a rating scale with different dimensions adopted in different parts
of experiment. The questions for multiple dimensions were formulated according to Recommendation of
ITU[4]:
Timbre (two dimensions)—Characterize the characteristics of different sounds of the recorded sound,
such as dark or light, warm or cold.
Stereo impression (two dimensions)—Characterizes whether the positions of each sound source
referenced by the original stimuli can be accurately positioned during playback, and whether the positioning
remains stable throughout the process.
Spatial impression (one dimension)—Using the original stimuli as a reference, the sense of space
represents whether the sound and image of the recorded sound are spatially appropriate in line with the
imaginary spatial size.
Sound balance (one dimension)—Characterize whether the sounds emitted by different individual sound
sources have good balance and harmony in the overall sound after recording.
Transparency (three dimensions)—Characterize the degree of resolution of the recorded sounds of
different frequencies, such as whether the bass has impact, whether the midrange is bright and clear, and
whether the treble is loud.
Table 1: Evaluation material and evaluation dimensions
Part
Stimuli index
The type of stimuli
Evaluation dimensions
Timbre (dark/light)
Timbre (warm/cold)
stereo impression (location
accuracy)
stereo impression (stability)
spatial impression
sound balance
high transparency
medium transparency
low transparency
Part Ⅰ
(instrument)
A01,02,06,09–11
B01,02,06,09–11
percussion instrument
Jazz (brass instrument)
Orchestral symphony
Guitar pop with the audience
String quartet
Solo piano
Part Ⅱ
(vocal)
A03–05,07
B03–05,07
Pure Vocal (Male)
Chorus (male and female)
Oratorio (Female)
Pure Vocal (Female)
Timbre (dark/light)
Timbre (warm/cold)
spatial impression
medium transparency
Part Ⅲ
(environment)
A12–14
B12–14
Lighting
Traffic
Park
stereo impression (location
accuracy)
stereo impression (stability)
sound balance
All stimuli
A01–14,
B01–14
All
Subjective preference
The 28th International Congress on Sound and Vibration (ICSV28), 24-28 July 2022
3
The pairwise comparison method[6] was adopted in part Ⅳ to compare the preference between stimuli of
devices A and B. Each participant listened to each pair of stimuli and reported which one they preferred.
All pairs of stimuli were played twice in random order.
2.5 Data analysis
In a pairwise comparison test, if one prefers A to B, let the score of A be 2 and B be 0, and vice verser;
if A euqals B, let both be 1. The subjective preference for each stimulus would be the summation of fifty
participants’ scores. The statistical analyses were completed by IBM SPSS Statistics (version 25). ShapiroWilk tests refused the normal distribution assumption for the data sets (p<0.05, Shapiro-Wilk), so we
employed the nonparametric statistics for data analysis [8]. Wilcoxon test was used for the post-hoc test on
preference and each dimension scores between two specific groups of stimuli. The significance level (i.e.
the p-value) was adjusted for multiple comparisons by Bonferroni correction.
3.
3.1
Result and discussion
Preference and multiple dimensions scores
Figure 2 demostrates the sum of all 50 participants’ preference scores. The scores of device A is
significantly higher than that of device B (**p<0.01, Wilcoxon test). Figures 3, 4 and 5 describe the results
of the multiple dimensions for each stimuli of Parts Ⅰ, Ⅱ, and Ⅲ, respectively.
Figure 2: Subjective preference scores of two devices.** p<0.01 with Wilcoxon test.
For instrumental musics in Part Ⅰ, stimuli A01 performed better in stereo impression (stability) than B01
(p=0.016, Wilcoxon test). For vocal sounds in Part Ⅱ, a significant difference in spatial impression was
found between A03 and B03 (pure male voice, windless state, anechoic chamber; p<0.001, Wilcoxon test).
A03 also performed better than B03 in dark/light (p=0.060 Wilcoxon test). For environmental sounds in
Part Ⅲ, a significant difference in location accuracy was found between A13 and B13 (traffic flow, winded
state, outdoor; p<0.05, Wilcoxon test) and in the sound balance between A14 and B14 (park, winded,
outdoor; p<0.05, Wilcoxon test). However, no significant difference was found for any other dimensions
between stimuli of devices A and B (p>0.05, Wilcoxon test).
The 28th International Congress on Sound and Vibration (ICSV28), 24-28 July 2022
4
Figure 3: Boxplots (median and interqutile ranges) for each pair of stimuli at each dimension, Part I
(instrumental music). * p<0.05 with Wilcoxon test.
Figure 4:Boxplots (median and interqutile ranges) for each pair of stimuli at each dimension, Part II
(vocal). ***p<0.001 with Wilcoxon test.
The 28th International Congress on Sound and Vibration (ICSV28), 24-28 July 2022
5
Figure 5:Boxplots (median and interqutile ranges) for each pair of stimuli at each dimension Part Ⅲ
(environmental sounds). *p<0.05 with Wilcoxon test.
3.2
The preference model
From above results, it seems that the rating scores in most dimensions do not differ much. Therefore, the
significant difference in preference might attribute to a comprehensive mechanism that partly reflected from
the multiple dimension ratings. Preliminarily, we formulated nonlinear regression models that account for
the relationship between preference and multiple dimension scores for each of three groups of stimuli (i.e.,
six pairs of instrumental music, five vocal and three environmental sounds), respectively. The modeling
procedures referred to the work of developing Zwicker’s psychological annoyance model[9][10]. The form of
the model is described as:
𝑦 = 𝑁(1 + √𝛾0 + 𝛾1 𝑥12 + 𝛾2 𝑥22 + ⋯ + 𝛾𝑛 𝑥𝑛2 ),
(1)
where the dependent variable, y, is the preference scores, the independent variables, 𝑥1 –𝑥𝑛 are the scores of
each dimension, N and 𝛾0 –𝛾𝑛 are coefficients.
The preference scores are normalized using the Min–Max scaling[11][12] as:
𝑝𝑟𝑒𝑓𝑒𝑟𝑒𝑛𝑐𝑒−𝑝𝑟𝑒𝑓𝑒𝑟𝑒𝑛𝑐𝑒𝑚𝑖𝑛 .
Normalized Preference = 𝑝𝑟𝑒𝑓𝑒𝑟𝑒𝑛𝑐𝑒 −𝑝𝑟𝑒𝑓𝑒𝑟𝑒𝑛𝑐𝑒
.
(2)
𝑚𝑎𝑥
𝑚𝑖𝑛
The range of preference is 0 to 1 after normalization. For each group of stimuli, we set scores of dimensions
dark/light, warm/cold, location accuracy, stability, spatial impression, sound balance, high transparency,
medium transparency, and low transparency to 𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 𝑥5 , 𝑥6 , 𝑥7 , 𝑥8 , 𝑥9 , respectively. Then we can
use Eq. (3) to get the coefficients, 𝛾1 –𝛾9 , for each dimension:
𝑦 = 𝑘𝑥 𝛾 .
(3)
Finally, we formulate the models by nonlinear curve fitting according to Eq. (1). The equations are shown
below, noting that only five dimensions relatively high-correlated to preference were selected for part Ⅰ, i.e.
the instrumental music.
𝑦 = 0.00292(1 + √−9577 + 6.17𝑥32 − 3.12𝑥42 + 2.33𝑥52 − 2.94𝑥62 + 2.96𝑥92 )
(4)
2
2
2
2
𝑦 = 0.00302(1 + √31.6 + 1.68𝑥1 − 1.10𝑥2 + 1.17𝑥5 + 1.01𝑥8 )
(5)
2
2
2
𝑦 = 0.00352(1 + √−19320 + 2.86𝑥3 + 2.86𝑥4 + 4.42𝑥6 )
(6)
The goodness of fit for each equation is 0.2677, 0.1406, and 0.7621, respectively. The fitting correlations
of groups 1 and 2 are relatively low, probably because the scores are strongly sound-dependent and the data
have been merged by combining all stimuli data.
4.
Conclusion
There are significant differences in subjective preference between two devices with similar objective
parameters. The dimensions in timbre, location accuracy, sound balance, and medium transparency
significantly impacted the subjective preference. The influence of multi-dimension scores is also sound
dependent. The nonlinear regression models account for the relation between preference and various
multi-dimension scores for music, vocal and environmental sounds. The model can estimate the
The 28th International Congress on Sound and Vibration (ICSV28), 24-28 July 2022
6
perefernce of environmental sounds accurately with a high regression coefficient. The traditional curve
fitting methods are limited, future work would consider the machine learning for modelling the accurate
preference with multi-dimension auditory quality.
5.
Reference
[1]. [Online] DIGITAL 2022: ANOTHER YEAR OF BUMPER GROWTH - We Are Social UK
[2]. V. Philip, L. M. Stewart. The GoPro gaze[J]. Cultural Geographies, 2014, 24(1).
[3]. F. E. Toole. Loudspeaker measurements and their relationship to listener preference[J]. Journal of The
Audio Engineering Society, 1974, 22:402–415.
[4]. Recommendation ITU-R BS.1284-2[M], General methods for the subjective assessment of sound
quality. Geneva: Electronic Publication, 2019.
[5]. A. Furmann, H. Edward, M. Niewiarowicz, P. Perz. On the correlation between the subjective
evaluation of sound and the objective evaluation of acoustic parameters for a selected source. The
Journal of The Acoustical Society of America. 1990, 38(11): 837–844, 1990.
[6]. David Clark. Precision measurement of loudspeaker parameters. Journal of The Audio Engineering
Society, 1996, 100: 1777–1786.
[7]. H. A. Davi. The Method of Paired Comparisons (2nd Edition). Oxford University Press, New York,
NY, USA, 1999.
[8]. S. Siegel and N.J. Castellan. Nonparametric Statistics for the Behavioural Sciences (2nd Edition).
Mcgraw Hill Higher Education, New York, NY, USA. 1988.
[9]. Zwicker, E. and H. Fastl. 2007. Psychoacoustics Facts and Models. 2nd ed. Springer, Berlin.
[10]. More, S. Aircraft noise metrics and characteristics. Dissertation, Purdue University, West Lafayette,
IN, USA, 2011.
[11]. F.A. Wichmann, N.J. Hill. The psychometric function: I. Fitting, sampling, and goodness of fit,
Perception & Psychophysics, 2001, 63: 1293–1313.
[12]. [Online] MinMaxScaling: Min-max scaling fpr pandas DataFrames and NumPy arrays - mlxtend
(rasbt.github.io)
The 28th International Congress on Sound and Vibration (ICSV28), 24-28 July 2022
7
Download