ITU-T Workshop on

advertisement
"From
ITU-T Workshop on
Speech to Audio: bandwidth extension,
binaural perception"
Lannion, France, 10-12 September 2008
Towards Consistent Assessment of Audio Quality
of Systems with Different Available Bandwidth
Yu Jiao
Sławek Zieliński
Francis Rumsey
Institute of Sound Recording, University of Surrey
Lannion, France, 10-12 September 2008
International
Telecommunication
Union
Outline
New challenges in quality assessment
Attributes to assess
Standard scale
Summary
Demo I: an example of attributes
identification and assessment
Demo II: assessing envelopment, an
example of direct anchoring
Lannion, France, 10-12 September 2008
International
Telecommunication
Union
2
Audio
Where should I go?
Speech
New Challenges in Speech and Audio Quality
Assessment
Is there a need for the development of a new,
more universal standard for audio quality
assessment, regardless of application or
bandwidth?
Universal Method?
Lannion, France, 10-12 September 2008
International
Telecommunication
Union
4
Towards a Universal Method
Two challenges:
Which perceptual attributes to use?
How to calibrate the scale so that the results
obtained from assessment of audio quality in
different applications can be sensibly
compared?
Lannion, France, 10-12 September 2008
International
Telecommunication
Union
5
Outline
New challenges in quality assessment
Attributes to assess
Standard scale
Summary
Demo I: an example of attributes
identification and assessment
Demo II: assessing envelopment, an
example of direct anchoring
Lannion, France, 10-12 September 2008
International
Telecommunication
Union
6
Which Attributes to Use?
Most commonly used attributes:
Speech Quality (ACR, DCR, CCR) – ITU-T
Recommendations
Basic Audio Quality (Continuous scale), Stereophonic
Image Quality, Front Image Quality, Impression of
Surround Quality – ITU-R Recommendations
What about other attributes? (see next slide)
Lannion, France, 10-12 September 2008
International
Telecommunication
Union
7
?
Preference
Basic Audio Quality
Timbral Quality
Spatial Quality
Preferences
Which Attributes to Use?
Judgements
Front Image Accuracy
Timbral Accuracy
Spatial Accuracy
Localisation error
Dynamics
Loudness
Envelopment
Distortions
Noise presence
Width change Distance change
This set is not exhaustive.
Which Attributes to Use?
It is relatively easy to agree upon and standardise high
level attributes
It is more difficult to standardise low-level attributes
Usefulness of low-level attributes is application specific
However, a pool of standardised attributes, together
with associated anchors and scale system, may be of
help.
Lannion, France, 10-12 September 2008
International
Telecommunication
Union
9
Example of Spatial Attributes Pool
– Rumsey(2002)
One of the most systematic attribute pools for spatial audio assessment
Lannion, France, 10-12 September 2008
International
Telecommunication
Union
10
Outline
New challenges in quality assessment
Attributes to assess
Standard scale
Summary
Demo I: an example of attributes
identification and assessment
Demo II: assessing envelopment, an
example of direct anchoring
Lannion, France, 10-12 September 2008
International
Telecommunication
Union
11
Do we need a standard audio quality scale?
It may help to reduce bias in listening tests
Essential for calibration of objective models
It may help to compare results across different applications
with different available bandwidth
Contraction
bias
Range
equalising
bias
Centring
bias
Stimulus
spacing
bias
etc.
Poulton (1989)
Lannion, France, 10-12 September 2008
International
Telecommunication
Union
12
Zielinski, Rumsey and Bech (2008)
Range Equalising Bias
e.g. Wide Band
Stimuli
Lannion, France, 10-12 September 2008
e.g. Narrow Band
Stimuli
International
Telecommunication
Union
13
Range Equalising Bias = “Rubber Ruler Effect”
e.g. Full Band
Stimuli
e.g. Narrow
Band Stimuli
e.g. Wide Band
Stimuli
Range Equalising Bias
– Data taken from Zielinski et
al (2003 and 2005)
• Means and
95% CIs
• Systematic
upward shift
• Max diff ≈ 20
• Absolute
scores?
Lannion, France, 10-12
September
2008
Conclusion:
Do not
put confidence
in the labels
International
Telecommunication
Union
15
Range Equalising Bias
– Zielinski et al (2007)
• Means and
95% CIs
• Systematic
upward shift
• Max diff = 13
• Absolute
scores?
Lannion, France, 10-12 September 2008
International
Telecommunication
Union
16
Range Equalising Bias – Cheer (2008)
Excellent
Good
Fair
Poor
Bad
Test based on ITU-T
P.800
Questions:
• Is “Absolute
Category Rating”
(ACR) really
absolute?
• Do we need a
better calibrated
scale?
Lannion, France, 10-12 September 2008
International
Telecommunication
Union
17
Towards Consistent Assessment across Narrow-,
Wide- and Full-Band Applications
Use a standard scale:
If possible, use physical units that are familiar to the listeners
(Poulton, 1989)
• For example, width of frontal image can be assessed
as an angle expressed in degrees
• Distance between listener and apparent source could
be assessed in metres
• Use an open-ended ratio scale – biased (Narens, 1996)
• Use verbal anchors along the scale (ineffective – see the
previous slide)
• Use auditory anchors (effective but difficult to implement)
Lannion, France, 10-12 September 2008
International
Telecommunication
Union
18
Three Types of Auditory Anchors:
1. Direct Anchors
Listeners are instructed
how to use the scale
relative to two or more
auditory anchors
• Help to define a
“frame of reference”
• Examples of partial
use: ITU-R BS.1116
and ITU-R BS.1534
MUSHRA
2. Indirect Anchors
Anchors are included in the
set of stimuli under
assessment. Listeners are
not instructed how to
assess them. They are
unaware of their purpose.
• Effective bias diagnostic
tool
3. Background Anchors
Anchors are only presented
during familiarisation phase
prior to a listening test but
they are not included in the
proper listening test. Listeners
are not instructed as to how
these anchors relate to the
scale.
• May help to define a
“frame of reference” if
used properly
• They do not calibrate the
scale but “calibrate”
listeners
• Examples of use: ITU-T
P.800 (MNRU reference
quality impairments). Also
3.5kHz anchor in ITU-R
BS.1534 MUSHRA
• Used very rarely
Example of a Scale with Direct Anchors (A & B)
– Conetta (2008)
Only two anchors in this example but more than 2 can also be used.
Lannion, France, 10-12 September 2008
International
Telecommunication
Union
20
Challenges of Direct Anchoring
•
How to choose them in terms of levels of quality?
•
How to make them similar to stimuli under assessment in terms
of perceptual properties?
Anchor 1: Full bandwidth full surround
sound quality?
Anchor 2: CD quality?
Hypothetical
Standard
Quality Scale
Anchor 3: FM radio quality?
Anchor 4: Narrow-bandwidth telephone
quality?
Lannion, France, 10-12 September 2008
Anchor 5: Cascaded codecs? DropInternational
outs? Modulation Noise? Telecommunication
Hard
Union
21
clipping? etc.
Example of Diagnostics with Indirect Anchors
• Indirect anchors – useful
diagnostic tool to check for
bias
• Scores “float” along the scale
• Do not put confidence in labels
Lannion, France, 10-12 September 2008
International
Telecommunication
Union
22
Outline
New challenges in quality assessment
Attributes to assess
Standard scale
Summary
Demo I: an example of attributes
identification and assessment
Demo II: assessing envelopment, an
example of direct anchoring
Lannion, France, 10-12 September 2008
International
Telecommunication
Union
23
Summary
•
A need for the development of a new, more universal standard for
audio quality assessment, regardless of application or bandwidth.
•
More attributes are needed to reveal the nature of quality
degradation.
•
Comparing audio quality across different applications, e.g. with
different audio bandwidth, is problematic due to potentially
inconsistent use of a scale (ill-defined frame of reference).
•
Standard scale needed.
•
Direct anchoring technique could be used but difficult to identify
suitable auditory anchors.
Lannion, France, 10-12 September 2008
International
Telecommunication
Union
24
Demo I:
An example of attributes identification & assessment -Jiao et
al. (2007)
New Developed Codec
Attribute Identification and Selection
Basic Audio Quality
Timbral Distortion
Spatial Distortion
Dynamic Spatial Distortion (DSD)
Level of DSD
Dynamicity of DSD
Scales & test method design
Lannion, France, 10-12 September 2008
International
Telecommunication
Union
26
Lannion, France, 10-12 September 2008
International
Telecommunication
Union
27
BAQ = "0.668# TD " 0.350# LDSD " 0.179# DDSD + 86.45
!
Lannion, France, 10-12 September 2008
International
Telecommunication
Union
28
Demo II:
Assessing envelopment, an example of direct anchoring George et al. (2008)
Interface allowing assessment of envelopment arising from surround recordings.
References
Cheer, J. (2008) The Investigation of Potential Biases in Speech Quality Assessment. Final Year Technical Project.
University of Surrey, UK. Institute of Sound Recording.
Conetta, R. (2007) Scaling and predicting spatial attributes of reproduced sound using an artificial listener, MPhil/PhD
Upgrade Report, University of Surrey, Institute of Sound Recording.
George S. Zielinski, S., Rumsey F., Bech S.,(2008) Evaluating the Sensation of Envelopment Arising from 5-channel
Surround Sound Recordings. presented at the 124th AES convention, Paper 7298,Amsterdam, the Netherlands.
Jiao Y., Zielinski, S., Rumsey F., (2007) Adaptive Karhunen-Loeve Transform for Multichannel Audio. Presented at the
AES 123rd Convention, Paper 7298, New York.
Narens, L. (1996) A Theory of Ratio Magnitude Estimation, Journal of Mathematical Psychology, vol. 40, pp. 109-129.
Poulton, E.C., (1989) Bias in Quantifying Judgments (Lawrence Erlbaum, London) .
Rumsey, R. (2002) Spatial Quality Evaluation for Reproduced Sound: Terminology,Meaning,and a Scene-Based
Paradigm, J. Audio Eng. Soc. Vol. 50, 9, pp. 651-666.
Zielinski, S., Rumsey F., and S. Bech (2003) Effects of down-mix algorithms on quality of surround audio. J. Audio Eng.
Soc. Vol. 51, 9, pp. 780-798.
Zielinski, S., Rumsey F., Kassier R, and Bech S. (2005) Comparison of Basic Audio Quality and Timbral and Spatial
Fidelity Changes Caused by Limitation of Bandwidth and by Down-mix Algorithms in 5.1 Surround Audio Systems.
J. Audio Eng. Soc. Vol 53, 3, pp. 174-192.
Zielinski, S., P. Hardisty, Ch. Hummersone, and Rumsey F. (2007) Potential Biases in MUSHRA Listening Tests.
Presented at AES 123rd Convention, Paper 7179, New York, NY, USA, 5-8 October.
Zielinski, S., Rumsey F., and Bech S. (2008) On Some Biases Encountered in Modern Audio Quality Listening Tests – A
Review, J. Audio Eng., Vol. 56, No. 6, pp. 427-451.
Acknowledgment
Some of this work was carried out with the financial support of
the Engineering and Physical Sciences Research Council, UK
(Grant EP/C527100/1).
Lannion, France, 10-12 September 2008
International
Telecommunication
Union
31
Download