Speech signal processing for media accessibility

advertisement

ITU Workshop on

“Making Media Accessible to all:

The options and the economics”

(Geneva, Switzerland, 24 (p.m.) – 25 October 2013)

Speech signal processing for media accessibility

Takayuki Ito, Dr. Eng.

Executive Research Engineer,

NHK Engineering System, Inc.

itou.takayuki@nes.or.jp

Geneva, Switzerland, 24 October 2013

Ageing : A Global Issue

Population of elderly persons is increasing globally because of fertility rates decline.

Japan

2010

Aged

65 and over

23%

2040 36%

Need providing elderly persons with the opportunity to continue contributing to society.

(UN 2002 Madrid International Plan of Action on Ageing)

From “supported” to “supporting”

Geneva, Switzerland, 24 October 2013

2

Ageing : degradation of hearing

Hearing loss especially in higher frequencies

Hearing Aid is available.

Background sound interferes to understand speech.

Better mixing balance for TV programs is needed.

Degradation of cognitive speed

Slower speech rate is preferable.

Compensating these degradations makes easier for their social participation.

Geneva, Switzerland, 24 October 2013 3

Speech rate conversion technology

Geneva, Switzerland, 24 October 2013

4

Speech rate conversion for elderly people

The elderly sometimes claim “Recent speeches on

TV programs are too fast for me to understand.”

A need to slow down speech rate without degrading sound quality

Faster

Original

Slower

①②④⑤⑥⑦⑨⑩

× × time

① ② ③ ④ ⑤ ⑥ ⑦ ⑧ ⑨ ⑩ time

①② ③ ③ ④⑤⑥⑦ ⑧ ⑧ ⑨⑩ time stop

① ② ③ ④ ⑤ ⑥ ⑦ ⑧ ⑨ ⑩

Analog elongation time

TV and radio set with “Slow button”

Geneva, Switzerland, 24 October 2013

5

Original

Speech rate conversion without changing length streaming data

Stop

Converted

Start is coincided at blue line positions

Geneva, Switzerland, 24 October 2013

Start is not coincided but…

Again it coincides

6

Intelligible high speed speech for visually impaired people

Original

( n times )

Visually impaired people use fast replay to find a main idea in audio books or web pages.

( Audio skimming ) recorded data

F

G

Important part ( speech ) E

BGM silent speech J

Stop

Converted

( same length)

Make slower Make slower time

Make this part easier to understand

7

Geneva, Switzerland, 24 October 2013

Applications of speech rate conversion

For elderly people

Learn foreign language

For people with learning disability slower

Quick news internet service

Audio skimming for visually impaired people faster

8

Geneva, Switzerland, 24 October 2013

Geneva, Switzerland, 24 October 2013

clean audio

9

A TV receiver with clean audio dial

Various ways to realize this.

For detailed information, please see FG AVA TR Part 12.

10

Geneva, Switzerland, 24 October 2013

Receiver-side re-mixing for the elderly

( Clean Audio)

Separate speech from background sound by stereo correlation.

Estimated speech component is enhanced for clearer speech.

Speech and BG sound is re-mixed with favorite ratio.

Nothing is necessary to change in production and transmission.

Broadcast

Sound

Stereo signal adaptive filter

Estimated speech

Estimated

BG sound spectrum emphasiz

-er

× α

× β

×

×

γ

η

Re-mixing speech and

BG with specified ratio

Output

Sound

Voice detector

Speech / non-speech flag

11

Geneva, Switzerland, 24 October 2013

Demonstration of the receiver-side clear audio

Geneva, Switzerland, 24 October 2013

12

Conclusions and Recommendations

Compensating degraded functions of the elderly helps their social participation.

Speech rate conversion and re-mixing F/B sounds are promising technologies for these purpose.

Broadcasters/TV manufacturers are encouraged to provide these services/ devices with these functions.

Refer FG AVA Tech. Report Part 12 for more information.

Geneva, Switzerland, 24 October 2013

13

Geneva, Switzerland, 24 October 2013

14

Clear audio in studio :

Mixing balance meter

Mixing balance meter

Indicate loudness-based mixing balance

“ Elderly emulation mode ” indicates better mixing for the elderly.

Young mixing engineers can produce better balanced audio for the elderly.

Speech

(narration etc.)

Background sounds

Calculates

Loudness

&

Estimate   the   favorability   of   the   MIX ‐ Level

Studio

Mixed sound

Mixing balance meter

15

Geneva, Switzerland, 24 October 2013

Download