Aline Remael

advertisement
Live subtitling with speech
recognition
Pilot research project and training at the University of Antwerp and Artesis
University College.
I. Research: Tijs Delbeke (research assistant), Mariëlle Leijten, Aline
Remael & Luuk Van Waes (supervisors)
II.Training: Veerle Haverhals (Artesis/VTM)
Today’s programme
I. Research at UA-AHA (Oct. 2008-Jan.2009)
1. Observational research
2. Experimental research (data to be processed)
II. Training: research & practical at UA-AHA
1. MA dissertations (UA & Artesis)
2. Within the MA in translation/interpreting at Artesis
3. Course structure & content at Artesis
1
Purpose of the Research
Short term:
• Create a classification of different types of reduction,
error (production), delay and their interaction (delay =
dependent variable)
Longer term:
• Identify the ‘ideal reduction rate’
• Identify the ideal respeaker-profile
• Improve live-subtitling procedures
2
Two stages in research:
both with ‘Inputlog’
Observational research
Experimental research
‘Real live’ footage
Recorded ‘as live’ footage
Sports programs
Talk show
Observational
Experimentally controlled
3
Participants
•
•
•
•
12 live subtitlers
Flemish Public Television (VRT)
8 men, 4 women
Various experience levels (1-7 years)
4
I. 1. Observational Research
1.
2.
3.
4.
5.
Live subtitling process: a schematic overview
Corpus
Reduction
Delay
Error production
5
1.1. Production of live subtitles: overview
spoken
> respeaking > speech
tv comment
recognition
(1)
(2)
>
subtitle
(3)
x
x+t
reduction
correction
error production
delay
6
1.2. First corpus
•
•
•
•
•
•
•
Flemish Public Television (VRT)
15 hours of sports programs
Transcriptions & broadcast subtitles
Time stamps
Character & word counts
Audio recordings
Detailed logging data (inputlog)
- Speech input
- Keystrokes
- Mouse movements
7
1.3. Reduction
• Verbatim vs. reduced/summarized/edited/condensed
 Continuum
 Largely program dependent
 Reduction crucial:
- Slower readers
- Speech recognition constraints
- Quantitative analysis
- Qualitative analysis
8
Reduction
Quantitative analysis
• -30% (football)
• -45% (tennis)
• -60 % (cycling)
Reduction table, example
9
Reduction (2)
Qualitative analysis
• Causes of reduction
• Reduction classification
- Literature: only vaguely
- 3 main classes
- 30 categories
10
Reduction (3)
Qualitative analysis
- Reduction to prevent delay (49%)
- Forced Reduction (22%)
- Time-induced reduction (15%)
11
Reduction (4)
Qualitative analysis
• Prevention of delay
- Deletion of redundant info
Repetition, obvious element, hesitation, interjection, …
SUBTITLE
SPOKEN COMMENT
But they can forget about that, I think.
But they can forget about that, I think.
They can forget about that
- Substitution
Names, metaphors, idioms, …
12
Reduction (5)
Qualitative analysis
•
•
Forced reduction
Erroneous grammatical construction, too difficult for respeaker/speech recognizer,
meaning unclear,…
Time-induced reduction
Complicated interaction, sudden event, prepared title coming up, not relevant
anymore,…
SUBTITLE
SPOKEN COMMENT
Cercle very dangerous using that
combination.
Iachtchouk. De Smet. Passes back. Van
Mol. De Sutter. Crosses. Yes. Cercle
Brugge very dangerous using that
combination.
13
1.4. Delay
Factors
• Block mode vs. scrolling mode
• Additional corrector vs. self correction
• Reduction degree (mutual process)
Delay table, example
• 6 sec : cycling
• 11 sec : football & tennis
(-30% red.)
(-45 & -60% red.)
14
1.5. Error production
• 6 fragments of 60 titles
Quantitatively
Pure recognition:
• Title: 72,22% (7 out of 10 titles correct)
After correction:
• 84% corrected --> 93% titles correct.
• 22% by respeaker vs. 78% by corrector
• 12% with speech vs. 88% with keyboard and mouse
15
1.5. Error production (2)
Qualitatively
• Classification model
• Based on Karat (1999) & Leijten (2007)
16
1.5. Error production (3)
1. Technical errors (71,6%)
- a. Erroneous Recognition
» i.
One word
» ii.
Multiple words
» iii.
Proper names (20,6%)
» iv.
Geographical names
- b. Erroneous Interpretation
» i.
Command as text
» ii.
Text as command
» iii.
Word as letter
» iv.
Letter as word
» v.
Abbreviation or acronyms as words
- c. Programming Errors
» i.
Grammatical error
» ii.
Background noise as text
» iii.
Crash
17
1.5. Error production (4)
2. Human errors (14,3%)
- a. (Corrector)
- b. Respeaker
» i.
Misinterpretation
» ii.
Wrong word
» iii.
Additions or transformations
» iv.
Formal revision
3. Technical & Human errors (1,6%)
- Slurred speech/mumbling or inaccurate recognition?
4. Other Errors (12,5%)
18
2. Experimental Research
• Infotainment talk show ‘Phara’
• 3 excerpts (15 minutes)
19
2.1 Method: procedure
• Backward Digit Span
• Reading task
• Verbatim subtitling (9 min)
Aim at 100% subtitling. Quantity > Quality.
• Summarized subtitling (15 min)
Aim at 50 % subtitling. Quantity = Quality. (usual)
• Heavily reduced subtitling (15 min)
Aim at 25 % subtitling. Quantity < Quality. (no errors)
• Concluding interview
20
2.2 Results
Quantitative analyses of 1 excerpt
• Reduction
• Error production
• Relation reduction & error production
21
2.2 Results: Reduction (1)
Subtitling percentage in function of reduction mode
100%
12,50
Subtitling %
80%
11,50
11,00
60%
10,50
10,00
40%
9,50
9,00
20%
Titles per minute
12,00
8,50
0%
8,00 Demanded subtitling %
1
2
Reduction Mode
3
Performed subtitling %
Subtitles (number)
22
2.2 Results: Reduction (2)
• Fairly inaccurate execution of demanded reduction mode
- Subtitling percentage lower than demanded
• Verbatim (100%)  51%
• Summarized (50%)  38%
 Important: Theoretical Optimum
Stop words
Repetitions
Hesitations …
- Subtitling percentage higher than demanded
• Highly reduced (25%)  35%
23
2.2 Results: Reduction (3)
• Reduction mode affects number of broadcast subtitles
 Less reduction = more titles
• Reduction mode moderately affects subtitle length
 Longer titles for verbatim mode
24
2.2 Results: Error Production
Error rate
30,00%
25,00%
Error %
20,00%
Title level
15,00%
Word level
10,00%
5,00%
0,00%
1
2
3
reduction m ode
25
2.2 Results: Error Production (2)
Accuracy per reduction mode
Level
Title level
Word level
Red. Mode
Verbatim
73%
95%
Summarized
89%
98%
Highly reduced
96%
99,5%
26
2.3 Concluding remarks
• Indication of maximal performance (verbatim subtitling)
• Error in 3 out of 10 subtitles
• Indication ‘normal’ performance
• Error in 1 out of 10 subtitles
• Subtitle production drops after 10 minutes
• More reduction yields more accurate subtitling
27
II. Training: 1. MA dissertations
MA dissertations in support of ongoing research: error
analyses, trial classifications, reception research, Dragon
training, …
- UA (Master in multilingual business communication)
- Artesis (Interpreting, 2007-2008)
28
II. Training: 2. Interpreting –general (1)
At Artesis:
- MA in Interpreting
- European Master in Conference Interpreting
29
II. Training:
2. Interpreting - general (2)
At Artesis:
MA in Interpreting= initiation in different types
Community Interpreting
Business Interpreting
Includes consecutive interpreting, speech training, research topics,
institutions, …
Option: Live subtitling with speech recognition (Dragon)
30
II. Training:
2. Interpreting – Live subtitling
Research training (beside MA theses)
- Within interpreting programme Artesis
- Within AVT programme Artesis
Practical training
- Within translation programme: subtitling (sem. 1)
- Within interpreting programme Artesis: live ST (sem 2)
Veerle Haverhals: MA in interpreting and full time respeaker
at VTM
31
II. Training:
2. Interpreting – Live subtitling: course topics practical
training (1)
- Initiation to DRAGON: make a profile, try out all the functions, add
terminology and test it.
- Working with codes, anticipating mistakes (e.g. TOX-Leterme)
- Test accuracy of the above with CRER (terminology added/or not,
terminology without ‘TOX’): get acquainted with errors.
32
II. Training:
2. Interpreting – Live subtitling: course topics practical
training (2)
-
Live subtitling in Flanders & the Netherlands: programmes, challenges, speed,
different speakers + examples
-
Visit to VRT: live cycling session
-
Introduction to “News production” at VTM, in preparation of internship at
VTM
33
II. Training:
2. Interpreting – Live subtitling: course topics practical
training (3)
Series of sessions to train respeaking
(to be expanded)
- Summarizing for deaf/hard of hearing (choice of words)
-
The use of colours (or not)
-
Multi-tasking in real time:corrections, colours
-
Seek compromise: completeness/errors
34
II. Training:
2. Interpreting – Live subtitling: course topics practical
training (4)
Special issues:
-
Linguistic variation (or not)
-
Onomatopeia (or not)
35
II. Training:
2. Interpreting – Live subtitling: course topics practical
training (4)
One day internship at VTM
- Watch news broadcast + question time
- Live simulation of the one o’clock news
Preparation (cf. above):
Learning to use the software(s), marking live passages, combining prepared
with live, studying key codes, forwarding the subtitles, correcting and
forwarding, …
.
36
Literature
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Baaring, I. (2006). "Respeaking-based online subtitling in Denmark." InTRAlinea. SPecial issue: Respeaking.
Daelemans, W., A. Höthker, et al. (2004). "Automatic Sentence Simplification for Subtitling in Dutch and English." Proceedings of
the 4th International Conference on Language Resources and Evaluation: 1045-1048
de Korte, T. (2006). "Live inter-lingual subtitling in the Netherlands." InTRAlinea. SPecial issue: Respeaking.
Den Boer, C. (2001) “Live interlingual subtitling.” Gambier & Gotlieb (2001)
Gambier, Y. and H. Gottlieb, Eds. (2001). (Multi) Media Translation. Concepts, Practises, and Research.
Jones, R. (2002). Conference Interpreting explained.
Karat, C. et al. (1999). “Patterns of entry and correction in large vocabulary continuous speech recognition systems.” Paper
presented at the CHI 99, Pittsburg.
Lambourne, A. (2006). "Subtitle respeaking." InTRAlinea. SPecial issue: Respeaking.
Lambourne, A., J. Hewitt, et al. (2004). "Speech-based Real-time Subtitling Services." International Journal of Speech Technology
7: 269-279.
Leijten, M. (2007). “Writing and Speech Recognition: Observing Error and Correction Strategies of Professional Writers.”
Utrecht: LOT
MacArthur, C. A. (2006). The Effects of New Technologies on Writing Processes. Handbook of Writing Research. C. A.
MacArthur, S. Graham and J. Fitzgerald.
Mack, G. (2006). "Detto scritto: un fenomeno, tanti nomi." inTRAlinea. SPecial issue: Respeaking.
Ogata, J. and M. Goto (2005). "Speech Repair: Quick Error Correction Just by Using Selection Operation for Speech Input
Interfaces." Proceedings of Interspeech 2005: 133-136.
Remael, A. (2004). Vertaling in beeld: audiovisuele vertaling en ondertitels.
Robson, G. D. (2004). The closed captioning handbook.
Slembrouck, S. and M. Van Herrewege (2004). Teletekstondertiteling en tussentaal: de pragmatiek van het alledaagse.
Schatbewaarder van de taal. Johan Taeldeman. Liber amicorum. J. De Caluwe, G. De Schutter, M. Devos and J. Van Keymeulen.
van der Veer, B. (2008) De tolk als respeaker: een kwestie van training.
Wald, M., Boulain, P., Bell, J., Doody, K. and Gerrard, J. (2007) “Correcting Automatic Speech Recognition Errors in Real Time.”
International Journal of Speech Technology
37
Thank you for your attention
Download