STEVIN Objectives

advertisement
Results of the STEVIN
programme
STEVIN Final Event, Rotterdam, Nov 28 2011
Jan Odijk
Overview
• STEVIN Objectives
• Digital Language Infrastructure
– Creation
– Resource Management
– IPR
• Strategic Research
• LST Community Consolidation
• Various Statistics
2
STEVIN Final Event, Rotterdam, 28 Nov 2011
STEVIN Objectives
• Digital Language Infrastructure (DLI)
• Strategic Research (SR)
• LST community consolidation (CC)
3
STEVIN Final Event, Rotterdam, 28 Nov 2011
Overview
• STEVIN Objectives
• Digital Language Infrastructure
– Creation
– Resource Management
– IPR
• Strategic Research
• LST Community Consolidation
• Various Statistics
4
STEVIN Final Event, Rotterdam, 28 Nov 2011
Digital Language Infrastructure
• Creation
• Resource Management
• IPR
5
STEVIN Final Event, Rotterdam, 28 Nov 2011
Overview
• STEVIN Objectives
• Digital Language Infrastructure
– Creation
– Resource Management
– IPR
• Strategic Research
• LST Community Consolidation
• Various Statistics
6
STEVIN Final Event, Rotterdam, 28 Nov 2011
DLI: Creation
• Priorities for written language:
a. A large corpus of written Dutch
b. An electronic lexicon
c. Parallel corpora
7
STEVIN Final Event, Rotterdam, 28 Nov 2011
Realisation: Written (1)
• D-COI + SONAR: 500M word corpus (a)
• LASSY: 1M word Treebank (a)
• CORNETTO: 40k entry lexical semantic
database (b)
• DPC: 10M word parallel corpus D-E / D-F (c )
8
STEVIN Final Event, Rotterdam, 28 Nov 2011
Realisation: Written (2)
• COREA: co-reference corpus (a)
• IRME: 5k MWE lexical database (b)
• DAESO: 1M word monolingual parallel corpus
(c)
• DAISY (a)
• DUOMAN (a)
• PACO-MT (a,c)
9
STEVIN Final Event, Rotterdam, 28 Nov 2011
Creation: Priorities Speech (1)
a. speech and multimodal corpora for
CALL, NAW, CCQA applications
b. multimodal corpora for
–
–
broadcast news transcription or
person identification;
c. text corpora for stochastic language
models;
10
STEVIN Final Event, Rotterdam, 28 Nov 2011
Creation: Priorities Speech (2)
d. tools and data for the development of
–
–
robust speech recognition;
automatic annotation of corpora;
e. speech synthesis;
11
STEVIN Final Event, Rotterdam, 28 Nov 2011
Realisation: Speech (1)
•
•
•
•
•
12
Autonomata (a, NAW; e)
JASMIN-CGN (a, CALL)
D-COI + SONAR (c )
SPRAAK (d)
STEVINcanPRAAT (d)
STEVIN Final Event, Rotterdam, 28 Nov 2011
Realisation: Speech (2)
• Missing
–
(b) Multimodal corpora
• But partially covered by other projects
–
–
13
EU: AMI, AMIDA (U Twente)
NL: IMIX
STEVIN Final Event, Rotterdam, 28 Nov 2011
Overview
• STEVIN Objectives
• Digital Language Infrastructure
– Creation
– Resource Management
– IPR
• Strategic Research
• LST Community Consolidation
• Various Statistics
14
STEVIN Final Event, Rotterdam, 28 Nov 2011
DLI: Resource Management
• HLT Agency set up
• See presentation by Remco van Veenendaal
15
STEVIN Final Event, Rotterdam, 28 Nov 2011
Overview
• STEVIN Objectives
• Digital Language Infrastructure
– Creation
– Resource Management
– IPR
• Strategic Research
• LST Community Consolidation
• Various Statistics
16
STEVIN Final Event, Rotterdam, 28 Nov 2011
DLI: IPR
• Systematic attention for IPR & Ethical Issues
from the start
– Not easy but
– The only way to ensure usage of LRs by the R&D
community in a legal manner
• Specific regulation on how to deal with IPR in
the STEVIN programme and projects
17
STEVIN Final Event, Rotterdam, 28 Nov 2011
Overview
• STEVIN Objectives
• Digital Language Infrastructure
– Creation
– Resource Management
– IPR
• Strategic Research
• LST Community Consolidation
• Various Statistics
18
STEVIN Final Event, Rotterdam, 28 Nov 2011
Strategic Research
• Will be dealt with by Walter in his
presentation
• Work programme lists examples of applications
– how do STEVIN projects contribute to such
applications (directly or indirectly)
19
STEVIN Final Event, Rotterdam, 28 Nov 2011
SR: Applications (1)
• Information extraction from Speech:
– Rechtspraakherkenning, NEON, and SNRT
– AUTONOMATA, JASMIN-CGN, SPRAAK,
STEVINcanPRAAT, N-BEST, AUTONOMATA TOO and
MIDAS.
• Detection of accent and identity of speakers.
– JASMIN-CGN, SPRAAK, DISCO, Diademo,
Rechtspraakherkenning
20
STEVIN Final Event, Rotterdam, 28 Nov 2011
SR: Applications (2)
• Extraction of information from (monolingual
or multilingual) text.
– DAESO, DUOMAN, Gemeenteconnect and YourNews.
– COREA, IRME, D-COI, SONAR, DPC, LASSY,
CORNETTO, and PACO-MT
• Semantic web:
– CORNETTO, D-COI and SONAR
21
STEVIN Final Event, Rotterdam, 28 Nov 2011
SR: Applications (3)
• Dialogue systems and Q&A solutions
– DAISY, DUOMAN, Gemeenteconnect, Web Assess.
• Automatic summarization and text
generation
– DAESO, Web Assess
– D-COI and SONAR,
22
STEVIN Final Event, Rotterdam, 28 Nov 2011
SR: Applications (4)
• Automatic Translation
– DPC, PACO-MT
– D-COI, SONAR, LASSY, IRME, COREA, CORNETTO
• Educational systems
– DISCO, SpelSpiek, Primus, HATCI, WooDy, AAP
– All resource creation projects
23
STEVIN Final Event, Rotterdam, 28 Nov 2011
Overview
• STEVIN Objectives
• Digital Language Infrastructure
– Creation
– Resource Management
– IPR
• Strategic Research
• LST Community Consolidation
• Various Statistics
24
STEVIN Final Event, Rotterdam, 28 Nov 2011
LST Community Consolidation
•
•
•
•
•
25
Create networks
consolidate LST activities
educate new experts
promote discussion
promote transfer of knowledge
STEVIN Final Event, Rotterdam, 28 Nov 2011
LST Community Consolidation
• Set aside a specific budget and a
dedicated WG
• joint KI/SME and NL/FL projects
preferred
– 330 binary cooperation link occurrences
• demonstration projects stimulated
companies to participate
26
STEVIN Final Event, Rotterdam, 28 Nov 2011
LST Community Consolidation
• Educational projects (3)
• Master classes (2)
• Networking events organized
– brokerage events, “Taal in Bedrijf”
(‘language@work’), STEVIN programme
meetings, etc..
• Networking events supported
27
– e.g. CLIN,
InterSpeech2007,
ICT-Delta
STEVIN Final Event, Rotterdam, 28 Nov 2011
Overview
• STEVIN Objectives
• Digital Language Infrastructure
– Creation
– Resource Management
– IPR
• Strategic Research
• LST Community Consolidation
• Various Statistics
28
STEVIN Final Event, Rotterdam, 28 Nov 2011
Money Distribution
•
•
•
•
•
29
R&D
Demonstration
Supporting Activities
HLT Agency
STEVIN Management
STEVIN Final Event, Rotterdam, 28 Nov 2011
(76.0%)
( 8.5%)
( 6.0%)
( 2.5%)
( 6.5%)
Strata Coverage
•
•
•
•
30
Basic resources for LST
Basic Research
Application-oriented Res.
Demonstration projects
STEVIN Final Event, Rotterdam, 28 Nov 2011
(51.1%)
(23.3%)
(15.4%)
(10.2%)
NL / FL Proportion
• R&D Projects
• Demonstrator projects
• Overall
63%:37%
66%:34%
64%-36%
• Educational projects (3)
• Master classes (2)
68%:32%
100%:0%
31
STEVIN Final Event, Rotterdam, 28 Nov 2011
KI / SME Proportion
•
•
•
•
•
•
32
Money
R&D projects by project
R&D projects by #participations
Demonstration projects
Master classes
Education activities
STEVIN Final Event, Rotterdam, 28 Nov 2011
83%: 17%
19 : 13
80%: 20%
15%: 85%
0%:100%
83%: 17%
Language / Speech
• Money:
33
53.1%:46.9%
STEVIN Final Event, Rotterdam, 28 Nov 2011
Funded v. Submitted
•
•
•
•
•
•
R&D count 1
19/52 (36.5%)
R&D count 2
19/68 (27.9%)
Demonstration
14/41 (30.0%)
Educational
3/ 5 (60%)
Master Classes
2/ 3 (66.6%)
Most proposals were very good
– So many more could and should be done
34
STEVIN Final Event, Rotterdam, 28 Nov 2011
Thanks for your Attention!
35
STEVIN Final Event, Rotterdam, 28 Nov 2011
DO NOT GO BEYOND THIS SLIDE
DO NOT GO BEYOND THIS SLIDE!
36
STEVIN Final Event, Rotterdam, 28 Nov 2011
Strategic Research
• Priorities written language:
a. semantic analysis (tagging, integration with syntax
and morphology)
b. text pre-processing (tokenization, spelling
correction, named entity recognition, ...)
c. morphological analysis (compounding and
derivation)
d. syntactic analysis: a robust parser for Dutch
37
STEVIN Final Event, Rotterdam, 28 Nov 2011
SR: Realisation Written (1)
• COREA: co-reference resolution (a)
• IRME: MWE identification + lexical
representation (d, a)
• LASSY: parser (d)
• DAESO: semantic relations and text-to-text
generation (a)
38
STEVIN Final Event, Rotterdam, 28 Nov 2011
SR: Realisation Written (2)
•
•
•
•
DAISY: automatic summarization (a)
DUOMAN: attitude detection (a)
PACO-MT: Machine translation (d, a)
D-COI / SONAR (a, b)
• Lacking: (c): morphological analysis for
derivation and compounding.
39
STEVIN Final Event, Rotterdam, 28 Nov 2011
SR: Priorities Speech
a. robustness of speech recognition;
b. output treatment (inverse text
normalization);
c. confidence measures;
d. adaptation;
e. lattices.
40
STEVIN Final Event, Rotterdam, 28 Nov 2011
SR: Realisation Speech
•
•
•
•
•
•
41
AUTONOMATA (a)
MIDAS (a)
N-BEST (a )
SPRAAK (a,b,c,d,e )
DISCO: (a + CALL priority)
AUTONOMATA TOO (a)
STEVIN Final Event, Rotterdam, 28 Nov 2011
Download