META-NET and META-SHARE: An Overview

advertisement
Infrastructures and plans
boosting
Language Technology
Research and Innovation
Stelios Piperidis
Athena RC, Greece
spip@ilsp.gr
Co-funded by the 7th Framework Programme of
the European Commission through the contract
T4ME, grant agreement no.: 249119.
Multilingual Europe



Challenge: Providing each language community with the most
advanced technologies for communication and information so that
maintaining their mother tongue does not turn into a disadvantage.
While research has made considerable progress in recent years, the
pace of progress is not fast enough to meet the challenge within the
next 10-20 years.
All stakeholders – researchers, LT user and provider industries,
language communities, funding programmes, policy makers –
should team up for a major dedicated push.
http://www.meta-net.eu
3
Objectives
META-NET is a network of excellence dedicated to fostering the technological foundations of the European multilingual information society.
http://www.meta-net.eu
4
Four EU-Funded Projects




Initial project: T4ME (FP7;
13 partners, 10 countries)
Three ICT-PSP consortia
since Feb. 2011: CESAR,
METANET4U, META-NORD
All EU member states and
several non-member states
covered.
META-NET in Nov. 2012:
60 members in 34 countries.
http://www.meta-net.eu/members
http://www.meta-net.eu
5
META-VISION
Language White Paper Series
http://www.meta-net.eu
6
Language White Paper Series




Reports on the state of our languages in
the digital age and the level of support
through language technology.
Series covers 30 languages.
Key communication instruments to
address decision makers and journalists.
Inform about societal and technological
problems and challenges as well as
economic opportunities.

>2 years in the making.

>200 national experts as contributors.

>8.000 copies printed and distributed to
politicians and journalists.
http://www.meta-net.eu
7
30 Languages Covered










Basque
Bulgarian*
Catalan
Czech*
Danish*
Dutch*
English*
Estonian*
Finnish*
French*










Galician
German*
Greek*
Hungarian*
Icelandic
Irish*
Italian*
Latvian*
Lithuanian*
Maltese*










Norwegian
Polish*
Portuguese*
Romanian*
Serbian
Slovak*
Slovene*
Spanish*
Swedish*
Croatian
* = Official EU language
http://www.meta-net.eu
8
Cross-Lingual Ranking

In four application areas, each language is assigned to one of five
clusters, ranging from excellent LT support to weak/no support:
1. Machine Translation
2. Speech Processing
3. Text Analysis
4. Resources

Results finalised at a meeting
in Berlin with representatives
of all 30 languages
(October 21/22, 2011).
http://www.meta-net.eu
9
MT
excellent
Text
Analysis
excellent
Resource
s
Speech
excellent
excellent
good
moderate
fragmentary
weak or no support
English
French, Spanish
Catalan, Dutch, German, Hungarian,
Italian, Polish, Romanian
Basque, Bulgarian, Croatian, Czech, Danish, Estonian, Finnish, Galician, Greek,
Icelandic, Irish, Latvian, Lithuanian,
Maltese, Norwegian, Portuguese,
Serbian, Slovak, Slovene, Swedish
good
moderate
fragmentary
weak or no support
English
Dutch, French,
German, Italian,
Spanish
Basque, Bulgarian, Catalan, Czech,
Danish, Finnish, Galician, Greek,
Hungarian, Norwegian, Polish,
Portuguese, Romanian, Slovak,
Slovene, Swedish
Croatian, Estonian, Icelandic, Irish,
Latvian, Lithuanian, Maltese, Serbian
good
moderate
fragmentary
weak or no support
English
Czech, Dutch, Finnish,
French, German,
Italian, Portuguese,
Spanish
Basque, Bulgarian, Catalan, Danish,
Estonian, Galician, Greek,
Hungarian, Irish, Norwegian, Polish,
Serbian, Slovak, Slovene, Swedish
Croatian, Icelandic, Latvian,
Lithuanian, Maltese, Romanian
good
moderate
fragmentary
weak/no support
English
Czech, Dutch, French,
German, Hungarian,
Italian, Polish,
Spanish, Swedish
Basque, Bulgarian, Catalan, Croatian,
Danish, Estonian, Finnish, Galician,
Greek, Norwegian, Portuguese,
Romanian, Serbian, Slovak, Slovene
Icelandic, Irish, Latvian,
Lithuanian, Maltese
http://www.meta-net.eu
10
Europe’s Languages and LT
English
good support through
Language Technology
http://www.meta-net.eu
Dutch
French
German
Italian
Spanish
Catalan
Czech
Finnish
Hungarian
Polish
Portuguese
Swedish
Basque
Bulgarian
Danish
Galician
Greek
Norwegian
Romanian
Slovak
Slovene
Croatian
Estonian
Icelandic
Irish
Latvian
Lithuanian
Maltese
Serbian
weak or
no support
11
Key Observations

When it comes to Language Technology support, there are massive
differences between Europe’s languages and technology areas.

LT support for English is ahead of any other language.

Even support for English is far from being perfect.

The gap between English and the other languages keeps widening!


Several languages – Icelandic, Latvian, Lithuanian, Maltese – receive
the weakest score in all four areas!
At least 21 European languages in danger of digital
extinction!(Languages put into the “weak or no support” category at
least once.)
http://www.meta-net.eu
12
META-VISION
Strategic Research Agenda
http://www.meta-net.eu
13
Three Ingredients
Appropriate
Actors
Appropriate
Programme
Research &
Commercialisation
Vision & Agenda
Appropriate
Support
Funding
http://www.meta-net.eu
14
Strategic Research Agenda




META-NET Strategic Research Agenda
for Multilingual Europe 2020.
Addresses the problems we found
during the white paper study.
Three priority research themes and
application/innovation scenarios.
Can put Europe ahead of its competitors
in this technology area.

190+ contributors.

Final version ready today!

SRA will be presented to the EC and
national bodies.
http://www.meta-net.eu
15
Strategic Research Agenda
http://www.meta-net.eu
16
Priority Themes: 3 + 2

Three Priority Research Themes:
 Translation Cloud
 Social Intelligence and e-Participation
 Socially-Aware Interactive Assistant

Two additional themes:
 European Language Technology
Platform
 Core Technologies for Language
Analysis and Production
http://www.meta-net.eu
17
META-SHARE
Open Resource Infrastructure
http://www.meta-net.eu
18
The power of data





Scientific data has the potential to transform and drastically
improve our lives
Evidence from many domains – geo & earth sciences, biotechnology
– shows data & tools become valuable through opening and sharing
 Both for research and technology development & evaluation
 Supporting innovative applications
Making the Human Genome Project results accessible, leveraged ~
€3 billion R&D investment, ~ €500 billion in economic activity
“Alzheimers’ researchers recently pooled genetic data and
discovered 5 new genes and important evidence about the disease”
“Data is too valuable to be locked away”
http://www.meta-net.eu
19
Strategic Research Agenda
http://www.meta-net.eu
21
LRs in the SRA
http://www.meta-net.eu
22
LRs Discovery? Availability?



According to past and recent studies only a portion of language
resources (LRs) is known/ announced / shared / traded / ...
… despite the fact that data collection, cleaning, annotation, curation
and maintenance is a very costly business
To make any progress, enable the development of useful
applications, we need all those scientific, technical, legal,
organisational, societal mechanisms that enable the necessary
resources to be shared, recycled, repurposed
http://www.meta-net.eu
23
META-SHARE rationale



Language resources (data and tools) are dynamic living entities
 they evolve over time in various dimensions (quantity,
annotation levels, conversion to new formats, addition of new
languages)
 they are usually the product of collaborative work
 they may come with varying restrictions, ...
Need solutions that enable every language resource provider, at
any granularity level (individual/lab/organisation), to
 Create his own repository of LRs
 Describe, document and update LR descriptions
 Link to a network of repositories of other providers
 Keep track of the use of his LRs, trade LRs, …
Need solutions that enable every language resource consumer to
 Discover what LRs suitable for his/her purposes exist
 Get information about, download / acquire them
http://www.meta-net.eu
24
META-SHARE: what it is


META-SHARE tries to match LR providers and consumers
needs
and
expectations
by
enhancing
visibility,
documentation, identification, availability, preservation of
language data and (basic language processing) tools
It launches a long-term multidimensional endeavour by which
language resources will contribute to boosting research,
technology and innovation through wide availability,
pooling, openness and sharing
http://www.meta-net.eu
25
META-SHARE architecture
User oriented and
support services
Registration – authentication - authorisation
META-SHARE portal
E
xt
er
n
al
Search / browse
licence
download
statistics
mappings
reporting
recommenders
Billing / payment
META-SHARE
inventory
re
p
os
Resources provision
META-SHARE
services
inventory
META-SHARE
inventory
metadata harvesting
Inventory
LR repo
http://www.meta-net.eu
Inventory
LR repo
…
Inventory
LR repo
Inventory
LR repo
26
META-SHARE provider side

All facilities for creating your
own META-SHARE-compliant
repository and linking to the
META-SHARE network :
 Open source repository software
 Functionalities for
documenting, updating
descriptions, storing/linking
LRs
 Provider support services
(helpdesks, forum, knowledge
base)
 Each repository maintains an
inventory with all LRs MD,
exports MD for harvesting
 Harvested MD are stored in
http://www.meta-net.eu
synchronised central servers
27
META-SHARE user side

Users (LR consumers) can
 search the central inventory
 browse using multiple facets
 access the actual resources by
visiting the respective repositories
to get legally interoperable
licence(s) to download and use
them
 get support through an online
user forum and helpdesks
dedicated to technical,
metadata and legal issues
 access a knowledge base
http://www.meta-net.eu
28
Join META-SHARE as ...
Third Party Consumers
Associate members
Depositing-only Members
Repository Service Providers
Hosting
(non-local)
repositories
Core and User Support
Service Providers
Local
repositories
Legal provisions for LR
sharing

Language Resources Sharing Charter – high level principles

Memorandum of Understanding – aka membership agreement

Licensing templates and deposition agreements
 Inclusive mix of open and openness inspired models
 Creative Commons licences (starting with Creative Commons Zero (CC-0)
and all possible combinations along the CC differentiation of rights of use)
 META-SHARE Commons licences, fully developed CC-based licensing
tool that allows META-SHARE members to make their resources available
inside the network only
 META-SHARE “No Redistribution” licences, allowing use and
exploitation of the Resources while permitting the LR Owner to have full
control over the Resource distribution.
 Software tools and web services are either provided though one of the
standard Open Source licenses or under a custom commercial license.
http://www.meta-net.eu
30
META-SHARE today…


A network of 24 language resources repositories in 19 EU
countries, with >1550 LRs
META-SHARE software, open source, under a permissive licence
(BSD), to set up a language resource repository

Legal instruments catering for a range of uses

Software-based services for both LR providers and LR consumers

User support services
 User Forum
 helpdesks

Mapping services to big resource inventories (CLARIN, OLAC, …)
http://www.meta-net.eu
32
In the immediate future…





More META-SHARE nodes and respective language resources will be
integrated – integration of ELRA supported initiatives, LRE Map,
Language Library
Adoption of the META-SHARE platform and framework by ELRA
Full deployment of the services of META-SHARE members – from
software availability, maintenance and technical assistance to
language resources storage and preservation as well as support
related to metadata and legal issues
Coordination with upcoming initiatives (iCordi, Research Data
Alliance, …)
Official launch : 25 January 2013
http://www.meta-net.eu
33
META-NET
Conclusions
http://www.meta-net.eu
34
Conclusions



Our white paper press campaign shows that Europe is extremely
interested in and passionate about its languages.
Two Parliamentary Questions in the European Parliament on the
“digital extinction of languages” topic.
Now is the time to move forward with a continent-wide, systematic
push and to invest in strategic research.

A modest investment is required.

This push will generate a countless number of opportunities.

Horizon 2020 and Connecting Europe Facility can provide sufficient
resources to make our visions for Europe’s citizens and economy a
reality.
http://www.meta-net.eu
35
http://www.meta-net.eu
36
Q/A
Thank you very much!
office@meta-net.eu
http://www.meta-net.eu
http://www.facebook.com/META.Alliance
37
Download