Working group "Corpora for the needs of language

advertisement
Nordic co-operation on sign language language planning
Working group "Corpora for the needs of language planning"
Report of the Oovoo meeting 8.6.2012, 10.00 – 11.30 (Swedish/Danish time)
Present:
Jette Hedegaard Kristoffersen
Tommy Lyxell
Leena Savolainen (contact person of the group, wrote this report of ooVoo meeting)
Absent:
Tomas Hedberg
1) Clarifying shortly the task of the corpus group.
THE TASK IS to write an application for Nordic co-operation in SL language planning
corpus work (i.e. the task is not to do the corpus work, or even not to make decisions on
what these corpuses should look like).
We can apply for the Nordic money to be able to discuss, e.g., the following issues:
- What are the needs of SL language planning in respective Nordic country or sign
language? What are the existing traditions (if any) in each country of making SL
language planning? Shall we plan the corpus according to the needs of these
traditions or shall we also attempt to create new means of doing language planning
(and the big question "what is (or can be) language planning of sign languages)?
- What kind of language material would a corpus for language planning consist of?
Planned language usage ("standard language usage", like tv news, signed texts
published by the authorities on the Internet etc.), like Leena and Karin suggested, or
something totally different, like Jette and Tommy were seeing the issue.
- What would be the essential annotation types ("annotation lines" e.g. in Elan) that
language planners would need?
Leena's comment (not discussed during Friday's ooVoo meeting):
While writing this report, I began to think, that it may be difficult (or impossible) to
separate the discussion on "What is the language planning of sign languages" from the
discussion on "What is a corpus made for the needs of SL language planning?". So, my
question is, should we actually apply the money for a seminar or a series of seminars
where we would first discuss the issue of "language planning of SLs in general" and
only after that the issue of corpuses made for the needs of language planning? To begin
with, what term(s) shall we use to refer to what we are doing: språkvård, språkplanering,
språkguidning? (language guidance)…
In the ooVoo meeting held in 13 June 2012 was decided not to start with a separate
seminar of SL language planning issues, but integrate the matter into corpus project.
2) Report from LREC "Corpus & lexicon workshop" (5th Workshop on the Representation
and Processing of Sign Languages: Interactions between Corpus and Lexicon. Language
Resources and Evaluation Conference (LREC) Istanbul, May 2012 http://www.signlang.uni-hamburg.de/lrec2012/programme.html).
No real report was given, as it turned out to be difficult to do that shortly. There in LREC
workshop were so many different kinds of approaches to SL corpuses. But the issue of
glosses, that was discussed quite widely at the LREC workshop, seemed to interest us three
as well.
Glosses are used in all sign language corpuses aroud the world, as they are the only easily
accessible, human-readable "names" (indexes) for the signs, and with them it is, e.g.
possible to show in a compact way which signs are used in a signed sentence on a video. If
lucky, one may find in a corpus, hundreds of contexts where the particular sign is used.
These instances of usage are convenient to look through as list a of glossed sentences.
Despite their handyness glosses also cause big problems, as they distort people's
understanding of the meanings of the signs. Sign's meaning is almost always in many ways
different than the word used as a gloss. And further, glosses tend to bind sign language (at
least in laymen's minds) to a particular spoken language, and people may get a picture, that
there would be a one to one relation between that particular spoken language and the sign
language in question. The problems that glosses cause could be one of the issues that would
be discussed along with the co-operation in corpus work. Even though we probably cannot
get rid of the glosses, we could try to find other ways to refer to the signs (e.g. stillphotos),
alongside the glosses and together develop their production processes.
3) Co-operation within each country.
CLARIN http://www.clarin.eu/page/2799 was shortly discussed and we agreed on that in each
country those responsible of national CLARIN activities should be contacted and
negotiations should be started in order to get they interested to take material of national sign
languages too as part of their language banks. In Sweden and Finland the contacts to
CLARIN has already been established, and it is thus possible to mention these contacts
already in the application. (and Leena adds outside our ooVoo meeting:) Other countries
may need more time for creating their own network to national spoken language corpus
people, and this work could be linked to the activities we are applying the money for. (?)
4) What are the frames for the application (schedule, demands, format).
No desicions have yet been made whether we will apply for the money for all three themes
(information on SLs, language planning corpus, material for teaching and learning SLs) this
year. Neither do we know where the groups would send their applications to – to three
different sources for the money?
During the meeting we agreed that Jette contacts Bodil and asks, what would be the best
way to proceed, which would be the possible sources for us to apply for Nordic money etc.
And we know, that the "information on SLs" group has already contacted Bodil in this
matter.
We also agreed, that Leena would write the application in Swedish and Tommy would then
take care of the final look of the text, "make it real Swedish". Jette, Tommy and Thomas
comment the text. The application and the material connected to it will be placed on our new
blog for others outside of our group to see and give their comments.
Another ooVoo meeting was scheduled for Wednesday 13.6. at 10 a.m. Swedish/Danish time
(lasting maximum an hour), as we ran out of time and couldn't agree on all necessary issues such as
who does what and when.
Report of the Oovoo meeting 13.6.2012, 10.00 – 11.10 (Swedish/Danish time)
Present:
Jette Hedegaard Kristoffersen
Tommy Lyxell
Leena Savolainen (contact person of the group, wrote this report)
Absent:
Tomas Hedberg
In the second meeting we continued the discussion about the contents and targets of the corpus
project and about writing the application.
In the application we must define what the language planning of sign languages consists of and how
it is done. But as such a definition is not available, instead we need to explain in the application the
varying language planning situation of each Nordic 6–7 sign languages.
Leena suggests (not discussed in the meeting): Tommy (and Thomas), Jette and Leena
write a short description of the language planning situation in respective country, and
we use these three ”reports” as a model of what kind of information we wish to get
from other Nordic countries or sign languages (Norway, Iceland, Faroe Island and
Greenland).
The corpus can serve as a basis for different ways of doing language planning. And it is always
possible to analyse the material further (add new annotation lines) in order to bring out the features
of the language one wish to look at. (i.e., one material can be used in many ways and it isn’t always
necessary to collect new material for every different purpose of usage.) Did i get this right?
In conclusion we agreed that we would apply the Nordic money for a test corpus for language
planning purposes of each Nordic sign language. The test corpuses would consist of a short sample
(30–60 minutes?) of each Nordic sign language or their variety (sign language used in Greenland).
The outcomes of this project would be:
- a model for the production of a SL language planning corpus
- an agreement (between Nordic countries) on the technical tools to be used for annotation,
storing lexical data and metadata, and the adaptation of those tools to the SL language
planning purposes
- the Nordic sign languages will be included in the national CLARIN language banks of
Nordic countries
- and, of course, we will get a usable though small corpus of each Nordic sign language
- a possible outcome could also be corpuses, that makes it possible to compare the Nordic
sign language with each other
We don’t yet know where we would apply for the money and because of that it is impossible to
know what the application should look like. E.g., for what parts of the test corpus project we can
apply the money for?
Things to be discussed further:
What can be done with the Nordic money and for what do each country/self-governing country
need to get national funding or use already established research (or other) resources?
- A workshop (?)
o planning of the project
o discussion about the issues ”language planning of sign languages” and ”what do
language planners want/need from a corpus”
o decisions on the technical tools to be used in building the corpus (programmes for
metadata, annotation and lexicon)
- equipment (cameras, computers for editing and annotating)
- in each country we need people commited to the project (can we get Nordic money for
that?)
o one person who co-ordinates the work in each country/self-governing country
o people responsible for
 collecting the language material
 editing the material
 producing the metadata
 annotating
To-do list:
-
Jette contacts Bodil Aurstad and asks which would be the possible Nordic sources to apply
for the money.
Leena contacts Han Sloetjes han.sloetjes@mpi.nl at Max Planck Institute who has been
responsible for adapting the lexicon tool LEXUS for sign language purposes. LEXUS could
be used as a tool integrated into Elan with which we can save and organize information
about the signs. With the help of that lexicon one can indicate in the annotation exactly what
sign is used on the video, and the gloss on the annotation line only serves as a human
readable index to the sign. About LEXUS see e.g. http://www.lat-mpi.eu/latnews/tag/lexus/
and http://www.lat-mpi.eu/tools/lexus
Next meeting: Next ooVoo meeting was scheduled for 20 August 9.00–11.30 (Swedish/Danish
time).
Report of the Oovoo meeting 20 August 2012, 9.30 – 11.30 (Swedish/Danish time)
Present:
Jette Hedegaard Kristoffersen
Tommy Lyxell
Leena Savolainen (contact person of the group, wrote this report)
Tomas Hedberg
name? interpreter
Jette: the needs of the Danish Sign Language from the corpus
-
information for building a better lexicon (dictionary)
frequency information on language usage (signs, structures etc.)
information on grammatical features => producing teaching material for SL
Why do we want to create the test corpuses, what are the targets?
-
-
to create tagging conventions (focus on certain parameters – not "all")
to develope the corpus tools
the possibility to do comparison between the Nordic sign languages (the corpus must be
annotated in such a manner)
o To make the corpuses accessible in every country, there must be a spoken language
translation (in a written form) available, that can be understood in all Nordic
countries. The same applies to the glossing.
finding ways to diminish the dependence on glosses (the usage of still-pictures and/or a
notation system to refer to the signs)
Some other thoughts on the test corpuses:
- the lenght: 5 + 5 + 5 minutes of different kinds of signed discourse
- a suggestion: 5 minutes of each language annotated on a phonological level
o Jette: in the German SL corpus project they have done much for the development of
phonological annotation
Jette and Leena:
The tags are not usually the problem, but their labels are. Linguists often call one phenomena by
different names depending on the linguistic theory they use ( the definition of the phenomena may
vary as well).
A comprehensive guide for annotation: Johnston, Trevor 2011. Auslan Corpus Annotation
guidelines. http://www.auslan.org.au/about/annotations/
By Jette:
Resume of talk with Bodil Aurstad on how to finance the Nordic Corpus Corporation Project
Nordplus was first for Bodil to suggest: Nordplus covers more than 50%. There exist a handbook
on application which should be read before an application is made
(http://www.nordplusonline.org/sca/media/files/publications_media/ars/ars_brukerveiledning_2012
_paa_norsk ) .
Any informal contact with the administrator is very helpful, but also very important because the
new procedure for treatment of applicants gives the Council of Ministers great power - and actually
an application that the administrator does not believe in doesn’t ever reach the program committee
(formerly the program commission read all applications and could over-rule one of Ministers
setting).
The linguistic expert in both the program committee and expert comitee is Jørn Lund who may be
contacted and asked if he will assist with guidance.
Velux and AP Moller foundations: We were advised to stress the Nordic cooperation and that the
case of northerners whose language has not been previously described – and that Nordic
cooperation could mean saving money (both volume discounts on technology and manpower in the
development of annotation conventions, etc.) Additonally we should stress that a shared project will
have the output of comparable data.
Nordic Council of Ministers Presidency this year is Norway and it switches to Sweden next year.
The Presidency regularly offers means. We were advised to contact the Swedish since the
Norwegian probably already has allocated most of the funds for this year. The Presidency can be
contacted for an informal chat about the project idea.
Nordic Cultural Fund currently has a strong focus on children and young people and it can make
it difficult for our project, but with the emphasis on the NEWish and again that it concerns a group
of northerners whose language has not previously been described - and not described in context
with each other, we might succeed in getting funding here. We must make and application for the
entire project, the fund then selects the budget posts it want to cover. They can cover up to 85% of a
project but rarely gives more than 300,000 dkr.
Language councils Sweden, Norway and Finland can ask for additional funds from their respective
culture ministries - not so easy in Denmark.
Next meeting: Next ooVoo meeting has been scheduled for 3 October 2012, 10.00–11.30
(Swedish/Danish time).
Download