Involving the crowd: How to complement corpus data in the... As soon as sign language ...

advertisement
Involving the crowd: How to complement corpus data in the process of dictionary making
As soon as sign language corpora are available in an annotated form, they will be an
important resource for sign language research in general and sign language lexicography in
particular. Then, the latter can be based on corpus data as it is standard practice for spoken
language lexicography today. However, due to well-known reasons sign language corpora
will remain rather small in comparison to corpora of spoken languages. As some questions
cannot be answered by a corpus approach at all (e.g. passive vocabulary) and as a corpus of
limited size will not cover low-frequency signs and other low-frequency linguistic phenomena
sufficiently (cf. Zipf’s law), supplementary methods are needed to complement and verify
available corpus data in dictionary compilation.
We propose that methods of crowd sourcing and community sourcing can be put to use to
complement corpus data as well as to verify and complement other data on signs and sign
uses. Our decision was to combine two different strategies to get members of the language
community involved: focus group and online feedback. The focus group approach had been
successfully applied in various dictionary projects before and is a form of community sourcing: Qualified community members commit themselves to the project for a longer period
thus providing continuity and high quality of work. In our case, the focus group consists of 20
signers with high language awareness and some metalinguistic knowledge. They discuss
specific questions on the use of signs that come up in the dictionary compilation process and
that cannot be answered on the basis of available corpus data. This is done by means of
introspection and filmed group discussions, resulting in mostly qualitative data.
In order to involve the language community as a whole (crowd sourcing) we provide an
online feedback platform. Members of the language community may register and answer
questions on signs, their variants, and senses. Results provide evidence for regional
distribution of signs and sign meanings. The answers are analysed quantitatively and provide
information that will complement and verify data from the corpus and other sources. As with
any crowd sourcing approach addressing a rather small community, the crucial point for our
feedback platform is not only how to attract enough first-time users, but also how to make
users check back regularly. Our goal is to keep the number of participants at three times the
number of informants involved in the earlier corpus data collection.
Out of the many approaches suggested in the crowd-sourcing literature, we chose a moderateeffort approach implementing computer game elements such as high-scores and expert levels
that combines well with the target community’s pride of their own language and their support
of the project.
In our paper we introduce the two approaches taken, focussing on the feedback method. We
describe the pre-testing at national and regional Deaf events as well as the IT solution
developed for the online feedback and discuss its advantages and drawbacks.
Download