D. Reject. There is nothing in the paper which is... Introduction In the following, I'll call the author of this paper...

advertisement
D. Reject. There is nothing in the paper which is either new nor interesting.
Introduction
In the following, I'll call the author of this paper A. A's main
purpose is to describe a project that he managed within the IBM
company.
This is not correct.
His reasons for doing so is that 1) he feels that his project has
been ignored
This is not correct.
and 2) he feels that there are claims that he has committed fraud
in his IBM morphology project of 20-30 years ago.
Not exactly: In my paper, I document allegations of fraud. (Feelings
are of minor importance in this connection.) In fact, these allegations
are the only reason why I wrote this paper.
These two points will be treated separately below. I will also
look at some (other) main problems of the paper: unclear and
undocumented claims, a whining tone (also with undocumented
claims),
What is this supposed to mean?
ignorance of other people's work, as well as some smaller points,
before I conclude.
1. On A's claim that the project has been ignored
On p.6, A says: "Unfortunately, there are signs today, more than
twenty years after the discontinuation of the Norwegian IBM
project, that such an approach [i.e. A's documention in reports,
reviewer's comment] is not sufficient. The project is hardly ever
mentioned in the relevant literature [...]."
First, I want to contest this claim: The IBM material is
mentioned in several places, here are some that I found by a
quick search:
- on the homepage of the Oslo-Bergen
tagger: http://tekstlab.uio.no/obt-ny/english/read.html
- on the homepage of the Norwegian Word
Bank: http://www.edd.uio.no/prosjekt/ordbanken/
This is not correct. There was no information of the IBM material in
these pages, nor was there any mentioning of it in the pages of Norsk
språkbank at the time I wrote my paper. (See below). In contrast to
other sources, IBM’s basic contribution has even been omitted from
oral presentations of Norsk språkbank at conferences and
presentations, e.g. the one at the Nordic conference on language and
technology, 7-8.- October 2013.
- in Ruth Vatvedt Fjeld's talk on the lexicographic database at
the MONS 8 conference in Trondheim 1999.
http://www.hum.uit.no/mons8/sammendrag/teknologi.html
The information provided in the presentation - one single sentence is misleading.
- in Christian Emil Ore's talk on a text corpus at a corpus
seminar in 1998:
http://www.iet.ntnu.no/~torbjorn/korpus/Hell_referat.html
Correct. So, “hardly ever” seems to be an appropriate characteristic.
Be that as it may. Stating that "Unfortunately, there are signs today,
more than twenty years after the discontinuation of the Norwegian
IBM project, that such an approach is not sufficient.” etc. the “signs”
are contentions that a) essentially the project didn’t exist and that b)
what came out of it was all plagiarism. See below.
Second, even if it had been true, A should not be surprised that
the public are ignorant about his work: Private invitations to
IBM's offices are hardly a way to get the public to know about
one's work (cf. p. 20: " Later, linguists from the University of
Oslo were invited on several occasions, individually and in
groups, to see the products at IBM premises (Kolbotn). "
First of all, this contention is rather peculiar, given the fact that the
reviewer holds, at various instances, that IBM’s contribution is wellknown. Cf. above.
Secondly, this is a clear misunderstanding based on the reviewers
own idea as to what my paper is all about. The point is that linguists
have contended that the research and development did not take place,
or that it only consisted in the copying of other sources, in casu
Bokmålsordboka and Nynorskordboka. My information about visiting
linguists was intended to show that there were even other linguists not involved in the projects - who were able to see with their own
eyes that the project was actually carried out. There are witnesses
outside of the project staff.
Further, several projects that have used resources that have
incorporated the IBM material have been documented
extensively in papers and conferences (see the long list of
references at the end of this review).
As for “the long list of references”, hardly any of the papers listed
mentions the IBM material – and for a good reason: The projects
they report didn’t have anything to do with the IBM material. (See
below.)
As for the few papers actually reporting projects partly based on
IBM material, IBM’s contribution is usually not mentioned when
relevant. Which users were obliged to do according to IBM’s
contract with the University of Oslo, when the material was sold for
a symbolic sum for scientific use only.
A has had ample opportunities to comment or rectify wrong
claims, as indeed he says has been done in two recent papers
from 2009 og 2011 (see my conclusion).
This is not correct. In fact, here the reviewer is led by a strange
conception of time and space. Many years after its publication, I
found out, by chance, that Fjeld 2000, a paper read at the
EURALEX meeting that year, contained allegations of fraud. (The
EURALEX conferences are the biannual conferences of the
European Association for Lexicography.) How could that possibly be
rectified – if not by a personal communication and a paper on the
subject in another forum in 2007 (later to be published as Engh
2009). Neither of which had any effect, though. Cf. Johannessen and
Fjeld 2008 and Fjeld and Henriksen 2012. (Engh 2011 has a different
subject and is irrelevant in this connection.)
2. A's feeling that he has been accused of fraud
Throughout the paper, A hints at the allegations that are later
revealed at p. 23:
“Revealed”? “Documented” is the appropriate term. The
documentation starts at p. 22, not on p. 23.
p. 6: " even allegations of fraud have been levelled against it. "
[i.e. against the IBM lexical material, reviewer's comment]
p. 6: " In order to refute allegations about their origin and their
very nature based on what is - to adopt a benevolent
interpretation - a selective misinterpretation of earlier
documentation attempts. "
p. 7: "Furthermore, the existence of the Bokmål project is the
one that has been most seriously contested afterwards. "
On p. 23, A finally presents the actual document that he feels is
an accusation of fraud on his part.
This is not correct. There are several documents, and two of them are
duly presented at page 22. A third document is presented at page 23,
the one the reviewer chooses to comments.
The document is an unpublished power point presentation of 10
slides that was presented on an internal project meeting five
years ago.
This project meeting was important for two reasons: It was a
meeting of the local Norwegian CLARIN project (“CLARIN is one of
the Research Infrastructures that were selected for the European
Research Infrastructures Roadmap by ESFRI, the European
Strategy Forum on Research Infrastructures.” 1), focusing on
“Common Language Resources and Technology Infrastructure
Norway” 2. I assume that there were foreign researchers present
and/or that the report was made known to the European
organization. (It is written in English, not in Norwegian.) This means
that the audience consisted of the “establishment” of this branch of
applied computational linguistics in Norway as well as leading
representatives of the corresponding European environment.
Directly or indirectly. At any rate, it was impossible for me to rectify
the misinformation. I discovered this presentation by chance, by the
way.
1
2
http://www.clarin.eu/content/general-information [5 October 2014]
https://clarin.b.uib.no/2008/12/15/m%C3%B8te-solstrand-15-16-des-2008/
The actual citation turns out to be: " IBM to develop their own
lexicon" mentioned together with six other named and some unnamed projects, under a headline of what projects have used two
UiO (University of Oslo)-developed dictionaries.
The main title of the page is the key to my interpretation of the
sentence mentioned: “research org. as developers, commercial org.
as users”. There can be no doubt about the intention, especially given
the earlier and later statements of one of the authors/speaker.
The citation does not say in what way the dictionaries
have been used
Indeed, it does. By stating that “research org.” are “developers”, and
by mentioning IBM on a pair with the TROLL project, which,
according to a paper in the reviewer’s list of relevant morphology
R&D for Norwegian, was 100% based on one of the dictionaries
mentioned, Bokmålsordboka.
(which indeed would probably be very different for each of the
projects on that list), and it does not quantify this. As I see it, the
citation would be false if it turned out that the IBM project had
not used these dictionaries at all. However, A admits that he has
used them.
“Admits”? A strange allegation from a person who supposedly, as a
reviewer on behalf of Language Resources and Evaluation, is an
expert on lexicography. Nothing is “admitted”. We are talking about
a declaration of lexicographic sources, which is something totally
different. As for the way the printed dictionary was used, see
comments below.
Below, I cite his paper:
p. 9: "Native language user linguists supplemented this material
to the best of their linguistic competence, adding new words
while consulting printed dictionaries when necessary - by
looking up single words, one by one. "
The reviewer appears to have rather vague ideas about what is a
legitimate use of a printed dictionary in contrast to plagiarism.
p. 16: " The following written sources were consulted:
[...] Landrø, Marit Ingebjørg, Boye Wangensteen et al.: 1986,
Bokmålsordboka. Bergen: Universitetsforlaget [...]
Here "consulted" means that the printed dictionaries were used
the way their authors and publishers had intended: Words were
looked up in order to verify the linguist's competence when
necessary."
The reviewer also has vague ideas about lexicographical common
practice: declaring other dictionaries that have been lawfully
consulted during the compilation. The reason why I made this
declaration – common use among all competent and decent
lexicographers – is exactly to point out that no copying, i.e.
plagiarism, took place. The keywords here are “consulted” and
“when necessary”. Let me add that Bokmålsordboka was not at all
extensively consulted, quite the contrary.
p.19: "Landrø, Marit Ingebjørg, Boye Wangensteen et al.: 1986,
Bokmålsordboka. Bergen: Universitetsforlaget
These titles were later made available for IBM internal use only
as electronic dictionaries. This meant that IBM employees at the
Norwegian headquarter could look up entry words on their
terminals and PCs. "
As stated, these titles were later made available for IBM internal use
only. Later. This is spelled out clearly. At that time, the morphology
projects were already accomplished. There is no log information
available as to the actual use of these electronic versions. However,
one has every reason to believe that they were used by translators
working for IBM at IBM premises (as vendors) and possibly by staff
members of the sales department with a special interest in the
Norwegian language. For the linguistics development group, the
electronic version was more or less irrelevant.
To repeat my point from the reviewed paper: There is a crucial
difference between normal use of a printed dictionary (consultation),
which is what a dictionary is intended for, and copying of machinereadable material (plagiarism/theft).
On p. 23, A finally tells us what he thinks is the only possible
way of reading the citation:
" In the light of the heading, emphasising the research
organisations' role as developers and the commercial
organisations' role as users, "Used by: IBM to develop their own
lexicon" can only be construed to mean that 'IBM made their
own lexicon on the basis of Bokmålsordboka and
Nynorskordboka' - the old groundless assertion in new disguise.
Together, these quotations contain one contention and one
insinuation: On the one hand, IBM did not develop its own
lexicon and morphology from scratch, but contented itself with
reformatting a digital version of a published dictionary instead.
On the other, IBM made illicit use of somebody else's
intellectual property, since IBM did not have the right to
commercial use of the published dictionary. Both allegations are
false."
I am unable to see that A's interpretation is the only one, or even
the most likely one. The citation is so short and rudimentary that
almost any interpretation is possible. It would have been
interesting to see what a written version of the power point
presentation would look like,
Why?
but none exists. Alas, it is impossible to push hard on one
interpretation, based on six words that don't even contain a finite
verb.
Nonsense. This is a clear distortion, cf. above. Again, these words
have to be interpreted in the light of the main title of the page. In fact,
even the interpretation of this single page is unambiguous. And my
interpretation is, unfortunately, confirmed by the prior allegations of
one of the authors.
3. Unclear and undocumented claims
p.1: "the major part of the language information accessible has
been converted from printed sources - or simply created by
individual
linguists, drawing on their competences as native language
users."
What does A mean by language information? Be
specific!
Nonsense. This is perfectly clear from the context. Any further
specification at this point would have been felt as an exaggerated
precision and bad style.
The claim is that the major part comes from two very
different sources. What are the possible alternatives? And where
does this information come from?
I fail to grasp the meaning of this. Is this relevant?
p.2: "In fact, existing digital language resources are the result of
a process where all the approaches are involved, although to a
varying degree.
Generally, this process has been poorly documented. "
-
This is not true.
Be specific!
I still fail to find any documentation of such a basic broad-scale
process. Cf. below.
At the end of this review I present a list of 11 talks and papers
on the creation of lexical resources from other resources, and
these are only for Norwegian.
Just a few of these talks and papers are about the very creation of
lexical and morphological resources, and hardly any of them about
how they are created from other resources the way I have
documented it in connection with the IBM project. In fact, these
talks and papers are of a rather different nature, so this is no
argument. (See below.) In fact, they support my initial claim.
I could have found more. A in addition cites 10 papers by Jan
Engh and 3 by Ruth Fjeld, further showing that the situation is
not bleak.
Again, it seems that the reviewer has not read the paper properly. It
would not have been necessary to write my paper if the
documentation I have provided myself had been read and
understood, let alone respected. One who has not understood these
papers, if she has cared to read them at all, is Ruth Vatvedt Fjeld.
And that is exactly the point of my paper. I want to rectify her
allegations – once and for all - and after having tried repeatedly and
by means of several papers to do so … Further, I raise the question
of how to document such a project, so that similar allegations cannot
be made again.
It follows that the lament on the situation, filling most
of this page, including a reference to one publication A mentions
as an exception,
?
is unjustified.
As I have already explained above, this is completely beyond the
point.
4. Whining tone with undocumented claims
The whole paper has a whining and bitter tone that is not
appropriate in a scientific paper.
The tone is one of polemics and sometimes sarcastic. With the
preconceived ideas of the reviewer, though, ‘whining’ is a natural
interpretation, however, not the correct one. More about the tone of
the review and the reviewer below.
The state of affairs A alludes to is not at all documented, and
from what I know of the field, neither can it be, since A is
wrong in his claims.
Highly debatable … As far as I can see, the last contention is not
documented.
For example, it cannot be true that computational morphology
has not been regarded as interesting.
This is no exact rendering nor any abstract of what I contend in my
paper. On the contrary, I state that “descriptive morphology itself
and, especially lexicography, which represents the context of the
morphology development, are not particularly trendy parts of
linguistics” and “The low academic status of lexical and
morphological resources creation”. The latter is based on personal
experience, which should not be unfamiliar to the reviewer. (See
below.) The former is based on observations of all accessible
publications within linguistics during the last 20 years. Nothing less.
In my capacity as academic librarian. (Cf. even Hovdhaugen et al.
2000, 516f. and Norsk lingvistikk. En evaluering av forskningen ved
fem universitetsinstitutter [Norwegian linguistics. An evaluation of the
research at the departments of linguistics at five Norwegian
universities], p. 68. Available
at http://evalueringsportalen.no/evaluering/norsk-lingvistikk-enevaluering-av-forskningen-ved-fem-universitetsinstitutter) I would
very much appreciate to see documentation of the opposite. And,
again, it is not the same as contending that “computational
morphology has not been regarded as interesting”. “Computational
morphology” is not mentioned in my paper. On the contrary,
keywords are “descriptive morphology” and “lexical and
morphological resources creation”, which is something utterly
different.
The topic of my paper is how the actual descriptive work – the task
of connecting the basic linguistic terrain to the computational
morphological models - is carried out. In practice, step by step at a
micro level. It should not be necessary to state that this is something
totally different from the ‘computational morphology’ the reviewer
accuses me of having ignored.
In 1991 a book called simply Computational Morphology was
published at the prestigious MIT Press. Further, the important
organisation ACL has a special interest group: ACL Special
Interest Group on Computational Morphology and Phonology.
On Norwegian, a book appeared on morphological analysis and
synthesis in 1990 (Johannessen, Janne Bondi: Automatisk
morfologisk analyse og syntese. Novus Forlag, Oslo 1990). This
book investigates and makes many of the observations that A
mentions on p. 13-14 (e.g. removing a final -e before other
suffixes).
As already mentioned, sifting through the totality of linguistic
publications during the last decades, there is no doubt in my mind
that a far greater number of titles have been devoted to syntax and
semantics than to (theoretical) morphology.
Apart from that, it is interesting that the two titles mentioned date
from the time when the IBM morphology project was already history.
Also, the fact of stating that an –e ought to be removed in a typical
case is not identical to carrying out and implementing a complete
identification process for the total vocabulary, for all possible
inflected forms. Which is exactly one of the important points of my
paper.
A monograph about how one particular theoretical model could be
adapted for Norwegian illustrated by a limited selection of examples
is simply different from a full implementation (of a different model)
of “all” words with their respective inflected forms. In sum: Broad
coverage. All lemmas, all forms – even those nobody have thought
of/commented on/normalised, thereby even charting the consistency
and the adequateness of the linguistic standardisation. Especially this
last task turned out to be a rather time-consuming pioneer work.
There simply does not existe any documentation that this kind of R
& D has been carried out consistently and on a broad scale for
Norwegian by anyone prior to the IBM project.
In fact, Johannessen 1990 is an example of the type of exposition that
I find insufficient from a descriptive linguistics point of view.
Referring to this title simply represents an illustration of one of my
points .
Clearly, the reviewer does not understand what is discussed and
described in my paper…
Here are some examples:
p.2: most of the page.
Be specific.
p.4: " Now, normative linguistics is generally not well seen by
theoretical (descriptive) linguists. "
- Says who? Claim should be documented. Is it
relevant?
This is obvious to the extent that there is no need for specific
documentation. Demanding concrete documentation is the proof of
superficial knowledge of theoretical linguistics – and the reason why
the need was felt to constitute theoretical linguistics as a separate
discipline in the first place. The second question – “Is it relevant?” begs another one: Why didn’t the reviewer bother to try to
understand my paper and the reason why it was written? What is
sound review practice?
p.4: " descriptive morphology itself and, especially lexicography,
which represents the context of the morphology development,
are not particularly trendy parts of linguistics. "
- Says who? Claim should be documented. Is it relevant? (See
also the introduction this section.)
I say so. Based on a survey of all linguistic publications received at
one university lirary during 20 years (see above). However also based
on a different kind of personal experience and facts from local
academia: 1) In the evaluation of candidates for the position as the
director of the University of Oslo’s empirical/descriptive
computational linguistics programme, Tekstlaboratoriet, in 1996,
computational and plain lexicography was not considered interesting
as a linguistic field, and competence in the area was not considered of
any value. 2) At the outset, the new Norwegian national system for
registration and subsequent use of the results as a basis for research
funding excluded lexicographic work of any kind. 3) Ironically, the
Linguistics department of the University of Oslo (ILN) has recently
decided to scrap what is left of the former National institute of
lexicography and its collections, old-fashioned analogical or highly
sophisticated digital ones, since it is considered without sufficient
scientific/linguistic merit, according to the head of the department.
Cf. http://www.uniforum.uio.no/nyheter/2014/06/tar-ikkje-ansvaretfor-spraksamlingane.html
I am sorry, the proofs are all around us, especially in the local,
Norwegian academic context.
p.4: " Enrichment, on the other hand, seems to be slightly more
appealing, perhaps because of its closer relationship to
semantics and syntax, which have been the more fashionable
parts of linguistics since the 1950s. "
- Says who? Claim should be documented. Is it
relevant?
The TROLL project mentioned repeatedly in the reviewer’s
bibliography, is part of the proof - and duly mentioned. Cf.
comments above.
p.6: "Unfortunately, there are signs today, more than twenty
years after the discontinuation of the Norwegian IBM project,
that such an approach [i.e. documention in reports, rev.
comment] is not sufficient. The project is hardly ever mentioned
in the relevant literature [...]."
- Is this documented?
In addition to my comments above: I would like to know how one
can reasonably document something which is not there. The
“counterexamples” (?) mentioned by the reviewer in the
bibliography, are void. Cf. below.
p.6: "The low academic status of lexical and morphological
resources creation is in strong opposition to its importance and
to the quality required. "
- Says who? Claim should be documented. Is it
relevant? (See also the introduction this section.)
As for the low academic status, see comments above. As for its
importance: Does the reviewer really object to the fact that
lexicographical and morphological high quality resources are
important for other branches of linguistics, e.g. computational syntax?
If so, this is a clear indication of incompetence in the area.
p.14: " Moreover, the creation of the complete morphology on
the basis of discontinuous and often inconsistent morphological
information from printed sources, among other things, was far
from trivial. "
- Who says it's trivial? Claim should be documented. Is
it relevant? (See also the introduction this section.)
In the first place, because this used to be a general comment to
IBM’s Nordic language projects at that time. Secondly, this is a
natural implication of the fact that this type of linguistic activity is
hardly mentioned in linguistics literature, as shown for instance by
the papers listed by the reviewer.
Irrelevant? When this activity is constantly ignored – even explicitely
considered to be of no scientific/linguistic value. Cf. above.
p.19: " Although probably unheard of in Norway at that time,
this type of student internship was common practice in IBM
internationally "
- The badly hidden criticism of the Norwegian society is totally
irrelevant in this context.
Nonsense. The result of this particular student internship (converting
Bokmålsordboka text files into a database) probably represents the
only clue for the one who presented the fraud allegations in the first
place. Exactly because she apparently did not understand its status
in relationship to the corporation’s R&D activities.
How this can be interpreted as a “hidden criticism of the Norwegian
society” is beyond my comprehension.
p.21: " For unknown reasons, Academia never showed any
interest in the "enrichment" part of IBM Norway's
lexicographical products: information about a variety of
semantic and syntactic properties of words, to which one could
also add information about word compounding and hyphenation.
"
- If it's true, maybe the relevant researchers don't know
about it?
A strange contention, given that the reviewer has spent a great effort
in maintaining that, in fact, the IBM project was well known. Cf.
above.
Public presentations of the IBM project were given at the Nordic
conference of lexicography in 1991 (cf. Engh 1992a) as well as at the
biannual national conference for Norwegian linguistics, MONS, in
1991 (cf. Engh 1992b) etc. This is one of the points where university
linguists’ visits to the project come in. Again, the point of my paper is
that although the IBM project was made known to the Norwegian
linguistics community through regular channels, it was “ignored”.
p. 21: "Thus IBM Norway's lexica and morphologies constitute
an important part of the base of today's electronic infrastructure
for the Norwegian language,64 unfortunately not generally
acknowledged as such. "
- This is untrue, given that the information is on the relevant
web sites.
Unfortunately: No, it isn’t … Nowhere on the pages of Norsk
språkbank there was any mentioning of IBM Norway’s lexica and
morphology at the time my paper was written. In fact, leading
member of the board of Norsk språkbank, Marit Hovdnak, sent me,
unsolicited, an e-mail dated 16. September this year (i.e. more than
one month after I received this review) informing me that a reference
to the IBM material had been added to the web pages of Norsk
språkbank on her initiative after personal communication.
5. The author ignores other people's work
In his eagerness to show that he has been wronged, A has
ignored both web sites and papers that are relevant to his paper.
I refer to the references at the bottom of this review, with 11
papers on Norwegian computation lexicography and the two on
computational morphology more generally. In a paper whose
main purpose is to complain about other people's ignorance and
even accusations on fraud, this is an inexusable oversight.
Did the reviewer read the papers listed? I have my doubts. To the
extent that I have succeded in finding/reading them, my clear
impression is that they are about something else. Moreover, to the
extent that subjects related to those of my paper are discussed, the
problems are identified as general problems. No extensive, let alone
complete and detailed analyses are given nor discussed.
These papers generally focus on different topics, and/or they have
different angles of attack.
A says: " However, while there is a flourishing literature on the
more formal aspects and the technical innovation part of natural
language processing, documentation on how the basic language
resources were and partly still are established is scarce, and
existing documentation may be ignored. " (Abstract, p.1)
The list of papers referred to in my list at the bottom of
the review are exactly examples of what the author claims does
not exist.
This is not correct. Cf. remarks above.
6. Other things
- Naming and enumerating all the staff that have worked on
parts of A's dictionary, as A does in the appendix, is not
something that belongs in a scientific journal. Either they should
be co-authors or thanked in a footnote. When there are more
than five or ten people, the group can safely be thanked as a
group, not individually.
Indeed a strange contention, especially since documentation has been
demanded for quite a few well-known truths above. I am perfectly
aware of normal usage as far as scientific publishing is concerned.
However, and as a countermeasure against the fraud allegations, all
the persons involved in the project are mentioned – so that they can
be asked about the work they actually carried out.
- p.21: "Norsk ordbanken". It's called Norsk ordbank.
p.21: " Norsk ordbanken a service from the University of Oslo,
incorporated in the newly established Norsk språkbank under
the auspices of Språkrådet. "
- A is misinformed about the structure of these institutions. The
University of Oslo and Språkrådet together are responsible for
the development and maintenance of Norsk ordbank. It is
available on the web site of the UiO as well as on that of
Språkbanken, which is institutionally part of the National
Library.
The information was based on personal communication from a
person involved in the hosting of Norsk ordbank and the current
web pages of Norsk språkbank. I may have misunderstood parts of
the information provided. However, that should not be of great
importance for the purpose of my paper. In a final, printed version,
any misunderstanding would have easily been corrected.
LREV List of criteria
o Significance of results
The paper reports on very old results that have been published
before.
“Very old results”? That is beyond the point. My paper discusses
how to document a certain type of basic computational linguistics
activity. In general. The “very old results” serve as a case under
discussion – exactly because it belongs to the past and that there has
continuously and, indeed, recently been made allegations concerning
it.
o Technical quality
OK, but nothing new.
If it is OK, it is OK. “but nothing new” is irrelevant in this
connection.
o Appropriateness and soundness of the methodology
OK
O Evaluation of results
None
Debatable...
o Knowledge of field
Poor, as regards the field following the decades after A did his
work.
This is not correct, cf. comments above.
o Rigor of arguments
Poor (as regarding claims about other work and status of the
field)
This is not correct, cf. comments above. The paper may be seen as
controversial. That is something quite different.
o Originality
Methods described are not original.
Really? I have never seen any published article or book chapter
about how to document such projects and how to prevent fraud
allegation when sifting through most linguistics literature published
in Europe or the US during the last 20 years, monographs and
articles of journals or anthologies. Also, the reviewer fails to provide
proofs to the contrary.
o Clarity of presentation
Poor. Lots of unargued claims.
Nonsense. Cf above comments.
o Acknowledgement of limitations
None.
What is this supposed to mean?
o Organization
o References to other work:
Poor
Nonsense. Cf above comments.
o Relevance to the Language Resources and Evaluation
audience
Not relevant. Method would have been relevant some decades
earlier.
Nonsense. This is based on a fundamental misunderstanding as to
what this entire paper is all about. Cf above comments.
Conclusion
The author believes that his work at IBM has been forgotten,
and even that claims about the IBM material may look like fraud
allegations. Having studied the arguments carefully,
Hardly. Cf above comments.
I don't think that A is right in that he has been forgotten (as
discussed above), and I don't think the claims he refers to are
allegations of fraud (as discussed above). Further, since A also
volunteers the information that he put things straight in two
conference papers: Engh 2009 (printed in a volume in History of
Nordic computing, at Springer, and Engh 2011 (printed in
another volume of History of Nordic Computing, also at
Springer), the present paper seems unnecessary.
Studying the arguments carefully, the reviewer would have
understood that this is exactly the point of the paper: 1) The form of
documentation, usual at the time of the project, turned out to be
insufficient. 2) Later documentation efforts to put things straight
after the first series of fraud allegations were not sufficient as well. 3)
The current paper represents one last correction attempt – and in
doing so, I discuss documentation of this type of projects in general.
(As already mentioned, Engh 2011 has a different topic, and is totally
irrelevant for the matter at issue.)
The description of the work that was put into the IBM
morphology is something that any computational morphologist
will recognize. I have compiled a list of work on Norwegian
computational morphology that comes in addition to A's own
work referred to in the paper.
As already mentioned, these papers may be on aspects of Norwegian
morphology or have some relationship to it (e.g. how to harmonise
the stems and the inflections of a phrase according to
standardization level, “conservative”, “radical” etc. in a linguistic
software function). However, their topic is different. They are simply
irrelevant in the present context.
There is nothing new that the present paper brings along that
makes it worth publishing. For the future , though, I advise A to
stop accusing others of their lack of interest (cf. the section on
whining above),
Again: The paper does not contain any accusation of others for lack
of interest. It tries to repudiate the repeated allegations of fraud,
based on a fundamental lack of knowledge as to how the original
projects were carried out. In this connection, general aspects of
project documentation is discussed.
and instead get on with it himself.
What is this supposed to mean? Norway is a small country, and the
anonymous reviewer knows that I am earning a living as a librarian.
And as the reviewer also knows perfectly well, I was prevented from
continuing my activities as a linguist exactly because of
preconceptions of the type I am referring to in my paper. It is
practically impossible to initiate and carry out any great descriptive
projects within computational linguistics alone without any support
apparatus and in one’s leisure hours. And, I would like to add, in an
institutional setting with the type of moral standards exhibited by the
reviewer.
Some talks and papers on the creation lexical digital resources
for Norwegian
(The list below has been compiled only to counter A's claim that
there is nothing on the linguistic questions regarding the
development of lexical resource. These are in addition to the ten
papers A lists by Jan Engh, and the three by Ruth Fjeld.)
De Smedt, Koenraad; Rosén, Victoria. 2000.
Automatic proofreading for Norwegian: The challenges of
lexical and grammatical variation.. I: NODALIDA '99.
Trondheim: NTNU 2000 s. 206-215
An interesting paper about something else: The handling of the
Norwegian variability in phrases in order to ensure uniformity as to
the level of the linguistic standard – “radical” vs. “conservative”
Bokmål forms etc. in phrases and compounds.
Irrelevant in this context.
Hagen, Kristin, Johannessen, Janne Bondi and Kristoffersen,
Kristian Emil. 1997. Problemer ved bruk av andres lister til
taggerformål. Foredrag på Møter om norsk språk 7,
Universitetet i Trondheim, 20.-22. november.
Not published in the conference report: Jan Terje Faarlund, Brit
Mæhlum, Torbjørn Nordgård (eds.): 1998, MONS 7. Utvalde
artiklar frå det 7. Møtet om norsk språk i Trondheim 1997. Oslo:
Novus. Cf.
http://www.nb.no/nbsok/nb/fefffc98e091b61d16b89c184e62d257.nbdi
gital?lang=no#3 [accessible in Norway only. Unpublished paper
according to http://www.tekstlab.uio.no/norsk/bokmaal/english.html
[accessed 18 September 2014]
Hellan, Lars; Nordgård, Torbjørn. 1997.
The NorKompLex and TROLL lexical systems. Workshop on
the Encoding of Verb Constructions
Unpublished?
Johannessen, Janne Bondi. 1998.Elektroniske hjelpemidler leksikografisk fornying. Norskrift 1998 ;Volum 97. s. 43-68
Focusing on various ways of using information technology while
editing dictionaries. I.e. hardly relevant in the present context.
Losnegaard, Gyri Smørdal; Samdal, Gunn Inger Lyse; Thunes,
Martha; Rosén, Victoria; De Smedt, Koenraad; Dyvik, Helge J.
Jakhelln; Meurer, Paul. 2012.What we have learned from Sofie:
Extending lexical and grammatical coverage in an LFG
parsebank. I: META-RESEARCH Workshop on Advanced
Treebanking at LREC2012. European Language Resources
Association 2012.
Fairly irrelevant. Stating the need for information of the kind that
apparently was of no interest to the academic research when IBM
produced them, cf. p. 17f. of my paper.
Nordgård, Torbjørn. 1997.
Argument structure in NorKomples. Workshop on the
Representation and Encoding of Verb Constructio ns and Verbs
Unpublished?
Nordgård, Torbjørn. 1998.
Norwegian Computational Lexicon (NorKompLeks).
Proceedings of the 11th Nordic Conference on Computational
Linguistics 1998 s. 34-44
Not in the stocks of Norwegian university or research libraries
according the national union catalogue, BIBSYS. One pointer found
on the Internet turned out to be inactive. [18 September 2014]
Nordgård, Torbjørn. 2000.
NORKOMPLEKS - A Norwegian Computational Lexicon.
COMLEX 2000. Workshop on Lexicography and Multimedia
Dictionaries;
A typo for COMPLEX 2000. Not in the national union catalogue, not
on the Internet. [18 September 2014]
Nordgård, Torbjørn. 1999.
From NorKompLeks to HPSG. HPSG-dager i Trondheim; 1999
Unpublished?
Rosén, Victoria. 2002.
Fra Bokmålsutboka via NorKompLeks til et LFG-leksikon for
norsk. MONS 9 Det niende møtet om norsk språk; 2002-11-22 2002-11-24
On how to cope with stylistic variation in phrases and compounds.
Irrelevant in this context.
Rosén, Victoria; De Smedt, Koenraad. 2000.
*Er korrekturlesningsevnen di god? Resultater fra SCARRIE.. I:
Artikler fra 8. møte om norsk språk (MONS 8). Tromsø:
Universitetet i Tromsø 2000 s. 214-228
Another interesting paper about something else: The handling of the
Norwegian variability in phrases in order to ensure uniformity as to
the level of the linguistic standard – “radical” vs. “conservative”
Bokmål forms etc. in phrases and compounds.
Irrelevant in this context.
In fact, a necessary step after the type of work described in my paper.
Similar efforts were also made at IBM – without leading to any
product. However, this is irrelevant to the matter discussed in my
paper, and, consequently, was not mentioned there.
Also mentioned in this review:
Black, Alan , Stephen Guy Pulman, Graeme Donald Ritchie and
Graham Russell. 1991. Computational Morphology. MIT Press.
ACL-MIT Series in Natural Language Processing.
I could have helped the reviewer finding more titles about
computational and plain morphology, but the existence of such titles
is simply beyond the point. It does not alter the fact that syntax and
semantics have been subject to more interest than morphology since
the late 1950ies.
Johannessen, Janne Bondi. 1990. Automatisk morfologisk
analyse og syntese. Novus Forlag, Oslo.
Cf. comments above.
*
In sum:
Partly unpublished and inaccessible conference papers. Partly titles
published in places known only to the happy few.
Every one of the titles actually published, was published many years
after the finalisation of the IBM project. Still, no paper discusses in
detail the broad scale, meticulous work needed to interpret, the
highly defective official norms for the two variants of written
Norwegian, not registration, nor completion etc.
As proofs of my ignoring “other people's work”, this list of titles is
simply void.
One natural question is whether the reviewer ever read these papers.
The report of the second reviewer, who for unknown reasons is
referred to as
Reviewer #3: The paper gives a detailed history of the '80 IBM
project for producing lexica of Norwegian, and disputes the later
claims that the project simply reformatted existing resources to
produce the lexica. This history and the dispute are interesting
mainly in the context of (the history of) Norwegian natural
language processing, but much less to an international audience.
The paper says very little about "How to document the creation
of digital language resources" esp. how his should be done in
2013 - this topic would definitely be interesting for readers,
whereas the history of a decades past project is much less so.
The history of the project has also been published before, in
several publications, also in English. The paper is very long - for
project notes, the limit is 10 pages.
⃰
Concluding remarks
There is, in fact, little to add. Basically, the “argumentation” of the reviewer tells its own tale.
Cf. my annotations.
Summing up:
Contrary to what the reviewer contends, the main purpose of my paper is neither selfpromotion nor promotion of a project of historical interest only. After the documentation
listed and one additional final overview, commissioned by the editor of Maal og minne in
2013 (Engh 2014), there would have been no point in doing so anyway. My paper was written
as a reaction to allegations of fraud seemingly impossible to refute. And since I had noticed
that descriptions of the very craft of creating digital language resources for a language with a
fairly complex – and different – morphology, on top of everything a deficiently standardised
one, were scarce, I thought I might as well address the general problem of project
documentation in the light of my own experience. This is clearly spelled out in my paper for
anyone who knows to read – and/or has no personal involvement in the case. So, the reviewer
is completely missing the point - intentionally or not.
On the whole, the reviewer’s analysis is poor, to say the least, and so is the rigor of argument.
Clearly, the conclusion came first; the “arguments” were added to support it. Quite a few
dubious assertions are made, especially as far as the context of my paper is concerned. (My
chances to refute the misinformation etc.) The reviewer’s reading is generally tendentious and
includes strange interpretations of isolated sentences, numerous minor misreadings, and
erroneous references, bordering on bluff documentation. Further, the review is spiced up by a
few odd remarks on irrelevant matters (cf. comment to page 19).
A common characteristic throughout the review is professional incompetence at various levels.
Some keywords:
• morphology literature compared to linguistics literature in its entirety
• language resources creation
• lexicography
• prescriptive vs. descriptive linguistics and the motive for establishing descriptive
linguistics as a discipline
• linguistics community culture/folklore: linguists’ attitudes and professional prejudices
• a surprising lack of insight as far as contemporary Norwegian language processing
literature is concerned
From an isolated editing point of view, the review leaves a curious impression as well, as it is
• characterised by contradictions and bewilderment. E.g. the reviewer alternates between
contending that the IBM project was well known - and unknown
• contains defective documentation
• as well as a constant insistence on documentation - for matters that need no documentation
• replete with dirty rhetorical tricks
• focusing on one sentence, misinterpreting it in isolation in order to arrive at
• faulty conclusions
• displaying polemics against an imagined opponent, attribution of incorrect intentions etc.
• use of certain loaded words and phrases: “allegations that are later revealed at p. 23”,
“A finally presents ...”, “A’s feeling that he has been accused of fraud”
• other tendentious characterisations, e.g. the repeated reference to a “whining tone” 3
The overall tone of the reviewer is aggressive, even malicious, and the author of the paper is
characterised in derogatory terms and “told” what to do in the future…
All in all, the reviewer is lacking in judgemental power as well as in competence in the
relevant fields of linguistics. The tone makes one even wonder whether she is disqualified as a
party in the case - or on someone’s behalf? 4
In fact, this begs the question: How did Language resources and evaluation select this
reviewer?
However, the review is not only interesting with regard to this particular journal. It is even
revealing as far as today’s academic publishing in general, and in particular its peer review
institution, is concerned. There simply isn’t any “peer” involved, neither with regards to
social position nor, unfortunately, scientific insight. It is rather the result of revamped
professorial rule. However, not the open, fairly transparent professorial rule of former times,
but a pernicious, incompetent version protected by anonymity.
The review brings to the surface the reviewer’s intense need to defend what one may call the
national linguistic establishment and local academic circles - and its members’ right to adapt
the reality to their advantage in international fora of their peers. Thus the review proves
beyond any doubt why it was necessary for me to write my paper in the first place.
Unfortunately, it even demonstrates why this type of paper will never be properly published –
which implies that practices similar to the one criticised in my paper will most probably go on
without being distracted by “inappropriate” critics. When someone with an academic title
presents incorrect information, even allegations of fraud, in an international forum
(EURALEX, CLARIN) without the knowledge of the person(s) involved, the latter will be
unable to protest in a similar forum (for instance in Language resources and evaluation) once
aware of the allegations. This is the reason why I, in the end, have to publish my paper in this
way. In a “private”, remote, and unnoticed channel. Under the sign of sidelined polemics.
In the long run, this situation will be detrimental to linguistics.
Additional bibliography
Engh, Jan: 2014, “IBMs leksikografiske prosjekt for norsk 1984–1991”. Maal og minne
106/1, 67-101
Hovdhaugen, Even et al.: 2000, The History of linguistics in the Nordic countries.
Helsingfors: Societas Scientiarum Fennica
3
This expression merits a comment. “Whining tone” is a rather infrequent expression, at least in UK English. (A
quick Google search for the phrase in the UK domain, excepting all electronic dictionary entries, produced 44
hits only.) Interestingly, the corresponding Norwegian verb SYTE is quite fashionable especially among women
managers in order to dismiss unwanted professional proposals or criticism. SYTE implies that the criticism is
immature and/or unjustified and as such morally debatable.
4
Oddly enough, at least one linguistic detail points in this direction: “The IBM company” is not a usual
denomination of the ‘IBM corporation’. Incidentally, this phrase also appears in Fjeld 2000.
Download