Presentation - University of Maryland Institute for Advanced

advertisement
The Web in Theoretical Linguistics Research:
Two Case Studies Using
the Linguist’s Search Engine
Philip Resnik, Aaron Elkiss,
Heather Taylor, and Ellen Lau
University of Maryland
Berkeley Linguistics Society
February 20, 2005
*Theje dberk eobbfid dbeonc kdoeb
Did that sound ok to you?
“a small, imperfect experiment…”
Nature of Grammar
Data-oriented
Probabilistic
Ordered constraints
Hard / Categorical
Conventional / Binary
{__,?,??,?*,*,**}
Contrasts
Magnitude estimation
Nature of Elicitation
Schütze (1996)
Cowart (1997)
Bard, Robertson, and Sorace (1996)
Crocker and Keller (2005)
Sorace and Keller (2005)
Corpora
Part-of-speech taggers
Treebanks
Statistical parsers
Semantic role labeling
…etc.
Nature of Grammar
Data-oriented
Probabilistic
Ordered constraints
Hard / Categorical
Linguist
?Nature of Elicitation
Source of Language Sample
Naturally occurring
Manning (2003): “…it remains fair
export TGREP_CORPUS=wsj_mrg.crp
to%% say
tools
not yet
tgrepthat
-n __ these
| grep .
| gziphave
> wsj_mrg.txt.gz
% tgrep2 -C -p wsj_mrg.txt wsj_mrg.t2c.g
made
the transition to the Ordinary
NP !<< PP [> NP | >> VP]
Working
Linguist without
considerable computer skills.”
If you build it, they will come…
Roadmap
•
•
•
•
•
Motivations
The Linguist’s Search Engine
Case Study 1: Psycholinguistics
Case Study 2: Syntax
Conclusions
A Brief Illustration of the LSE
• Pollard and Sag (1994); discussion in Manning (2003)
–
–
–
–
(a) We consider Kim to be an acceptable candidate
(b) We consider Kim an acceptable candidate
(c) We consider Kim quite acceptable
(d) We consider Kim among the most acceptable candidates
–
–
–
–
(e) *We consider Kim as an acceptable candidate
(f) *We consider Kim as quite acceptable
(g) *We consider Kim as among the most acceptable candidates
(h) *We consider Kim as being among the most acceptable candidates
Query By Example
Type an example
of the structure
you’re interested
in.
LSE generates an
automatic analysis
(You don’t have to
agree with the
analysis!)
Use the mouse to
edit the tree.
A few mouseclicks
later, you have a
description of the
structure you’re
looking for.
The LSE creates
the query for you.
You can choose to
match all
morphological
forms of a word.
Hit ‘search’ and the LSE
retrieves sentences
whose analysis matches
the structure you
specified.
One more click to
look at a sentence
in context…
… or to see the entire Web
page where it occurred.
Two Case Studies
• Focus in this talk:
– What was the study about?
– How was the LSE useful?
In both cases, my co-authors were naïve
users of the Linguist’s Search Engine. I
didn’t discover the LSE had been useful to
them until after the fact.
Case Study I: Psycholinguistics
• Nina Kazanina, Ellen Lau, Moti Lieberman,
Colin Phillips and Masaya Yoshida, “Active
Dependency Formation in the Processing of
Backwards Anaphora”. 17th Annual CUNY
Sentence Processing Conference, University
of Maryland, College Park. March 2004.
http://www.ling.umd.edu/ninaka/Papers/CUNY_2004_slides.pdf
Active Dependency Formation
The teacher asked what the team was laughing about __.
While he was watching TV, John heard the phone ring.
• Wh-word signals
upcoming dependency
formation
• Active processing of
dependency observed
 filled gap effect
• Dependency formation
constrained by grammar
 island constraints
• Early pronoun signals
upcoming dependency
formation
• Active processing of
dependency observed?
• Dependency formation
constrained by grammar?
Active Dependency Formation
Original data for testing prediction
Gender mismatch effect
While she was cooking dinner, John listened to the radio.
She was cooking dinner while John listened to the radio.
Principle C rules out
coreference in c-commanded
position, so no mismatch effect
should be observed
Results looked good, but there was a confound!
She was cooking dinner while John listened to the radio.
Needed a construction where the target position is expected;
otherwise processor might simply have stopped looking for target.
Active Dependency Formation
Possible solution: expletive constructions
It was clear to his mother that John should go.
It was clear to him that John should go.
No Principle C
Principle C
Question: does this construction really have the right properties?
• Is the second clause consistently expected?
• Is it consistently expletive rather than referential?
Options:
• Rely on experimenter intuition
• Do a pilot study
• Sift through a corpus
Query by example:
It was clear to him
Becomes
It AUX [clear to NP]
Active Dependency Formation
Result:
• Verified that virtually all results of the search did involve
expletive it with a following clause.
• Obtained reassurance in designing the follow-up study
• Later double-checked using an off-line completion study
The LSE made it easy to start with linguists’ intuitions and find
relevant evidence in naturally occurring text.
The LSE also makes it easy to look for additional relevant data
that may not have occurred to the experimenter.
Query by example:
Any adjective
PP with any
preposition
It AUX Adj PP that…
clear
important
vital
manifest
interesting
necessary
obvious
Case Study II: Syntax
• Heather Taylor, “Interclausal
(co)dependency: the case of the
comparative correlative”, Proc. Michigan
Linguistics Society, October 2004.
http://www.ling.umd.edu/events/syntax/abstracts/heather1.PDF
Comparative Correlatives*
The Xer …, the Yer …
– Highlighted in recent debates about the UG approach
– Central question: are these constructions amenable to
an analysis based on UG principles, or do they present
a challenge to the UG view?
Central claim here: the LSE is useful regardless
of which side of the debate you’re on.
*A.k.a. Conditional correlatives, correlative conditionals, “more-more” constructions
Comparative Correlatives
Culicover and Jackendoff (1999)
Taylor (2004)
IP/CP
Sui generis
CP
CP
CP
CP
CP
CP
[the more XP]i (that) IP
… ti …
[the more XP]j(that) IP
… tj …
Interclausal relationships
accounted for outside the
syntax
UG analysis relating
CCs to conditionals
Comparative Correlatives
• McCawley’s generalization (1988, 1998):
Deletion of copular main verbs in CCs is sensitive to
semantic properties of the subject (generic/specific)
–The better an advisor is , the more successful a student is
–The more obnoxious Fred *Ø
is , the less attention you should pay
• But analysis of LSE data exposes the role of:
– Phonological weight of the subject
– Parallelism (copula in both clauses, deletion in both clauses)
casting doubt on the generalization’s validity
Comparative Correlatives
*The more obnoxious Fred,
the less attention you should pay to him.
?The more obnoxious Fred’s younger brother,
the less attention you should pay to him.
?The longer the day’s activities are, the sleepier the campers.
?The longer the day’s activities, the sleepier the campers are.
√The longer the day’s activities, the sleepier the campers.
Informant judgments confirm the tendencies indicated
by naturally occurring data.
Comparative Correlatives
• Overt then?
– The hungrier Romeo gets, then the more pizza he eats.
– Cf. If Romeo gets hungrier, then he eats more pizza.
Comparative Correlatives
• Overt then
– The hungrier Romeo gets, then the more pizza he eats.
– Cf. If Romeo gets hungrier, then he eats more pizza.
• LSE searches suggest that overt then is not anomalous.
• Might this support a UG account that provides a
unified treatment of CCs and conditionals?
One more fact to add to the theoretical debate!
Conclusions
• The LSE is useful to traditional linguists
– Confirming/disconfirming intuitions
(theory  data)
– Exposing a wider range of data
(data  theory)
• The LSE complements new methodological
trends
– Magnitude estimation, etc.
• The LSE is available for anyone to use
– http://lse.umiacs.umd.edu
Traditional?!
Backup slides
Conclusions
• Chomsky (1979): “You can also collect butterflies and make
many observations. If you like butterflies, that’s fine; but such
work must not be confounded with research, which is concerned
to discover explanatory principles of some depth and fails if it
does not do so.”
• Einstein (1940): “Science is the attempt to make the chaotic
diversity of our sense-experience correspond to a logically
uniform system of thought [in which] experience must be
correlated with the theoretical structure… What we call physics
comprises that group of natural sciences which base their
concepts on measurements…”
A Web Search Tool for the
Ordinary Working Linguist
•
•
•
•
•
•
•
Must have linguist-friendly “look and feel”
Must minimize learning/ramp-up time
Must permit real-time interaction
Must permit large-scale searches
Must allow search on linguistic criteria
Must be reliable
Must evolve with real use
LSE Example:
Text in Parallel Translation
Example: seeing how English “completive particle” usages
(eat up versus simply eat, indicating a telic event) are rendered
in different languages.
LSE Example: Implicit Objects
• Resnik (1993, 1996):
– Information-theoretic model of selectional constraints
– Model makes predictions with respect to implicit objects
• Implicit objects
– John ate Ø (= John ate something edible)
– *John found Ø (can’t mean John found something findable).
• Question from audience:
– “Doesn’t your model then predict that the verb titrate
should permit implicit objects?”
– Options
• Find informants for whom titrate is in the working vocabulary
• Slog through corpora looking for titrate used “intransitively”
Custom collection of
sentences from the
Web
Active Dependency Formation
Gender mismatch effect (van Gompel and Liversedge, 2003)
• When she wasn’t busy, the girl visited the boy very often.
• When she wasn’t busy, the boy visited the girl very often.
Gender mismatch effect
reveals active processing
Can grammatical information constrain the process?
• Principle C: pronoun can’t
co-refer with antecedent that it
she
c-commands.
• Prediction: no gender mismatch *
the
effect with c-commanded
boy
positions
More on Comparative Correlatives
(see Taylor, 2004)
• The two clauses behave like a subordinate and matrix
clause, respectively
– Tag questions form on clause2 and not clause1
– Only clause2 can host subjunctive case
– In German, the word order is consistent with clause1 being
subordinate to matrix clause2
– In Dutch there is flexibility in the word order of clause2
characteristic of matrix clauses
• NPI licensed in clause1 but not in clause2
• Extraction is equally permissible from both
• Conditionals
–
–
–
–
–
–
–
Presence of then
Tag questions form on clause2 and not clause1
NPI licensed in clause1 but not in clause2
Extraction from both clauses
Variable binding facts “shadow” each other
Lack of Condition C binding between clauses
Codependence
• Each clause depends on the presence of the other
• The licit values of X in the “comparative strings” are
determined by each other
• Parallelism in copula deletion
the ADJer … the ADJer …
Download