Slide

advertisement
User Input and Interactions on
Microsoft Research ESL Assistant
Claudia Leacock, Butler Hill Group
Michael Gamon, Microsoft Research
Chris Brockett, Microsoft Research
... and
William B. Dolan, Jianfeng Gao, Dmitriy
Belenko, Lucy Vanderwende (Microsoft
Research)
Alexandre Klementiev (University of Illinois at
Urbana Champaign)
Outline
• Who is using it and how often
• How users are interacting with the system
• Does it help the users to improve their writing?
Most frequent errors made by East
Asian non-native speakers
Noun Related: Articles (inclusion & choice), Noun Number, Noun of Noun
• I think it’s *a/the best way to resolve issues like this.
• Conversion always takes a lot of *efforts/effort.
• Please send the *feedback of customer/customer feedback to me by mail.
Preposition Related: inclusion & choice
• It seems ok and I did not pay much attention *on/to it.
• I should *to ask/ask a rhetorical question.
Verb Related: Gerund/Infinitive Confusion, Auxiliary Verb Error, Verb Formation
Errors (6), Cognate/ Verb confusion, Irregular Verbs
• On Saturday, I with my classmate went *eating/to eat.
• Hope you will *happy/be happy in Taiwan.
• I *teached/taught him all the things I know.
Adjective Related: Adjective Confusion (4), Adjective Order
• She is very *interesting/interested in the problem.
• So *Korea/Korean Government is intensely fostering trade.
4
User Interface Deployed 6/2008
5
7/20/08
7/27/08
8/ 3/08
8/10/08
8/17/08
8/24/08
8/31/08
9/7/08
9/14/08
9/21/08
9/28/08
10/ 5/08
10/12/08
10/19/08
10/26/08
11/ 2/08
11/ 9/08
11/16/08
11/23/08
11/30/08
12/ 7/08
12/14/08
12/21/08
12/28/08
1/ 4/09
1/11/09
1/18/09
1/25/09
2/ 1/09
2/ 8/09
2/15/09
2/22/09
3/ 1/09
3/ 8/09
3/15/09
3/22/09
3/29/09
4/ 5/09
4/12/09
4/19/09
4/26/09
5/ 3/09
5/10/09
Page Views per Week
Page Views per Week
25 000
20 000
15 000
10 000
5 000
0
User Locations
1.
China
59,276
17.9% 10. Japan
5,941
1.8%
2.
United States
55,104
16.6% 11. Spain
5,924
1.8%
3.
Taiwan
47,159
14.2% 12. United Kingdom
5,828
1.8%
4.
Korea - South
18,730
5.6% 13. Russian Federation
5,454
1.6%
5.
Hong Kong
14,259
4.3% 14. France
3,971
1.2%
6.
Brazil
8,444
2.5% 15. Saudi Arabia
3,893
1.2%
7.
Germany
8,219
2.5% 16. Mexico
3,878
1.2%
8.
Canada
7,634
2.3% 17. Netherlands
3,330
1.0%
9.
Italy
6,880
2.1% 18. Thailand
3,207
1.0%
Repeat Users
Return frequency
100
90
percentage of total visits
80
70
60
50
40
30
20
10
0
once only
2 times or more
3 times or more
4 times or more
5 times or more
8
Frequent Users (4/21/09)
.
Frequent Users
854
Sessions
8,339
Session-Unique Sentences
66,765
Grammatical Error Flags
22,542
adj
2%
noun
61%
verb
10%
prep
27%
9
Collected Data (4/21/09)
Writing Domains: By Number of Sentences
Unrelated
Other 5%
4%
Technical
17%
Email
49%
Non-technical
25%
10
User Interaction 1:
Responses to “Tell us what you think!”
Some users wrote:
Other users wrote:
“This is awesome! It works really well.”
“It didn’t work at all.”
“I found the tool very useful.”
“I hate it.”
“Great tool in general – thank you!!!!!!!”
“Terrible job.”
“I love the feature where it looks for a
phrase in web pages.”
“The microsoft search results below
confuses me.”
Bug reports: “When I first opened it, it wouldn’t let me type in any characters at all.”
“What wearies me is the message ‘Server is temporarily unavailable’.”
Suggestions: “There should be some indication that the check is done.”
“I would like a filter for business and personal use.”
Users Examine 83% of Suggestions
Look at
suggestion but
not trigger web
search
31%
Accept
42%
Trigger web
search but
don't accept
28%
Conclusion: A significant number of users are inspecting the suggested rewrites
and making a deliberate choice to accept it or not accept it.
Inspect >18.3K Flags to Accept 7.6K
12
Do users make the right choices?
Evaluated ~900 complete user sessions: 6K flags
1. Calculate system performance for ALL
suggestions.
2. Calculate performance for ONLY
suggestions that were accepted.
3. Compare ratios of good and bad flags.
13
Evaluation Categories
Evaluation SubEval
Good
Correct Flag
Description
The correction fixes a problem in the user input.
The suggestion is a legitimate alternative of a well-formed
original input. Ex: I like working/to work.
The original input contained an error but the suggested
Misdiagnosis rewrite neither improves nor further degrades the user
input. Ex: If you have fail machine on hand.
A non-ascii or text processing mark-up character is in the
Non-ascii
immediate context. (Only applies to user data)
The suggestion resulted in an error or would otherwise
False Flag
lead to a degradation over the original user input.
Both Good
Neutral
Bad
14
Are users accepting good suggestions?
All significant in the Wilcoxin’s signed-ranks test.
Noun-related
All Suggestions
All Suggestions
bad
24%
good
56%
Adj-related
All Suggestions
All Suggestions
bad
11%
good
37%
bad
23%
neut
32%
good
62%
neut
39%
Accepted
neut
26%
Verb-related
bad
6%
bad
16%
neut
28%
Prep-related
Accepted
good
45%
good
63%
neut
32%
Accepted
Accepted
bad
3%
bad
13%
good
45%
bad
9%
neut
25%
good
72%
neut
28%
good
63%
neut
42%
15
By Domain:
All significant in the Wilcoxin’s signed-ranks test.
Email
Non-technical
Technical
Suggestions
Suggestions
Suggestions
bad
15%
bad
12%
good
53%
neutral
32%
bad
9%
Accepted
neutral
28%
good
63%
bad
34%
neutral
32%
good
56%
neutral
28%
Accepted
Accepted
bad
10%
neutral
34%
good
38%
bad
19%
good
56%
neutral
29%
good
52%
16
What do users do with neutral flags?
Neutral Categories
non-ascii
7%
Misdiagnosis
78%
Both OK
15%
Neutral Flags not accepted but
sentence edited to produce no flag
•
•
I don't know that you knew or not , this early morning i got a from head office ...
– suggestion: delete “from”
I don't know that you knew or not , this early morning I heard from the head
office ...
Please play with the software and Friday I will be by to work with any questions
you may regarding it.
– suggestion: regardingregard
Please play with the software and Friday I will be by to work with any questions
you may have regarding it.
From 1,349 sentences with neutral flags found 215 subsequently
submitted “similar” strings with no error flag.
Users not accept suggestion but did something ELSE to make the flag go
away.
18
Users improve 40% of the time
Not Accept Suggestion but Revise Sentence
Revise and not
improve
16%
Typed in suggestion
44%
Revise and improve
40%
Identifying the location of an error can help the user.
19
Conclusions
• Traffic: There is an interest in ESL proofing tools
• Even current state-of-the-art error correction can be
useful for ELLs:
 Users do not accept proposed corrections blindly – they are
selective in their behavior
 Users make informed choices – they can distinguish correct
suggestions from incorrect ones
 Sometimes just identifying the location of an error enables
the users to repair the problem themselves
20
New Interface
www.eslassistant.com
Download