User Input and Interactions on Microsoft Research ESL Assistant Claudia Leacock, Butler Hill Group Michael Gamon, Microsoft Research Chris Brockett, Microsoft Research ... and William B. Dolan, Jianfeng Gao, Dmitriy Belenko, Lucy Vanderwende (Microsoft Research) Alexandre Klementiev (University of Illinois at Urbana Champaign) Outline • Who is using it and how often • How users are interacting with the system • Does it help the users to improve their writing? Most frequent errors made by East Asian non-native speakers Noun Related: Articles (inclusion & choice), Noun Number, Noun of Noun • I think it’s *a/the best way to resolve issues like this. • Conversion always takes a lot of *efforts/effort. • Please send the *feedback of customer/customer feedback to me by mail. Preposition Related: inclusion & choice • It seems ok and I did not pay much attention *on/to it. • I should *to ask/ask a rhetorical question. Verb Related: Gerund/Infinitive Confusion, Auxiliary Verb Error, Verb Formation Errors (6), Cognate/ Verb confusion, Irregular Verbs • On Saturday, I with my classmate went *eating/to eat. • Hope you will *happy/be happy in Taiwan. • I *teached/taught him all the things I know. Adjective Related: Adjective Confusion (4), Adjective Order • She is very *interesting/interested in the problem. • So *Korea/Korean Government is intensely fostering trade. 4 User Interface Deployed 6/2008 5 7/20/08 7/27/08 8/ 3/08 8/10/08 8/17/08 8/24/08 8/31/08 9/7/08 9/14/08 9/21/08 9/28/08 10/ 5/08 10/12/08 10/19/08 10/26/08 11/ 2/08 11/ 9/08 11/16/08 11/23/08 11/30/08 12/ 7/08 12/14/08 12/21/08 12/28/08 1/ 4/09 1/11/09 1/18/09 1/25/09 2/ 1/09 2/ 8/09 2/15/09 2/22/09 3/ 1/09 3/ 8/09 3/15/09 3/22/09 3/29/09 4/ 5/09 4/12/09 4/19/09 4/26/09 5/ 3/09 5/10/09 Page Views per Week Page Views per Week 25 000 20 000 15 000 10 000 5 000 0 User Locations 1. China 59,276 17.9% 10. Japan 5,941 1.8% 2. United States 55,104 16.6% 11. Spain 5,924 1.8% 3. Taiwan 47,159 14.2% 12. United Kingdom 5,828 1.8% 4. Korea - South 18,730 5.6% 13. Russian Federation 5,454 1.6% 5. Hong Kong 14,259 4.3% 14. France 3,971 1.2% 6. Brazil 8,444 2.5% 15. Saudi Arabia 3,893 1.2% 7. Germany 8,219 2.5% 16. Mexico 3,878 1.2% 8. Canada 7,634 2.3% 17. Netherlands 3,330 1.0% 9. Italy 6,880 2.1% 18. Thailand 3,207 1.0% Repeat Users Return frequency 100 90 percentage of total visits 80 70 60 50 40 30 20 10 0 once only 2 times or more 3 times or more 4 times or more 5 times or more 8 Frequent Users (4/21/09) . Frequent Users 854 Sessions 8,339 Session-Unique Sentences 66,765 Grammatical Error Flags 22,542 adj 2% noun 61% verb 10% prep 27% 9 Collected Data (4/21/09) Writing Domains: By Number of Sentences Unrelated Other 5% 4% Technical 17% Email 49% Non-technical 25% 10 User Interaction 1: Responses to “Tell us what you think!” Some users wrote: Other users wrote: “This is awesome! It works really well.” “It didn’t work at all.” “I found the tool very useful.” “I hate it.” “Great tool in general – thank you!!!!!!!” “Terrible job.” “I love the feature where it looks for a phrase in web pages.” “The microsoft search results below confuses me.” Bug reports: “When I first opened it, it wouldn’t let me type in any characters at all.” “What wearies me is the message ‘Server is temporarily unavailable’.” Suggestions: “There should be some indication that the check is done.” “I would like a filter for business and personal use.” Users Examine 83% of Suggestions Look at suggestion but not trigger web search 31% Accept 42% Trigger web search but don't accept 28% Conclusion: A significant number of users are inspecting the suggested rewrites and making a deliberate choice to accept it or not accept it. Inspect >18.3K Flags to Accept 7.6K 12 Do users make the right choices? Evaluated ~900 complete user sessions: 6K flags 1. Calculate system performance for ALL suggestions. 2. Calculate performance for ONLY suggestions that were accepted. 3. Compare ratios of good and bad flags. 13 Evaluation Categories Evaluation SubEval Good Correct Flag Description The correction fixes a problem in the user input. The suggestion is a legitimate alternative of a well-formed original input. Ex: I like working/to work. The original input contained an error but the suggested Misdiagnosis rewrite neither improves nor further degrades the user input. Ex: If you have fail machine on hand. A non-ascii or text processing mark-up character is in the Non-ascii immediate context. (Only applies to user data) The suggestion resulted in an error or would otherwise False Flag lead to a degradation over the original user input. Both Good Neutral Bad 14 Are users accepting good suggestions? All significant in the Wilcoxin’s signed-ranks test. Noun-related All Suggestions All Suggestions bad 24% good 56% Adj-related All Suggestions All Suggestions bad 11% good 37% bad 23% neut 32% good 62% neut 39% Accepted neut 26% Verb-related bad 6% bad 16% neut 28% Prep-related Accepted good 45% good 63% neut 32% Accepted Accepted bad 3% bad 13% good 45% bad 9% neut 25% good 72% neut 28% good 63% neut 42% 15 By Domain: All significant in the Wilcoxin’s signed-ranks test. Email Non-technical Technical Suggestions Suggestions Suggestions bad 15% bad 12% good 53% neutral 32% bad 9% Accepted neutral 28% good 63% bad 34% neutral 32% good 56% neutral 28% Accepted Accepted bad 10% neutral 34% good 38% bad 19% good 56% neutral 29% good 52% 16 What do users do with neutral flags? Neutral Categories non-ascii 7% Misdiagnosis 78% Both OK 15% Neutral Flags not accepted but sentence edited to produce no flag • • I don't know that you knew or not , this early morning i got a from head office ... – suggestion: delete “from” I don't know that you knew or not , this early morning I heard from the head office ... Please play with the software and Friday I will be by to work with any questions you may regarding it. – suggestion: regardingregard Please play with the software and Friday I will be by to work with any questions you may have regarding it. From 1,349 sentences with neutral flags found 215 subsequently submitted “similar” strings with no error flag. Users not accept suggestion but did something ELSE to make the flag go away. 18 Users improve 40% of the time Not Accept Suggestion but Revise Sentence Revise and not improve 16% Typed in suggestion 44% Revise and improve 40% Identifying the location of an error can help the user. 19 Conclusions • Traffic: There is an interest in ESL proofing tools • Even current state-of-the-art error correction can be useful for ELLs: Users do not accept proposed corrections blindly – they are selective in their behavior Users make informed choices – they can distinguish correct suggestions from incorrect ones Sometimes just identifying the location of an error enables the users to repair the problem themselves 20 New Interface www.eslassistant.com