Usability of CAPTCHAs Or “usability issues in CAPTCHA design” Jeff Yan School of Computing Science Newcastle University, UK (Joint work with Ahmad Salah El Ahmad) Apology 2nd time to miss SOUPS … nth (n > 2) time to be unable to present my paper … All due to the same problem: A US visit visa! (started my application in April, I’ve not heard its result yet …) SOUPS’08 (CMU, July 2008) (2) Does this man look like a terrorist?! ;-) SOUPS’08 (CMU, July 2008) (3) CAPTCHA Why was it invented? Ask any CMU people, or read the cartoon Automated Turing tests that computers cannot pass, but human can Almost standard security technology (e.g. for antispam) widespread application on commercial websites SOUPS’08 (CMU, July 2008) (4) Main CAPTCHAs Text-based schemes typically require users to solve a text recognition task the most widely deployed Sound-based schemes typically require users to solve a speech recognition task. Image-based schemes typically require users to perform an image recognition task Example: Microsoft’s Assira SOUPS’08 (CMU, July 2008) (5) This paper is about understanding how to design usable and robust CAPTCHAs, with a focus on usability Isn’t that … CAPTCHAs with poor usability should not exist by definition? Yes, but … still many deployed CAPTCHAs, including famous ones, are not that usable … SOUPS’08 (CMU, July 2008) (7) How about robustness? When necessary, it will be covered However, our major attacks are discussed in somewhere else Low-cost attacks on schemes by Microsoft, Yahoo and Google (CCS’08, to appear) The pixel count attack (ACSAC’07) Breaking CAPTCHAs by counting the number of pixels! SOUPS’08 (CMU, July 2008) (8) A framework for CAPTCHA usability Distortion distortion techniques employed and their impact on usability. Content content embedded in CAPTCHA challenges and their impact on usability e.g. how the content should be organized? Presentation the way that CAPTCHA challenges are presented and impact on usability. SOUPS’08 (CMU, July 2008) (9) Distortion | confusing characters Well-known that under common distortions, characters such as 1 and l, o and 0, 5 and s, would cause confusion To be secure (or resistant to segmentation attacks), Google and Yahoo CAPTCHAs introduced new confusing characters vv or w? cl or d? rn or m? SOUPS’08 (CMU, July 2008) rm or nn? cm or an? nn or m? … (10) Distortion | confusing characters ~6% challenges in Google CAPTCHA, and ~10% in the latest Yahoo scheme (rolled out since Mar 2008) were observed to have such confusing characters. SOUPS’08 (CMU, July 2008) (11) Content | string length A design issue: string length predictable or not? Case study: Microsoft CAPTCHA used a fixed length of 8 characters, which helped its usability The first object is “7”? The first object is “L”? With the length info, users can be pretty sure that the first objects in the above examples are noise. SOUPS’08 (CMU, July 2008) (12) Content | string length However, the length info also helped our automated segmentation attack (success rate: >92%) Our program knows when to stop! Start point SOUPS’08 (CMU, July 2008) Stop: identified 8 chars already (13) Presentation | the use of colour Using colour is common practice in CAPTCHA design (for all sorts of reasons) However, we have seen many cases in which the use of colour is unhelpful for usability has caused negative impact on security, or is problematic in terms of both usability and security SOUPS’08 (CMU, July 2008) (14) Presentation | the use of colour Case 1: Gimpy-r (a well-known early scheme) How human see it SOUPS’08 (CMU, July 2008) How machines see it (15) Presentation | the use of colour Case 1: Gimpy-r Dominant colour of distorted text (often black) is distinguishable: always the lowest intensity, and never appeared in the background easy to extract the text colour background: No much use in terms of security negative effect in usability (e.g. confusing people) SOUPS’08 (CMU, July 2008) (16) Presentation | the use of colour Case 2: BotBlock How human see it SOUPS’08 (CMU, July 2008) How machines see it (17) Presentation | the use of colour Case 2: BotBlock sophisticated colour management providing resistance to OCR However, the misuse of colour: texts have distinguishable colour patterns the same colour for foreground occurs repetitively. easy to extract text automatically Negative effect on usability and false sense of security. SOUPS’08 (CMU, July 2008) (18) Presentation | the use of colour It seems that the “Las Vegas effect” also applies to CAPTCHA design No colour might be better than too much colour Major CAPTCHAs started to avoid using fancy colour management, including Microsoft Yahoo Google reCAPTCHA SOUPS’08 (CMU, July 2008) (19) The framework: applied to text CAPTCHAs Category Usability issue Distortion method and level Distortion Confusing characters Friendly to foreigners? Character set Content String length How long? Predictable or not? Random string or dictionary word? Offensive word Font type and size Presentation Image size Use of color Integration with web pages SOUPS’08 (CMU, July 2008) (20) The framework Inspired by text-based CAPTCHAs Applicable to sound-based schemes Details see our paper also applicable to image-based schemes (e.g. IMAGINATION) for schemes such as Assira and Bongo, in which distortion is absent, only the dimensions of content and presentation will apply. SOUPS’08 (CMU, July 2008) (21) Summary First attempt towards a systematic analysis of usability issues in CAPTCHA design (in particular, text-based schemes) Proposed a simple but novel framework, which accommodates both novel issues we have identified, and known issues scattered in the literature The framework is applicable to text, sound and (some) image based CAPTCHAs. SOUPS’08 (CMU, July 2008) (22)