Visual CAPTCHA with Handwritten Image Analysis Amalia Rusu and Venu Govindaraju CEDAR University at Buffalo Background on CAPTCHA Completely Automatic Public Turing test to tell Computers and Humans Apart – CAPTCHA CAPTCHA should be automatically generated and graded Tests should be taken quickly and easily by human users Tests should accept virtually all human users and reject software agents Tests should resist automatic attack for many years despite the technology advances and prior knowledge of algorithms Exploits the difference in abilities between humans and machines (e.g., text, speech or facial features recognition) A new formulation of the Alan Turing’s test - “Can machines think?” Securing Cyberspace Using CAPTCHA The user initiates initiate the the dialog and has to be authenticated by server Authentication Server User Challenge Response Internet User authentication Automatic Authentication Session for Web Services. Initialization Handwritten CAPTCHA Challenge User Response Verification Objective Develop CAPTCHAs based on the ability gap between humans and machines in handwriting recognition using Gestalt laws of perception State-of-the-art in HR Lexicon Lexicon Driven size time (secs) accuracy Top 1 10 0.027 100 Grapheme Model accuracy Top 2 time (secs) Top 1 Top 2 96.53 98.73 0.021 96.56 98.77 0.044 89.22 94.13 0.031 89.12 94.06 1000 0.144 75.38 86.29 0.089 75.38 86.29 20000 1.827 58.14 66.56 0.994 58.14 66.49 Speed and accuracy of a HR. Feature extraction time is excluded. [Xue, Govindaraju 2002] Testing platform is an Ultra-SPARC. H-CAPTCHA Motivation Machine recognition of handwriting is more difficult than printed text Handwriting recognition is a task that humans perform easily and reliably Several machine printed text based CAPTCHAs have been already broken Greg Mori and Jitendra Malik of the UCB have written a program that can solve Ez-Gimpy with accuracy 83% Thayananthan, Stenger, Torr, and Cipolla of the Cambridge vision group have written a program that can achieve 93% correct recognition rate against Ez-Gimpy Gabriel Moy, Nathan Jones, Curt Harkless, and Randy Potter of Areté Associates have written a program that can achieve 78% accuracy against Gimpy-R Speech/visual features based CAPTCHAs are impractical H-CAPTCHAs thus far unexplored by the research community H-CAPTCHA Challenges Generation of random and ‘infinite many’ distinct handwritten CAPTCHAs Quantifying and exploiting the weaknesses of state-of-theart handwriting recognizers and OCR systems Controlling distortion - so that they are human readable (conform to Gestalt laws) but not machine readable Generation of random and infinite many distinct handwritten text images Use handwritten word images that current recognizers cannot read Handwritten US city name images available from postal applications Collect new handwritten word samples Create real (or nonsense) handwritten words and sentences by gluing isolated upper and lower case handwritten characters or word images Generation of random and infinite many distinct handwritten text images Use handwriting distorter for generating “human-like” samples Models that change the trajectory/shape of the letter in a controlled fashion (e.g. Hollerbach’s oscillation model) Original handwritten image (a). Synthetic images (b,c,d,e,f). Exploit the Source of Errors for State-of-the-art Handwriting Recognizers Word Model Recognizer (WMR) [Kim, Govindaraju 1997] Accuscript [Xue, Govindaraju 2002] Lexicon Driven Model Distance between lexicon entry ‘word’ Grapheme Based Model first character ‘w’ and the image End 1 2 3 Loops 4 5 6 7 8 between: Junction 9 - segments 1 and 4 is 5.0 lexicon driven - segments 1 and 3 is 7.2 End approach grapheme-based recognizer - segments 1 and 2 is 7.6 chain code based image processing extracts high-level structural pre-processingfeatures from characters such as w[5.0] o[7.7]r[5.8] segmentation r[7.6] loops, turns, d[4.9] junctions, arcs, o[6.1] r[6.4]without previous segmentation feature extraction w[5.0] o[6.0] r[7.5] o[7.6]r[6.3] o[8.3] Loop w[7.6] Turns 7uses a stochastic 1 2 o[6.6] 3 4 5dynamic6matching 8 9finite state r[3.8] d[4.4] automata model based on the w[7.2] o[7.2] d[6.5] o[10.6] extracted features w[8.6] o[7.8]r[8.6] static lexicons in the ‘w’, ‘o’, Find the bestwayuses of accounting for characters recognition process ‘r’, ‘d’ buy consuming all segments 1 to 8 in the process Source of Errors for State-of-the-art Handwriting Recognizers Image quality Background noise, printing surface, writing styles Image features Variable stroke width, slope, rotations, stretching, compressing Segmentation errors Over-segmentation, merging, fragmentation, ligatures, scrawls Recognition errors Confusion with a similar lexicon entries, large lexicons Gestalt Laws Gestalt psychology is based on the observation that we often experience things that are not a part of our simple sensations What we are seeing is an effect of the whole event, not contained in the sum of the parts (holistic approach) Organizing principles: Gestalt Laws By no means restricted to perception only (e.g. memory) Gestalt Laws 1. Law of closure 2. Law of similarity OXXXXXX XOXXXXX XXOXXXX XXXOXXX XXXXOXX XXXXXOX XXXXXXO 3. Law of proximity ************** ************** ************** 4. Law of symmetry [ ][ ][ ] Gestalt Laws 5. Law of continuity a) Ambiguous segmentation b) Segmentation based on good continuity, follows the path of minimal curvature change c) Perceptually implausible segmentation 6. Law of familiarity a) Ambiguous segmentation b) Perceptual segmentation c) Segmentation based on good continuity proves to be erroneous Gestalt Laws 7. Figure and ground 8. Memory Control Overlaps Gestalt laws: proximity, symmetry, familiarity, continuity, figure and ground Create horizontal or vertical overlaps For same word, smaller distance overlaps For different words, bigger distance overlaps Control Occlusions Gestalt laws: closure, proximity, familiarity Add occlusions by circles, rectangles, lines with random angles Ensure small enough occlusions such that they do not hide letters completely Control Occlusions Gestalt laws: closure, proximity, familiarity Add occlusions by waves from left to right on entire image, with various amplitudes / wavelength or rotate them by an angle Choose areas with more foreground pixels, on bottom part of the text image (not too low not to high) Control Extra Strokes Gestalt laws: continuity, figure and ground, familiarity Add occlusion using the same pixels as the foreground pixels (black pixels), arcs, or lines, with various thickness Curved strokes could be confused with part of a character Use asymmetric strokes such that the pattern cannot be learned Control Letter/Word Orientation Gestalt laws: memory, internal metrics, familiarity of letters vertical mirror horizontal mirror flip-flop Change word orientation entirely, or the orientation for few letters only Use variable rotation, stretching, compressing General H-CAPTCHA Generation Algorithm Input. Original (randomly selected) handwritten image (existing US city name image or synthetic word image with length 5 to 8 characters or meaningful sentence) Lexicon containing the image’s truth word Output. H-CAPTCHA image Method. Randomly choose a number of transformations Randomly establish the transformations corresponding to the given number If more than one transformation is chosen then A priori order is assigned to each transformation based on experimental results Sort the list of chosen transformations based on their priori order and apply them in sequence, so that the effect is cumulative Testing Results on Machines HW Recognizer WMR Accuscript Lexicon Size 4,000 40,000 4,000 40,000 Occlusion by circles 35.93% 20.28% 32.34% 17.37% Vertical Overlap 27.88% 14.36% 12.64% 3.94% Horizontal Overlap (Small) 24.35% 10.70% 2.93% 0.60% Black Waves 16.36% 5.33% 1.57% 0.38% Occlusion by waves 15.43% 7.00% 10.56% 4.28% Horizontal Overlap (Large) 12.93% 3.56% 2.42% 0.36% Overlap Different Words 3.80% 0.48% 4.43% 0.92% Flip-Flop 0.46% 0.14% 0.70% 0.19% General Image Transformations 9.28% N/A 4.41% N/A The accuracy of HR on images deformed using Gestalt laws approach. The number of tested images is 4,127 for each type of transformation. HR running time increases from few seconds per image for lexicon 4,000 to several minutes per image for lexicon 40,000. Testing Results on Humans Human Tests All Transforms Occlusion by circles Vertical Overlap Horizontal Overlap (Small) Black Waves Occlusion by waves Horizontal Overlap (Large) Nr. Of Tested Images 1069 90 88 90 90 87 89 Accuracy 76.08% 67.78% 87.50% 76.67% 80.00% 80.46% 65.17% The accuracy of human readers on images deformed using Gestalt laws approach. A word image is recognized correctly when all characters are recognized. 90.00 WMR(4000) 80.00 WMR(40000) Accuscript(4000) 70.00 Accuracy Accuscript(40000) 60.00 Humans 50.00 40.00 30.00 20.00 10.00 0.00 Transformations H-CAPTCHA Evaluation No risk of image repetition Image generation completely automated: words, images and distortions chosen at random The transformed images cannot be easily normalized or rendered noise free by present computer programs, although original images must be public knowledge Deformed images do not pose problems to humans Human subjects succeeded on our test images Test against state-of-the-art: Word Model Recognizer, Accuscript CAPTCHAs unbroken by state-of-the-art recognizers Future Work Develop general methods to attack H-CAPTCHA (e.g. pre and post processing techniques) Research lexicon free approaches for handwriting recognition Quantify the gap between humans and machines in reading handwriting by category (of distortions & Gestalt laws) Parameterize the difficulty levels of Gestalt based H-CAPTCHAs Thank You Questions?