"Visual CAPTCHA with Handwritten Image Analysis," Amalia Rusu

advertisement
Visual CAPTCHA with Handwritten
Image Analysis
Amalia Rusu and Venu Govindaraju
CEDAR
University at Buffalo
Background on CAPTCHA
 Completely Automatic Public Turing test to tell Computers and Humans
Apart – CAPTCHA
 CAPTCHA should be automatically generated and graded
 Tests should be taken quickly and easily by human users
 Tests should accept virtually all human users and reject software agents
 Tests should resist automatic attack for many years despite the
technology advances and prior knowledge of algorithms
 Exploits the difference in abilities between humans and machines
(e.g., text, speech or facial features recognition)
 A new formulation of the Alan Turing’s test - “Can machines think?”
Securing Cyberspace Using CAPTCHA
The user initiates
initiate the
the
dialog and has to be
authenticated by server
Authentication Server
User
Challenge
Response
Internet
User authentication
Automatic Authentication Session for Web Services.




Initialization
Handwritten CAPTCHA Challenge
User Response
Verification
Objective
Develop CAPTCHAs based on the ability gap between humans
and machines in handwriting recognition using Gestalt laws of perception
State-of-the-art in HR
Lexicon Lexicon Driven
size
time
(secs)
accuracy
Top 1
10
0.027
100
Grapheme Model
accuracy
Top 2
time
(secs)
Top 1
Top 2
96.53
98.73
0.021
96.56
98.77
0.044
89.22
94.13
0.031
89.12
94.06
1000
0.144
75.38
86.29
0.089
75.38
86.29
20000
1.827
58.14
66.56
0.994
58.14
66.49
Speed and accuracy of a HR. Feature extraction time is excluded.
[Xue, Govindaraju 2002]
Testing platform is an Ultra-SPARC.
H-CAPTCHA Motivation
 Machine recognition of handwriting is more difficult than printed
text
 Handwriting recognition is a task that humans perform easily and
reliably
 Several machine printed text based CAPTCHAs have been
already broken
 Greg Mori and Jitendra Malik of the UCB have written a program that can solve
Ez-Gimpy with accuracy 83%
 Thayananthan, Stenger, Torr, and Cipolla of the Cambridge vision group have
written a program that can achieve 93% correct recognition rate against Ez-Gimpy
 Gabriel Moy, Nathan Jones, Curt Harkless, and Randy Potter of Areté Associates
have written a program that can achieve 78% accuracy against Gimpy-R
 Speech/visual features based CAPTCHAs are impractical
 H-CAPTCHAs thus far unexplored by the research community
H-CAPTCHA Challenges



Generation of random and ‘infinite many’ distinct
handwritten CAPTCHAs
Quantifying and exploiting the weaknesses of state-of-theart handwriting recognizers and OCR systems
Controlling distortion - so that they are human readable
(conform to Gestalt laws) but not machine readable
Generation of random and infinite many distinct
handwritten text images
 Use handwritten word images that current recognizers cannot read
 Handwritten US city name images available from postal applications
 Collect new handwritten word samples
 Create real (or nonsense) handwritten words and sentences by gluing isolated
upper and lower case handwritten characters or word images
Generation of random and infinite many distinct
handwritten text images
 Use handwriting distorter for generating “human-like” samples
 Models that change the trajectory/shape of the letter in a controlled fashion (e.g.
Hollerbach’s oscillation model)
Original handwritten image (a). Synthetic images (b,c,d,e,f).
Exploit the Source of Errors for State-of-the-art
Handwriting Recognizers
 Word Model Recognizer (WMR) [Kim, Govindaraju 1997]
 Accuscript
[Xue,
Govindaraju 2002]
Lexicon Driven
Model
Distance between lexicon entry ‘word’
Grapheme Based Model first character ‘w’ and the image
End
1
2
3
Loops
4
5
6
7
8
between:
Junction
9
- segments 1 and 4 is 5.0
 lexicon driven
- segments 1 and
3 is 7.2
End
 approach
grapheme-based
recognizer
- segments
1 and 2 is 7.6
 chain code based image
processing

extracts high-level structural
 pre-processingfeatures from characters such as
w[5.0]
o[7.7]r[5.8]  segmentation
r[7.6]
loops, turns,
d[4.9] junctions, arcs,
o[6.1]
r[6.4]without previous segmentation
 feature extraction
w[5.0]
o[6.0]
r[7.5]
o[7.6]r[6.3]
o[8.3] Loop
w[7.6] Turns
 7uses a stochastic
1
2 o[6.6] 3
4  5dynamic6matching
8
9finite state
r[3.8]
d[4.4]
automata model based on the
w[7.2]
o[7.2]
d[6.5]
o[10.6]
extracted
features
w[8.6]
o[7.8]r[8.6]
static lexicons
in the ‘w’, ‘o’,
Find the bestwayuses
of accounting
for characters
recognition
process
‘r’, ‘d’ buy consuming
all segments
1 to 8 in the
process
Source of Errors for State-of-the-art Handwriting
Recognizers
 Image quality
Background noise, printing surface, writing styles
 Image features
Variable stroke width, slope, rotations, stretching, compressing
 Segmentation errors
Over-segmentation, merging, fragmentation, ligatures, scrawls
 Recognition errors
Confusion with a similar lexicon entries, large lexicons
Gestalt Laws
 Gestalt psychology is based on the observation that we often
experience things that are not a part of our simple sensations
 What we are seeing is an effect of the whole event, not contained
in the sum of the parts (holistic approach)
 Organizing principles: Gestalt Laws
 By no means restricted to perception only (e.g. memory)
Gestalt Laws
1. Law of closure
2. Law of similarity
OXXXXXX
XOXXXXX
XXOXXXX
XXXOXXX
XXXXOXX
XXXXXOX
XXXXXXO
3. Law of proximity
**************
**************
**************
4. Law of symmetry
[
][
][
]
Gestalt Laws
5. Law of continuity
a) Ambiguous segmentation
b) Segmentation based on good continuity, follows the path of minimal curvature change
c) Perceptually implausible segmentation
6. Law of familiarity
a) Ambiguous segmentation
b) Perceptual segmentation
c) Segmentation based on good continuity proves to be erroneous
Gestalt Laws
7. Figure and ground
8. Memory
Control Overlaps
Gestalt laws: proximity, symmetry, familiarity, continuity, figure and ground
Create horizontal or vertical overlaps
For same word, smaller distance overlaps For different words, bigger distance overlaps
Control Occlusions
Gestalt laws: closure, proximity, familiarity
Add occlusions by circles, rectangles, lines with random angles
Ensure small enough occlusions such that they do not hide letters completely
Control Occlusions
Gestalt laws: closure, proximity, familiarity
Add occlusions by waves from left to right on entire image, with various
amplitudes / wavelength or rotate them by an angle
Choose areas with more foreground pixels, on bottom part of the text image
(not too low not to high)
Control Extra Strokes
Gestalt laws: continuity, figure and ground, familiarity
Add occlusion using the same pixels as the foreground pixels (black pixels),
arcs, or lines, with various thickness
Curved strokes could be confused with part of a character
Use asymmetric strokes such that the pattern cannot be learned
Control Letter/Word Orientation
Gestalt laws: memory, internal metrics, familiarity of letters
vertical mirror
horizontal mirror
flip-flop
Change word orientation entirely, or the orientation for few letters only
Use variable rotation, stretching, compressing
General H-CAPTCHA Generation Algorithm
Input.
 Original (randomly selected) handwritten image (existing US city name
image or synthetic word image with length 5 to 8 characters or meaningful
sentence)
 Lexicon containing the image’s truth word
Output.
 H-CAPTCHA image
Method.
 Randomly choose a number of transformations
 Randomly establish the transformations corresponding to the given number
 If more than one transformation is chosen then
 A priori order is assigned to each transformation based on experimental results
 Sort the list of chosen transformations based on their priori order and apply them
in sequence, so that the effect is cumulative
Testing Results on Machines
HW Recognizer
WMR
Accuscript
Lexicon Size
4,000
40,000
4,000
40,000
Occlusion by circles
35.93%
20.28%
32.34%
17.37%
Vertical Overlap
27.88%
14.36%
12.64%
3.94%
Horizontal Overlap
(Small)
24.35%
10.70%
2.93%
0.60%
Black Waves
16.36%
5.33%
1.57%
0.38%
Occlusion by waves
15.43%
7.00%
10.56%
4.28%
Horizontal Overlap
(Large)
12.93%
3.56%
2.42%
0.36%
Overlap Different
Words
3.80%
0.48%
4.43%
0.92%
Flip-Flop
0.46%
0.14%
0.70%
0.19%
General Image
Transformations
9.28%
N/A
4.41%
N/A
The accuracy of HR on images deformed using Gestalt laws approach. The number of tested images is
4,127 for each type of transformation. HR running time increases from few seconds per image for
lexicon 4,000 to several minutes per image for lexicon 40,000.
Testing Results on Humans
Human
Tests
All
Transforms
Occlusion by
circles
Vertical
Overlap
Horizontal
Overlap
(Small)
Black Waves
Occlusion by
waves
Horizontal
Overlap
(Large)
Nr. Of Tested
Images
1069
90
88
90
90
87
89
Accuracy
76.08%
67.78%
87.50%
76.67%
80.00%
80.46%
65.17%
The accuracy of human readers on images deformed using Gestalt laws approach.
A word image is recognized correctly when all characters are recognized.
90.00
WMR(4000)
80.00
WMR(40000)
Accuscript(4000)
70.00
Accuracy
Accuscript(40000)
60.00
Humans
50.00
40.00
30.00
20.00
10.00
0.00
Transformations
H-CAPTCHA Evaluation
 No risk of image repetition
 Image generation completely automated: words, images and distortions
chosen at random
 The transformed images cannot be easily normalized or rendered
noise free by present computer programs, although original images
must be public knowledge
 Deformed images do not pose problems to humans
 Human subjects succeeded on our test images
 Test against state-of-the-art: Word Model Recognizer, Accuscript
 CAPTCHAs unbroken by state-of-the-art recognizers
Future Work
Develop general methods to attack H-CAPTCHA (e.g. pre and
post processing techniques)
Research lexicon free approaches for handwriting recognition
Quantify the gap between humans and machines in reading
handwriting by category (of distortions & Gestalt laws)
Parameterize the difficulty levels of Gestalt based H-CAPTCHAs
Thank You
Questions?
Download