Breaking Visual CAPTCHAs with Naïve Pattern Recognition

advertisement
Breaking Visual CAPTCHAs with Naïve Pattern Recognition Algorithms
Chapter 1
Introduction
CAPTCHAs are short for Completely Automated Public Turing test to tell
Computers and Humans Apart. The term "CAPTCHA" is coined in 2000 by Luis Von
Ahn, Manuel Blum, Nicholas J. Hopper [1]. They are challenge-response tests to ensure that
the users are indeed human. The purpose of a CAPTCHA is to block form submissions from
spam bots – automated scripts that harvest email addresses from publicly available web
forms. A common kind of CAPTCHA used on most websites requires the users to enter the
string of characters that appear in a distorted form on the screen.
CAPTCHAs are used because of the fact that it is difficult for the computers to extract
the text from such a distorted image, whereas it is relatively easy for a human to understand
the text hidden behind the distortions. Therefore, the correct response to a CAPTCHA
challenge is assumed to come from a human and the user is permitted into the website. The
need to create a test that can tell humans and computers apart is because of people trying
to game the system -- they want to exploit weaknesses in the computers running the site.
While these individuals probably make up a minority of all the people on the Internet, their
actions can affect millions of users and Web sites. For example, a free e-mail service might
find itself bombarded by account requests from an automated program. That automated
program could be part of a larger attempt to send out spam mail to millions of people. The
CAPTCHA test helps identify which users are real human beings and which ones are
computer programs [2].
Spammers are constantly trying to build algorithms that read the distorted text
correctly. So strong CAPTCHAs have to be designed and built so that the efforts of the
spammers are thwarted.
The easiest implementation of a CAPTCHA to a Website would be to insert a few
lines of CAPTCHA code into the Website‘s HTML code, from an open source CAPTCHA
builder, which will provide the authentication services remotely. Most such services are free.
Popular among them is the service provided by www.captcha.net’s reCAPTCHA project and
also Captchaservice.org.
Dept. of CSE, JNNCE.
Page 1
Breaking Visual CAPTCHAs with Naïve Pattern Recognition Algorithms
1.1 Motivation:
The proliferation of the publicly available services on the web is a boon for the
community at large. But unfortunately it has invited new and novel abuses. Programs (bots
and spiders) are being created to steal services and to conduct fraudulent transactions.
Some examples:

Free online accounts are being registered automatically many times and are being
used to distribute stolen or copyrighted material.

Recommendation systems are vulnerable to artificial inflation or deflation of
rankings. For example, EBay, a famous auction website allows users to rate a
product. Abusers can easily create bots that could increase or decrease the rating
of a specific product, possibly changing people’s perception towards the product.

Spammers register themselves with free email accounts such as those provided by
Gmail or Hotmail and use their bots to send unsolicited mails to other users of that
email service.

Online polls are attacked by bots and are susceptible to ballot stuffing. This gives
unfair mileage to those that benefit from it.
In light of the above listed abuses and much more, a need is felt for a facility that
checks users and allows access to services to only human users. It is in this direction that such
a tool like CAPTCHA is created.
1.2 Background:
The need for CAPTCHAs rose to keep out the website/search engine abuse by bots. In
1997, AltaVista sought ways to block and discourage the automatic submissions of URLs
into their search engines. Andrei Broder, Chief Scientist of AltaVista, and his colleagues
developed a filter. Their method is to generate a printed text randomly that only humans
could read and not machine readers. Their approach is so effective that in an year, “spamadd-ons” were reduced by 95% and a patent is issued in 2001.
In 2000, Yahoo’s popular Messenger chat service is hit by bots which pointed
advertising links to annoying human users of chat rooms. Yahoo, along with Carnegie Mellon
University, developed a CAPTCHA called EZ-GIMPY, which chose a dictionary word
Dept. of CSE, JNNCE.
Page 2
Breaking Visual CAPTCHAs with Naïve Pattern Recognition Algorithms
randomly and distorted it with a wide variety of image occlusions and asked the user to input
the distorted word.
In November 1999, slashdot.com released a poll to vote for the best CS college in the
US. Students from the Carnegie Mellon University and the Massachusetts Institute of
Technology created bots that repeatedly voted for their respective colleges. This incident
created the urge to use CAPTCHAs for such online polls to ensure that only human users are
able to take part in the polls.
1.3 CAPTCHAs and the Turing Test:
CAPTCHA technology has its foundation in an experiment called the Turing Test.
Alan Turing, sometimes called the father of modern computing, proposed the test as a way to
examine whether or not machines can think -- or appear to think -- like humans [3]. The
classic test is a game of imitation. In this game, an interrogator asks two participants a series
of questions. One of the participants is a machine and the other is a human. The interrogator
can't see or hear the participants and has no way of knowing which is which. If the
interrogator is unable to figure out which participant is a machine based on the responses, the
machine passes the Turing Test.
Of course, with a CAPTCHA, the goal is to create a test that humans can pass easily
but machines can't. It's also important that the CAPTCHA application is able to present
different CAPTCHAs to different users. If a visual CAPTCHA presented a static image that
is the same for every user, it wouldn't take long before a spammer spotted the form,
deciphered the letters, and programmed an application to type in the correct answer
automatically.
Most, but not all, CAPTCHAs rely on a visual test. Computers lack the sophistication
that human beings have when it comes to processing visual data. Humans can look at an
image and pick out patterns more easily than a computer. But not all CAPTCHAs rely on
visual patterns [10]. In fact, it's important to have an alternative to a visual CAPTCHA.
Otherwise, the Web site administrator runs the risk of disenfranchising any web user who has
a visual impairment. One alternative to a visual test is an audible one. An audio CAPTCHA
usually presents the user with a series of spoken letters or numbers. It's not unusual for the
Dept. of CSE, JNNCE.
Page 3
Breaking Visual CAPTCHAs with Naïve Pattern Recognition Algorithms
program to distort the speaker's voice, and it's also common for the program to include
background noise in the recording. This helps thwart voice recognition programs.
Another option is to create a CAPTCHA that asks the reader to interpret a short
passage of text. A contextual CAPTCHA quizzes the reader and tests comprehension skills.
While computer programs can pick out key words in text passages, they aren't very good at
understanding what those words actually mean.
CAPTCHAs are by definition fully automated, requiring little human maintenance or
intervention in administering the test. This has obvious benefits in cost and reliability.
Modern Text based CAPTCHAS are designed in such a way that they require the
simultaneous use of three separate abilities—invariant recognition, segmentation, and
parsing—to correctly complete the task with any consistency.
Invariant recognition refers to the ability to recognize the large amount of variation in
the shapes of letters. There are nearly an infinite number of versions for each character that a
human brain can successfully identify. The same is not true for a computer and teaching it to
recognize all those differing formations is an extremely challenging task. Segmentation, or
the ability to separate one letter from another is also made difficult in CAPTCHAs, as
characters are crowded together with no white space in between. Lastly, context is also
critical. A complete holistic view of the CAPTCHA must be taken to correctly identify each
character. For example, in one segment of a CAPTCHA, a letter might look like a “m.” It is
only when the whole word is taken into consideration that it becomes clear that it is actually a
“u” and an “n.” Each of these problems pose a significant challenge for a computer even in
isolation. The presence of all three at the same time is what makes CAPTCHAS so difficult
to solve.
There are various CAPTCHAs that would be insecure if a significant number of sites
started using them. An example of such a puzzle is asking text-based questions, such as a
mathematical question ("what are 1+1"). Since a parser could easily be written that would
allow bots to bypass this test, such "CAPTCHAs" rely on the fact that few sites use them, and
thus that a bot author has no incentive to program their bot to solve that challenge. True
CAPTCHAs should be secure even after a significant number of websites adopt them [2].
Introduction, motivation and previous works on the field of CAPTCHA were
described in this section of the report. Chapter 2 and 3 describes various types of CAPTCHA
and also a scheme for breaking the CAPTCHA.
Dept. of CSE, JNNCE.
Page 4
Breaking Visual CAPTCHAs with Naïve Pattern Recognition Algorithms
Chapter 2
Types of CAPTCHAs
In this section, various classifications of CAPTCHAs are described. CAPTCHAs are
classified based on what is distorted and presented as a challenge to the user. They are:
2.1 Text CAPTCHAs:
These are simple to implement. The simplest yet novel approach is to present the user
with some questions which only a human user can solve [6]. Examples of such questions are:
1. What is twenty minus three?
2. What is the third letter in UNIVERSITY?
3. Which of Yellow, Thursday and Richard is a colour?
4. If yesterday is a Sunday, what is today?
Such questions are very easy for a human user to solve, but it’s very difficult to
program a computer to solve them. These are also friendly to people with visual disability –
such as those with color blindness.
Other text CAPTCHAs involves text distortions and the user is asked to identify the
text hidden. The various implementations are:
2.1.1 Gimpy:
Gimpy is a very reliable text CAPTCHA built by CMU in collaboration with Yahoo
for their Messenger service. Gimpy is based on the human ability to read extremely distorted
text and the inability of computer programs to do the same. Gimpy works by choosing ten
words randomly from a dictionary, and displaying them in a distorted and overlapped
manner. Gimpy then asks the users to enter a subset of the words in the image. The human
user is capable of identifying the words correctly, whereas a computer program cannot. Fig.
2.1 depicts the Gimpy CAPTCHA used in yahoo messenger.
Dept. of CSE, JNNCE.
Page 5
Breaking Visual CAPTCHAs with Naïve Pattern Recognition Algorithms
Fig 2.1 Gimpy CAPTCHA
2.1.2 Ez – Gimpy:
This is a simplified version of the Gimpy CAPTCHA, adopted by Yahoo in their
signup page. Ez – Gimpy randomly picks a single word from a dictionary and applies
distortion to the text. The user is then asked to identify the text correctly. Fig. 2.2 shows the
Ez-Gimpy CAPTCHA used in yahoo in the signup page.
Fig 2.2 Yahoo’s Ez – Gimpy CAPTCHA
2.1.3 BaffleText:
This is developed by Henry Baird at University of California at Berkeley. This
is a variation of the Gimpy. This doesn’t contain dictionary words, but it picks up random
alphabets to create a nonsense but pronounceable text. Distortions are then added to this text
and the user is challenged to guess the right word. This technique overcomes the drawback of
Gimpy CAPTCHA because, Gimpy uses dictionary words and hence, clever bots could be
Dept. of CSE, JNNCE.
Page 6
Breaking Visual CAPTCHAs with Naïve Pattern Recognition Algorithms
designed to check the dictionary for the matching word by brute-force. The fig.2.3 is an
example for the Baffle text.
finans
Fig 2.3 Baffle Text Example
2.1.4 MSN Captcha:
Microsoft uses a different CAPTCHA for services provided under MSN
umbrella. These are popularly called MSN Passport CAPTCHAs. They use eight characters
(upper case) and digits. Foreground is dark blue, and background is grey as depicted in the
fig. 2.4. Warping is used to distort the characters, to produce a ripple effect, which makes
computer recognition very difficult.
XTNM5YRE
L9D28229B
Fig 2.4 MSN Passport CAPTCHA
2.2 Graphic CAPTCHAs:
Graphic CAPTCHAs are challenges that involve pictures or objects that have some sort of
similarity that the users have to guess[6]. They are visual puzzles, similar to Mensa tests.
Computer generates the puzzles and grades the answers, but is itself unable to solve it.
2.2.1 Bongo:
BONGO is named after M.M. Bongard, who published a book of pattern recognition
problems in the 1970s. BONGO asks the user to solve a visual pattern recognition problem. It
Dept. of CSE, JNNCE.
Page 7
Breaking Visual CAPTCHAs with Naïve Pattern Recognition Algorithms
displays two series of blocks, the left and the right. The blocks in the left series differ from
those in the right, and the user must find the characteristic that sets them apart. A possible left
and right series is shown in Figure 2.5
Fig 2.5 Bongo CAPTCHA
These two sets are different because everything on the left is drawn with thick lines and those
on the right are in thin lines. After seeing the two blocks, the user is presented with a set of
four single blocks and is asked to determine to which group the each block belongs to. The
user passes the test if s/he determines correctly to which set the blocks belong to.
Programmer should be careful to see that the user is not confused by a large number of
choices.
2.2.2 PIX:
PIX is a program that has a large database of labeled images. All of these images are
pictures of concrete objects (a horse, a table, a house, a flower). The program picks an object
at random, finds six images of that object from its database, presents them to the user and
then asks the question “what are these pictures of?” Current computer programs should not be
able to answer this question, so PIX should be a CAPTCHA. However, PIX, as stated, is not
a CAPTCHA: it is very easy to write a program that can answer the question “what are these
pictures of?” Remember that all the code and data of a CAPTCHA should be publicly
available; in particular, the image database that PIX uses should be public. Hence, writing a
program that can answer the question “what are these pictures of?” is easy: search the
database for the images presented and find their label. Fortunately, this can be fixed. One way
Dept. of CSE, JNNCE.
Page 8
Breaking Visual CAPTCHAs with Naïve Pattern Recognition Algorithms
for PIX to become a CAPTCHA is to randomly distort the images before presenting them to
the user, so that computer programs cannot easily search the database for the undistorted
image. Fig 2.6 shows an example of the PIX CAPTCHA.
Fig 2.6 PIX CAPTCHA
2.3 Audio CAPTCHAs:
This type of CAPTCHAs are based on sound. The program picks a word or a
sequence of numbers at random, renders the word or the numbers into a sound clip and
distorts the sound clip; it then presents the distorted sound clip to the user and asks users to
enter its contents [7]. This CAPTCHA is based on the difference in ability between humans
and computers in recognizing spoken language. Nancy Chan of the City University in Hong
Kong is the first to implement a sound-based system of this type. The idea is that a human is
able to efficiently disregard the distortion and interpret the characters being read out while
software would struggle with the distortion being applied, and need to be effective at speech
to text translation in order to be successful. This is a crude way to filter humans and it is not
so popular because the user has to understand the language and the accent in which the sound
clip is recorded. Fig 2.7 depicts the audio CAPTCHA.
Dept. of CSE, JNNCE.
Page 9
Breaking Visual CAPTCHAs with Naïve Pattern Recognition Algorithms
Fig 2.7 Audio CAPTCHA
2.4 reCAPTCHA and book digitization:
To counter various drawbacks of the existing implementations, researchers at CMU
developed a redesigned CAPTCHA aptly called the reCAPTCHA. About 200 million
CAPTCHAs are solved by humans around the world every day. In each case, roughly ten
seconds of human time are being spent. Individually, that's not a lot of time, but in aggregate
these little puzzles consume more than 150,000 hours of work each day. What if one could
make positive use of this human effort? reCAPTCHA does exactly that by channeling the
effort spent solving CAPTCHAs online into "reading" books.
To archive human knowledge and to make information more accessible to the world,
multiple projects are currently digitizing physical books that were written before the
computer age. The book pages are being photographically scanned, and then transformed into
text using "Optical Character Recognition" (OCR). The transformation into text is useful
because scanning a book produces images, which are difficult to store on small devices,
expensive to download, and cannot be searched. The problem is that OCR is not perfect.
reCAPTCHA improves the process of digitizing books by sending words that cannot
be read by computers to the Web in the form of CAPTCHAs for humans to decipher. More
specifically, each word that cannot be read correctly by OCR is placed on an image and used
as a CAPTCHA. This is possible because most OCR programs alert you when a word cannot
be read correctly.
Dept. of CSE, JNNCE.
Page 10
Breaking Visual CAPTCHAs with Naïve Pattern Recognition Algorithms
Fig 2.8 reCAPTCHA
But if a computer can't read such a CAPTCHA, how does the system know the correct
answer to the puzzle? Here's how: Each new word that cannot be read correctly by OCR is
given to a user in conjunction with another word for which the answer is already known as
shown in the Fig 2.8. The user is then asked to read both words. If they solve the one for
which the answer is known, the system assumes their answer is correct for the new one. The
system then gives the new image to a number of other people to determine, with higher
confidence, whether the original answer is correct. Currently, reCAPTCHA is employed in
digitizing books as part of the Google Books Project.
Currently, there are mainly two kinds of methods to implement the CAPTCHA
mechanism: OCR (Optical character recognition) visual method, non-OCR visual method.
The CAPTCHA based on OCR visual method takes advantage of superiority in language
barrier, security and easy use, becoming the most widely used CAPTCHA. However, with
the fast development of OCR technology based on neural network, as well as the emergence
of a variety of character segmentation technology, CAPTCHAs of lots of websites have been
attacked. Though there are many different kinds of specific implementations for non-OCR
visual method, it eventually comes down to the OCR problem in general, requiring users to
identify images. It is not so widely used. Up to now, except some research sites, commercial
sites rarely use it. Non-OCR visual method is designed for special occasions and certain user
groups, thus it has very limited applications.
Dept. of CSE, JNNCE.
Page 11
Breaking Visual CAPTCHAs with Naïve Pattern Recognition Algorithms
Chapter 3
Breaking Visual CAPTCHAs with Naïve Pattern
Recognition Algorithms
Visual CAPTCHAs have been widely used across the Internet to defend against
undesirable or malicious bot programs. In this section, methods to break the visual schemes
provided at Captchaservice.org, a publicly available web service for CAPTCHA generation is
documented. These schemes were effectively resistant to attacks conducted using a highquality Optical Character Recognition program, but were broken with a near 100% success
rate by the novel attacks [3].
Captchaservice.org is the first web service that is designed for the sole purpose of
generating CAPTCHA challenges. It supports multiple visual and non-visual CAPTCHA
schemes. Using the API provided by this service, people can obtain a chosen type of
CAPTCHA challenge generated on the fly to protect their blogs from comment spam attacks,
or to defend against other type of bots.
Captchaservice.org supports the following four visual schemes:
• word_image: In this scheme, a challenge is a distorted image of a six-letter word.
• random_letters_image: A challenge is implemented as a distorted image of a random sixletter sequence.
• user_string_image: A challenge is a distorted image of a user-supplied string of at most 15
characters.
• number_puzzle_text_image: This is a multi-modal scheme, which includes a distorted image
of a random number, as well as a textual description of a puzzle involving the number. A user
can solve such a challenge either by recognising the number in the image, or by solving the
textual puzzle. The advantage of such a multimodal scheme is mainly to improve its usability
and accessibility.
All these schemes use a random_shear distortion technique, which [4] describes, thus:
“the initial image of text is distorted by randomly shearing it both vertically and horizontally.
Dept. of CSE, JNNCE.
Page 12
Breaking Visual CAPTCHAs with Naïve Pattern Recognition Algorithms
That is, the pixels in each column of the image are translated up or down by an amount that
varies randomly yet smoothly from one column to the next. Then the same kind of translation
is applied to each row of pixels (with a smaller amount of translation on average)”.
In this section, algorithms and enhancements for breaking the most commonly used
CAPTCHAs created by using word image and random letter image are described under the
headings breaking scheme1 and breaking scheme 2 respectively.
3.1 Breaking Scheme 1:
The following empirical observations found from the captchaservice.org :
• Only two colors were used in each challenge, one for background and another for
foreground which is the distorted challenge text; the choice of colors is either random or
specified by the user. Therefore, it is easy to separate the text from the background.
Table 3.1 pixel count of each letter used to create CAPTCHA
Only capital letters were used. Although a letter might be distorted into a different
shape each time, it consisted of a constant number of foreground pixels in a challenge image.
That is, a letter had a constant pixel count. The pixel count for each of the letters A to Z as in
table 3.1, were calculated. As plotted in 3.1, most letters had a distinct pixel count. Few
letters overlapped or touched with each other in a challenge, so many challenges were
vulnerable to a vertical segmentation attack: the image could be vertically divided by a
program into segments each containing a single character. The attack algorithm is largely
Dept. of CSE, JNNCE.
Page 13
Breaking Visual CAPTCHAs with Naïve Pattern Recognition Algorithms
based on the above observations. One of its key components is a vertical segmentation
algorithm, which works as follows.
1. Obtaining the top-left pixel’s colour value, which defines the background colour of an
image. Any pixel of a different colour value in this image is in foreground, i.e. part of the
distorted text.
2. Identifying the first segmentation line. map the image into a coordinate system, in which
the top-left pixel has coordinates (0, 0), the top-right pixel (image width, 0) and the bottomleft pixel (0, image height). Starting from point (0, 0), a vertical “slicing” process traverse
pixels from top to bottom and then from left to right. This process stops once a pixel with a
nonbackground colour is detected. The X co-ordinate of this pixel, x1, defines the first
vertical segmentation line X = x1 -1.
Fig 3.1 Graph Of Pixel Count V/S Letters
3. Vertical slicing continues from (x1+1, 0), until it detects another vertical line that does not
contain any foreground pixels – this is the next segmentation line.
Dept. of CSE, JNNCE.
Page 14
Breaking Visual CAPTCHAs with Naïve Pattern Recognition Algorithms
4. Vertical slicing continues from a pixel to the right of the previous segmentation line.
However, the next vertical line that does not contain any foreground pixel is not necessarily
the next segmentation line. It could be a redundant segmentation line, which would be
ignored by the algorithm. Therefore, only when the vertical slicing process cuts through the
next letter, the next vertical line that does not contain any foreground pixels is the next
segmentation line.
5. Step 4 repeats until the algorithm determines the last segmentation line (after which, the
vertical slicing will not find any foreground pixels). Once a challenge image is vertically
segmented, the attack program simply counts the number of foreground pixels in each
segment. Then, the pixel count obtained is used to look up Table 3.1, telling the letter in each
segment. Fig 3.2 shows how the basic attack has broken a challenge. First, the vertical
segmentation divided the challenge into 6 segments. Second, each segment is scanned to get
the number of foreground pixels in it. Then, the pixel count obtained in the previous step is
used to look up the mapping table, recognising a character Ci for each segment Si (i=1, …,
6). Finally, the string ‘C1 C2 C3 C4 C5 C6’ gives the result.
Fig. 3.2 Basic Attack To Break The CAPTCHA.
Enhancement: dictionary attack
The basic algorithm would fail to break some challenges completely. Fig. 3.3 gives a
failing example, where the vertical segmentation method could not separate letters ‘S’ and
Dept. of CSE, JNNCE.
Page 15
Breaking Visual CAPTCHAs with Naïve Pattern Recognition Algorithms
‘K’ because the vertical slicing line could not split the two letters without touching both of
them. The basic attack could not do anything more than to give a partially recognized result
“FRI**Y” (‘*’is used to represent one unrecognized character). However, since Scheme 1
challenges all used words, the basic attack is enhanced by the following “dictionary attack”.
A dictionary of about 6,000 six-letter English words is introduced.
Fig 3.3 Failing Example Of Vertical Segmentation Method
Any partial result returned by the basic algorithm is used as a string pattern to identify
candidate words in the dictionary that match the pattern. Since there could be multiple
candidate words, a simple solution is introduced to find the best possible result as follows.
For each dictionary entry, pixel sum is pre-computed (using Table 3.1), which is the total
number of pixels this word could have when it is embedded in a CAPTCHA challenge. This
pixel sum is stored along with the word in the dictionary. A pixel sum for the unbroken
challenge is also worked out, which is the total number of all foreground pixels in the
challenge. The first candidate word with the same pixel sum as the challenge is returned as
the final recognition result.
Dept. of CSE, JNNCE.
Page 16
Breaking Visual CAPTCHAs with Naïve Pattern Recognition Algorithms
Fig 3.4 Enhanced Algorithm
Fig 3.4 illustrates how the enhanced algorithm worked. In this case, the partial result
‘FRI**Y’ obtained by the basic algorithm is used to identify all words that start with ‘FRI’
and end with ‘Y’ in the dictionary. Five candidate words were found: ‘FRIARY’, ‘FRILLY’,
‘FRISKY’, ‘FRIZZY’ and ‘FRIDAY’. However, ‘FRISKY’ is returned as the best possible
result, since it is the only candidate having a pre-computed pixel sum of 987, which equals to
the pixel sum of the unbroken challenge (133+208+121+372+153=987).
To make the dictionary attack work properly, it is crucial to create a correct string
pattern after the vertical segmentation process. For example, when the vertical segmentation
divided an image into only four segments and the corresponding partial result is in the
following form: ‘B�B�’, it is important to determine how many unrecognized letters were in
each box ‘�’. Otherwise, ‘B*B***’, ‘B**B**’ or ‘B***B*’ would give totally different
recognition results. If all these patterns were used to look up the dictionary, it would be likely
to find many candidates with an identical pixel sum. This is a problem of indexing letters in
their correct positions, and it is addressed using the following two step method.
1) For some cases, it is trivial to work out a string pattern with contextual information. For
instance, if a segmented image contained only one unrecognized segment, e.g. the example in
Fig 7, the number of unrecognized characters in the segment is six minus the number of all
recognized characters. Another straightforward case is when no character is recognized in an
Dept. of CSE, JNNCE.
Page 17
Breaking Visual CAPTCHAs with Naïve Pattern Recognition Algorithms
image – then the number of unrecognized segments in the image did not really matter. For
example, an image segmented into three unrecognized segments ‘���’ would be no
different to one for which the vertical segmentation completely failed.
2) When the above method did not work, e.g. in the case of ‘B�B�’, the number of pixels in
each unrecognized segment in order to deduce how many characters the segment contained
were considered. For example, when the number of pixels in a segment is larger than 239 (the
largest pixel count in Table 3.1, i.e., ‘N’) but smaller than 2×239, it is likely that this segment
had two unrecognized letters. There were exceptions that could not be handled this way.
Although the average pixel count for letters A-Z is 178.80, ‘J’, ‘L’ and ‘I’ had a pixel count
much smaller than the average. For example, the pixel sum of ‘ILL’ or ‘LIL’ is only 343; the
pixel sum of ‘LI’ or ‘IL’ is a mere 232. On the other hand, the combination of ‘LLL’, ‘JK’
and ‘KJ’ never or rarely occurs in English words.
Further enhancements
The following enhancements were developed to handle typical “troublemakers” that
could not be broken by the above techniques.
Letters with an identical pixel count. Letters having an identical pixel count could confuse
the basic algorithm. For example, the challenge in Fig 3.5(a) is successfully segmented into 6
parts, but it is initially recognized as “OELLEY”, leading to an incorrect result. Since ‘O’ and
‘K’ have the same pixel count, the basic algorithm had only a 50% of chance for breaking
this challenge.
To overcome this problem, the following “spelling check” method is considered: if a
challenge includes a letter with a pixel count of 111 (‘J’ or ‘L’), 178 (‘K’ or ‘O’), or 162 (‘P’
or ‘V’), each variant is generated, and then carry out multiple dictionary lookups to rule out
candidate strings that are not proper words. For example, in the above case, both ‘OELLEY’
and ‘KELLEY’ were looked up in the dictionary. Since only “KELLEY” is in the dictionary,
it is returned as the best possible result. This “spelling check” technique is also used to
enhance the string pattern matching in the dictionary attack. For example, if a partial result
recognized by the basic algorithm is “V*B*IC”, then both “V*B*IC” and “P*B*IC” would
be valid matching patterns for identifying candidate words in the dictionary.
Dept. of CSE, JNNCE.
Page 18
Breaking Visual CAPTCHAs with Naïve Pattern Recognition Algorithms
Fig 3.5 Typical troublemakers: a) Letters with an identical pixel count; b) Broken letters;
c) Letters with additional or less pixels.
Broken characters. A few challenges contained broken letters that misled the segmentation
algorithm. As shown in Fig 3.5 b), due to a break in ‘H’, the letter is segmented into two parts
instead of one. To overcome this problem, a two-step method as follows is introduced. First,
once the vertical segmentation is done, the algorithm tested whether a segment is complete: if
the number of foreground pixels in a segment is smaller than 111, the smallest pixel count in
Table 3.1, then this segment is incomplete; if the number of foreground pixels in a segment is
larger than 111 but smaller than 239 (the largest pixel count in Table 3.1, i.e., ‘N’) and this
number could not be found in the lookup table, then this segment is incomplete. Second, an
incomplete segment would be merged with its neighboring segment(s). A proper merging of
segment is one for which the combined pixel count could lead to a meaningful recognition
result, e.g. the combined count is equal to or less than 239, and it could be found in the
lookup table. When multiple proper combinations existed (e.g. S3 can be combined either
with S2 or with S4), spelling check could serve as the last resort to find the best possible
result.
Additional pixel(s). In a few cases, a letter might contain additional pixel(s) against its pixel
count in the lookup table. For example, an additional pixel occurred above ‘A’ in Fig 3.5 c.
Dept. of CSE, JNNCE.
Page 19
Breaking Visual CAPTCHAs with Naïve Pattern Recognition Algorithms
To address this problem, an approximate table lookup is used: when a pixel count for a
segment could not be located in the lookup table, this segment would be recognized as the
most likely letter. This method does not succeed all the time, since some letters have close
pixel counts (e.g., V, E and U; D, Z and S; M and W). However, sometimes, we could resort
to the spelling check technique to find the correct result. For example, when multiple
candidate answers were returned by the approximate method, spelling check could be used to
choose the best possible solution.
3.2 Breaking Scheme 2
In Scheme 2 (random_letters_image), each challenge is a distorted image of a random
six-letter sequence, rather than an English word. However, the challenge images in Schemes
1 and 2 share many common characteristics, such as:
• Each image is of the same dimension: 178 × 83 pixels. Only two colors are used in the
image, one for background and another for foreground which is the distorted challenge text.
• Only capital letters are used. Few letters overlap or touch with each other.
• Each letter has an (almost) constant pixel count. The one-to-one mapping from a pixel count
to a letter in Table 3.1 is still valid. The basic attack algorithm in the previous section is also
applicable to Scheme 2. However, the dictionary attack did not work here. It is possible (but
expensive) to build a dictionary of 6-random-letter strings (26^6 =308,915,776 dictionary
entries). However, the pixel sum matching would often return multiple candidates. Moreover,
the spelling check technique is no longer applicable to differentiate letters with an identical
pixel count. To boost the success rate, a new method is developed, largely based on the
following new ideas: a “snake” segmentation algorithm, which replaced the vertical
segmentation since it could do a better job of dividing an image into individual letter
components, and some simple geometric analysis algorithms that differentiated letters with
the same pixel count.
Snake segmentation
The snake segmentation method is inspired by the popular “snake” game, which is
supported in most mobile phones. In this game, a player moves a growing snake on the
screen, and tries to avoid collisions between the snake and dynamic blocks. In the algorithm,
a snake is a line that separates the letters in an image. It starts at the top line of the image and
ends at the bottom. The snake can move in four directions: Up, Right, Left and Down, and it
can touch foreground pixels of the image but never cuts through them. Often, a snake can
Dept. of CSE, JNNCE.
Page 20
Breaking Visual CAPTCHAs with Naïve Pattern Recognition Algorithms
properly segment a challenge that the vertical segmentation fails to do. The first step of the
snake segmentation is to preprocess an image to obtain the first and last segmentation lines,
as illustrated in Fig 3.6
The first segmentation line (X= xfirst) is obtained as in the vertical segmentation
algorithm, and then the vertical slicing started at point (width, 0), moving leftwards to locate
the last segmentation line (X= xlast). The top and bottom edges of the image between these
two segmentation lines were starting and ending lines for a snake. Since each letter occupies
some width, chose to refine the starting line by shifting 10 pixels to each segmentation lines.
That is, for snakes, all possible starting points are between (xfirst+10, 0) and (xlast - 10, 0).
Fig 3.6 Snake Segmentation
Next, the snake segmentation is started to divide the pre-processed image into
segments. The following heuristics control the movement of a snake:
1. Whenever feasible, a snake moves down vertically as much as possible. That is, Down is
the direction that has the highest priority.
2. A snake moves down from its starting point until it is immediately above a foreground
pixel.
3. When a snake can move Left and Up only, it moves left one pixel. And then moves down
as much as possible.
4. When a snake can move Right and Up only, it moves right one pixel. And then moves
down as much as possible.
5. When a snake can move right and left only, it goes right. (Priority order: D > R > L > U)
6. When a snake moves left, it cannot go to any point that is to the left of a previously
completed segmentation line.
7. A vertical slicing line could be a legitimate segmentation line.
8. Distance control: when a snake reaches the bottom line, it is done.
9. If a snake cannot reach the bottom, it is aborted and all its trace is deleted.
Dept. of CSE, JNNCE.
Page 21
Breaking Visual CAPTCHAs with Naïve Pattern Recognition Algorithms
10. No matter whether or not the previous snake succeeded in reaching the bottom, the next
snake starts one pixel to the right of the previous starting point.
There could be multiple snakes between two segments, see Fig 3.6(b), where for
example the red block between ‘K’ and ‘S’ were in fact a set of snake lines that touched each
other. Therefore, the last step is to finalize the segments. This process dealt with the
following tasks:
1) Getting rid of redundant snakes: if there is no foreground pixel in a segment, then this is an
empty segment and one of its segmentation lines is redundant;
2) When necessary, handling broken characters by merging neighboring segments using the
normal vertical segmentation. Fig 3.6(c) shows the finalized segments of a challenge, one for
which vertical segmentation would fail to segment overlapping letters T, J and K.
Enhancement technique
To enhance the snake segmentation approach, simple algorithms were designed to tell
apart letters with an identical pixel count by analyzing their geometric layouts.
Differentiating between ‘P’ and ‘V’. When a segment had a pixel count of 162, it could be
either ‘P’ or ‘V’. To determine which letter it is, this segment would be first normalized: its
left segmentation line would be adjusted to cross its left-most foreground pixel vertically and
similarly for the right segmentation line. Then, a vertical line would be drawn in the middle
of the normalized segment. If this middle line cut through the foreground text only once, this
segment would be recognized as ‘V’; otherwise, it is recognized as ‘P’. Fig. 3.6 shows the
condition. It is unlikely for the middle line to cut through ‘V’ twice, since it is rare to use a
rotated ‘V’ in a challenge presumably due to a usability concern: it would be very difficult for
people to differentiate a rotated ‘V’ from a distorted ‘L’ or ‘J’.
Fig 3.7 Differentiating Between ‘P’ And ‘V’
This method would not work when a ‘P’ or ‘V’ happened to have a crack in the
middle of its normalized segment. However, it is trivial to address this exception: the middle
Dept. of CSE, JNNCE.
Page 22
Breaking Visual CAPTCHAs with Naïve Pattern Recognition Algorithms
line could be shifted horizontally a number of times, and each time the number of
intersections it cut through the foreground would be checked. If two or more intersections
occurred more often, then one can be sure this segment is ‘P’; otherwise it is ‘V’.
Telling ‘O’ and ‘K’ apart. When a segment had a pixel count of 178, it could be either ‘K’
or ‘O’. To determine which letter it is, a vertical line would be drawn in the middle of the
segment. If this line cut through the foreground text only once, this segment would be
recognized as ‘K’. If this cut-through line had two intersections with the foreground, the letter
could be either ‘O’ or ‘K’. However, when observed that the distance between two
intersections, denoted by d, is larger for ‘O’ than for ‘K’. In the algorithm, if this distance is
larger than 14 pixels (an empirical threshold), the letter is recognized as ‘O’; else, it is
recognized as ‘K’. Fig 3.7 shows a segment that is normalized and then successfully
recognized as ‘O’ in this way.
Fig. 3.8 Calculating the diameter
However, this method is not perfect. For example, if there is a break in letter ‘O’ and
this break is exactly in the middle of the normalized segment, then the cutthrough line would
only cross the foreground once. Thus, this letter would be wrongly recognized as ‘K’.
Differentiating between ‘L’ and ‘J’. To tell whether a segment is ‘L’ or ‘J’, the segment is
first normalized. A horizontal line would then start to slice the segment horizontally from top
to bottom, until it intersected the foreground text. If the intersection is closer to the left
segmentation line, then the segment is recognized as ‘L’; if the intersection is closer to the
right segmentation line, then the segment is recognized as ‘J’. Fig 3.8 depicts it. If the
intersection is exactly in the middle, it is guessed by default as ‘L’.
Fig 3.9 Differentiating ‘L’ and ‘J’
Dept. of CSE, JNNCE.
Page 23
Breaking Visual CAPTCHAs with Naïve Pattern Recognition Algorithms
Chapter 4
Applications
This section describes various applications of the CAPTCHA. CAPTCHAs are used
in various web applications to identify human users and to restrict access to them [2].
Some of them are:
1.
Online Polls: Bots can wreak havoc to any unprotected online poll. They might
create a large number of votes which would then falsely represent the poll winner in
spotlight. This also results in decreased faith in these polls. CAPTCHAs can be used
in websites that have embedded polls to protect them from being accessed by bots,
and hence bring up the reliability of the polls.
2.
Protecting Web Registration: Several companies offer free email and other services.
Until recently, these service providers suffered from a serious problem – bots. These
bots would take advantage of the service and would sign up for a large number of
accounts. This often created problems in account management and also increased the
burden on their servers. CAPTCHAs can effectively be used to filter out the bots and
ensure that only human users are allowed to create accounts.
3. Preventing comment spam: Most bloggers are familiar with programs that submit
large number of automated posts that are done with the intention of increasing the
search engine ranks of that site. CAPTCHAs can be used before a post is submitted to
ensure that only human users can create posts. A CAPTCHA won't stop someone who
is determined to post a rude message or harass an administrator, but it will help
prevent bots from posting messages automatically.
4.
Search engine bots: It is sometimes desirable to keep web pages unindexed to
prevent others from finding them easily. There is an html tag to prevent search engine
bots from reading web pages. The tag, however, doesn't guarantee that bots won't read
a web page; it only serves to say "no bots, please." Search engine bots, since they
Dept. of CSE, JNNCE.
Page 24
Breaking Visual CAPTCHAs with Naïve Pattern Recognition Algorithms
usually belong to large companies, respect web pages that don't want to allow them in.
However, in order to truly guarantee that bots won't enter a web site, CAPTCHAs are
needed.
5.
E-Ticketing: Ticket brokers like Ticketmaster also use CAPTCHA applications.
These applications help prevent ticket scalpers from bombarding the service with
massive ticket purchases for big events. Without some sort of filter, it's possible for a
scalper to use a bot to place hundreds or thousands of ticket orders in a matter of
seconds. Legitimate customers become victims as events sell out minutes after tickets
become available. Scalpers then try to sell the tickets above face value. While
CAPTCHA applications don't prevent scalping; they do make it more difficult to scalp
tickets on a large scale.
6.
Email spam: CAPTCHAs also present a plausible solution to the problem of spam
emails. Therefore one has to use a CAPTCHA challenge to verify that an indeed a
human has sent the email.
7.
Preventing Dictionary Attacks: CAPTCHAs can also be used to prevent dictionary
attacks in password systems. The idea is simple: prevent a computer from being able
to iterate through the entire space of passwords by requiring it to solve a CAPTCHA
after a certain number of unsuccessful logins. This is better than the classic approach
of locking an account after a sequence of unsuccessful logins, since doing so allows
an attacker to lock accounts at will.
8.
As a tool to verify digitized books:
This is a way of increasing the value of
CAPTCHA as an application. An application called reCAPTCHA harnesses users
responses in CAPTCHA fields to verify the contents of a scanned piece of paper.
Because computers aren’t always able to identify words from a digital scan, humans
have to verify what a printed page says.
Dept. of CSE, JNNCE.
Page 25
Breaking Visual CAPTCHAs with Naïve Pattern Recognition Algorithms
Chapter 5
Conclusion and Future Scope
CAPTCHAs are an effective way to counter bots and reduce spam. They serve dual
purpose– help advance AI knowledge. Applications are varied– from stopping bots to
character recognition & pattern matching. A good CAPTCHA system should give
consideration both to computer security and human friendliness. Designing good
CAPTCHAs is a tedious business. Text CAPTCHAs achieve a high level of practicality, but
very often fall short of providing a good balance between usability and security. CAPTCHA
plays an important role in protecting Internet resources from attacks by automated scripts.
However, CAPTCHA is believed to be vulnerable to various attacks to breakdown the
CAPTCHA and thus breakdown the security system. Fatal design mistakes were exploited to
develop simple attacks that could break, with near 100% success, two visual CAPTCHA
schemes provided by Captchaservice.org -- these schemes all employed sophisticated
distortions, and they were effectively resistant to OCR software attacks and appeared to be
secure. It is alarming and because many of today’s CAPTCHAs are likely only to provide a
false sense of security. Thus the systematically breaking representative schemes will generate
convincing evidence and establish valuable insights that will benefit the design of the next
generation of robust and usable CAPTCHAs. Thus breaking of CAPTCHA helps in building
more robust and secured CAPTCHA and thus provides secured online authentication.
Dept. of CSE, JNNCE.
Page 26
Breaking Visual CAPTCHAs with Naïve Pattern Recognition Algorithms
References
[1] L. von Ahn, M. Blum, N. J. Hopper, and J. Langford. CAPTCHA: Using hard AI
problems for security. Proc. of Int. Conf. on the Theory and Applications of Cryptographic
Techniques (EUROCRYPT 2003), vol. 2656 of LNCS, pp. 294– 311, May 2003
[2] Sarika et al., International Journal of Advanced Research in Computer Science and
Software Engineering: Understanding Captcha: Text and Audio Based Captcha with its
Applications, June - 2013, pp. 106-115
[3] Jeff Yan and A. S. E. Ahmad. Breaking visual CAPTCHAs with naive pattern recognition
algorithms. Proc. of 23rd Annual Computer Security Applications Conference (ACSAC
2007), pp. 279–291, Dec. 2007.
[4] T Converse, “CAPTCHA generation as a web service”, Proc. of Second Int’l Workshop
on Human Interactive Proofs (HIP’05), ed. by HS Baird and DP Lopresti, Springer-Verlag.
LNCS 3517, Bethlehem, PA, USA, 2005. pp. 82-96
[5] Athanasopoulos.E and Antonatos.S. Enhanced CAPTCHAs: Using animation to tell
humans and computers apart. Proc. of 10th Int. Conf. on Communicationsand Multimedia
Security (CMS 2006), vol. 4237 of LNCS, pp. 97–108, October 2006.
[6] Ferzli, R.; Bazzi, R.; Karam, L.J.; A Captcha Based on the Human Visual Systems
Masking Characteristics; IEEE International Conference on Multimedia and Expo,
2006,pp517-520.
[7] T.-Y. Chan. Using a text-to-speech synthesizer to generate a reverse Turing test. Proc. of
15th IEEE Int.Conf. on Tools with Artificial Intelligence (ICTAI 03), pp. 226–232,
November 2003.
[8] C Pope and K Kaur. “Is It Human or Computer? Defending E-Commerce with captcha”,
IEEE IT Professional, March 2005, pp. 43-49
Dept. of CSE, JNNCE.
Page 27
Breaking Visual CAPTCHAs with Naïve Pattern Recognition Algorithms
[9] HS Baird, MA Moll and SY Wang. “ScatterType: A Legible but Hard-to-Segment
CAPTCHA”, Eighth International Conference on Document Analysis and Recognition,
August 2005, pp. 935-939
[10] AL Coates, H S Baird and RJ Fateman. “PessimalPrint: A Reverse Turing Test”, Int'l. J.
on Document Analysis & Recognition, Vol. 5, pp. 158-163, 2003.
[11] Gabriel Moy, Nathan Jones, Curt Harkless and Randall Potter. “Distortion Estimation
Techniques in Solving Visual CAPTCHAs”, IEEE Conference on Computer Vision and
Pattern Recognition (CVPR'04), Vol 2, June 2004, pp. 23-28
[12] J. Yan and A. S. El Ahmad. Usability of CAPTCHAs or usability issues in CAPTCHA
design. In SOUPS ‘08, pp. 44–52, New York, NY, USA, 2008.
[13] Jeremy Elson, John R. Douceur, Jon Howell, and Jared Saul. Asirra: a CAPTCHA that
exploits interest-aligned manual image categorization. In ACM Conference on Computer and
Communications Security, pages 366–374. ACM, 2007.
[14] B. Pinkas and T. Sander, “Securing passwords against dictionary attacks,” in Proc. of the
9th CCS, V. Atluri, Ed. ACM Press, Nov. 2002, pp. 161–170.
Dept. of CSE, JNNCE.
Page 28
Download