ppt

advertisement
CAPTCHA Processing
CPRE 583 Fall 2010 Project
CAPTCHA Processing
Responsibilities
• Brian Washburn – Loading Image into RAM
and Preprocessing and related portion of
writeup/presentation
• Nicholas Rundle – Text Detection, related
portion of writeup/presentation, and
writeup/presentation Assembly
• Daniel Uhrman – Text Recognition and
related portion of writeup/presentation
CAPTCHA Processing
Motivation
The ever increasing spam e-mail has led to the
development of CAPTCHAs to try and distinguish between
humans and computers. The ability to distinguish between
humans and computers is becoming more difficult as
computer systems improve. New CAPTCHA systems that
are harder to break with a computer are necessary in order
to maintain security. This project aims to break current
CAPTCHA systems as a means of showing the
weaknesses inherent in the system and to motivate ways to
improve upon the current designs.
CAPTCHA Processing
Design
Image
FPGA
Text
overlooks
Interface Design
There are two main interfaces into the system:
1.) Ethernet to/from the PPC
2.) Loads and Stores to/from PPC and APU
Load
Terminal
Ethernet
PowerPC
440
Store
Auxiliary
Processor
Unit
File Transfer Protocol
Client
Server
TCP Connect
“220”
“AUTH”
“234”
“USER Captcha Group”
“230”
Passive FTP
Client
PASV
Server Control Port
Server Data
Port
227
IP, IP, IP, IP, Port, Port
Connect to Addr
ACK
DATA
Terminate
“226 Success”
Features of the Xilinx llwip4 library
(Lightweight IP)
• Standard Berkeley model for sockets
– Lwip_listen()
– Lwip_write()
– Lwip_socket()
– Lwip_bind()
– Lwip_socket() (SOCK_STREAM for TCP)
– Lwip_accept()
– Read()
– Close()
lxilKernel library
• Features an easy threading model
• Pthread like mutex’ing
FTP
Server
Thread
Process
Data
Port
Process
Control
Port
Control
Port
listen
Thread
Listen
Data
Port
Captcha Controller
• Our Controller coordinates dataflow
between all of our different subsystems
Auxiliary
Processor
Unit
Segmenter
BRAM
BRAM
Classifier
BRAM
Future PPC Work
• The PowerPC can be used for preprocessing
– Noise Reduction
– Edge detection
– Color correction
• Also, it could be used to parse the headers
of image files and pass this data along
coherently
Segmenter Unit
• Searches columns of the input image for
the edges of letters and copies these
columns into BRAM.
• For uniformity, output letters are fixed size
of 32x32. Right filled with white pixels.
Segmenter Unit
Input bram
address 0
Output bram
address 0
Address 32
Segmentation
• Histogram thresholding
• Edge detection
• Region-based
Classifier Unit
• Receives indication of successful segmentation
of up to 8 characters from Segmenter
• Reads Segmented Characters from BRAM.
• Compares each input character to 36 template
characters (A-Z and 0-9).
• Outputs an array of up to 8 ASCII values.
Horizontal Projection
• The segmented characters and template characters
are analyzed using HP (horizontal projection).
• The HP is determined by calculating the sum of each
horizontal row of pixel values for an image.
• For our 32x32 pixel images, the HP values will be
arrays of size 32 containing sums of up to 32 in each
position.
Classifier Template BRAM
• The expected HP values are pre-calculated for each
template character.
• These values are stored in a ROM made in a BRAM IP
core that is preconfigured with a .COE file.
• The input images from the segmenter are read from
BRAM and compared to each of the template characters
to find the best match.
Correlation Algorithm
• The HP values are compared utilizing the correlation
function from statistics shown below:
• Where: X and Y are the HP values for an input image
and a given template and N is the length of the HP array.
Correlation Algorithm Cont’d
• Due to the following constraints we went with the
following modification of the correlation equation:
– No IP Core for floating point conversion in version 10.1 of tools.
– No IP Core for an integer-based square root function.
– Potential overflows as a result of large summations and
multiplication.
• Implemented as 16 dedicated multipliers, 1 larger width
multiplier as well as 1 dedicated divider.
Potential Future Work
• Implement “learning” functionality in
classifier so that the template ROM is
actually a RAM and can be updated based
upon CAPTCHA techniques it observes.
• Utilize CAPTCHA Detection Unit for name
recognition from security badges, or
license plate identification on speed
cameras.
Integration
• In its current form, the project works fully
in Modelsim with various test inputs.
• In HW, the project works all the way up to
the classifier. The classifier unit has many
multipliers and uses a pipelined divider
which is a potential point of timing
irregularities. We are adding pipeline
stages to account for these timing issues.
Potential Future Work
• Implement “learning” functionality in
classifier so that the template ROM is
actually a RAM and can be updated based
upon CAPTCHA techniques it observes.
• Utilize CAPTCHA Detection Unit for name
recognition from security badges, or
license plate identification on speed
cameras.
CAPTCHA Processing
Papers
• Algorithm to Break Visual CAPTCHA (ICETET 2009)
• Bio-inspired unified model of visual segmentation system
for CAPTCHA character recognition (SiPS 2008)
• CAPTCHA Security: A Case Study (Security & Privacy
July 2009)
• Recognizing object in adversarial clutter: breaking a
visual CAPTCHA (Computer Vision and Pattern
Recognition 2003)
• Reverse Engineering CAPTCHAs (WCRE 2008)
Download