MLP for character recogntion

advertisement
COURSEWORK 1: USING A MULTI-LAYER PERCEPTRON
WITH BACK-PROPAGATION LEARNING FOR CHARACTER
RECONGNITION
In this assessment, you will implement a multi-layer perceptron (MLP) for character recognition using Matlab.
When your program is presented an image of a single character, the program must output its class (what letter it is
in the English alphabet). You may use handwritten characters, or scanned printed characters with noise as your
data. You are not expected to localize character in a document, but use images of single characters.
Please plan to spend approximately three weeks, spending two hours on average each day.
STEPS
Here are some steps that you may wish to follow:
1.
2.
3.
4.
5.
Download a suitable dataset of characters with ground truth.
 The database must contain many samples (images cut to the size of the character) for each class
(character), along with ground truth. You may not be able to find a database of equal-size images
or just grayscale images. Therefore, your may need to use Matlab functions rgb2gray, imresize,
and zero padding to have equal-sized images of characters. The function imnoise adds noise to
images.
 I have provided below some links for you to download a dataset. You may find a source or create
one on your own as well.
Program a MLP with back propagation for the 2-D three-class problem that we discussed in class.
 This is for you to learn the MLP in a simpler setting.
Based on your simple MLP, program the character recognizer.
 If each character in the dataset is 60 × 40 you can have a MLP with 60 × 40 = 1200 inputs, and
26 outputs, one for each character class.
 Probably, one hidden layer will be sufficient. However, you must experiment on your own.
 You should select the number of neurons in the hidden layer/s.
Test your character recognizer.
 Carry out testing for the training set and, more importantly, for test sets which the NN has not
seen before.
 Please report the detection rate and the false alarm rate in all your tests.
 Interpret your results.
 BONUS: you will get bonus points if you add noise to characters and plot a curve of detection
rate against the noise level.
Write the report
 Write the report including the problem definition, background, motivation, literature search,
method, experimental evaluation, conclusions and references.
 Experimental evaluation (results, comparison of results, and interpretation) is very important.
WHAT TO SUBMIT
Prepared by Ranga Rodrigo, The University of Moratuwa, Sri Lanka. ranga@uom.lk
Please submit a report of at most four pages (two double-sided A4 sheets), stapled, with no cover sheet. I will read
only the first four pages.




The first page should indicate your name and index number.
Your report must convince me that the work is your own, your system actually works, and that
the performance is good.
You should, thus, include descriptions, block diagrams, images, important parts of code, results
and references.
Please submit a soft version of the report and the code to your student representative, as I will
run the reports through turnitin and the code through moss for plagiarism detection.
WHAT WILL MAKE YOU LOSE MARKS
The following will make you lose marks:







Plagiarizing the report: You may not copy even a single sentence from another source, without
citing the source. If you copy and paste, in addition to citing the source, you must use quotation
marks. You will be given zero if plagiarism is detected.
Plagiarizing code: clearly acknowledge which part of code has been borrowed. You may not
borrow the MLP and back propagation parts of the code. You will be given zero if plagiarism is
detected.
Using Matlab’s NN toolbox. You will be given zero if Matlab’s NN toolbox is used.
Incomplete report.
Code that does not work.
Not presenting results.
Submitting a report of more than four pages.
GRADING
Total marks: 25 + bonus of 5 marks
Poor
Evidence of attempting to write an MLP:
2 marks
1
MLP software
(5 marks max.)
2
Results
(10 marks max.)
Evidence of recognizing a couple of
characters using student’s own MLP
software:
4 marks
3
Report
(5 marks max.)
Some notion of MLP, implementation, and
results:
2 marks
4
Discussion
(5 marks max.)
Including some form of a discussion:
2 marks
Excellent
Evidence of implementing a fully functional
software:
5 marks
Extensive testing of the system with more
than one data set with numerical results
and comparisons using student’s own MLP
software:
10 marks
Presenting the problem definition,
background, motivation, literature search,
method, experimental evaluation and
conclusions excellently well:
5 marks
Evaluating the system and MLP in general
in a critical discussion of the MLP, results,
problems, and alternatives:
5 marks
Prepared by Ranga Rodrigo, The University of Moratuwa, Sri Lanka. ranga@uom.lk
WHAT WILL EARN YOU BONUS POINTS (20%)
Any one of the following will earn you bonus points:






Adding noise to images and plotting a curve of detection rate versus noise level.
Adapting your MLP for another recognition task.
Creating a new dataset and making it available online.
Making a video of your work available online, along with the code and data set, provided that
your code works excellently well.
Adapting your work for Sinhala or Tamil with your own character database.
Augmenting your system with character localization and segmentation.
LINKS
The Chars74K dataset: http://www.ee.surrey.ac.uk/CVSSP/demos/chars74k/
OCR data set: http://ai.stanford.edu/~btaskar/ocr/
Prepared by Ranga Rodrigo, The University of Moratuwa, Sri Lanka. ranga@uom.lk
Download