COURSEWORK 1: USING A MULTI-LAYER PERCEPTRON WITH BACK-PROPAGATION LEARNING FOR CHARACTER RECONGNITION In this assessment, you will implement a multi-layer perceptron (MLP) for character recognition using Matlab. When your program is presented an image of a single character, the program must output its class (what letter it is in the English alphabet). You may use handwritten characters, or scanned printed characters with noise as your data. You are not expected to localize character in a document, but use images of single characters. Please plan to spend approximately three weeks, spending two hours on average each day. STEPS Here are some steps that you may wish to follow: 1. 2. 3. 4. 5. Download a suitable dataset of characters with ground truth. The database must contain many samples (images cut to the size of the character) for each class (character), along with ground truth. You may not be able to find a database of equal-size images or just grayscale images. Therefore, your may need to use Matlab functions rgb2gray, imresize, and zero padding to have equal-sized images of characters. The function imnoise adds noise to images. I have provided below some links for you to download a dataset. You may find a source or create one on your own as well. Program a MLP with back propagation for the 2-D three-class problem that we discussed in class. This is for you to learn the MLP in a simpler setting. Based on your simple MLP, program the character recognizer. If each character in the dataset is 60 × 40 you can have a MLP with 60 × 40 = 1200 inputs, and 26 outputs, one for each character class. Probably, one hidden layer will be sufficient. However, you must experiment on your own. You should select the number of neurons in the hidden layer/s. Test your character recognizer. Carry out testing for the training set and, more importantly, for test sets which the NN has not seen before. Please report the detection rate and the false alarm rate in all your tests. Interpret your results. BONUS: you will get bonus points if you add noise to characters and plot a curve of detection rate against the noise level. Write the report Write the report including the problem definition, background, motivation, literature search, method, experimental evaluation, conclusions and references. Experimental evaluation (results, comparison of results, and interpretation) is very important. WHAT TO SUBMIT Prepared by Ranga Rodrigo, The University of Moratuwa, Sri Lanka. ranga@uom.lk Please submit a report of at most four pages (two double-sided A4 sheets), stapled, with no cover sheet. I will read only the first four pages. The first page should indicate your name and index number. Your report must convince me that the work is your own, your system actually works, and that the performance is good. You should, thus, include descriptions, block diagrams, images, important parts of code, results and references. Please submit a soft version of the report and the code to your student representative, as I will run the reports through turnitin and the code through moss for plagiarism detection. WHAT WILL MAKE YOU LOSE MARKS The following will make you lose marks: Plagiarizing the report: You may not copy even a single sentence from another source, without citing the source. If you copy and paste, in addition to citing the source, you must use quotation marks. You will be given zero if plagiarism is detected. Plagiarizing code: clearly acknowledge which part of code has been borrowed. You may not borrow the MLP and back propagation parts of the code. You will be given zero if plagiarism is detected. Using Matlab’s NN toolbox. You will be given zero if Matlab’s NN toolbox is used. Incomplete report. Code that does not work. Not presenting results. Submitting a report of more than four pages. GRADING Total marks: 25 + bonus of 5 marks Poor Evidence of attempting to write an MLP: 2 marks 1 MLP software (5 marks max.) 2 Results (10 marks max.) Evidence of recognizing a couple of characters using student’s own MLP software: 4 marks 3 Report (5 marks max.) Some notion of MLP, implementation, and results: 2 marks 4 Discussion (5 marks max.) Including some form of a discussion: 2 marks Excellent Evidence of implementing a fully functional software: 5 marks Extensive testing of the system with more than one data set with numerical results and comparisons using student’s own MLP software: 10 marks Presenting the problem definition, background, motivation, literature search, method, experimental evaluation and conclusions excellently well: 5 marks Evaluating the system and MLP in general in a critical discussion of the MLP, results, problems, and alternatives: 5 marks Prepared by Ranga Rodrigo, The University of Moratuwa, Sri Lanka. ranga@uom.lk WHAT WILL EARN YOU BONUS POINTS (20%) Any one of the following will earn you bonus points: Adding noise to images and plotting a curve of detection rate versus noise level. Adapting your MLP for another recognition task. Creating a new dataset and making it available online. Making a video of your work available online, along with the code and data set, provided that your code works excellently well. Adapting your work for Sinhala or Tamil with your own character database. Augmenting your system with character localization and segmentation. LINKS The Chars74K dataset: http://www.ee.surrey.ac.uk/CVSSP/demos/chars74k/ OCR data set: http://ai.stanford.edu/~btaskar/ocr/ Prepared by Ranga Rodrigo, The University of Moratuwa, Sri Lanka. ranga@uom.lk