Word & Pic HW3 – Report (Naive Bayes) Students: Yulong Wang (#108319008), Bo Feng (#108809282) In this homework, we train a Naive Bayes classifier on bag of words and bag of (SIFT) image keypoints features. Data Preparation As we have four bags related categories (hobo, shoulder, clutch, totes). The first step we do is to split our data into two parts for every category. For each bag category, those images numbered 500 and below are in training part. The rest images, those images numbered above 500 are in our testing part. Part 1 - Image Representation The idea of this part is similar to the idea of bag of words feature, which is called “bag of keypoints” feature in the paper we read. So we can model images as a histogram of visual word counts. And our visual word can be found by clustering SIFT features. For each image, we first extract its SIFT descriptor as our image features by using the vl_sift() in vlfeat package. We then use some large random subset of SIFT features of our training data, cluster the features into a visual word code book using kmeans by vl_kmeans(). We tried different numbers of subset, which are 10000, 20000, 40000, 80000, 160000 (around 230000 in total). We found that the result was not impacted too much from these numbers. The reason for this is because, by calling vl_sift(), we get to much sift interesting points, so by randomly uniformly choosing a relatively large subset of them is sufficient for clustering. For how we choose subset of these features, for example, if we are going to choose 80000 points, and we have four categories, we evenly choose from each category. That is, we choose 20000 points randomly in each category. For choosing the number of clustering center, we analyzed the error from different cluster number, ranging from 100 to 3000 with interval 100. For each image, and each SIFT feature we chose in that image, label the feature as the closest visual word. For each image, compute your image representation as a histogram of visual word counts. Part 2 - Text Representation We modeled text descriptions as a histogram of word counts just like what we did in previous homework. We compute our lexicon as all words in the shopping descriptions that appear relatively frequently. As usual, we use regular expression to get lower tokens of words. Thus we can, for each text document, compute the text representation as a histogram of word counts. Part 3 - Training Classifiers In this part, we trained our Naive Bayes classifier by first calculating P (Fi|Cj) for each feature i, and each category j. As P(Cj) is uniform over the 4 categories, we can simply ignore this part in our calculation. Each visual and textual word will be a feature in our Naive Bayes classifier. We use the training data for each bag category to calculate the probability of features given categories using counts with Laplace smoothing, which is the simple add-1 smoothing. (来张公式) Part 4 - Image Classification and Confusion Here we will classify test images using our trained classifier. For each test image calculate the probability of each category using your classifier (see Csurka paper, eqn 1 for this calculation). Label the image as the highest probability category. Compute a confusion matrix showing for each category what percentage of images from that category were confused with the other categories (see the Csurka paper for an example). The diagonal of this matrix is your per category accuracy.